Andre-lab · wpotrzebowski · May 12, 2026 · May 12, 2026 · May 12, 2026 · May 19, 2026
diff --git a/.streamlit/config.toml b/.streamlit/config.toml
@@ -0,0 +1,14 @@
+[server]
+headless = true
+enableCORS = false
+enableXsrfProtection = true
+
+[browser]
+gatherUsageStats = false
+
+[theme]
+primaryColor = "#045C64"
+backgroundColor = "#FFFFFF"
+secondaryBackgroundColor = "#F4F6F4"
+textColor = "#2D3436"
+font = "sans serif"
diff --git a/MANIFEST b/MANIFEST
@@ -7,6 +7,13 @@ psisloo.py
 setup.py
 stan_models.py
 stan_utility.py
-statistics.py
+bioce_statistics.py
 variationalBayesian.py
+fullBayesianMultimodal.py
+multimodal_io.py
+prepareHDX.py
+prepareXLMS.py
+bioce_pipeline.py
+structure_observables.py
+app.py
 vbw_sc.i
diff --git a/README.md b/README.md
@@ -32,6 +32,27 @@ python variationalBayesian.py –-help
 ```
 If you see no errors but options menu pops up, you are good to go.
 
+### PyStan 3 (full Bayesian inference)
+
+Complete Bayesian sampling (`fullBayesian.py`, `fullBayesianTR.py`) uses **PyStan 3**:
+install the **`pystan`** package from pip (inside the activated conda env the `bioce.yml` `pip` section already requests `pystan>=3.9,<4`). In code this appears as **`import stan`** (not `import pystan`). See the [PyStan upgrading guide](https://pystan.readthedocs.io/en/latest/upgrading.html).
+
+**Platforms:** PyStan 3 is supported on **Linux and macOS** only (not Windows).
+
+**Smoke test** (checks that `stan` compiles and samples, and that `stan_run` / `stan_utility` run):
+
+```bash
+pip install pytest
+cd /path/to/bioce
+python -m pytest tests/test_pystan_smoke.py -q
+```
+
+The first test only checks **warmup vs sampling split** logic and passes without PyStan. The other two tests **compile and sample** a tiny model; if `pystan` is missing they are **skipped** (pytest exit code 0). With PyStan installed, all three tests should pass.
+
+Some httpstan builds reject sampler options such as ``parallel_chains`` or ``show_progress``; ``stan_run.build_and_sample`` then retries with only ``num_chains``, ``num_warmup``, and ``num_samples`` (parallelism and progress follow backend defaults).
+
+`pytest.ini` in the repo root sets `pythonpath = .` so `stan_run` imports work without installing the package.
+
 ## Problems with installation on OSX 10.14 (Mojave)
 There is a known issue with xcode installation on OSX 10.14 (Mojave)
 If you see the following error:
@@ -98,6 +119,56 @@ which graphically ilustrated distribution of population weights (shown below) an
 3. Script also returns text file containing Q vector, experimental intensity,
 model intensity and experimental error.
 
+### Streamlit web app (SciLifeLab Serve)
+
+A browser UI runs the full PDB → observables → Stan pipeline:
+
+```bash
+pip install streamlit biopython "pystan>=3.9,<4"
+streamlit run app.py
+```
+
+Upload PDBs (or zip), experimental SAXS / HDX / XL files, and run inference. See `serve/SERVE.md` for Docker deployment on [SciLifeLab Serve](https://serve.scilifelab.se/docs/).
+
+SAXS from PDB requires **FoXS** or **Pepsi-SAXS** on `PATH`. HDX and XL use **BioPython** (peptide SASA proxy and Cα distances).
+
+### HDX-MS and XL-MS (multimodal inference)
+
+Ensemble-averaged HDX uptake and probabilistic XL-MS distance restraints can be combined with SAXS (or used alone) via `fullBayesianMultimodal.py`. Per-conformer observables must be precomputed (e.g. protection factors or Cα distances); Stan multiplies independent likelihoods over a shared `simplex` weight vector.
+
+**Prepare inputs**
+
+```bash
+# One file per conformer in a directory (same row order: flattened peptide×time)
+python prepareHDX.py -d hdx_predictions/ -o SimulatedHDX.txt
+
+# One distance file per conformer (rows = crosslinks, same order as xl_restraints.dat)
+python prepareXLMS.py -d xl_distances/ -o SimulatedXLdistances.txt
+```
+
+`hdx_exp.dat`: two columns `uptake sigma` per observation.  
+`xl_restraints.dat`: `res_i res_j z [d_max] [tau]` (`z=1` observed link; default `d_max=30`, `tau=3` Å).
+
+**Run inference**
+
+```bash
+# SAXS + HDX + XL
+python fullBayesianMultimodal.py \
+  -p weights.txt -f structures.txt \
+  -e simulated.dat -s SimulatedIntensities.txt \
+  -H hdx_exp.dat -S SimulatedHDX.txt \
+  -X xl_restraints.dat -D SimulatedXLdistances.txt \
+  -i 2000 -c 4 -j 4
+
+# HDX only, or XL only (same -p priors)
+python fullBayesianMultimodal.py -p weights.txt -H hdx_exp.dat -S SimulatedHDX.txt
+python fullBayesianMultimodal.py -p weights.txt -X xl_restraints.dat -D SimulatedXLdistances.txt
+python fullBayesianMultimodal.py -p weights.txt -X xl_restraints.dat -D SimulatedXLdistances.txt --xl-fp-mixture
+```
+
+XL satisfaction uses a soft logistic in linker distance: ensemble probability  
+`p_sat = sum_k w_k * sigmoid((d_max - d_k) / tau)` with `z_l ~ Bernoulli(p_sat)`.
+
 ### Using chemical shift data
 1. In order to use chemical shift data, one needs to install SHIFTX2. This can be done by following instructions at:
 [SHIFTX2](http://www.shiftx2.ca/download.html). This requires running python 2.6 or later, which won't work with

diff --git a/app.py b/app.py
@@ -0,0 +1,242 @@
+"""
+Bioce Streamlit app: upload PDB ensemble + SAXS / HDX / XL data → Bayesian weights.
+
+Run locally:
+  streamlit run app.py
+"""
+from __future__ import annotations
+
+import tempfile
+from pathlib import Path
+
+import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+import streamlit as st
+
+from bioce_pipeline import PipelineConfig, run_full_pipeline
+from scilifelab_theme import (
+    AQUA,
+    GRAPE,
+    LIME,
+    TEAL,
+    apply_matplotlib_theme,
+    inject_streamlit_theme,
+    render_footer,
+    render_header,
+)
+from structure_observables import _foxs_on_path, _HAS_BIOPYTHON, _pepsi_on_path
+
+st.set_page_config(
+    page_title="Bioce | SciLifeLab",
+    layout="wide",
+    initial_sidebar_state="expanded",
+)
+inject_streamlit_theme()
+apply_matplotlib_theme()
+render_header(
+    "Bioce — Bayesian ensemble inference",
+    "Upload PDB conformers and SAXS, HDX-MS, and/or XL-MS data to infer posterior ensemble weights.",
+)
+
+with st.sidebar:
+    st.markdown(
+        '<p style="color:#045C64;font-weight:700;font-size:0.85rem;margin-bottom:0;">'
+        "SciLifeLab · Bioce</p>",
+        unsafe_allow_html=True,
+    )
+    st.header("Sampling")
+    iterations = st.slider("Stan iterations (total per chain)", 100, 3000, 400, 100)
+    chains = st.slider("Chains", 1, 8, 2)
+    njobs = st.slider("Parallel chains", 1, 8, 2)
+    st.header("Modalities")
+    use_saxs = st.checkbox("SAXS", value=True)
+    use_hdx = st.checkbox("HDX-MS", value=False)
+    use_xl = st.checkbox("XL-MS", value=False)
+    xl_fp = st.checkbox("XL false-positive mixture", value=False, disabled=not use_xl)
+    st.divider()
+    st.markdown("**Dependencies**")
+    st.write("FoXS:", "yes" if _foxs_on_path() else "no")
+    st.write("Pepsi-SAXS:", "yes" if _pepsi_on_path() else "no")
+    st.write("BioPython:", "yes" if _HAS_BIOPYTHON else "no")
+
+st.subheader("Structure library")
+pdb_uploads = st.file_uploader(
+    "PDB files or .zip archive",
+    type=["pdb", "zip"],
+    accept_multiple_files=True,
+    help="One file per conformer in the ensemble.",
+)
+
+col1, col2 = st.columns(2)
+
+with col1:
+    st.subheader("SAXS")
+    saxs_file = st.file_uploader(
+        "Experimental SAXS (.dat)",
+        type=["dat", "txt", "csv"],
+        disabled=not use_saxs,
+        help="Columns: q, I, sigma (whitespace-separated).",
+    )
+
+with col2:
+    st.subheader("HDX-MS")
+    hdx_peptides = st.file_uploader(
+        "Peptide definitions",
+        type=["txt", "dat", "csv"],
+        disabled=not use_hdx,
+        help="One peptide per line: chain start_res end_res (e.g. A 10 25).",
+    )
+    hdx_exp = st.file_uploader(
+        "Experimental uptake",
+        type=["txt", "dat", "csv"],
+        disabled=not use_hdx,
+        help="Lines: uptake sigma — or pep_idx time uptake sigma for kinetics.",
+    )
+
+st.subheader("XL-MS")
+xl_file = st.file_uploader(
+    "Crosslink restraints",
+    type=["txt", "dat"],
+    disabled=not use_xl,
+    help="res_i res_j z [d_max] [tau] — or chain_i res_i chain_j res_j z [d_max] [tau].",
+)
+
+with st.expander("File format help"):
+    st.markdown(
+        """
+- **SAXS**: three columns `q`, `I`, `error` (same as bioce `fullBayesian.py`).
+- **HDX peptides**: `chain start end` per line (1-based residue numbers).
+- **HDX data**: `uptake sigma` per peptide, or `pep_idx time uptake sigma` for time series.
+- **XL**: `z=1` if the crosslink was observed. Distances are computed as Cα–Cα from each PDB.
+- **HDX prediction** uses mean peptide SASA (Shrake–Rupley) as a solvent-exposure proxy—not full HDX kinetics.
+        """
+    )
+
+run = st.button("Run inference", type="primary", use_container_width=True)
+
+if run:
+    if not pdb_uploads:
+        st.error("Upload at least one PDB file or a .zip archive.")
+        st.stop()
+    if use_saxs and not saxs_file:
+        st.error("SAXS enabled: upload an experimental curve.")
+        st.stop()
+    if use_hdx and (not hdx_peptides or not hdx_exp):
+        st.error("HDX enabled: upload peptide definitions and experimental uptake.")
+        st.stop()
+    if use_xl and not xl_file:
+        st.error("XL enabled: upload crosslink restraints.")
+        st.stop()
+    if use_saxs and not (_foxs_on_path() or _pepsi_on_path()):
+        st.error("Install FoXS or Pepsi-SAXS on PATH for SAXS simulation from PDBs.")
+        st.stop()
+    if (use_hdx or use_xl) and not _HAS_BIOPYTHON:
+        st.error("Install BioPython for HDX/XL from PDB (pip install biopython).")
+        st.stop()
+    if not (use_saxs or use_hdx or use_xl):
+        st.error("Enable at least one modality.")
+        st.stop()
+
+    with tempfile.TemporaryDirectory(prefix="bioce_") as tmp:
+        work = Path(tmp)
+        config = PipelineConfig(
+            work_dir=work,
+            use_saxs=use_saxs,
+            use_hdx=use_hdx,
+            use_xl=use_xl,
+            iterations=iterations,
+            chains=chains,
+            njobs=njobs,
+            xl_fp_mixture=xl_fp,
+        )
+        if use_saxs:
+            saxs_path = work / "experimental_saxs.dat"
+            saxs_path.write_bytes(saxs_file.getvalue())
+            config.saxs_experimental = saxs_path
+        if use_hdx:
+            config.hdx_peptide_lines = hdx_peptides.getvalue().decode().splitlines()
+            config.hdx_exp_lines = hdx_exp.getvalue().decode().splitlines()
+        if use_xl:
+            config.xl_restraint_lines = xl_file.getvalue().decode().splitlines()
+
+        with st.status("Running pipeline…", expanded=True) as status:
+            try:
+                st.write("Extracting PDBs and computing observables…")
+                result = run_full_pipeline(pdb_uploads, config)
+                status.update(label="Done", state="complete")
+            except Exception as exc:
+                status.update(label="Failed", state="error")
+                st.exception(exc)
+                st.stop()
+
+        m1, m2, m3, m4 = st.columns(4)
+        if result.saxs_chi2 is not None:
+            m1.metric("SAXS χ²", f"{result.saxs_chi2:.3f}")
+        if result.jsd is not None:
+            m2.metric("JSD", f"{result.jsd:.4f}")
+        if result.hdx_rmse is not None:
+            m3.metric("HDX RMSE", f"{result.hdx_rmse:.4f}")
+        if result.xl_mean_psat is not None:
+            m4.metric("Mean XL p(sat)", f"{result.xl_mean_psat:.3f}")
+
+        st.subheader("Posterior mean weights")
+        df = pd.DataFrame(result.tables["weights"])
+        st.dataframe(df, use_container_width=True)
+        fig, ax = plt.subplots(figsize=(8, 3))
+        colors = [TEAL if w == df["weight"].max() else LIME for w in df["weight"]]
+        ax.bar(df["structure"], df["weight"], color=colors, edgecolor=TEAL, linewidth=0.6)
+        ax.set_ylabel("Weight")
+        ax.set_facecolor("#FAFCFA")
+        ax.grid(axis="y", linestyle="--", alpha=0.5)
+        ax.tick_params(axis="x", rotation=45)
+        plt.tight_layout()
+        st.pyplot(fig)
+        plt.close(fig)
+
+        if "saxs_fit" in result.plots:
+            st.subheader("SAXS fit")
+            curve = np.genfromtxt(result.plots["saxs_fit"])
+            fig2, ax2 = plt.subplots(figsize=(7, 4))
+            ax2.errorbar(
+                curve[:, 0], curve[:, 1], yerr=curve[:, 3],
+                fmt="o", ms=3, color=GRAPE, ecolor=AQUA, label="Experiment", alpha=0.85,
+            )
+            ax2.plot(
+                curve[:, 0], curve[:, 2], "-", color=TEAL,
+                label="Weighted model", lw=2,
+            )
+            ax2.set_xlabel("q (Å⁻¹)")
+            ax2.set_ylabel("I(q)")
+            ax2.legend(frameon=True)
+            ax2.set_facecolor("#FAFCFA")
+            ax2.set_yscale("log")
+            st.pyplot(fig2)
+            plt.close(fig2)
+
+        for key, path in result.plots.items():
+            if path.suffix == ".png":
+                st.image(str(path), caption=key)
+
+        if "xl_psat" in result.tables:
+            st.subheader("XL satisfaction probability (posterior mean)")
+            psat = result.tables["xl_psat"]
+            fig_xl, ax_xl = plt.subplots(figsize=(7, 2.8))
+            ax_xl.plot(psat, color=TEAL, marker="o", markersize=5, markerfacecolor=LIME)
+            ax_xl.fill_between(range(len(psat)), psat, alpha=0.2, color=AQUA)
+            ax_xl.set_ylim(0, 1)
+            ax_xl.set_xlabel("Crosslink index")
+            ax_xl.set_ylabel("p(satisfied)")
+            ax_xl.set_facecolor("#FAFCFA")
+            ax_xl.grid(axis="y", linestyle="--", alpha=0.5)
+            st.pyplot(fig_xl)
+            plt.close(fig_xl)
+
+        st.download_button(
+            "Download weights CSV",
+            df.to_csv(index=False).encode(),
+            file_name="bayesian_weights.csv",
+            mime="text/csv",
+        )
+
+render_footer()
diff --git a/bioce.yml b/bioce.yml
@@ -6,11 +6,14 @@ channels:
 - intel
 dependencies:
 - seaborn
+- biopython
+- pandas
+- streamlit
 - clang
 - openmp
 - swig
 - gsl
 - pip:
-  - pystan
+  - "pystan>=3.9,<4"
   - pandas
   - git+https://github.com/emblsaxs/sasciftools.git