Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .streamlit/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[server]
headless = true
enableCORS = false
enableXsrfProtection = true

[browser]
gatherUsageStats = false

[theme]
primaryColor = "#045C64"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F4F6F4"
textColor = "#2D3436"
font = "sans serif"
9 changes: 8 additions & 1 deletion MANIFEST
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ psisloo.py
setup.py
stan_models.py
stan_utility.py
statistics.py
bioce_statistics.py
variationalBayesian.py
fullBayesianMultimodal.py
multimodal_io.py
prepareHDX.py
prepareXLMS.py
bioce_pipeline.py
structure_observables.py
app.py
vbw_sc.i
71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,27 @@ python variationalBayesian.py –-help
```
If you see no errors but options menu pops up, you are good to go.

### PyStan 3 (full Bayesian inference)

Complete Bayesian sampling (`fullBayesian.py`, `fullBayesianTR.py`) uses **PyStan 3**:
install the **`pystan`** package from pip (inside the activated conda env the `bioce.yml` `pip` section already requests `pystan>=3.9,<4`). In code this appears as **`import stan`** (not `import pystan`). See the [PyStan upgrading guide](https://pystan.readthedocs.io/en/latest/upgrading.html).

**Platforms:** PyStan 3 is supported on **Linux and macOS** only (not Windows).

**Smoke test** (checks that `stan` compiles and samples, and that `stan_run` / `stan_utility` run):

```bash
pip install pytest
cd /path/to/bioce
python -m pytest tests/test_pystan_smoke.py -q
```

The first test only checks **warmup vs sampling split** logic and passes without PyStan. The other two tests **compile and sample** a tiny model; if `pystan` is missing they are **skipped** (pytest exit code 0). With PyStan installed, all three tests should pass.

Some httpstan builds reject sampler options such as ``parallel_chains`` or ``show_progress``; ``stan_run.build_and_sample`` then retries with only ``num_chains``, ``num_warmup``, and ``num_samples`` (parallelism and progress follow backend defaults).

`pytest.ini` in the repo root sets `pythonpath = .` so `stan_run` imports work without installing the package.

## Problems with installation on OSX 10.14 (Mojave)
There is a known issue with xcode installation on OSX 10.14 (Mojave)
If you see the following error:
Expand Down Expand Up @@ -98,6 +119,56 @@ which graphically ilustrated distribution of population weights (shown below) an
3. Script also returns text file containing Q vector, experimental intensity,
model intensity and experimental error.

### Streamlit web app (SciLifeLab Serve)

A browser UI runs the full PDB → observables → Stan pipeline:

```bash
pip install streamlit biopython "pystan>=3.9,<4"
streamlit run app.py
```

Upload PDBs (or zip), experimental SAXS / HDX / XL files, and run inference. See `serve/SERVE.md` for Docker deployment on [SciLifeLab Serve](https://serve.scilifelab.se/docs/).

SAXS from PDB requires **FoXS** or **Pepsi-SAXS** on `PATH`. HDX and XL use **BioPython** (peptide SASA proxy and Cα distances).

### HDX-MS and XL-MS (multimodal inference)

Ensemble-averaged HDX uptake and probabilistic XL-MS distance restraints can be combined with SAXS (or used alone) via `fullBayesianMultimodal.py`. Per-conformer observables must be precomputed (e.g. protection factors or Cα distances); Stan multiplies independent likelihoods over a shared `simplex` weight vector.

**Prepare inputs**

```bash
# One file per conformer in a directory (same row order: flattened peptide×time)
python prepareHDX.py -d hdx_predictions/ -o SimulatedHDX.txt

# One distance file per conformer (rows = crosslinks, same order as xl_restraints.dat)
python prepareXLMS.py -d xl_distances/ -o SimulatedXLdistances.txt
```

`hdx_exp.dat`: two columns `uptake sigma` per observation.
`xl_restraints.dat`: `res_i res_j z [d_max] [tau]` (`z=1` observed link; default `d_max=30`, `tau=3` Å).

**Run inference**

```bash
# SAXS + HDX + XL
python fullBayesianMultimodal.py \
-p weights.txt -f structures.txt \
-e simulated.dat -s SimulatedIntensities.txt \
-H hdx_exp.dat -S SimulatedHDX.txt \
-X xl_restraints.dat -D SimulatedXLdistances.txt \
-i 2000 -c 4 -j 4

# HDX only, or XL only (same -p priors)
python fullBayesianMultimodal.py -p weights.txt -H hdx_exp.dat -S SimulatedHDX.txt
python fullBayesianMultimodal.py -p weights.txt -X xl_restraints.dat -D SimulatedXLdistances.txt
python fullBayesianMultimodal.py -p weights.txt -X xl_restraints.dat -D SimulatedXLdistances.txt --xl-fp-mixture
```

XL satisfaction uses a soft logistic in linker distance: ensemble probability
`p_sat = sum_k w_k * sigmoid((d_max - d_k) / tau)` with `z_l ~ Bernoulli(p_sat)`.

### Using chemical shift data
1. In order to use chemical shift data, one needs to install SHIFTX2. This can be done by following instructions at:
[SHIFTX2](http://www.shiftx2.ca/download.html). This requires running python 2.6 or later, which won't work with
Expand Down
242 changes: 242 additions & 0 deletions app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
"""
Bioce Streamlit app: upload PDB ensemble + SAXS / HDX / XL data → Bayesian weights.

Run locally:
streamlit run app.py
"""
from __future__ import annotations

import tempfile
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import streamlit as st

from bioce_pipeline import PipelineConfig, run_full_pipeline
from scilifelab_theme import (
AQUA,
GRAPE,
LIME,
TEAL,
apply_matplotlib_theme,
inject_streamlit_theme,
render_footer,
render_header,
)
from structure_observables import _foxs_on_path, _HAS_BIOPYTHON, _pepsi_on_path

st.set_page_config(
page_title="Bioce | SciLifeLab",
layout="wide",
initial_sidebar_state="expanded",
)
inject_streamlit_theme()
apply_matplotlib_theme()
render_header(
"Bioce — Bayesian ensemble inference",
"Upload PDB conformers and SAXS, HDX-MS, and/or XL-MS data to infer posterior ensemble weights.",
)

with st.sidebar:
st.markdown(
'<p style="color:#045C64;font-weight:700;font-size:0.85rem;margin-bottom:0;">'
"SciLifeLab · Bioce</p>",
unsafe_allow_html=True,
)
st.header("Sampling")
iterations = st.slider("Stan iterations (total per chain)", 100, 3000, 400, 100)
chains = st.slider("Chains", 1, 8, 2)
njobs = st.slider("Parallel chains", 1, 8, 2)
st.header("Modalities")
use_saxs = st.checkbox("SAXS", value=True)
use_hdx = st.checkbox("HDX-MS", value=False)
use_xl = st.checkbox("XL-MS", value=False)
xl_fp = st.checkbox("XL false-positive mixture", value=False, disabled=not use_xl)
st.divider()
st.markdown("**Dependencies**")
st.write("FoXS:", "yes" if _foxs_on_path() else "no")
st.write("Pepsi-SAXS:", "yes" if _pepsi_on_path() else "no")
st.write("BioPython:", "yes" if _HAS_BIOPYTHON else "no")

st.subheader("Structure library")
pdb_uploads = st.file_uploader(
"PDB files or .zip archive",
type=["pdb", "zip"],
accept_multiple_files=True,
help="One file per conformer in the ensemble.",
)

col1, col2 = st.columns(2)

with col1:
st.subheader("SAXS")
saxs_file = st.file_uploader(
"Experimental SAXS (.dat)",
type=["dat", "txt", "csv"],
disabled=not use_saxs,
help="Columns: q, I, sigma (whitespace-separated).",
)

with col2:
st.subheader("HDX-MS")
hdx_peptides = st.file_uploader(
"Peptide definitions",
type=["txt", "dat", "csv"],
disabled=not use_hdx,
help="One peptide per line: chain start_res end_res (e.g. A 10 25).",
)
hdx_exp = st.file_uploader(
"Experimental uptake",
type=["txt", "dat", "csv"],
disabled=not use_hdx,
help="Lines: uptake sigma — or pep_idx time uptake sigma for kinetics.",
)

st.subheader("XL-MS")
xl_file = st.file_uploader(
"Crosslink restraints",
type=["txt", "dat"],
disabled=not use_xl,
help="res_i res_j z [d_max] [tau] — or chain_i res_i chain_j res_j z [d_max] [tau].",
)

with st.expander("File format help"):
st.markdown(
"""
- **SAXS**: three columns `q`, `I`, `error` (same as bioce `fullBayesian.py`).
- **HDX peptides**: `chain start end` per line (1-based residue numbers).
- **HDX data**: `uptake sigma` per peptide, or `pep_idx time uptake sigma` for time series.
- **XL**: `z=1` if the crosslink was observed. Distances are computed as Cα–Cα from each PDB.
- **HDX prediction** uses mean peptide SASA (Shrake–Rupley) as a solvent-exposure proxy—not full HDX kinetics.
"""
)

run = st.button("Run inference", type="primary", use_container_width=True)

if run:
if not pdb_uploads:
st.error("Upload at least one PDB file or a .zip archive.")
st.stop()
if use_saxs and not saxs_file:
st.error("SAXS enabled: upload an experimental curve.")
st.stop()
if use_hdx and (not hdx_peptides or not hdx_exp):
st.error("HDX enabled: upload peptide definitions and experimental uptake.")
st.stop()
if use_xl and not xl_file:
st.error("XL enabled: upload crosslink restraints.")
st.stop()
if use_saxs and not (_foxs_on_path() or _pepsi_on_path()):
st.error("Install FoXS or Pepsi-SAXS on PATH for SAXS simulation from PDBs.")
st.stop()
if (use_hdx or use_xl) and not _HAS_BIOPYTHON:
st.error("Install BioPython for HDX/XL from PDB (pip install biopython).")
st.stop()
if not (use_saxs or use_hdx or use_xl):
st.error("Enable at least one modality.")
st.stop()

with tempfile.TemporaryDirectory(prefix="bioce_") as tmp:
work = Path(tmp)
config = PipelineConfig(
work_dir=work,
use_saxs=use_saxs,
use_hdx=use_hdx,
use_xl=use_xl,
iterations=iterations,
chains=chains,
njobs=njobs,
xl_fp_mixture=xl_fp,
)
if use_saxs:
saxs_path = work / "experimental_saxs.dat"
saxs_path.write_bytes(saxs_file.getvalue())
config.saxs_experimental = saxs_path
if use_hdx:
config.hdx_peptide_lines = hdx_peptides.getvalue().decode().splitlines()
config.hdx_exp_lines = hdx_exp.getvalue().decode().splitlines()
if use_xl:
config.xl_restraint_lines = xl_file.getvalue().decode().splitlines()

with st.status("Running pipeline…", expanded=True) as status:
try:
st.write("Extracting PDBs and computing observables…")
result = run_full_pipeline(pdb_uploads, config)
status.update(label="Done", state="complete")
except Exception as exc:
status.update(label="Failed", state="error")
st.exception(exc)
st.stop()

m1, m2, m3, m4 = st.columns(4)
if result.saxs_chi2 is not None:
m1.metric("SAXS χ²", f"{result.saxs_chi2:.3f}")
if result.jsd is not None:
m2.metric("JSD", f"{result.jsd:.4f}")
if result.hdx_rmse is not None:
m3.metric("HDX RMSE", f"{result.hdx_rmse:.4f}")
if result.xl_mean_psat is not None:
m4.metric("Mean XL p(sat)", f"{result.xl_mean_psat:.3f}")

st.subheader("Posterior mean weights")
df = pd.DataFrame(result.tables["weights"])
st.dataframe(df, use_container_width=True)
fig, ax = plt.subplots(figsize=(8, 3))
colors = [TEAL if w == df["weight"].max() else LIME for w in df["weight"]]
ax.bar(df["structure"], df["weight"], color=colors, edgecolor=TEAL, linewidth=0.6)
ax.set_ylabel("Weight")
ax.set_facecolor("#FAFCFA")
ax.grid(axis="y", linestyle="--", alpha=0.5)
ax.tick_params(axis="x", rotation=45)
plt.tight_layout()
st.pyplot(fig)
plt.close(fig)

if "saxs_fit" in result.plots:
st.subheader("SAXS fit")
curve = np.genfromtxt(result.plots["saxs_fit"])
fig2, ax2 = plt.subplots(figsize=(7, 4))
ax2.errorbar(
curve[:, 0], curve[:, 1], yerr=curve[:, 3],
fmt="o", ms=3, color=GRAPE, ecolor=AQUA, label="Experiment", alpha=0.85,
)
ax2.plot(
curve[:, 0], curve[:, 2], "-", color=TEAL,
label="Weighted model", lw=2,
)
ax2.set_xlabel("q (Å⁻¹)")
ax2.set_ylabel("I(q)")
ax2.legend(frameon=True)
ax2.set_facecolor("#FAFCFA")
ax2.set_yscale("log")
st.pyplot(fig2)
plt.close(fig2)

for key, path in result.plots.items():
if path.suffix == ".png":
st.image(str(path), caption=key)

if "xl_psat" in result.tables:
st.subheader("XL satisfaction probability (posterior mean)")
psat = result.tables["xl_psat"]
fig_xl, ax_xl = plt.subplots(figsize=(7, 2.8))
ax_xl.plot(psat, color=TEAL, marker="o", markersize=5, markerfacecolor=LIME)
ax_xl.fill_between(range(len(psat)), psat, alpha=0.2, color=AQUA)
ax_xl.set_ylim(0, 1)
ax_xl.set_xlabel("Crosslink index")
ax_xl.set_ylabel("p(satisfied)")
ax_xl.set_facecolor("#FAFCFA")
ax_xl.grid(axis="y", linestyle="--", alpha=0.5)
st.pyplot(fig_xl)
plt.close(fig_xl)

st.download_button(
"Download weights CSV",
df.to_csv(index=False).encode(),
file_name="bayesian_weights.csv",
mime="text/csv",
)

render_footer()
5 changes: 4 additions & 1 deletion bioce.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,14 @@ channels:
- intel
dependencies:
- seaborn
- biopython
- pandas
- streamlit
- clang
- openmp
- swig
- gsl
- pip:
- pystan
- "pystan>=3.9,<4"
- pandas
- git+https://github.com/emblsaxs/sasciftools.git
Loading