NIH Catalyze · Pig biomarker and titration pipeline
| Step | What | Input | Main outputs |
|---|---|---|---|
| 1 | ammonia_analysis.ipynb |
Master sheets/*.xlsx (10 workbooks) |
output/ammonia_concentration.csv, output/co2_concentration.csv, output/ammonia_co2_ratio.csv |
| 2 | spline_interpolation.py (or .ipynb) |
Titration_Data_*.xlsx — ACIDITY + ALKALINITY |
output/naoh_pH11_results.csv, output/citric_acid_pH4_results.csv, output/interpolation_parameters_table.csv, output/titration_curves_healthy_vs_nonhealthy.png |
| 3 | biomarker_analysis.py (or .ipynb) |
Titration workbook in repo root + CSVs in output/ |
output/biomarker_summary.csv, by-pig / pooled PNGs in output/ |
Typical order: (1) ammonia notebook → (2) spline → (3) biomarker. Step 1 can run in parallel with step 2 if paths are set; step 3 needs outputs from both.
Shared code: All loading, splines, stats, and plotting live in shared_utils.py.
pip install -r requirements.txtFrom the catalyze directory (with Master sheets/, titration .xlsx in the repo root, and output/ for artifacts):
# Option A (CLI): create ammonia/CO2/ratio CSVs into output/
python ammonia_analysis.py --no-plots
# Then run spline + biomarker:
python spline_interpolation.py
python biomarker_analysis.pyScripts create output/ if needed (shared_utils.DEFAULT_OUTPUT_DIR). The titration workbook is read from shared_utils.CATALYZE_DIR (same folder as shared_utils.py) unless you pass absolute paths.
Notebooks (ammonia_analysis.ipynb, spline_interpolation.ipynb, biomarker_analysis.ipynb) mirror the same logic for interactive use.
python ammonia_analysis.py --master-sheets-dir "Master sheets" --output-dir "output" --no-plots
python spline_interpolation.py --titration-xlsx "Titration_Data_02052026.xlsx" --output-dir "output"
python biomarker_analysis.py --titration-xlsx "Titration_Data_02052026.xlsx" --input-dir "output" --output-dir "output"Each script supports --help.
- NaOH @ pH 11: PCHIP (monotonic) on ACIDITY sheet — NaOH vs pH.
- Citric @ pH 4: Linear interpolation on ALKALINITY sheet.
Default outputs (from spline_interpolation.py): naoh_pH11_results.csv, citric_acid_pH4_results.csv, interpolation_parameters_table.csv, titration_curves_healthy_vs_nonhealthy.png (300 DPI).
Edit constants at the top of spline_interpolation.py if needed: INPUT_FILE, SHEET_ACIDITY, SHEET_ALKALINITY, TARGET_pH_NAOH, TARGET_pH_CITRIC, HEALTHY_PIGS / NONHEALTHY_PIGS, and output filenames.
biomarker_analysis resolves citric via citric_acid_pH4_results.csv or alternate citric_pH4_results.csv if present.
HEALTHY_PIGS = PIG-04, PIG-06, PIG-07, PIG-08
NONHEALTHY_PIGS = PIG-01, PIG-02, PIG-03, PIG-05, PIG-09, PIG-10
| p-value | Label | Effect |
|---|---|---|
| < 0.001 | Highly significant | *** |
| < 0.01 | Very significant | ** |
| < 0.05 | Significant | * |
| ≥ 0.05 | Not significant | ns |
| Cohen’s d | Effect size |
|---|---|
| < 0.2 | None |
| 0.2–0.5 | Small |
| 0.5–0.8 | Medium |
| > 0.8 | Large |
Percent difference (reported in pipeline):
100 × (Healthy mean − Non-healthy mean) / Non-healthy mean
Positive ⇒ healthy higher.
Tests: shared_utils.calculate_statistics uses independent t-test, Mann–Whitney U, and Cohen’s d (pooled SD).
| Rank | Biomarker | p (approx.) | Cohen’s d | Note |
|---|---|---|---|---|
| 1 | Phosphate | < 0.001 | ~1.14 | Primary discriminator |
| 2 | Creatinine | ~0.045 | ~0.46 | Weak |
| 3 | NaOH @ pH 11 | ~0.054 | ~0.46 | NS |
| 4 | Urine pH | ~0.62 | ~0.08 | NS |
| 5 | Min pH (if computed) | ~0.19 | ~0.28 | NS |
| 6 | Citric @ pH 4 | ~0.97 | ~0.01 | NS |
Phosphate (primary): healthy ~176.95 vs non-healthy ~120.6 mM; ~+47% in healthy; large effect.
Interpretation (short): Non-healthy animals show proximal tubular phosphate wasting; urine pH and citric buffering did not separate groups, arguing against a generic “global metabolic failure” story and toward selective tubular signal.
Manuscript snippets (templates):
- Phosphate: “Urinary phosphate was lower in non-healthy animals (e.g. ~120.6 vs ~176.9; p < 0.001, Cohen’s d ~ 1.1), consistent with impaired tubular reabsorption.”
- pH (negative): “Direct urine pH did not differ between groups (p > 0.5), consistent with preserved baseline acid–base readout in this dataset.”
| File | Role |
|---|---|
output/ |
Default folder for CSV and PNG artifacts |
shared_utils.py |
Ammonia/CO₂ loaders, splines, biomarker stats, figures |
ammonia_analysis.py |
CLI: export ammonia/CO₂/ratio CSVs from Master sheets |
spline_interpolation.py |
CLI: NaOH + citric CSVs + plot |
biomarker_analysis.py |
CLI: full biomarker run → biomarker_summary.csv |
*.ipynb |
Interactive runs |
- Inspect the Excel layout (
pd.read_excel,sheet_name=…, print columns). - Pool values across
HEALTHY_PIGS/NONHEALTHY_PIGS(seeread_phosphate_creatinine_by_pig,load_naoh_from_csv, etc. inshared_utils.py). - Call
calculate_statistics(healthy_arr, nonhealthy_arr). - Add your step to
run_full_biomarker_analysis(or a separate script) and append to the summary dict passed tocreate_biomarker_summary.
Reuse existing plot helpers (plot_biomarkers_by_pig_figure, …) for consistent styling.
- Import errors: install packages (section 1).
- Missing CSV: run steps in order; check working directory.
- Sheet not found:
pd.ExcelFile(path).sheet_names. - NaN at target pH: measured pH range may not reach the target (e.g. 11); inspect raw titration rows.
- p or d disagrees with a past run: confirm same file version, same group labels, and same test (t-test vs Mann–Whitney).
License: Proprietary to the Catalyze project; internal use unless otherwise agreed.
When publishing, replace summary numbers with those from your current biomarker_summary.csv and locked analysis date.