plot_mavisp_ddg.py is a command-line tool for visualizing protein mutation stability predictions from MAVISp-generated CSV files. It produces grouped bar plots of ΔΔG (kcal/mol) values for multiple computational methods (FoldX, Rosetta, RaSP), with automatic standard deviation error bars when available.
The script is designed to visualize complete mutational scans, plotting all 19 possible amino-acid substitutions per residue position in a single figure. Each plot corresponds to one residue position and includes all associated stability predictions.
Both PDF and high-resolution PNG figures are generated for publication and reporting.
Key Features:
- Detects ΔΔG columns for FoldX, Rosetta, and RaSP
- Detects corresponding standard deviation columns (st. dev, stdev, std, sd)
- Excludes irrelevant columns (e.g. classification, count, rank)
- Converts values to numeric and safely ignores non-numeric entries
- Splits large datasets into configurable chunks
- Plots 19 mutations per figure, corresponding to a full amino-acid substitution set
- Generates publication-ready PDF and PNG plots
Python 3.10+ Packages: numpy, pandas, matplotlib
Example module load (if using a module system):
module load python/3.10/modulefileInput Requirements:
-
Input file must be a CSV
-
Must contain a column named Mutation (used as the x-axis)
-
ΔΔG columns must include method keywords:
- foldx
- rosetta
- rasp
-
Preferred ΔΔG identifiers (optional but recommended):
- stability, kcal, ddg, ΔΔG, delta, dG
-
Standard deviation columns are detected automatically if their names contain:
- st. dev, stdev, std, or sd
Basic usage (CSV in current directory)
python3 plot_mavisp_ddg.py -c my_mutations.csvThis will:
- Read my_mutations.csv
- Create an output folder ./my_mutations/
- Generate plots with 19 mutations per figure
CSV in a different directory:
python3 plot_mavisp_ddg.py -c /full/path/to/my_mutations.csvSpecify output directory
python3 plot_mavisp_ddg.py -c my_mutations.csv -o plots/Output will be saved in ./plots/.
Change chunk size
python3 plot_mavisp_ddg.py -c my_mutations.csv -n 19This plots 19 mutations per figure.
Full example
python plot_mavisp_ddg.py \
-c INPP4B-simple_mode.csv \
-n 19 \
-o 19-INPP4B-simple_mode| Flag | Description |
|---|---|
-h, --help |
Show help message and exit |
-c CSV, --csv CSV |
Input CSV file (simple or ensemble mode) |
-o OUT, --out OUT |
Output directory (default: CSV basename in current directory) |
-n CHUNK_SIZE, --chunk-size CHUNK_SIZE |
Number of mutations per plot (default: 19) |
For each chunk of mutations, the script generates two files:
| File | Description |
|---|---|
CSVNAME_01.pdf |
PDF plot for chunk 1 |
CSVNAME_01.png |
High-resolution PNG plot for chunk 1 |
Each plot includes:
- ΔΔG values for all 19 substitutions
- One bar group per method
- Error bars for standard deviation (if present)
- Horizontal reference line at ΔΔG = 0
- Clear legend and axis labels
- Column detection is case-insensitive
- If no standard deviation column is found for a method, error bars are omitted silently
- Multiple ΔΔG columns per method are supported
- Large datasets are automatically split to keep plots readable
- The script does not modify the input CSV
Workflow
- Run MAVISp and generate a CSV
- Verify the CSV contains a Mutation column
- Run plot_mavisp_ddg.py
- Collect PDF/PNG plots from the output directory
This script provides a robust, automated, and reproducible way to visualize mutation stability predictions across multiple computational methods, with minimal assumptions about column naming and strong support for large datasets.