A Python GUI for PDBStat (the C/C++ binary, actively maintained by Roberto Tejero) for NMR protein structure analysis with a graphical interface.
Note: The PDBStat C/C++ binary is a separate download and is not included in this project. It must be installed separately to enable full functionality.
- Python 3.10 or higher (tested on 3.10, 3.11, 3.12)
- The PDBStat C/C++ binary (see below)
- Optional but recommended: Ghostscript (provides
ps2pdffor automatic conversion of PostScript (.ps) plots to PDF)
If Ghostscript is not installed, some plots (e.g., Ramachandran/contact maps) will only be available as .ps files. To enable automatic PDF conversion, install Ghostscript:
macOS:
brew install ghostscriptLinux (Debian/Ubuntu):
sudo apt-get install ghostscriptWindows:
- Download and install from https://www.ghostscript.com/download/gsdnld.html
# Install dependencies
pip install -r requirements.txt
# Run GUI
python pdbstat_gui.py
# Or analyze from command line
python examples/analyze_nmr_ensemble.py data/pdb/2jqn.pdb --backboneβ Complete PDB parsing with multi-model ensemble support β Ensemble analysis - RMSD, RMSF, coordinate precision, well-defined regions β GUI application - File loading, visualization, interactive analysis β 3D Molecular Viewer - Interactive browser-based visualization with 3Dmol.js β PDBStat Binary Integration - Seamlessly integrates with the PDBStat C/C++ binary for core analyses β Quality checks - Ramachandran, chirality, hydrogen atoms, missing atoms β Advanced analysis - NOE restraints, DAOP, order parameters, contact maps β Comprehensive testing - ~1,900 tests with 59% coverage, CI on every commit β Debug console & visibility - Interactive testing and complete logging of PDBStat operations β Clean architecture - Modular design with manager pattern, 64% reduction in main window complexity
All project documentation is located in the docs/ directory.
- docs/USER_GUIDE.md: A comprehensive guide for end-users covering installation, GUI usage, and the command-line interface.
- docs/DEVELOPER_GUIDE.md: The central hub for developers, detailing the development setup, project architecture, and contribution guidelines.
For more specific topics, see the developer sub-directories:
docs/developer/: Contains detailed guides on the command sequence system, the interactive API, and binary integration.docs/architecture/: Contains in-depth architectural reviews, design proposals, and historical decision documents.
CRITICAL PRINCIPLE: All PDBStat binary interaction is driven by declarative configuration, not hard-coded pexpect calls.
Each PDBStat command is modeled as a state machine with:
- States = prompts (with regex patterns and timeouts)
- Transitions = inputs
- Accept state = completion pattern
This architecture ensures:
- β Single source of truth - All command patterns in one place
- β Maintainable - Update config file, not scattered Python code
- β Testable - Mock configurations without the binary
- β Self-documenting - Configuration IS the specification
# Configuration (in command_config.py)
{
"name": "contact",
"prompts": [
{"pattern": r"\?\s*[:_]", "timeout": 10.0}, # Standard prompt
{"pattern": r"Please enter title", "timeout": 5.0}, # Custom prompt
],
}
# Usage (in processing modules)
from pdbstat.integration.pdbstat_interactive import PDBStatInteractive
interactive = PDBStatInteractive(pdbstat_dir)
output = interactive.execute_command_configured(
"contact",
inputs=["coord", "1", "5.0", "heavy", "Title", "*"],
load_pdb=pdb_file,
)- docs/COMMAND_CONFIGURATION_ARCHITECTURE.md - Complete architectural specification
- docs/COMMAND_STATE_MACHINES.md - Visual state diagrams (Mermaid)
- GEMINI.md - Project architectural principles
β Never hard-code prompt patterns in pexpect calls β Never hard-code timeouts β Never bypass the configuration system
β
Always define commands in command_config.py
β
Always use execute_command_configured()
See the architecture docs for complete details and visual state diagrams.
For GUI and application-level consumers, prefer PDBStatFacade as the canonical integration surface. It manages session lifecycle and provides convenience helpers (start_session(), get_interactive(), stop_session(), execute()) that simplify code and testing.
Quick migration example (GUI/application code):
from pathlib import Path
from pdbstat.integration.pdbstat_facade import PDBStatFacade
pdbstat_dir = Path('/path/to/PdbStat5X')
facade = PDBStatFacade(pdbstat_dir)
facade.start_session(timeout=30.0)
try:
interactive = facade.get_interactive()
output = interactive.send_command_with_inputs('info', inputs=['1'], load_pdb=Path('structure.pdb'))
# process output
finally:
facade.stop_session()Notes:
- Processing modules that require low-level, explicit control of the binary should continue to use
PDBStatInteractivedirectly. - Tests that mock/patch
PDBStatInteractivemay either patch the symbol in modules that use it or patchPDBStatFacadewhen testing GUI-level code.
python pdbstat_gui.pyFeatures:
- Load PDB files (File β Open PDB File...)
- Choose atom selection (CA or backbone)
- Run analysis with Python or Legacy implementation
- View statistics, plots, and comparisons
- Debug console for interactive binary testing
The GUI includes an interactive 3D molecular viewer powered by 3Dmol.js:
# Launch GUI and load a structure
python pdbstat_gui.py
# Click "𧬠View in 3D" button in the Analysis sectionFeatures:
- Interactive Controls: Rotate (left-click drag), zoom (scroll), pan (right-click drag)
- Multiple Styles: Cartoon, stick, sphere, and line representations
- Color Schemes: Spectrum, chain, secondary structure, and monochrome
- Additional: Reset view, toggle auto-spin animation
- No Dependencies: Uses system browser, no additional packages required
Browser Compatibility: Works with all modern browsers (Chrome, Firefox, Safari, Edge)
# Basic analysis
python examples/analyze_nmr_ensemble.py structure.pdb
# With backbone atoms (recommended)
python examples/analyze_nmr_ensemble.py structure.pdb --backbone
# Example output:
# Models: 20
# Mean pairwise RMSD: 0.893 Γ
# Well-defined residues: 72.2%from pdbstat.io.pdb_parser import PDBParser
from pdbstat.analysis.ensemble import EnsembleAnalyzer
# Load structure
parser = PDBParser()
ensemble = parser.parse_file("data/pdb/2jqn.pdb")
# Analyze
analyzer = EnsembleAnalyzer(ensemble, parser=parser)
stats = analyzer.calculate_statistics(use_backbone=True)
# Display
print(f"Mean pairwise RMSD: {stats['mean_pairwise_rmsd']:.3f} Γ
")
print(f"RMSD to mean: {stats['rmsd_to_mean']:.3f} Γ
")
print(f"Well-defined: {stats['pct_well_defined']:.1f}%")Control logging via environment variables:
# Enable debug logging to see all PDBStat operations
export PDBSTAT_LOG_LEVEL=DEBUG
python pdbstat_gui.py
# Log to file
export PDBSTAT_LOG_LEVEL=INFO
export PDBSTAT_LOG_FILE=/tmp/pdbstat.log
python pdbstat_gui.pySee docs/developer/binary-integration.md for complete debugging options.
Automated testing runs on every push and pull request via GitHub Actions:
- β ~1,300 tests run automatically on every commit
- β Tests on Python 3.10, 3.11, 3.12
- β Tests on Ubuntu (Linux) and macOS
- β Code quality checks (Black, isort, flake8)
- βοΈ 83 tests auto-skip when PDBStat binary unavailable (run locally)
View test status: GitHub Actions
# Quick test (unit + integration, no binary needed)
pytest tests/unit/ tests/analysis/ tests/core/ tests/geometry/ \
tests/io/ tests/processing/ tests/utils/ tests/edge_cases/ \
tests/integration/ -v
# With coverage report
pytest --cov=pdbstat --cov-report=html
open htmlcov/index.html # View coverage
# Functional tests (requires PDBStat binary)
export PDBSTAT_DIR=/path/to/PdbStat5X
pytest tests/functional/ -v
# All tests (1,436 tests)
pytest tests/ -v| Test Category | Count | Requires Binary | Runs in CI |
|---|---|---|---|
| Unit Tests | ~160 | No | β Yes |
| Analysis Tests | ~45 | No | β Yes |
| Core Tests | ~40 | No | β Yes |
| Geometry Tests | ~30 | No | β Yes |
| I/O Tests | ~35 | No | β Yes |
| Processing Tests | ~40 | No | β Yes |
| Edge Cases | ~40 | No | β Yes |
| Integration (mocked) | ~120 | No | β Yes |
| Integration (binary) | ~60 | Yes | βοΈ Auto-skip |
| Functional Tests | ~40 | Yes | βοΈ Ignored |
| GUI Dialog Tests | ~22 | No | βοΈ Auto-skip |
| Total | ~1,927 | - | ~1,900 in CI |
Coverage: ~59% overall, 90%+ for core analysis modules
Note: Tests requiring the PDBStat binary are automatically skipped in CI but run locally when PDBSTAT_DIR is set.
See docs/architecture/FUNCTIONAL_TEST_COVERAGE.md for command-level details.
The cwd parameter allows you to control where PDBStat writes output files. This is useful for organizing analysis results, preventing directory clutter, and testing.
from pathlib import Path
from pdbstat.processing.pdbstat_engine import PDBStatEngine
# Specify custom working directory
output_dir = Path("./analysis_output")
output_dir.mkdir(exist_ok=True)
# Create engine with custom working directory
engine = PDBStatEngine(
pdbstat_dir=Path("PdbStat5X"),
cwd=output_dir # All PDBStat files written here
)
# Process PDB file
result = engine.process(Path("data/pdb/2jqn.pdb"))- Organize Output: Keep analysis results in dedicated directories
- Prevent Clutter: Avoid filling current directory with output files
- Isolate Runs: Separate different analysis runs into distinct folders
- Testing: Control file locations for predictable test behavior
The cwd parameter propagates through the processing stack:
User Code (cwd=Path("output"))
β
PDBStatEngine(cwd=...)
β
PDBStatFacade(cwd=...)
β
PDBStatInteractive(cwd=...)
β
subprocess.Popen(cwd=str(...))
The PDBStat binary subprocess runs in the specified directory, so all output files are created there.
The cwd parameter is validated when PDBStatInteractive is initialized:
- Must exist (raises
FileNotFoundErrorif not) - Must be a directory (raises
NotADirectoryErrorif file) - Can be
None(uses process default)
from pathlib import Path
from pdbstat.processing.pdbstat_engine import PDBStatEngine
pdb_files = Path("data/pdb").glob("*.pdb")
pdbstat_dir = Path("PdbStat5X")
for pdb_file in pdb_files:
# Create output directory for each PDB
output_dir = Path("results") / pdb_file.stem
output_dir.mkdir(parents=True, exist_ok=True)
# Process with isolated output
engine = PDBStatEngine(pdbstat_dir=pdbstat_dir, cwd=output_dir)
result = engine.process(pdb_file)
print(f"Processed {pdb_file.name} β {output_dir}")For architectural details, see docs/COMMAND_CONFIGURATION_ARCHITECTURE.md.
This project provides a Python GUI for the PDBStat C/C++ binary (actively maintained by Roberto Tejero), making its powerful features more accessible. It also includes some complementary Python-based analysis features.
- 3D Molecular Viewer - Interactive browser-based visualization with 3Dmol.js (no dependencies)
- GitHub Actions CI - Automated testing on every commit (1,400+ tests, Python 3.10-3.12, Linux/macOS)
- Command Sequence System - 160 tests for all 18 PDBStat commands with full prompt handling
- Phase 3 Refactoring - Extracted 4 manager classes, reduced main window complexity by 64%
- Functional Test Suite - 40 tests validating real PDBStat binary command execution
- Visibility Tools - Complete logging and debugging capabilities for PDBStat operations
- DAOP Analysis - Dihedral Angle Order Parameters implementation
- Order Parameters - Residue ordering analysis based on RMSD thresholds
- Contact Maps - Residue-residue contact map generation and visualization
- Nomenclature Conversion - IUPAC β XPLOR atom naming conversion
- FindCore algorithm
- Additional file formats (mmCIF)
- Batch processing for multiple structures
- Additional quality validation metrics
If you use pdbstat in your research, please cite:
PDBStat:
@article{tejero2013pdbstat,
title={PDBStat: a universal restraint converter and restraint analysis software package for protein NMR},
author={Tejero, Roberto and Snyder, David and Mao, Binchen and Aramini, James M and Montelione, Gaetano T},
journal={Journal of Biomolecular NMR},
volume={56},
number={4},
pages={337--351},
year={2013},
publisher={Springer}
}- PDBStat authors: Roberto Tejero, David Snyder, Binchen Mao, James M. Aramini, Gaetano T. Montelione
- Northeast Structural Genomics Consortium (NESG)
- Rutgers University Center for Advanced Biotechnology and Medicine
Status: π§ Beta - Feature Complete, Comprehensive Testing
Current Version: 0.4.0 (CI/CD & Command Sequence System)
Testing: ~1,900 automated tests with 59% coverage, CI on every commit (Python 3.10-3.12, Linux/macOS)
Architecture: Clean, modular design with manager pattern. Main window reduced from 4,020 to 1,442 lines (64% reduction).