PDBStat Python GUI

A Python GUI for PDBStat (the C/C++ binary, actively maintained by Roberto Tejero) for NMR protein structure analysis with a graphical interface.

Note: The PDBStat C/C++ binary is a separate download and is not included in this project. It must be installed separately to enable full functionality.

System Requirements

Python 3.10 or higher (tested on 3.10, 3.11, 3.12)
The PDBStat C/C++ binary (see below)
Optional but recommended: Ghostscript (provides ps2pdf for automatic conversion of PostScript (.ps) plots to PDF)

If Ghostscript is not installed, some plots (e.g., Ramachandran/contact maps) will only be available as .ps files. To enable automatic PDF conversion, install Ghostscript:

macOS:

brew install ghostscript

Linux (Debian/Ubuntu):

sudo apt-get install ghostscript

Windows:

Download and install from https://www.ghostscript.com/download/gsdnld.html

Quick Start

# Install dependencies
pip install -r requirements.txt

# Run GUI
python pdbstat_gui.py

# Or analyze from command line
python examples/analyze_nmr_ensemble.py data/pdb/2jqn.pdb --backbone

Features

✅ Complete PDB parsing with multi-model ensemble support ✅ Ensemble analysis - RMSD, RMSF, coordinate precision, well-defined regions ✅ GUI application - File loading, visualization, interactive analysis ✅ 3D Molecular Viewer - Interactive browser-based visualization with 3Dmol.js ✅ PDBStat Binary Integration - Seamlessly integrates with the PDBStat C/C++ binary for core analyses ✅ Quality checks - Ramachandran, chirality, hydrogen atoms, missing atoms ✅ Advanced analysis - NOE restraints, DAOP, order parameters, contact maps ✅ Comprehensive testing - ~1,900 tests with 59% coverage, CI on every commit ✅ Debug console & visibility - Interactive testing and complete logging of PDBStat operations ✅ Clean architecture - Modular design with manager pattern, 64% reduction in main window complexity

Documentation

All project documentation is located in the docs/ directory.

docs/USER_GUIDE.md: A comprehensive guide for end-users covering installation, GUI usage, and the command-line interface.
docs/DEVELOPER_GUIDE.md: The central hub for developers, detailing the development setup, project architecture, and contribution guidelines.

For more specific topics, see the developer sub-directories:

docs/developer/: Contains detailed guides on the command sequence system, the interactive API, and binary integration.
docs/architecture/: Contains in-depth architectural reviews, design proposals, and historical decision documents.

⚠️ Core Architecture: Configuration-Driven Command Execution

CRITICAL PRINCIPLE: All PDBStat binary interaction is driven by declarative configuration, not hard-coded pexpect calls.

Why This Matters

Each PDBStat command is modeled as a state machine with:

States = prompts (with regex patterns and timeouts)
Transitions = inputs
Accept state = completion pattern

This architecture ensures:

✅ Single source of truth - All command patterns in one place
✅ Maintainable - Update config file, not scattered Python code
✅ Testable - Mock configurations without the binary
✅ Self-documenting - Configuration IS the specification

Example: Contact Command

# Configuration (in command_config.py)
{
    "name": "contact",
    "prompts": [
        {"pattern": r"\?\s*[:_]", "timeout": 10.0},  # Standard prompt
        {"pattern": r"Please enter title", "timeout": 5.0},  # Custom prompt
    ],
}

# Usage (in processing modules)
from pdbstat.integration.pdbstat_interactive import PDBStatInteractive

interactive = PDBStatInteractive(pdbstat_dir)
output = interactive.execute_command_configured(
    "contact",
    inputs=["coord", "1", "5.0", "heavy", "Title", "*"],
    load_pdb=pdb_file,
)

Documentation

docs/COMMAND_CONFIGURATION_ARCHITECTURE.md - Complete architectural specification
docs/COMMAND_STATE_MACHINES.md - Visual state diagrams (Mermaid)
GEMINI.md - Project architectural principles

Anti-Patterns to Avoid

❌ Never hard-code prompt patterns in pexpect calls ❌ Never hard-code timeouts ❌ Never bypass the configuration system

✅ Always define commands in command_config.py ✅ Always use execute_command_configured()

See the architecture docs for complete details and visual state diagrams.

Integration: Preferred API (PDBStatFacade) 🔧

For GUI and application-level consumers, prefer PDBStatFacade as the canonical integration surface. It manages session lifecycle and provides convenience helpers (start_session(), get_interactive(), stop_session(), execute()) that simplify code and testing.

Quick migration example (GUI/application code):

from pathlib import Path
from pdbstat.integration.pdbstat_facade import PDBStatFacade

pdbstat_dir = Path('/path/to/PdbStat5X')
facade = PDBStatFacade(pdbstat_dir)
facade.start_session(timeout=30.0)
try:
    interactive = facade.get_interactive()
    output = interactive.send_command_with_inputs('info', inputs=['1'], load_pdb=Path('structure.pdb'))
    # process output
finally:
    facade.stop_session()

Notes:

Processing modules that require low-level, explicit control of the binary should continue to use PDBStatInteractive directly.
Tests that mock/patch PDBStatInteractive may either patch the symbol in modules that use it or patch PDBStatFacade when testing GUI-level code.

Example Usage

GUI Application

python pdbstat_gui.py

Features:

Load PDB files (File → Open PDB File...)
Choose atom selection (CA or backbone)
Run analysis with Python or Legacy implementation
View statistics, plots, and comparisons
Debug console for interactive binary testing

3D Molecular Viewer

The GUI includes an interactive 3D molecular viewer powered by 3Dmol.js:

# Launch GUI and load a structure
python pdbstat_gui.py

# Click "🧬 View in 3D" button in the Analysis section

Features:

Interactive Controls: Rotate (left-click drag), zoom (scroll), pan (right-click drag)
Multiple Styles: Cartoon, stick, sphere, and line representations
Color Schemes: Spectrum, chain, secondary structure, and monochrome
Additional: Reset view, toggle auto-spin animation
No Dependencies: Uses system browser, no additional packages required

Browser Compatibility: Works with all modern browsers (Chrome, Firefox, Safari, Edge)

Command Line

# Basic analysis
python examples/analyze_nmr_ensemble.py structure.pdb

# With backbone atoms (recommended)
python examples/analyze_nmr_ensemble.py structure.pdb --backbone

# Example output:
# Models: 20
# Mean pairwise RMSD: 0.893 Å
# Well-defined residues: 72.2%

Python API

from pdbstat.io.pdb_parser import PDBParser
from pdbstat.analysis.ensemble import EnsembleAnalyzer

# Load structure
parser = PDBParser()
ensemble = parser.parse_file("data/pdb/2jqn.pdb")

# Analyze
analyzer = EnsembleAnalyzer(ensemble, parser=parser)
stats = analyzer.calculate_statistics(use_backbone=True)

# Display
print(f"Mean pairwise RMSD: {stats['mean_pairwise_rmsd']:.3f} Å")
print(f"RMSD to mean: {stats['rmsd_to_mean']:.3f} Å")
print(f"Well-defined: {stats['pct_well_defined']:.1f}%")

Configuration

Control logging via environment variables:

# Enable debug logging to see all PDBStat operations
export PDBSTAT_LOG_LEVEL=DEBUG
python pdbstat_gui.py

# Log to file
export PDBSTAT_LOG_LEVEL=INFO
export PDBSTAT_LOG_FILE=/tmp/pdbstat.log
python pdbstat_gui.py

See docs/developer/binary-integration.md for complete debugging options.

Testing

Continuous Integration

Automated testing runs on every push and pull request via GitHub Actions:

✅ ~1,300 tests run automatically on every commit
✅ Tests on Python 3.10, 3.11, 3.12
✅ Tests on Ubuntu (Linux) and macOS
✅ Code quality checks (Black, isort, flake8)
⏭️ 83 tests auto-skip when PDBStat binary unavailable (run locally)

View test status: GitHub Actions

Running Tests Locally

# Quick test (unit + integration, no binary needed)
pytest tests/unit/ tests/analysis/ tests/core/ tests/geometry/ \
       tests/io/ tests/processing/ tests/utils/ tests/edge_cases/ \
       tests/integration/ -v

# With coverage report
pytest --cov=pdbstat --cov-report=html
open htmlcov/index.html  # View coverage

# Functional tests (requires PDBStat binary)
export PDBSTAT_DIR=/path/to/PdbStat5X
pytest tests/functional/ -v

# All tests (1,436 tests)
pytest tests/ -v

Test Suite Overview

Test Category	Count	Requires Binary	Runs in CI
Unit Tests	~160	No	✅ Yes
Analysis Tests	~45	No	✅ Yes
Core Tests	~40	No	✅ Yes
Geometry Tests	~30	No	✅ Yes
I/O Tests	~35	No	✅ Yes
Processing Tests	~40	No	✅ Yes
Edge Cases	~40	No	✅ Yes
Integration (mocked)	~120	No	✅ Yes
Integration (binary)	~60	Yes	⏭️ Auto-skip
Functional Tests	~40	Yes	⏭️ Ignored
GUI Dialog Tests	~22	No	⏭️ Auto-skip
Total	~1,927	-	~1,900 in CI

Coverage: ~59% overall, 90%+ for core analysis modules

Note: Tests requiring the PDBStat binary are automatically skipped in CI but run locally when PDBSTAT_DIR is set.

See docs/architecture/FUNCTIONAL_TEST_COVERAGE.md for command-level details.

Working Directory Control

The cwd parameter allows you to control where PDBStat writes output files. This is useful for organizing analysis results, preventing directory clutter, and testing.

Basic Usage

from pathlib import Path
from pdbstat.processing.pdbstat_engine import PDBStatEngine

# Specify custom working directory
output_dir = Path("./analysis_output")
output_dir.mkdir(exist_ok=True)

# Create engine with custom working directory
engine = PDBStatEngine(
    pdbstat_dir=Path("PdbStat5X"),
    cwd=output_dir  # All PDBStat files written here
)

# Process PDB file
result = engine.process(Path("data/pdb/2jqn.pdb"))

Use Cases

Organize Output: Keep analysis results in dedicated directories
Prevent Clutter: Avoid filling current directory with output files
Isolate Runs: Separate different analysis runs into distinct folders
Testing: Control file locations for predictable test behavior

How It Works

The cwd parameter propagates through the processing stack:

User Code (cwd=Path("output"))
  ↓
PDBStatEngine(cwd=...)
  ↓
PDBStatFacade(cwd=...)
  ↓
PDBStatInteractive(cwd=...)
  ↓
subprocess.Popen(cwd=str(...))

The PDBStat binary subprocess runs in the specified directory, so all output files are created there.

Validation

The cwd parameter is validated when PDBStatInteractive is initialized:

Must exist (raises FileNotFoundError if not)
Must be a directory (raises NotADirectoryError if file)
Can be None (uses process default)

Example: Batch Processing

from pathlib import Path
from pdbstat.processing.pdbstat_engine import PDBStatEngine

pdb_files = Path("data/pdb").glob("*.pdb")
pdbstat_dir = Path("PdbStat5X")

for pdb_file in pdb_files:
    # Create output directory for each PDB
    output_dir = Path("results") / pdb_file.stem
    output_dir.mkdir(parents=True, exist_ok=True)

    # Process with isolated output
    engine = PDBStatEngine(pdbstat_dir=pdbstat_dir, cwd=output_dir)
    result = engine.process(pdb_file)

    print(f"Processed {pdb_file.name} → {output_dir}")

For architectural details, see docs/COMMAND_CONFIGURATION_ARCHITECTURE.md.

Project Status

This project provides a Python GUI for the PDBStat C/C++ binary (actively maintained by Roberto Tejero), making its powerful features more accessible. It also includes some complementary Python-based analysis features.

Recently Completed ✅

3D Molecular Viewer - Interactive browser-based visualization with 3Dmol.js (no dependencies)
GitHub Actions CI - Automated testing on every commit (1,400+ tests, Python 3.10-3.12, Linux/macOS)
Command Sequence System - 160 tests for all 18 PDBStat commands with full prompt handling
Phase 3 Refactoring - Extracted 4 manager classes, reduced main window complexity by 64%
Functional Test Suite - 40 tests validating real PDBStat binary command execution
Visibility Tools - Complete logging and debugging capabilities for PDBStat operations
DAOP Analysis - Dihedral Angle Order Parameters implementation
Order Parameters - Residue ordering analysis based on RMSD thresholds
Contact Maps - Residue-residue contact map generation and visualization
Nomenclature Conversion - IUPAC ↔ XPLOR atom naming conversion

Planned 📋

FindCore algorithm
Additional file formats (mmCIF)
Batch processing for multiple structures
Additional quality validation metrics

Citation

If you use pdbstat in your research, please cite:

PDBStat:

@article{tejero2013pdbstat,
  title={PDBStat: a universal restraint converter and restraint analysis software package for protein NMR},
  author={Tejero, Roberto and Snyder, David and Mao, Binchen and Aramini, James M and Montelione, Gaetano T},
  journal={Journal of Biomolecular NMR},
  volume={56},
  number={4},
  pages={337--351},
  year={2013},
  publisher={Springer}
}

Acknowledgments

PDBStat authors: Roberto Tejero, David Snyder, Binchen Mao, James M. Aramini, Gaetano T. Montelione
Northeast Structural Genomics Consortium (NESG)
Rutgers University Center for Advanced Biotechnology and Medicine

Status: 🚧 Beta - Feature Complete, Comprehensive Testing

Current Version: 0.4.0 (CI/CD & Command Sequence System)

Testing: ~1,900 automated tests with 59% coverage, CI on every commit (Python 3.10-3.12, Linux/macOS)

Architecture: Clean, modular design with manager pattern. Main window reduced from 4,020 to 1,442 lines (64% reduction).

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
data		data
docs		docs
examples		examples
src/pdbstat		src/pdbstat
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
classes.pdf		classes.pdf
classes.png		classes.png
mypy.ini		mypy.ini
packages.pdf		packages.pdf
packages.png		packages.png
pdbstat_gui.py		pdbstat_gui.py
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

License

elkins/pdbstat-python

Folders and files

Latest commit

History

Repository files navigation

PDBStat Python GUI

System Requirements

Quick Start

Features

Documentation

⚠️ Core Architecture: Configuration-Driven Command Execution

Why This Matters

Example: Contact Command

Documentation

Anti-Patterns to Avoid

Integration: Preferred API (PDBStatFacade) 🔧

Example Usage

GUI Application

3D Molecular Viewer

Command Line

Python API

Configuration

Testing

Continuous Integration

Running Tests Locally

Test Suite Overview

Working Directory Control

Basic Usage

Use Cases

How It Works

Validation

Example: Batch Processing

Project Status

Recently Completed ✅

Planned 📋

Citation

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages