Skip to content

elkins/pdbstat-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PDBStat Python GUI

A Python GUI for PDBStat (the C/C++ binary, actively maintained by Roberto Tejero) for NMR protein structure analysis with a graphical interface.

Note: The PDBStat C/C++ binary is a separate download and is not included in this project. It must be installed separately to enable full functionality.

Python 3.10+ License: MIT Tests Code Coverage Code style: black

System Requirements

  • Python 3.10 or higher (tested on 3.10, 3.11, 3.12)
  • The PDBStat C/C++ binary (see below)
  • Optional but recommended: Ghostscript (provides ps2pdf for automatic conversion of PostScript (.ps) plots to PDF)

If Ghostscript is not installed, some plots (e.g., Ramachandran/contact maps) will only be available as .ps files. To enable automatic PDF conversion, install Ghostscript:

macOS:

brew install ghostscript

Linux (Debian/Ubuntu):

sudo apt-get install ghostscript

Windows:


Quick Start

# Install dependencies
pip install -r requirements.txt

# Run GUI
python pdbstat_gui.py

# Or analyze from command line
python examples/analyze_nmr_ensemble.py data/pdb/2jqn.pdb --backbone

Features

βœ… Complete PDB parsing with multi-model ensemble support βœ… Ensemble analysis - RMSD, RMSF, coordinate precision, well-defined regions βœ… GUI application - File loading, visualization, interactive analysis βœ… 3D Molecular Viewer - Interactive browser-based visualization with 3Dmol.js βœ… PDBStat Binary Integration - Seamlessly integrates with the PDBStat C/C++ binary for core analyses βœ… Quality checks - Ramachandran, chirality, hydrogen atoms, missing atoms βœ… Advanced analysis - NOE restraints, DAOP, order parameters, contact maps βœ… Comprehensive testing - ~1,900 tests with 59% coverage, CI on every commit βœ… Debug console & visibility - Interactive testing and complete logging of PDBStat operations βœ… Clean architecture - Modular design with manager pattern, 64% reduction in main window complexity

Documentation

All project documentation is located in the docs/ directory.

  • docs/USER_GUIDE.md: A comprehensive guide for end-users covering installation, GUI usage, and the command-line interface.
  • docs/DEVELOPER_GUIDE.md: The central hub for developers, detailing the development setup, project architecture, and contribution guidelines.

For more specific topics, see the developer sub-directories:

  • docs/developer/: Contains detailed guides on the command sequence system, the interactive API, and binary integration.
  • docs/architecture/: Contains in-depth architectural reviews, design proposals, and historical decision documents.

⚠️ Core Architecture: Configuration-Driven Command Execution

CRITICAL PRINCIPLE: All PDBStat binary interaction is driven by declarative configuration, not hard-coded pexpect calls.

Why This Matters

Each PDBStat command is modeled as a state machine with:

  • States = prompts (with regex patterns and timeouts)
  • Transitions = inputs
  • Accept state = completion pattern

This architecture ensures:

  • βœ… Single source of truth - All command patterns in one place
  • βœ… Maintainable - Update config file, not scattered Python code
  • βœ… Testable - Mock configurations without the binary
  • βœ… Self-documenting - Configuration IS the specification

Example: Contact Command

# Configuration (in command_config.py)
{
    "name": "contact",
    "prompts": [
        {"pattern": r"\?\s*[:_]", "timeout": 10.0},  # Standard prompt
        {"pattern": r"Please enter title", "timeout": 5.0},  # Custom prompt
    ],
}

# Usage (in processing modules)
from pdbstat.integration.pdbstat_interactive import PDBStatInteractive

interactive = PDBStatInteractive(pdbstat_dir)
output = interactive.execute_command_configured(
    "contact",
    inputs=["coord", "1", "5.0", "heavy", "Title", "*"],
    load_pdb=pdb_file,
)

Documentation

Anti-Patterns to Avoid

❌ Never hard-code prompt patterns in pexpect calls ❌ Never hard-code timeouts ❌ Never bypass the configuration system

βœ… Always define commands in command_config.py βœ… Always use execute_command_configured()

See the architecture docs for complete details and visual state diagrams.


Integration: Preferred API (PDBStatFacade) πŸ”§

For GUI and application-level consumers, prefer PDBStatFacade as the canonical integration surface. It manages session lifecycle and provides convenience helpers (start_session(), get_interactive(), stop_session(), execute()) that simplify code and testing.

Quick migration example (GUI/application code):

from pathlib import Path
from pdbstat.integration.pdbstat_facade import PDBStatFacade

pdbstat_dir = Path('/path/to/PdbStat5X')
facade = PDBStatFacade(pdbstat_dir)
facade.start_session(timeout=30.0)
try:
    interactive = facade.get_interactive()
    output = interactive.send_command_with_inputs('info', inputs=['1'], load_pdb=Path('structure.pdb'))
    # process output
finally:
    facade.stop_session()

Notes:

  • Processing modules that require low-level, explicit control of the binary should continue to use PDBStatInteractive directly.
  • Tests that mock/patch PDBStatInteractive may either patch the symbol in modules that use it or patch PDBStatFacade when testing GUI-level code.

Example Usage

GUI Application

python pdbstat_gui.py

Features:

  • Load PDB files (File β†’ Open PDB File...)
  • Choose atom selection (CA or backbone)
  • Run analysis with Python or Legacy implementation
  • View statistics, plots, and comparisons
  • Debug console for interactive binary testing

3D Molecular Viewer

The GUI includes an interactive 3D molecular viewer powered by 3Dmol.js:

# Launch GUI and load a structure
python pdbstat_gui.py

# Click "🧬 View in 3D" button in the Analysis section

Features:

  • Interactive Controls: Rotate (left-click drag), zoom (scroll), pan (right-click drag)
  • Multiple Styles: Cartoon, stick, sphere, and line representations
  • Color Schemes: Spectrum, chain, secondary structure, and monochrome
  • Additional: Reset view, toggle auto-spin animation
  • No Dependencies: Uses system browser, no additional packages required

Browser Compatibility: Works with all modern browsers (Chrome, Firefox, Safari, Edge)

Command Line

# Basic analysis
python examples/analyze_nmr_ensemble.py structure.pdb

# With backbone atoms (recommended)
python examples/analyze_nmr_ensemble.py structure.pdb --backbone

# Example output:
# Models: 20
# Mean pairwise RMSD: 0.893 Γ…
# Well-defined residues: 72.2%

Python API

from pdbstat.io.pdb_parser import PDBParser
from pdbstat.analysis.ensemble import EnsembleAnalyzer

# Load structure
parser = PDBParser()
ensemble = parser.parse_file("data/pdb/2jqn.pdb")

# Analyze
analyzer = EnsembleAnalyzer(ensemble, parser=parser)
stats = analyzer.calculate_statistics(use_backbone=True)

# Display
print(f"Mean pairwise RMSD: {stats['mean_pairwise_rmsd']:.3f} Γ…")
print(f"RMSD to mean: {stats['rmsd_to_mean']:.3f} Γ…")
print(f"Well-defined: {stats['pct_well_defined']:.1f}%")

Configuration

Control logging via environment variables:

# Enable debug logging to see all PDBStat operations
export PDBSTAT_LOG_LEVEL=DEBUG
python pdbstat_gui.py

# Log to file
export PDBSTAT_LOG_LEVEL=INFO
export PDBSTAT_LOG_FILE=/tmp/pdbstat.log
python pdbstat_gui.py

See docs/developer/binary-integration.md for complete debugging options.

Testing

Continuous Integration

Automated testing runs on every push and pull request via GitHub Actions:

  • βœ… ~1,300 tests run automatically on every commit
  • βœ… Tests on Python 3.10, 3.11, 3.12
  • βœ… Tests on Ubuntu (Linux) and macOS
  • βœ… Code quality checks (Black, isort, flake8)
  • ⏭️ 83 tests auto-skip when PDBStat binary unavailable (run locally)

View test status: GitHub Actions

Running Tests Locally

# Quick test (unit + integration, no binary needed)
pytest tests/unit/ tests/analysis/ tests/core/ tests/geometry/ \
       tests/io/ tests/processing/ tests/utils/ tests/edge_cases/ \
       tests/integration/ -v

# With coverage report
pytest --cov=pdbstat --cov-report=html
open htmlcov/index.html  # View coverage

# Functional tests (requires PDBStat binary)
export PDBSTAT_DIR=/path/to/PdbStat5X
pytest tests/functional/ -v

# All tests (1,436 tests)
pytest tests/ -v

Test Suite Overview

Test Category Count Requires Binary Runs in CI
Unit Tests ~160 No βœ… Yes
Analysis Tests ~45 No βœ… Yes
Core Tests ~40 No βœ… Yes
Geometry Tests ~30 No βœ… Yes
I/O Tests ~35 No βœ… Yes
Processing Tests ~40 No βœ… Yes
Edge Cases ~40 No βœ… Yes
Integration (mocked) ~120 No βœ… Yes
Integration (binary) ~60 Yes ⏭️ Auto-skip
Functional Tests ~40 Yes ⏭️ Ignored
GUI Dialog Tests ~22 No ⏭️ Auto-skip
Total ~1,927 - ~1,900 in CI

Coverage: ~59% overall, 90%+ for core analysis modules

Note: Tests requiring the PDBStat binary are automatically skipped in CI but run locally when PDBSTAT_DIR is set.

See docs/architecture/FUNCTIONAL_TEST_COVERAGE.md for command-level details.

Working Directory Control

The cwd parameter allows you to control where PDBStat writes output files. This is useful for organizing analysis results, preventing directory clutter, and testing.

Basic Usage

from pathlib import Path
from pdbstat.processing.pdbstat_engine import PDBStatEngine

# Specify custom working directory
output_dir = Path("./analysis_output")
output_dir.mkdir(exist_ok=True)

# Create engine with custom working directory
engine = PDBStatEngine(
    pdbstat_dir=Path("PdbStat5X"),
    cwd=output_dir  # All PDBStat files written here
)

# Process PDB file
result = engine.process(Path("data/pdb/2jqn.pdb"))

Use Cases

  • Organize Output: Keep analysis results in dedicated directories
  • Prevent Clutter: Avoid filling current directory with output files
  • Isolate Runs: Separate different analysis runs into distinct folders
  • Testing: Control file locations for predictable test behavior

How It Works

The cwd parameter propagates through the processing stack:

User Code (cwd=Path("output"))
  ↓
PDBStatEngine(cwd=...)
  ↓
PDBStatFacade(cwd=...)
  ↓
PDBStatInteractive(cwd=...)
  ↓
subprocess.Popen(cwd=str(...))

The PDBStat binary subprocess runs in the specified directory, so all output files are created there.

Validation

The cwd parameter is validated when PDBStatInteractive is initialized:

  • Must exist (raises FileNotFoundError if not)
  • Must be a directory (raises NotADirectoryError if file)
  • Can be None (uses process default)

Example: Batch Processing

from pathlib import Path
from pdbstat.processing.pdbstat_engine import PDBStatEngine

pdb_files = Path("data/pdb").glob("*.pdb")
pdbstat_dir = Path("PdbStat5X")

for pdb_file in pdb_files:
    # Create output directory for each PDB
    output_dir = Path("results") / pdb_file.stem
    output_dir.mkdir(parents=True, exist_ok=True)

    # Process with isolated output
    engine = PDBStatEngine(pdbstat_dir=pdbstat_dir, cwd=output_dir)
    result = engine.process(pdb_file)

    print(f"Processed {pdb_file.name} β†’ {output_dir}")

For architectural details, see docs/COMMAND_CONFIGURATION_ARCHITECTURE.md.

Project Status

This project provides a Python GUI for the PDBStat C/C++ binary (actively maintained by Roberto Tejero), making its powerful features more accessible. It also includes some complementary Python-based analysis features.

Recently Completed βœ…

  • 3D Molecular Viewer - Interactive browser-based visualization with 3Dmol.js (no dependencies)
  • GitHub Actions CI - Automated testing on every commit (1,400+ tests, Python 3.10-3.12, Linux/macOS)
  • Command Sequence System - 160 tests for all 18 PDBStat commands with full prompt handling
  • Phase 3 Refactoring - Extracted 4 manager classes, reduced main window complexity by 64%
  • Functional Test Suite - 40 tests validating real PDBStat binary command execution
  • Visibility Tools - Complete logging and debugging capabilities for PDBStat operations
  • DAOP Analysis - Dihedral Angle Order Parameters implementation
  • Order Parameters - Residue ordering analysis based on RMSD thresholds
  • Contact Maps - Residue-residue contact map generation and visualization
  • Nomenclature Conversion - IUPAC ↔ XPLOR atom naming conversion

Planned πŸ“‹

  • FindCore algorithm
  • Additional file formats (mmCIF)
  • Batch processing for multiple structures
  • Additional quality validation metrics

Citation

If you use pdbstat in your research, please cite:

PDBStat:

@article{tejero2013pdbstat,
  title={PDBStat: a universal restraint converter and restraint analysis software package for protein NMR},
  author={Tejero, Roberto and Snyder, David and Mao, Binchen and Aramini, James M and Montelione, Gaetano T},
  journal={Journal of Biomolecular NMR},
  volume={56},
  number={4},
  pages={337--351},
  year={2013},
  publisher={Springer}
}

Acknowledgments

  • PDBStat authors: Roberto Tejero, David Snyder, Binchen Mao, James M. Aramini, Gaetano T. Montelione
  • Northeast Structural Genomics Consortium (NESG)
  • Rutgers University Center for Advanced Biotechnology and Medicine

Status: 🚧 Beta - Feature Complete, Comprehensive Testing

Current Version: 0.4.0 (CI/CD & Command Sequence System)

Testing: ~1,900 automated tests with 59% coverage, CI on every commit (Python 3.10-3.12, Linux/macOS)

Architecture: Clean, modular design with manager pattern. Main window reduced from 4,020 to 1,442 lines (64% reduction).

About

Python GUI for PDBStat Utility

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published