MolSAIC (Molecular/Materials Structure Assembly, I/O, and Composition) is a Python codebase for building, transforming, and validating atomistic structures using a small, stable library surface plus plain Python “workspaces” (simple run.py scripts) that produce reproducible artifacts.
The V4 philosophy is intentionally simple:
- Code-first orchestration (no plugin registries, no DAG engines, no required YAML).
- Determinism-first behavior (stable ordering, explicit working directories, validated outputs).
- Composable building blocks (structure model + parameter packages + thin wrappers for external tools).
MolSAIC is:
- A set of deterministic primitives for structure I/O and structure manipulation.
- A set of thin, testable wrappers around third-party executables (e.g., Packmol / MSI converters).
- A collection of workspace examples that demonstrate end-to-end pipelines with reproducible outputs.
MolSAIC is not:
- A workflow engine or registry-driven framework.
- A monolithic “do everything” chemistry stack.
- A replacement for full crystallography suites (those can be layered on top and converted into MolSAIC structures).
MolSAIC V4 is organized around a few intentionally small packages:
USM is the structure kernel: a dependency-light Python library that represents atoms + bonds + periodic cell, provides deterministic operations, and reads/writes common structure formats.
- In this repo, USM is included as a git submodule at
src/usm/. - Docs live in the USM submodule:
src/usm/docs/.
Key entry points (examples):
- CAR/MDF/PDB I/O:
src/usm/io/ - Deterministic ops (select/transform/replicate/merge/compose/...):
src/usm/ops/ - Core data model:
src/usm/core/model.py
MolSAIC provides wrappers for third-party tools with a consistent “result envelope”:
- deterministic CWD
- PATH augmentation for linker stability
- output existence + non-empty validation
- best-effort tool version capture
See:
- Contract:
docs/ADAPTER_CONTRACT.md - Helpers:
src/external/adapter.py - Wrappers:
src/external/packmol.py,src/external/msi2namd.py,src/external/msi2lmp.py
Workspaces are plain directories under workspaces/ containing:
run.py(the orchestrator)config.json(small, explicit config)outputs/(generated artifacts; typically containssummary.json)
This model keeps pipelines transparent and easy to fork without framework lock-in.
Start here (canonical):
- Quickstart (end-to-end example):
docs/QUICKSTART.md - Workspace contract + determinism conventions:
docs/WORKFLOWS.md - External wrapper contract:
docs/ADAPTER_CONTRACT.md - Acceptance criteria (AS2 hydration):
docs/ACCEPTANCE.md - Alumina “ions in water” workflows:
docs/ions-in-water-workflows.md - Manifest schema:
docs/manifest.v1.schema.json
Workspaces (entry points)
- Alumina:
workspaces/alumina/- AS2 hydration:
workspaces/alumina/alumina_AS2_hydration_v1/ - AS2/AS5/AS10/AS12 ions pipelines:
workspaces/alumina/*_ions_v1/ - Case studies:
workspaces/alumina/cases/
- AS2 hydration:
- MXenes:
workspaces/mxenes/- Analysis:
workspaces/mxenes/analysis/ - Fluorination and hydrated systems:
workspaces/mxenes/fluorinated/,workspaces/mxenes/hydrated/
- Analysis:
- NIST:
workspaces/NIST/- CIF import demo:
workspaces/NIST/nist_calf20_cif_import_v1/
- CIF import demo:
UPM is a standalone Python library + CLI for managing, validating, and exporting force-field parameter packages (starting with Materials Studio / BIOSYM-style *.frc used by msi2lmp).
- In this repo, UPM is included as a git submodule at
src/upm/with its ownpyproject.toml. - UPM is intentionally installed separately from the MolSAIC root package.
See: src/upm/README.md
If you’re cloning fresh, initialize submodules (USM + UPM):
git clone git@github.com:fl-sean03/MolSAICV4.git
cd MolSAICV4
git submodule update --init --recursiveFrom repository root:
python -m pip install -e .cd workspaces/alumina/alumina_AS2_hydration_v1
python run.py --config ./config.json- End-to-end guide:
docs/QUICKSTART.md - Deterministic conventions and step mapping:
docs/WORKFLOWS.md
UPM has its own packaging metadata and Python requirement (see src/upm/pyproject.toml).
From repository root:
python -m pip install -e ./src/upmMolSAIC V4 emphasizes determinism for both library operations and workflows:
- Stable table ordering and renumbering in structure operations.
- Seed capture where applicable (e.g., Packmol seed injection).
- Deterministic working directories for external tools (no implicit global state).
- Validation that required outputs are created and non-empty.
- Optional
outputs/summary.jsonmanifests for machine-consumable run summaries.
Key docs:
- External adapter contract:
docs/ADAPTER_CONTRACT.md - Manifest schema:
docs/manifest.v1.schema.json
src/src/molsaic/— small shared utilities (workspace helpers, etc.)src/usm/— USM submodule (structure model + I/O + ops)src/external/— external tool adapter contract + wrapperssrc/pm2mdfcar/— composition utilities used by hydration/ions workspacessrc/upm/— UPM standalone package (separate install)
docs/— MolSAIC V4 documentationworkspaces/— runnable pipelines and analysis scriptstests/— unit + integration tests
The intended extension points are straightforward:
- Add a new workspace
- Copy the workspace template and customize inputs/steps.
- Write artifacts under
outputs/and (optionally) a manifestsummary.json. - Document it in
docs/if it becomes a canonical workflow.
- Add a new external adapter
- Follow the envelope contract in
docs/ADAPTER_CONTRACT.md. - Implement
run(...)->dictreturning a deterministic result (argv/cwd/outputs/stderr/stdout/duration/version). - Add unit tests that validate schema keys and output validation behavior.
- Extend structure capabilities
- Extend USM (in the USM repo/submodule) for new I/O formats or deterministic operations.
- Keep project-specific policies in workspaces, not in USM core.
This repository is under active development. The V4 docs under docs/ reflect the current code-first architecture and are intended to be the canonical reference for MolSAIC V4.