A YAML-based hierarchical configuration parser for machine learning projects from 2018-2020,
demonstrating independent convergent evolution of design patterns later popularized by Hydra and OmegaConf.
This configuration management system was originally developed starting in June 2018, fifteen months before the first public release of Hydra (September 2019) and two years before OmegaConf reached stability (November 2020). This codebase independently arrived at the same design and usage principles that would later define the industry standard approach to ML experiment configuration.
In 2017-2018, the ML community was grappling with configuration management complexity as experiments became more sophisticated. Training runs required managing dozens of hyperparameters, nested model architectures, and dataset configurations. Engineers needed to:
- Run hundreds of experiments with slight variations
- Ensure reproducibility across team members and compute environments
- Override specific parameters without duplicating entire config files
- Maintain readability as configurations grew to hundreds of lines
Existing solutions were ad-hoc: custom scripts, flat INI files, or environment variables. There was no widely-adopted standard for hierarchical, overrideable configurations tailored to ML workflows.
This repository demonstrates convergent evolution in software engineering: when facing identical constraints and problems, independent efforts arrive at essentially the same solutions. This has implications for:
Software Archaeology: Documents how ML tooling evolved in response to practical challenges in the "pre-MLOps" era
Engineering Philosophy: Shows that widely-adopted patterns (like Hydra's design) succeeded not just through innovation and first mover advantage, but by being optimal solutions addressing fundamental challenges that were universally felt across ML teams
Prior Art & Independent Invention: Provides historical evidence that these configuration patterns emerged organically across the community, not from a single source
Understanding Design Inevitability: Demonstrates which patterns emerge from problem constraints (destined to be rediscovered) versus which depend on specific implementation choices or organizational context
Hydra (backed by Facebook AI Research) ultimately became the community standard through open-source availability, institutional authority (FAIR), extensive documentation and marketing, as well as strong ecosystem support. This codebase started being developed fifteen months prior Hydra's first public announcement within private organizations, never participating in public ML tooling discussions. Yet it arrived at nearly identical solutions and design patterns.
- June 18, 2018: Original development begins
- Early July 2018: Core functionality complete (hierarchical configs, CLI overrides, dot notation, validation)
- October 3, 2019: Hydra publicly announced by Facebook AI Research (official blog post)
- December 2019: This codebase extracted as standalone package
- June 2021: Hydra 1.1 + OmegaConf 2.1 released, establishing the modern standard for ML configuration
- October-November 2025: Historical preservation; git history reconstructed from original repositories with preserved timestamps and authorship
- February 2026: Public release as open-source historical artifact
This implementation and Hydra/OmegaConf share fundamental patterns that emerged from practical needs:
- Default-as-schema: Using a default configuration file as both type schema and fallback values
- Hierarchical override semantics: Clear precedence rules (CLI arguments > Custom Config > Default Config)
- Dot notation access: Treating nested YAML as attribute-accessible objects (
config.model.learning_rate) - Partial overriding: Modifying specific nested values without redefining parent structures
- Type-aware validation: Catching configuration errors before expensive training runs
| Feature | ArcheoML-Confparser (2018) | Hydra (2019+) |
|---|---|---|
| Hierarchical configs | ✅ | ✅ |
| CLI overrides | ✅ | ✅ |
| Dot notation access | ✅ | ✅ |
| Schema validation | ✅ | ✅ |
| Partial overriding | ✅ | ✅ |
| Multi-run support | ❌ | ✅ |
| Plugin system | ❌ | ✅ |
| Tab completion | ❌ | ✅ |
| Ecosystem | ❌ | ✅ Extensive |
For those interested in exploring this historical implementation:
pip install git+https://github.com/lospooky/archeoml-confparser.gitOr in editable/development mode:
git clone https://github.com/lospooky/archeoml-confparser.git
cd archeoml-confparser
pip install -e .from confparser import parse_configuration
# Load configuration with default values
config = parse_configuration("examples/default_config.yaml")
# Access nested config via dot notation
print(config.model.name)
print(config.training.learning_rate)At the command line:
# Override with custom config and/or CLI arguments
python train.py --custom_config custom.yaml --training.learning_rate 0.001See README.original.md for the original project documentation.
❌ Not actively maintained - This is a historical preservation project
❌ No new features planned - Code preserved as-is from 2018-2020
❌ For production use → Use Hydra instead
✅ For historical/academic interest - Feel free to explore!
This repository serves as a software archaeology artifact, preserved to document the independent evolution of ML configuration patterns. The preservation approach:
Git History Preservation: All commits have been preserved with their original timestamps and authorship intact. The codebase was extracted from two larger private repositories where the project evolved, and commit 45a436dbbf71c245c79966045230818387626a34 serves as the graft point to bridge these two original repositories into a unified history.
Code Integrity: The core implementation remains unchanged from its 2018-2020 state, maintaining historical accuracy. Only documentation and packaging have been updated to reflect archival/preservation status.
This codebase was developed during the course of professional work at Micropsi Industries (2018-2019) and Advertima (2019-2020). We are grateful for their explicit permission to preserve and open-source this implementation as a historical artifact demonstrating the independent evolution of ML configuration patterns.
Core Contributors:
- Simone Cirillo - Primary implementation and design
- Clemens Korndörfer - Loss gating mechanisms and architecture refinements
- Mathias Winther Madsen - Testing, documentation, and data field resolution
- Levani Tevdoradze - Bug fixes and example implementations
- Noorvir Aulakh - Early contributions
This work emerged from the practical needs of production ML systems and the collective problem-solving of teams facing configuration management challenges in the late 2010s. The fact that similar solutions emerged independently across the industry speaks to the universality of these challenges and the convergent nature of effective solutions.
MIT License - See LICENSE for details.
If referencing this work in academic contexts:
@software{cirillo2018confparser,
author = {Cirillo, Simone},
title = {ArcheoML-Confparser: A Historical ML Configuration Parser},
year = {2018-2026},
url = {https://github.com/lospooky/archeoml-confparser},
note = {Historical artifact demonstrating independent convergent evolution
of ML configuration patterns}
}