ArcheoML-Confparser

A YAML-based hierarchical configuration parser for machine learning projects from 2018-2020,
demonstrating independent convergent evolution of design patterns later popularized by Hydra and OmegaConf.

This configuration management system was originally developed starting in June 2018, fifteen months before the first public release of Hydra (September 2019) and two years before OmegaConf reached stability (November 2020). This codebase independently arrived at the same design and usage principles that would later define the industry standard approach to ML experiment configuration.

Historical Note

The Context: Late 2010s ML Configuration Chaos

In 2017-2018, the ML community was grappling with configuration management complexity as experiments became more sophisticated. Training runs required managing dozens of hyperparameters, nested model architectures, and dataset configurations. Engineers needed to:

Run hundreds of experiments with slight variations
Ensure reproducibility across team members and compute environments
Override specific parameters without duplicating entire config files
Maintain readability as configurations grew to hundreds of lines

Existing solutions were ad-hoc: custom scripts, flat INI files, or environment variables. There was no widely-adopted standard for hierarchical, overrideable configurations tailored to ML workflows.

Why This Matters

This repository demonstrates convergent evolution in software engineering: when facing identical constraints and problems, independent efforts arrive at essentially the same solutions. This has implications for:

Software Archaeology: Documents how ML tooling evolved in response to practical challenges in the "pre-MLOps" era

Engineering Philosophy: Shows that widely-adopted patterns (like Hydra's design) succeeded not just through innovation and first mover advantage, but by being optimal solutions addressing fundamental challenges that were universally felt across ML teams

Prior Art & Independent Invention: Provides historical evidence that these configuration patterns emerged organically across the community, not from a single source

Understanding Design Inevitability: Demonstrates which patterns emerge from problem constraints (destined to be rediscovered) versus which depend on specific implementation choices or organizational context

Historical Perspective

Hydra (backed by Facebook AI Research) ultimately became the community standard through open-source availability, institutional authority (FAIR), extensive documentation and marketing, as well as strong ecosystem support. This codebase started being developed fifteen months prior Hydra's first public announcement within private organizations, never participating in public ML tooling discussions. Yet it arrived at nearly identical solutions and design patterns.

Timeline

June 18, 2018: Original development begins
Early July 2018: Core functionality complete (hierarchical configs, CLI overrides, dot notation, validation)
October 3, 2019: Hydra publicly announced by Facebook AI Research (official blog post)
December 2019: This codebase extracted as standalone package
June 2021: Hydra 1.1 + OmegaConf 2.1 released, establishing the modern standard for ML configuration
October-November 2025: Historical preservation; git history reconstructed from original repositories with preserved timestamps and authorship
February 2026: Public release as open-source historical artifact

Core Design Patterns (Independently Discovered)

This implementation and Hydra/OmegaConf share fundamental patterns that emerged from practical needs:

Default-as-schema: Using a default configuration file as both type schema and fallback values
Hierarchical override semantics: Clear precedence rules (CLI arguments > Custom Config > Default Config)
Dot notation access: Treating nested YAML as attribute-accessible objects (config.model.learning_rate)
Partial overriding: Modifying specific nested values without redefining parent structures
Type-aware validation: Catching configuration errors before expensive training runs

Feature Comparison

Feature	ArcheoML-Confparser (2018)	Hydra (2019+)
Hierarchical configs	✅	✅
CLI overrides	✅	✅
Dot notation access	✅	✅
Schema validation	✅	✅
Partial overriding	✅	✅
Multi-run support	❌	✅
Plugin system	❌	✅
Tab completion	❌	✅
Ecosystem	❌	✅ Extensive

Installation

For those interested in exploring this historical implementation:

pip install git+https://github.com/lospooky/archeoml-confparser.git

Or in editable/development mode:

git clone https://github.com/lospooky/archeoml-confparser.git
cd archeoml-confparser
pip install -e .

Quick Start

from confparser import parse_configuration

# Load configuration with default values
config = parse_configuration("examples/default_config.yaml")

# Access nested config via dot notation
print(config.model.name)
print(config.training.learning_rate)

At the command line:

# Override with custom config and/or CLI arguments
python train.py --custom_config custom.yaml --training.learning_rate 0.001

See README.original.md for the original project documentation.

Maintenance Status

❌ Not actively maintained - This is a historical preservation project
❌ No new features planned - Code preserved as-is from 2018-2020
❌ For production use → Use Hydra instead
✅ For historical/academic interest - Feel free to explore!

Preservation Strategy

This repository serves as a software archaeology artifact, preserved to document the independent evolution of ML configuration patterns. The preservation approach:

Git History Preservation: All commits have been preserved with their original timestamps and authorship intact. The codebase was extracted from two larger private repositories where the project evolved, and commit 45a436dbbf71c245c79966045230818387626a34 serves as the graft point to bridge these two original repositories into a unified history.

Code Integrity: The core implementation remains unchanged from its 2018-2020 state, maintaining historical accuracy. Only documentation and packaging have been updated to reflect archival/preservation status.

Acknowledgements

This codebase was developed during the course of professional work at Micropsi Industries (2018-2019) and Advertima (2019-2020). We are grateful for their explicit permission to preserve and open-source this implementation as a historical artifact demonstrating the independent evolution of ML configuration patterns.

Core Contributors:

Simone Cirillo - Primary implementation and design
Clemens Korndörfer - Loss gating mechanisms and architecture refinements
Mathias Winther Madsen - Testing, documentation, and data field resolution
Levani Tevdoradze - Bug fixes and example implementations
Noorvir Aulakh - Early contributions

This work emerged from the practical needs of production ML systems and the collective problem-solving of teams facing configuration management challenges in the late 2010s. The fact that similar solutions emerged independently across the industry speaks to the universality of these challenges and the convergent nature of effective solutions.

License

MIT License - See LICENSE for details.

Citation

If referencing this work in academic contexts:

@software{cirillo2018confparser,
  author = {Cirillo, Simone},
  title = {ArcheoML-Confparser: A Historical ML Configuration Parser},
  year = {2018-2026},
  url = {https://github.com/lospooky/archeoml-confparser},
  note = {Historical artifact demonstrating independent convergent evolution 
          of ML configuration patterns}
}

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
confparser		confparser
examples		examples
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.original.md		README.original.md
example.py		example.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArcheoML-Confparser

Historical Note

The Context: Late 2010s ML Configuration Chaos

Why This Matters

Historical Perspective

Timeline

Core Design Patterns (Independently Discovered)

Feature Comparison

Installation

Quick Start

Maintenance Status

Preservation Strategy

Acknowledgements

License

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ArcheoML-Confparser

Historical Note

The Context: Late 2010s ML Configuration Chaos

Why This Matters

Historical Perspective

Timeline

Core Design Patterns (Independently Discovered)

Feature Comparison

Installation

Quick Start

Maintenance Status

Preservation Strategy

Acknowledgements

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages