OS Multi-Science: Index of Convergence Multi-epistemic (ICM)

ICM is a five-component index that measures convergence across multiple epistemic methods -- how much independent models agree on a prediction, and whether that agreement is trustworthy enough to act on. Instead of picking the "best" model, ICM quantifies multi-model consensus through distributional agreement (A), directional consistency (D), uncertainty overlap (U), perturbation invariance (C), and a dependency penalty (Pi), all fused via a logistic sigmoid into a single [0, 1] score. A companion Conformal Risk Control (CRC) gating layer maps ICM scores to three-way decisions: ACT, DEFER, or AUDIT -- with finite-sample coverage guarantees.

The ICM Formula

ICM = sigma(scale * (w_A * A + w_D * D + w_U * U + w_C * C - lambda * Pi - shift))

Component	What it measures	Default weight
A (Agreement)	Distributional similarity across models (Hellinger / Wasserstein / MMD)	0.35
D (Direction)	Sign / argmax consistency of predictions	0.15
U (Uncertainty)	Overlap of prediction intervals or top-K probabilities	0.25
C (Invariance)	Stability under input perturbation	0.10
Pi (Dependency)	Penalty for correlated residuals / shared features / gradient similarity	0.15

Quick Start

pip install os-multi-science

import numpy as np
from framework.icm import compute_icm_from_predictions
from framework.config import ICMConfig

# Predictions from 3 independent models (probability distributions over 3 classes)
predictions = {
    "model_A": np.array([[0.7, 0.2, 0.1], [0.6, 0.3, 0.1]]),
    "model_B": np.array([[0.65, 0.25, 0.1], [0.55, 0.35, 0.1]]),
    "model_C": np.array([[0.72, 0.18, 0.1], [0.58, 0.32, 0.1]]),
}

config = ICMConfig.wide_range_preset()
result = compute_icm_from_predictions(predictions, config=config)
print(f"ICM score: {result.icm:.3f}")  # High agreement -> score near 1.0

Benchmark Results

Evaluated on 22 UCI / OpenML datasets with 5-fold cross-validation, 8 methods (including Deep Ensemble, Stacking, Bagging):

Metric	ICM-Weighted	ICM-Optimized	Deep Ensemble
Mean accuracy	0.891	0.898	--
Friedman rank	4.55	3.62 (2nd)	3.45 (1st)
UQ set size	1.26	--	--
vs. RAPS set size	55% smaller (1.26 vs 2.87)	--	--
C-component AUROC	1.000	--	--
Transfer attack AUROC	1.000	--	--

Friedman test: chi2 = 29.191, p = 0.000134 (significant at alpha = 0.01). Critical difference = 2.348 (Nemenyi post-hoc). ICM-Optimized is not significantly different from Deep Ensemble.

EU AI Act Compliance

ICM directly supports two key articles of the EU AI Act:

Art. 14 (Human Oversight): CRC gating provides a principled ACT / DEFER / AUDIT mechanism. High-risk predictions (low ICM) are automatically routed to human review with finite-sample coverage guarantees.
Art. 15 (Risk Assessment): The five-component decomposition provides an auditable breakdown of why a prediction is (or is not) trustworthy, enabling transparent risk documentation.

LLM / Multi-Agent Evaluation

ICM generalizes beyond classical ML to evaluate convergence in multi-agent LLM systems -- treating each agent's output as one "epistemic method." This enables:

Measuring agreement across multiple LLM agents on the same query
Detecting hallucination divergence (low A, low D)
Routing uncertain queries to human review via CRC gating

See examples/llm_convergence.py for a demonstration.

Citation

If you use ICM in your research, please cite:

@article{stanisljevic2026icm,
  title={Index of Convergence Multi-epistemic: A Five-Component Framework for
         Trustworthy Multi-Model Decision-Making},
  author={Stanisljevic, Luka},
  journal={arXiv preprint},
  year={2026}
}

License

This project is licensed under the MIT License. See pyproject.toml for details.

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
.github/workflows		.github/workflows
.planning		.planning
agents		agents
benchmarks		benchmarks
chats_raw		chats_raw
examples		examples
experiments		experiments
framework		framework
knowledge		knowledge
notebooks		notebooks
orchestrator		orchestrator
paper		paper
patent		patent
reports		reports
scripts		scripts
site		site
tasks		tasks
tests		tests
trustgate		trustgate
.gitignore		.gitignore
Dockerfile		Dockerfile
FIX_SUMMARY.md		FIX_SUMMARY.md
IP_VALUE_ASSESSMENT.md		IP_VALUE_ASSESSMENT.md
PAPER_IMPROVEMENT_PRD.md		PAPER_IMPROVEMENT_PRD.md
PRD.md		PRD.md
PRD_ADDENDUM.md		PRD_ADDENDUM.md
PRD_OS_MULTI_SCIENCE.md		PRD_OS_MULTI_SCIENCE.md
PROJECT_VALUE_ASSESSMENT_2026_03.md		PROJECT_VALUE_ASSESSMENT_2026_03.md
README.md		README.md
VALUATION_ASSESSMENT.md		VALUATION_ASSESSMENT.md
_wait_and_update.py		_wait_and_update.py
benchmark_v12_run.log		benchmark_v12_run.log
cli.py		cli.py
colab_analyze.ipynb		colab_analyze.ipynb
colab_evaluate.ipynb		colab_evaluate.ipynb
colab_icm2_benchmark.ipynb		colab_icm2_benchmark.ipynb
colab_train_baselines.ipynb		colab_train_baselines.ipynb
colab_train_ensemble.ipynb		colab_train_ensemble.ipynb
compute_adaptive_gating.py		compute_adaptive_gating.py
compute_combined_icm.py		compute_combined_icm.py
compute_conditional_coverage.py		compute_conditional_coverage.py
compute_diverse_ensemble.py		compute_diverse_ensemble.py
compute_learned_weights.py		compute_learned_weights.py
compute_optimized_ensemble.py		compute_optimized_ensemble.py
cv_results_v75.pkl		cv_results_v75.pkl
docker-compose.yml		docker-compose.yml
generate_patent_tables.py		generate_patent_tables.py
gpu_phase5_multiseed_results.json		gpu_phase5_multiseed_results.json
medical_run.log		medical_run.log
pyproject.toml		pyproject.toml
reproduce_results.py		reproduce_results.py
requirements.txt		requirements.txt
rerun_exp2_exp18.py		rerun_exp2_exp18.py
run_ablation_and_write.py		run_ablation_and_write.py
run_benchmark_joblib.py		run_benchmark_joblib.py
run_benchmark_parallel.py		run_benchmark_parallel.py
run_benchmark_sequential.py		run_benchmark_sequential.py
run_benchmark_v9.py		run_benchmark_v9.py
run_benchmark_v9_foldpar.py		run_benchmark_v9_foldpar.py
run_medical.py		run_medical.py
run_missing_folds.py		run_missing_folds.py
run_post_benchmark.py		run_post_benchmark.py
run_single_fold.py		run_single_fold.py
test_calibrated_beta_fix.py		test_calibrated_beta_fix.py
test_exp8.py		test_exp8.py
test_p1_multiclass_u.py		test_p1_multiclass_u.py
update_paper_tables.py		update_paper_tables.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OS Multi-Science: Index of Convergence Multi-epistemic (ICM)

The ICM Formula

Quick Start

Benchmark Results

EU AI Act Compliance

LLM / Multi-Agent Evaluation

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OS Multi-Science: Index of Convergence Multi-epistemic (ICM)

The ICM Formula

Quick Start

Benchmark Results

EU AI Act Compliance

LLM / Multi-Agent Evaluation

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages