ICD-10 coding from medical chart text using Daimon + Neo4j + Qdrant.
The LLM searches a vector store of 46k ICD-10 codes, verifies each candidate against a Neo4j taxonomy graph, and returns only confirmed codes with ground-truth confidence scores.
Chart image / PDF / plain text
↓
Surya OCR (layout detection → region OCR → structured text)
↓
Daimon sidecar (LLM + auto-generated tools)
├─ icd10_search → Qdrant (semantic vector lookup)
└─ icd10_graph_* → Neo4j (ICD-10 taxonomy traversal)
↓
CodeAssignment[] { code, description, confidence, verified }
Confidence is computed from the graph — not from the model:
1.0— exact code confirmed in Neo4j0.5— parent category exists, specific subcode does not0.0— not found at all (hallucinated)
cp .env.example .env
# edit .env and add ANTHROPIC_API_KEYFully local alternative: change the
codercomponent indaimon-config/config.yamlfromtype: anthropictotype: llamacppand pullqwen2.5:7bvia Ollama. No API key required, but coding quality is lower.
docker compose up -dStarts Neo4j, Qdrant, and (if using local mode) Ollama.
Remove the deploy: GPU block from docker-compose.yml if you have no NVIDIA GPU.
daimon serve --config daimon-config/config.yamlDownload the Daimon binary from releases.
pip install -e ".[dev]"Download the CMS 2025 ICD-10-CM tabular XML from:
https://www.cms.gov/files/zip/2025-code-tables-tabular-and-index.zip
Extract, then run:
python -m icd10.loader data/icd10cm_tabular_2025.xmlThis takes ~10 minutes (46k codes — vector embed + Neo4j graph insert).
Download mtsamples.csv from Kaggle, then:
python -m pipeline.ingest --csv data/mtsamples.csv --out data/coded.jsonl --limit 50from pipeline.coder import code_file
codes = code_file("path/to/chart.png")
for c in codes:
print(c.code, f"{c.confidence:.0%}", c.description)Each record in coded.jsonl:
{
"sample_name": "Allergic Rhinitis",
"specialty": "Allergy / Immunology",
"codes": [
{"code": "J30.1", "description": "Allergic rhinitis due to pollen", "confidence": 1.0, "verified": true},
{"code": "J45.20", "description": "Mild intermittent asthma, uncomplicated", "confidence": 1.0, "verified": true}
]
}daimon-config/config.yaml Daimon sidecar config (LLM + vector + graph)
docker-compose.yml Neo4j + Qdrant + Ollama
ocr/layout.py Surya hierarchical OCR pipeline
icd10/loader.py CMS XML parser → Qdrant + Neo4j
pipeline/coder.py Daimon client → CodeAssignment[]
pipeline/ingest.py MTSamples batch processor with Rich console
data/ Datasets (gitignored)
Apache 2.0