Author: Kameldip Singh Basra · kameldipbasra@gmail.com
Current paper: paper_v8.md (V8, 2026-05-29)
Concept DOI (always resolves to latest): 10.5281/zenodo.18598229
Last published DOI: V7 10.5281/zenodo.20198001
I am a software engineer and AI architect, not a historian, linguist, or medical historian. My background is in machine learning, NLP, and computational systems. I came to the Voynich Manuscript the same way I approach any pattern-recognition problem: build a decoder, measure it against external corpora, stress-test it against hostile alternatives, and document everything so someone else can reproduce or refute it.
This repository is a live research log. I started in February 2026 and have been publishing each stage of the work as it happened — hypothesis, test, revision, new finding, repeat. Each version is preserved unchanged so the progression is auditable.
What I used: Python for all analysis and testing, SQLite for the corpus database, and Anthropic Claude (Sonnet 4.6 / Claude Code) as a collaborative AI assistant throughout — for coding, statistical testing, cross-referencing medical texts, and drafting. I am naming this explicitly because it is part of the method and it would be dishonest not to. The AI did not generate the hypothesis or the evidence; it helped me execute tests quickly, catch errors in my reasoning, and work through large corpora I would not have been able to process manually alone.
What I am not claiming: I am not a Sinhala linguist. I am not a Sri Lankan medical historian. I am not a trained botanist. The paper is explicit about each of these gaps and identifies the specific specialists whose review it needs.
The Voynich Manuscript (Beinecke MS 408, carbon-dated 1404–1438 CE) is a 15th-century Sri Lankan Elu-Sinhala pharmaceutical text — a working pharmacist's compressed reference recording Ayurvedic drug preparations in a bespoke phonetic script.
Confidence summary (V8):
| Claim | Confidence |
|---|---|
| South Asian pharmaceutical text | ~97% |
| Sri Lankan provenance | ~90% |
| Sinhala/Elu specifically (vs Pali/Sanskrit sister) | ~90% |
| Pre-12c Elu chronolect | ~83% |
| Working-pharmacist register | ~98% |
| P(overall identification wrong) | ~7–10% |
-
Rival-language tournament: 27 control corpora tested across 11 tradition families. Sārattha Saṃgaha (18th-century Sinhala Buddhist prose) scores 66.67% repeated locked-anchor overlap. All Unani, Tamil/Siddha, and European controls score ≤0.5%. 95× gap with structural typological exclusions. (Note: the "Sārārtha_SriLanka" file is the Buddhist Vinaya commentary Sārattha Saṃgaha, not Buddhadāsa's medical Sārārtha Saṃgrahaya — the statistical result holds as a Sinhala-language discriminator.)
-
Non-circular structure tests: Section classifier 61% accuracy vs 32.3% chance baseline (p=0.0099). KALPANA preparation-marker enrichment OR=8.11 (p=2×10⁻²²²). q-/ch- phonological allomorph distribution OR=32.81. These tests do not depend on any English glosses.
-
BALNEO processing cluster — 0/41 European/Islamic transmission: 41 independent source texts (9th–19th c. CE) with zero BALNEO co-occurrence. dolāyantra + kāñjika + tridinam fermentation-and-suspension/steam-processing complex absent from every tested European, Arabic, Tibetan, and Unani pharmacopoeia — documented from Pali Vinaya (~3c BCE) through Rasaśāstra texts (~15c CE), 1,800-year continuous attestation.
-
102 score-3+ recipe folio matches: All 30 RECIPE folios covered. First Score-6 match: f103v ↔ AH Grahaṇīdoṣa+Kāsa (6 shared ingredients: elā+rāsnā+āmalakī+bibhīṭakā+madhu+ghṛta). Triple-text confirmation (AH+BM+YRK) unique in corpus.
-
Phonological convergence — temporal lock: V17 decoder frozen 2026-05-04. All five Wickremasinghe phonological laws of Old Sinhala (W1–W5) independently required by V17. Wijeratne dissertation first read 2026-05-09 — nine days after freeze.
| Version | Date | DOI | What changed |
|---|---|---|---|
| V1 | Feb 2026 | — | Initial decipherment hypothesis, primary statistical tests |
| V2 | 2026-05-04 | 10.5281/zenodo.20023733 | V17 decoder, Bowern engagement, hostile-reviewer test, falsification probes |
| V3 | 2026-05-07 | 10.5281/zenodo.20072618 | Full corpus expansion, VPNS 21 states, 25 BM formula clusters, COSMO architecture |
| V4 | 2026-05-09 | 10.5281/zenodo.20098162 | V21 meaning corrections, 81-folio plant table, 23-chapter BM/AH mapping, Team B suite |
| V5 | 2026-05-12 | 10.5281/zenodo.20138182 | 27-corpus rival tournament, BALNEO recharacterised, pharmacopoeia architecture |
| V6 | 2026-05-17 | 10.5281/zenodo.20165134 | Phonological convergence W1–W5, EZ acquisitions, 42 BM formula sources |
| V7 | 2026-05-20 | 10.5281/zenodo.20198001 | D-tier resolved (0%), 5 confirmed nakshatra, HERBAL opener grammar, V20 DB |
| V8 | 2026-05-29 | TBD (pre-release) | 98 sources / 572 sections / 102 recipe matches / Score-6 / 41-source EU exclusion / V8.116 tier audit (24%/65%/11%) / Varayogasāraya identified / 6 confirmed nakshatras (sara/#22 EZ-grounded) / 45 HERBAL CONFIRMED HIGH / geda=ghaṭa / seda CONFIRMED / BALNEO grammar complete |
paper_v8.md ← current paper (V8, 2026-05-29)
paper_v7.md / v5.md / … ← preserved earlier versions
canonical_plant_test.py ← DB-level plant ID verification (NOTE: stale against V8; do not use as release gate — see tests/ when updated)
recipe_coherence_test.py ← recipe folio match integrity
run_all.sh ← full validation gate (23/23 pass; extended legacy suite)
scripts/ ← decoders, statistical tests, corpus analysis
translation/
voynich_v20_corpus.db ← canonical corpus DB (36,633 tokens; V8)
supplementary/ ← extended analysis writeups (60+ files)
FORMULA_TRANSMISSION_EVIDENCE.md ← 98 sources, 572 sections (§§1–572)
EUROPEAN_TRANSMISSION_EVIDENCE.md ← 41-source EU exclusion table
DECODER_RULES_COMPLETE.md
COSMO_NAKSHATRA_MAP.md
HERBAL_PLANT_IDENTIFICATIONS.md
…
teamb_rerun_d32bc5e_20260515/ ← Team B validation suite (19/19 PASS)
references/medical_corpus/ ← comparison corpora (BM, Caraka, AH, Sārattha Saṃgaha, …)
results/
CHECKSUMS.sha256 ← runtime manifest (rewritten by run_all.sh)
CHECKSUMS_RELEASE.sha256 ← immutable release manifest (do not overwrite)
All analysis runs on Python 3.8+ with no unusual dependencies (sqlite3, scipy, numpy). Clone the repo and:
# Full validation gate (extended legacy suite, 23 scripts)
./run_all.sh
# Recipe coherence test (canonical)
python3 recipe_coherence_test.py
# Team B suite (19 tests, all pass)
bash teamb_rerun_d32bc5e_20260515/run_current_db_suite.sh
# Formula null suite
python3 scripts/v20_full_validation.pyNote on canonical_plant_test.py: This test is currently stale against V8 plant/token revisions and fails 6 checks. Do not advertise it as a release gate until it is updated.
Current DB: translation/voynich_v20_corpus.db
SHA256: df50c831999efde19be5244f015fe555e0afd895ea9a650b0ad83791098e4732
Release checksums: results/CHECKSUMS_RELEASE.sha256
Runtime checksums: results/CHECKSUMS.sha256 (rewritten by run_all.sh)
- No Sinhala/Elu specialist review yet. The linguistic interpretation needs a philologist.
- No trained botanist review yet. Plant identifications are candidate-level (45 CONFIRMED HIGH + 1 MEDIUM-HIGH; 3 active species conflicts; 52 further proposals).
- Sister-language question remains open. Pali and Sinhala/Elu are closely related; the corpus discriminates well statistically but specialist review is the definitive test.
- Initial-sound gap (/b/, /v/ near-absent). Remains a noted decoder risk.
- Formula null suite is an internal consistency check. BM vocabulary informed
meaning_fixed; the pre-V28 baseline was not significant (p≈0.27). The result confirms self-consistency, not independence. - canonical_plant_test.py is stale against V8 revisions — 6 failing checks. Not a release gate until updated.
@misc{basra2026voynich,
title = {A Candidate Decipherment of the Voynich Manuscript:
Evidence for a Spoken Elu-Sinhala Pharmaceutical Register (V8)},
author = {Basra, Kameldip Singh},
year = {2026},
month = {May},
doi = {10.5281/zenodo.18598229},
url = {https://doi.org/10.5281/zenodo.18598229},
note = {Concept DOI resolves to latest version; V7 version DOI: 10.5281/zenodo.20198001}
}Beinecke Rare Book and Manuscript Library for digital access to MS 408. The EVA transcription community (Stolfi, Takahashi, and contributors) for the foundational transcription. Daniel Gaskell for open-sourcing the random-forest Voynich classifier used in §4.13. The Buddhist medical traditions of Sri Lanka, whose pharmacopoeial literature forms the comparative backbone of this work. Anthropic Claude (Sonnet 4.6 / Claude Code) was used throughout as a collaborative AI assistant for coding, statistical testing, corpus cross-referencing, and drafting — stated explicitly as a matter of transparency.