Skip to content

kamb-code/Voynich

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

491 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voynich Manuscript — Candidate Decipherment (V8)

Author: Kameldip Singh Basra · kameldipbasra@gmail.com
Current paper: paper_v8.md (V8, 2026-05-29)
Concept DOI (always resolves to latest): 10.5281/zenodo.18598229
Last published DOI: V7 10.5281/zenodo.20198001


About this project

I am a software engineer and AI architect, not a historian, linguist, or medical historian. My background is in machine learning, NLP, and computational systems. I came to the Voynich Manuscript the same way I approach any pattern-recognition problem: build a decoder, measure it against external corpora, stress-test it against hostile alternatives, and document everything so someone else can reproduce or refute it.

This repository is a live research log. I started in February 2026 and have been publishing each stage of the work as it happened — hypothesis, test, revision, new finding, repeat. Each version is preserved unchanged so the progression is auditable.

What I used: Python for all analysis and testing, SQLite for the corpus database, and Anthropic Claude (Sonnet 4.6 / Claude Code) as a collaborative AI assistant throughout — for coding, statistical testing, cross-referencing medical texts, and drafting. I am naming this explicitly because it is part of the method and it would be dishonest not to. The AI did not generate the hypothesis or the evidence; it helped me execute tests quickly, catch errors in my reasoning, and work through large corpora I would not have been able to process manually alone.

What I am not claiming: I am not a Sinhala linguist. I am not a Sri Lankan medical historian. I am not a trained botanist. The paper is explicit about each of these gaps and identifies the specific specialists whose review it needs.


The hypothesis in one sentence

The Voynich Manuscript (Beinecke MS 408, carbon-dated 1404–1438 CE) is a 15th-century Sri Lankan Elu-Sinhala pharmaceutical text — a working pharmacist's compressed reference recording Ayurvedic drug preparations in a bespoke phonetic script.

Confidence summary (V8):

Claim Confidence
South Asian pharmaceutical text ~97%
Sri Lankan provenance ~90%
Sinhala/Elu specifically (vs Pali/Sanskrit sister) ~90%
Pre-12c Elu chronolect ~83%
Working-pharmacist register ~98%
P(overall identification wrong) ~7–10%

Strongest evidence (non-circular, reproduced)

  1. Rival-language tournament: 27 control corpora tested across 11 tradition families. Sārattha Saṃgaha (18th-century Sinhala Buddhist prose) scores 66.67% repeated locked-anchor overlap. All Unani, Tamil/Siddha, and European controls score ≤0.5%. 95× gap with structural typological exclusions. (Note: the "Sārārtha_SriLanka" file is the Buddhist Vinaya commentary Sārattha Saṃgaha, not Buddhadāsa's medical Sārārtha Saṃgrahaya — the statistical result holds as a Sinhala-language discriminator.)

  2. Non-circular structure tests: Section classifier 61% accuracy vs 32.3% chance baseline (p=0.0099). KALPANA preparation-marker enrichment OR=8.11 (p=2×10⁻²²²). q-/ch- phonological allomorph distribution OR=32.81. These tests do not depend on any English glosses.

  3. BALNEO processing cluster — 0/41 European/Islamic transmission: 41 independent source texts (9th–19th c. CE) with zero BALNEO co-occurrence. dolāyantra + kāñjika + tridinam fermentation-and-suspension/steam-processing complex absent from every tested European, Arabic, Tibetan, and Unani pharmacopoeia — documented from Pali Vinaya (~3c BCE) through Rasaśāstra texts (~15c CE), 1,800-year continuous attestation.

  4. 102 score-3+ recipe folio matches: All 30 RECIPE folios covered. First Score-6 match: f103v ↔ AH Grahaṇīdoṣa+Kāsa (6 shared ingredients: elā+rāsnā+āmalakī+bibhīṭakā+madhu+ghṛta). Triple-text confirmation (AH+BM+YRK) unique in corpus.

  5. Phonological convergence — temporal lock: V17 decoder frozen 2026-05-04. All five Wickremasinghe phonological laws of Old Sinhala (W1–W5) independently required by V17. Wijeratne dissertation first read 2026-05-09 — nine days after freeze.


Version history

Version Date DOI What changed
V1 Feb 2026 Initial decipherment hypothesis, primary statistical tests
V2 2026-05-04 10.5281/zenodo.20023733 V17 decoder, Bowern engagement, hostile-reviewer test, falsification probes
V3 2026-05-07 10.5281/zenodo.20072618 Full corpus expansion, VPNS 21 states, 25 BM formula clusters, COSMO architecture
V4 2026-05-09 10.5281/zenodo.20098162 V21 meaning corrections, 81-folio plant table, 23-chapter BM/AH mapping, Team B suite
V5 2026-05-12 10.5281/zenodo.20138182 27-corpus rival tournament, BALNEO recharacterised, pharmacopoeia architecture
V6 2026-05-17 10.5281/zenodo.20165134 Phonological convergence W1–W5, EZ acquisitions, 42 BM formula sources
V7 2026-05-20 10.5281/zenodo.20198001 D-tier resolved (0%), 5 confirmed nakshatra, HERBAL opener grammar, V20 DB
V8 2026-05-29 TBD (pre-release) 98 sources / 572 sections / 102 recipe matches / Score-6 / 41-source EU exclusion / V8.116 tier audit (24%/65%/11%) / Varayogasāraya identified / 6 confirmed nakshatras (sara/#22 EZ-grounded) / 45 HERBAL CONFIRMED HIGH / geda=ghaṭa / seda CONFIRMED / BALNEO grammar complete

Repository layout

paper_v8.md                    ← current paper (V8, 2026-05-29)
paper_v7.md / v5.md / …        ← preserved earlier versions
canonical_plant_test.py        ← DB-level plant ID verification (NOTE: stale against V8; do not use as release gate — see tests/ when updated)
recipe_coherence_test.py       ← recipe folio match integrity
run_all.sh                     ← full validation gate (23/23 pass; extended legacy suite)

scripts/                       ← decoders, statistical tests, corpus analysis
translation/
  voynich_v20_corpus.db        ← canonical corpus DB (36,633 tokens; V8)
supplementary/                 ← extended analysis writeups (60+ files)
  FORMULA_TRANSMISSION_EVIDENCE.md  ← 98 sources, 572 sections (§§1–572)
  EUROPEAN_TRANSMISSION_EVIDENCE.md ← 41-source EU exclusion table
  DECODER_RULES_COMPLETE.md
  COSMO_NAKSHATRA_MAP.md
  HERBAL_PLANT_IDENTIFICATIONS.md
  … 
teamb_rerun_d32bc5e_20260515/  ← Team B validation suite (19/19 PASS)
references/medical_corpus/     ← comparison corpora (BM, Caraka, AH, Sārattha Saṃgaha, …)
results/
  CHECKSUMS.sha256             ← runtime manifest (rewritten by run_all.sh)
  CHECKSUMS_RELEASE.sha256     ← immutable release manifest (do not overwrite)

Reproducing the key tests

All analysis runs on Python 3.8+ with no unusual dependencies (sqlite3, scipy, numpy). Clone the repo and:

# Full validation gate (extended legacy suite, 23 scripts)
./run_all.sh

# Recipe coherence test (canonical)
python3 recipe_coherence_test.py

# Team B suite (19 tests, all pass)
bash teamb_rerun_d32bc5e_20260515/run_current_db_suite.sh

# Formula null suite
python3 scripts/v20_full_validation.py

Note on canonical_plant_test.py: This test is currently stale against V8 plant/token revisions and fails 6 checks. Do not advertise it as a release gate until it is updated.

Current DB: translation/voynich_v20_corpus.db
SHA256: df50c831999efde19be5244f015fe555e0afd895ea9a650b0ad83791098e4732
Release checksums: results/CHECKSUMS_RELEASE.sha256
Runtime checksums: results/CHECKSUMS.sha256 (rewritten by run_all.sh)


Honest limitations

  1. No Sinhala/Elu specialist review yet. The linguistic interpretation needs a philologist.
  2. No trained botanist review yet. Plant identifications are candidate-level (45 CONFIRMED HIGH + 1 MEDIUM-HIGH; 3 active species conflicts; 52 further proposals).
  3. Sister-language question remains open. Pali and Sinhala/Elu are closely related; the corpus discriminates well statistically but specialist review is the definitive test.
  4. Initial-sound gap (/b/, /v/ near-absent). Remains a noted decoder risk.
  5. Formula null suite is an internal consistency check. BM vocabulary informed meaning_fixed; the pre-V28 baseline was not significant (p≈0.27). The result confirms self-consistency, not independence.
  6. canonical_plant_test.py is stale against V8 revisions — 6 failing checks. Not a release gate until updated.

Citation

@misc{basra2026voynich,
  title  = {A Candidate Decipherment of the Voynich Manuscript:
             Evidence for a Spoken Elu-Sinhala Pharmaceutical Register (V8)},
  author = {Basra, Kameldip Singh},
  year   = {2026},
  month  = {May},
  doi    = {10.5281/zenodo.18598229},
  url    = {https://doi.org/10.5281/zenodo.18598229},
  note   = {Concept DOI resolves to latest version; V7 version DOI: 10.5281/zenodo.20198001}
}

Acknowledgments

Beinecke Rare Book and Manuscript Library for digital access to MS 408. The EVA transcription community (Stolfi, Takahashi, and contributors) for the foundational transcription. Daniel Gaskell for open-sourcing the random-forest Voynich classifier used in §4.13. The Buddhist medical traditions of Sri Lanka, whose pharmacopoeial literature forms the comparative backbone of this work. Anthropic Claude (Sonnet 4.6 / Claude Code) was used throughout as a collaborative AI assistant for coding, statistical testing, corpus cross-referencing, and drafting — stated explicitly as a matter of transparency.