Voynich Manuscript — Candidate Decipherment (V8)

Author: Kameldip Singh Basra · kameldipbasra@gmail.com
Current paper: paper_v8.md (V8, 2026-05-29)
Concept DOI (always resolves to latest): 10.5281/zenodo.18598229
Last published DOI: V7 10.5281/zenodo.20198001

About this project

I am a software engineer and AI architect, not a historian, linguist, or medical historian. My background is in machine learning, NLP, and computational systems. I came to the Voynich Manuscript the same way I approach any pattern-recognition problem: build a decoder, measure it against external corpora, stress-test it against hostile alternatives, and document everything so someone else can reproduce or refute it.

This repository is a live research log. I started in February 2026 and have been publishing each stage of the work as it happened — hypothesis, test, revision, new finding, repeat. Each version is preserved unchanged so the progression is auditable.

What I used: Python for all analysis and testing, SQLite for the corpus database, and Anthropic Claude (Sonnet 4.6 / Claude Code) as a collaborative AI assistant throughout — for coding, statistical testing, cross-referencing medical texts, and drafting. I am naming this explicitly because it is part of the method and it would be dishonest not to. The AI did not generate the hypothesis or the evidence; it helped me execute tests quickly, catch errors in my reasoning, and work through large corpora I would not have been able to process manually alone.

What I am not claiming: I am not a Sinhala linguist. I am not a Sri Lankan medical historian. I am not a trained botanist. The paper is explicit about each of these gaps and identifies the specific specialists whose review it needs.

The hypothesis in one sentence

The Voynich Manuscript (Beinecke MS 408, carbon-dated 1404–1438 CE) is a 15th-century Sri Lankan Elu-Sinhala pharmaceutical text — a working pharmacist's compressed reference recording Ayurvedic drug preparations in a bespoke phonetic script.

Confidence summary (V8):

Claim	Confidence
South Asian pharmaceutical text	~97%
Sri Lankan provenance	~90%
Sinhala/Elu specifically (vs Pali/Sanskrit sister)	~90%
Pre-12c Elu chronolect	~83%
Working-pharmacist register	~98%
P(overall identification wrong)	~7–10%

Strongest evidence (non-circular, reproduced)

Rival-language tournament: 27 control corpora tested across 11 tradition families. Sārattha Saṃgaha (18th-century Sinhala Buddhist prose) scores 66.67% repeated locked-anchor overlap. All Unani, Tamil/Siddha, and European controls score ≤0.5%. 95× gap with structural typological exclusions. (Note: the "Sārārtha_SriLanka" file is the Buddhist Vinaya commentary Sārattha Saṃgaha, not Buddhadāsa's medical Sārārtha Saṃgrahaya — the statistical result holds as a Sinhala-language discriminator.)
Non-circular structure tests: Section classifier 61% accuracy vs 32.3% chance baseline (p=0.0099). KALPANA preparation-marker enrichment OR=8.11 (p=2×10⁻²²²). q-/ch- phonological allomorph distribution OR=32.81. These tests do not depend on any English glosses.
BALNEO processing cluster — 0/41 European/Islamic transmission: 41 independent source texts (9th–19th c. CE) with zero BALNEO co-occurrence. dolāyantra + kāñjika + tridinam fermentation-and-suspension/steam-processing complex absent from every tested European, Arabic, Tibetan, and Unani pharmacopoeia — documented from Pali Vinaya (~3c BCE) through Rasaśāstra texts (~15c CE), 1,800-year continuous attestation.
102 score-3+ recipe folio matches: All 30 RECIPE folios covered. First Score-6 match: f103v ↔ AH Grahaṇīdoṣa+Kāsa (6 shared ingredients: elā+rāsnā+āmalakī+bibhīṭakā+madhu+ghṛta). Triple-text confirmation (AH+BM+YRK) unique in corpus.
Phonological convergence — temporal lock: V17 decoder frozen 2026-05-04. All five Wickremasinghe phonological laws of Old Sinhala (W1–W5) independently required by V17. Wijeratne dissertation first read 2026-05-09 — nine days after freeze.

Version history

Version	Date	DOI	What changed
V1	Feb 2026	—	Initial decipherment hypothesis, primary statistical tests
V2	2026-05-04	10.5281/zenodo.20023733	V17 decoder, Bowern engagement, hostile-reviewer test, falsification probes
V3	2026-05-07	10.5281/zenodo.20072618	Full corpus expansion, VPNS 21 states, 25 BM formula clusters, COSMO architecture
V4	2026-05-09	10.5281/zenodo.20098162	V21 meaning corrections, 81-folio plant table, 23-chapter BM/AH mapping, Team B suite
V5	2026-05-12	10.5281/zenodo.20138182	27-corpus rival tournament, BALNEO recharacterised, pharmacopoeia architecture
V6	2026-05-17	10.5281/zenodo.20165134	Phonological convergence W1–W5, EZ acquisitions, 42 BM formula sources
V7	2026-05-20	10.5281/zenodo.20198001	D-tier resolved (0%), 5 confirmed nakshatra, HERBAL opener grammar, V20 DB
V8	2026-05-29	TBD (pre-release)	98 sources / 572 sections / 102 recipe matches / Score-6 / 41-source EU exclusion / V8.116 tier audit (24%/65%/11%) / Varayogasāraya identified / 6 confirmed nakshatras (sara/#22 EZ-grounded) / 45 HERBAL CONFIRMED HIGH / geda=ghaṭa / seda CONFIRMED / BALNEO grammar complete

Repository layout

paper_v8.md                    ← current paper (V8, 2026-05-29)
paper_v7.md / v5.md / …        ← preserved earlier versions
canonical_plant_test.py        ← DB-level plant ID verification (NOTE: stale against V8; do not use as release gate — see tests/ when updated)
recipe_coherence_test.py       ← recipe folio match integrity
run_all.sh                     ← full validation gate (23/23 pass; extended legacy suite)

scripts/                       ← decoders, statistical tests, corpus analysis
translation/
  voynich_v20_corpus.db        ← canonical corpus DB (36,633 tokens; V8)
supplementary/                 ← extended analysis writeups (60+ files)
  FORMULA_TRANSMISSION_EVIDENCE.md  ← 98 sources, 572 sections (§§1–572)
  EUROPEAN_TRANSMISSION_EVIDENCE.md ← 41-source EU exclusion table
  DECODER_RULES_COMPLETE.md
  COSMO_NAKSHATRA_MAP.md
  HERBAL_PLANT_IDENTIFICATIONS.md
  … 
teamb_rerun_d32bc5e_20260515/  ← Team B validation suite (19/19 PASS)
references/medical_corpus/     ← comparison corpora (BM, Caraka, AH, Sārattha Saṃgaha, …)
results/
  CHECKSUMS.sha256             ← runtime manifest (rewritten by run_all.sh)
  CHECKSUMS_RELEASE.sha256     ← immutable release manifest (do not overwrite)

Reproducing the key tests

All analysis runs on Python 3.8+ with no unusual dependencies (sqlite3, scipy, numpy). Clone the repo and:

# Full validation gate (extended legacy suite, 23 scripts)
./run_all.sh

# Recipe coherence test (canonical)
python3 recipe_coherence_test.py

# Team B suite (19 tests, all pass)
bash teamb_rerun_d32bc5e_20260515/run_current_db_suite.sh

# Formula null suite
python3 scripts/v20_full_validation.py

Note on canonical_plant_test.py: This test is currently stale against V8 plant/token revisions and fails 6 checks. Do not advertise it as a release gate until it is updated.

Current DB: translation/voynich_v20_corpus.db
SHA256: df50c831999efde19be5244f015fe555e0afd895ea9a650b0ad83791098e4732
Release checksums: results/CHECKSUMS_RELEASE.sha256
Runtime checksums: results/CHECKSUMS.sha256 (rewritten by run_all.sh)

Honest limitations

No Sinhala/Elu specialist review yet. The linguistic interpretation needs a philologist.
No trained botanist review yet. Plant identifications are candidate-level (45 CONFIRMED HIGH + 1 MEDIUM-HIGH; 3 active species conflicts; 52 further proposals).
Sister-language question remains open. Pali and Sinhala/Elu are closely related; the corpus discriminates well statistically but specialist review is the definitive test.
Initial-sound gap (/b/, /v/ near-absent). Remains a noted decoder risk.
Formula null suite is an internal consistency check. BM vocabulary informed meaning_fixed; the pre-V28 baseline was not significant (p≈0.27). The result confirms self-consistency, not independence.
canonical_plant_test.py is stale against V8 revisions — 6 failing checks. Not a release gate until updated.

Citation

@misc{basra2026voynich,
  title  = {A Candidate Decipherment of the Voynich Manuscript:
             Evidence for a Spoken Elu-Sinhala Pharmaceutical Register (V8)},
  author = {Basra, Kameldip Singh},
  year   = {2026},
  month  = {May},
  doi    = {10.5281/zenodo.18598229},
  url    = {https://doi.org/10.5281/zenodo.18598229},
  note   = {Concept DOI resolves to latest version; V7 version DOI: 10.5281/zenodo.20198001}
}

Acknowledgments

Beinecke Rare Book and Manuscript Library for digital access to MS 408. The EVA transcription community (Stolfi, Takahashi, and contributors) for the foundational transcription. Daniel Gaskell for open-sourcing the random-forest Voynich classifier used in §4.13. The Buddhist medical traditions of Sri Lanka, whose pharmacopoeial literature forms the comparative backbone of this work. Anthropic Claude (Sonnet 4.6 / Claude Code) was used throughout as a collaborative AI assistant for coding, statistical testing, corpus cross-referencing, and drafting — stated explicitly as a matter of transparency.

Name		Name	Last commit message	Last commit date
Latest commit History 491 Commits
.github/workflows		.github/workflows
archive		archive
data		data
output		output
publication		publication
references		references
release_v2		release_v2
results		results
reviewer_packs		reviewer_packs
scripts		scripts
srilankapics		srilankapics
supplementary		supplementary
teamb		teamb
teamb_dictionary_attack_audit_20260516		teamb_dictionary_attack_audit_20260516
teamb_rerun_cfad605_20260515		teamb_rerun_cfad605_20260515
teamb_rerun_d23b4ce_20260515		teamb_rerun_d23b4ce_20260515
teamb_rerun_d32bc5e_20260515		teamb_rerun_d32bc5e_20260515
translation		translation
v15_work		v15_work
.dockerignore		.dockerignore
.gitignore		.gitignore
.zenodo.json		.zenodo.json
AUDIT_NOTES.md		AUDIT_NOTES.md
CITATION.cff		CITATION.cff
DECODED_VOCABULARY.md		DECODED_VOCABULARY.md
Dockerfile		Dockerfile
GROUNDING_DOCUMENT.md		GROUNDING_DOCUMENT.md
LICENSE		LICENSE
MANIFEST.md		MANIFEST.md
PAPER_V1_VS_V2_LAYOUT.md		PAPER_V1_VS_V2_LAYOUT.md
README.md		README.md
REPRODUCTION.md		REPRODUCTION.md
RESEARCH_ROADMAP.md		RESEARCH_ROADMAP.md
SESSION_NOTES_v14.md		SESSION_NOTES_v14.md
SESSION_NOTES_v8.md		SESSION_NOTES_v8.md
SESSION_NOTES_v9.md		SESSION_NOTES_v9.md
TYPOLOGICAL_JUSTIFICATION.md		TYPOLOGICAL_JUSTIFICATION.md
UNRESOLVED_ISSUES_FOR_REVIEW.md		UNRESOLVED_ISSUES_FOR_REVIEW.md
UPLOAD_INSTRUCTIONS.md		UPLOAD_INSTRUCTIONS.md
VALIDATION_LOG.md		VALIDATION_LOG.md
canonical_plant_test.py		canonical_plant_test.py
decoded_vocabulary.tsv		decoded_vocabulary.tsv
main.pdf		main.pdf
main.tex		main.tex
p_initial_test.py		p_initial_test.py
paper.md		paper.md
paper_framework.md		paper_framework.md
paper_v1_archived.md		paper_v1_archived.md
paper_v2.md		paper_v2.md
paper_v2.pdf		paper_v2.pdf
paper_v2.tex		paper_v2.tex
paper_v3.md		paper_v3.md
paper_v3.pdf		paper_v3.pdf
paper_v4.md		paper_v4.md
paper_v4.pdf		paper_v4.pdf
paper_v5.md		paper_v5.md
paper_v6.md		paper_v6.md
paper_v7.md		paper_v7.md
paper_v8.md		paper_v8.md
recipe_coherence_test.py		recipe_coherence_test.py
references.bib		references.bib
requirements.txt		requirements.txt
run_all.sh		run_all.sh
smoke_test.py		smoke_test.py
supplementary_v6.zip		supplementary_v6.zip
voynich_it.txt		voynich_it.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voynich Manuscript — Candidate Decipherment (V8)

About this project

The hypothesis in one sentence

Strongest evidence (non-circular, reproduced)

Version history

Repository layout

Reproducing the key tests

Honest limitations

Citation

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voynich Manuscript — Candidate Decipherment (V8)

About this project

The hypothesis in one sentence

Strongest evidence (non-circular, reproduced)

Version history

Repository layout

Reproducing the key tests

Honest limitations

Citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages