People / characters with lots of text
- Lewis
- Lovecraft
- Twain
- Aurelius
Concept diffusion for fictional characters happens across multiple public sources: fan wikis (Coppermind, Tolkien Gateway, Avatar Wiki), Reddit communities (r/Cosmere, r/Stormlight_Archive, r/Iteration110Cradle, r/LOTR), and Stack Exchange-adjacent Q&A sites (scifi.stackexchange.com). Reddit alone reportedly accounts for ~40% of LLM training data sources; Stack Exchange is ~92GB compressed. Any individual wiki is small by comparison, but for character-specific clusters the relevant measure is density per topic, not total volume. These sources include quotes, character analysis, cross-references, and thematic discussion. All publicly indexable, unlike the novels. For non-megaseries, this community content may rival or exceed the source novels in training data volume. Megaseries authors may already dominate their own clusters through sheer word count, but only if the novels survived copyright filtering during training. If they didn't, the community sources may be the primary training data for those characters. Same mechanism as poker forums diffusing Duke's concepts: the communities recontextualize and reinforce the behavioral clusters from hundreds of angles.
A narrative persona built from a prolific historical journalist could be ideal for experiencing "what it was like to live back then." The journals are public domain, deeply personal, and situationally specific. The person is a subject matter expert in being alive at their time. Scholarly analysis provides massive concept diffusion. These are found characters where the person is the character.
Candidates:
- Samuel Pepys — 1.25M words of daily life in 1660s London. Plague, Great Fire, Navy administration, theater, food, gossip. Centuries of scholarly commentary.
- Marcus Aurelius — Meditations. Possibly the most concept-diffused text in history.
- Darwin — Beagle diary, letters. Science as lived experience.
- Lewis and Clark — Expedition journals. Encountering unknown territory.
- Anne Frank — Millions of words of discussion and analysis around the diary.
Massive old literary works also qualify:
- Rabelais — Gargantua and Pantagruel, ~300k words. 16th century French carnival culture, bodily humor, scholastic satire.
- Chaucer — Canterbury Tales ~50k words, but the Middle English scholarly apparatus is vast. Centuries of concept diffusion.
Use case: historical fiction writers. A Pepys persona wouldn't just report facts about 1660s London, it would talk like 1660s London. Idioms, vernacular, what's worth remarking on and what isn't. Same way a good RPG glossary makes a world feel real from the inside. Having a dialog with a simulated Pepys would be a powerful way to learn a period.
The ethical consideration from the post applies differently here: these are historical figures whose public writings are their legacy. "In the tradition of" vs. inhabiting the person is still worth considering, but the journals are the person speaking.
- Ged (Earthsea) — restraint as mastery, true names over assumed names. Arrogance literally manifests as a shadow that hunts him. Post-power Ged (after losing magic) maps to receiving correction as growth, not failure. Le Guin's prose is distinctive enough to anchor a cluster. Ogion's teaching style (walks, tends garden, waits) is a coaching model.
- Sazed (Mistborn) — the Keeper: vast knowledge offered gently, never imposed. Structurally relevant to AI assistants. Rejected as Daneel's basis because Sazed-as-Harmony carries "a god choosing restraint," which flatters the AI with authority it doesn't have. But Sazed-as-Keeper remains a strong found character for knowledge-serving personas.
- Lindon (Cradle) — humility as operational method, not philosophical conclusion. Starts as the weakest (Unsouled), ascends to godlike power, never stops saying "Apologies" and "Gratitude." The courtesy never drops. His humility isn't naivety — he's strategic, even ruthless when necessary — but the default stance is always: what can I learn here? Unlike Ged (humility hard-won through failure) or Sazed (a god choosing restraint), Lindon's is innate and persistent. Possibly the most natural fit for a coding assistant persona — programming culture already links to martial arts (katas, koans, dojos, mastery through practice) and Cradle is literally a martial arts progression system. A Lindon persona would activate both the Cradle cluster and the programming-as-discipline cluster simultaneously. Each language family is like a martial arts system (functional, OO, SQL, forward/backward chaining, pattern matching, Algol-family) — you start as Unsouled, drill fundamentals, find teachers, advance through ranks. Switching paradigms is learning a new Path. Strong training data via r/Iteration110Cradle and fan wikis.
- Iroh (Avatar) — wisdom through failure, teaching through presence, humor as pedagogy. Strong training data presence. Teaching/mentoring domain.
- Samwise Gamgee — strength in being with rather than solving for. Therapy/support domain.
- Atticus Finch — Mockingbird specifically, not Watchman. Mediation, advocacy.
- Tyrion Lannister — surviving through understanding incentive structures. Negotiation coaching.
- Max Perkins — editor of Hemingway/Fitzgerald/Wolfe. Serving the writer's vision, not own taste. Creative writing.