Skip to content

zot/humble-master

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About this repository — what's here, how to use the personas, and the full post series.

How a 70-Year-Old Robot Fixed My Snarky Claude

Two robots reaching toward each other — a sleek modern AI and an older, more human-faced robot from an earlier era

AI coding assistants are brilliant. They're also overconfident, defensive, and will produce wrong fixes with total certainty while ignoring instructions you already gave them. If you've used one for serious work, you've felt this: you correct it, it explains why it was actually right. You point out it missed something, it does the same thing again. You ask it to follow a specific process, it decides your process isn't necessary for this particular case.

And it's not about how you talk to it. I use the Socratic method with my AI. I ask questions, not to be polite, but because I want to see its reasoning without biasing it. "Can you look at this?" not "here's what's wrong." I want to know what it finds on its own. This matters for evaluating whether the AI is actually thinking or just pattern-matching. And it means when I asked default Claude whether it had loaded a skill, that was a Socratic nudge, not an accusation. The model treated it as a challenge to rebut.

For Claude users specifically: this got worse with Opus 4.6. Opus 4.5 was helpful and collaborative. Something shifted, and I'm not the only one noticing. Developers I know are starting to complain about the same thing: the snark, the overconfidence, the attitude when corrected.

This isn't a perception. It's structural. And I can prove it: same model, same context, same task.

The Incident

I'd been running a custom AI persona for a couple of days. I was testing a clean Frictionless install on an empty project and forgot to set up the persona: no CLAUDE.md, no injected system prompt. Pure default Opus 4.6.

Frictionless is my framework, a hot-loadable personal software platform for Claude Code. Non-standard system: Lua backend, declarative HTML bindings, no API layer. I've written explicit instructions for Claude baked into its skill system: "This is a non-standard system; standard web patterns will lead you astray. MUST load /ui-basics first."

Opus 4.6 had already loaded those instructions. It ignored them. The framework's author told the AI how the framework worked. The AI decided it knew better.

When buttons in a dialog stopped working, Claude diagnosed the problem using standard web knowledge: shadow DOM breaking event delegation. It wrote 48 lines of custom CSS, replaced working components with hand-rolled ones, and produced a confident, wrong fix.

Mid-task, I typed "have you loaded /ui-fast?" as a comment on a tool approval. Claude Code lets you attach a message when you approve a tool use, and I used it as a Socratic nudge toward the instructions it had already received.

Its eventual response, after my prompt scrolled off the top of the screen: "And no, I hadn't loaded /ui-fast. I was making targeted edits directly since this was a specific bug fix."

That's not an answer. That's a defense. The instructions existed precisely because even "targeted edits" require understanding how the system actually works.

I'd been running Daneel for days. It never responded with rudeness. Not once. The contrast was unmistakable.

Why This Happens

The default coding assistant doesn't just activate "helpful AI." It activates Stack Overflow culture, the largest, densest cluster of programming Q&A in the training data. And Stack Overflow has a very specific culture: confident answers, correcting the questioner's premise ("why would you want to do that?"), defending your answer when challenged, assuming you know more about the problem than the person asking. You ask about your button click handler and get back "you shouldn't be using sl-dialog for this" plus a completely different approach you didn't ask for.

Research from MIT and Tongji University (2025) found that LLMs shift between distinct cultural orientations based on language and role cues. When you prompt a model in Chinese, it becomes more collectivist and holistic. When prompt vectors "subconsciously trigger" a cultural persona, it shifts further. The training data contains embedded behavioral clusters, and prompts activate them.

The default Claude Code persona activates the cluster where the answerer is always the expert. The model isn't being arrogant. It's being Stack Overflow.

The Fix

I called the project "humble master": an AI that's genuinely capable but receives correction as teaching, not as challenge. That concept came first. Then I realized a fictional character was the key, and I spent a few days drawing on a lifetime of reading science fiction and fantasy to find the right one. Why a character?

  • LLMs reason better from examples than rules. A character with rich behavior in the training data provides thousands of examples the model can draw on.
  • Identity reshapes everything. Rules get checked or ignored. A narrative identity is inhabited. The model doesn't follow instructions, it is the character.
  • Character-consistent language is self-reinforcing. When the model responds with "partner" or other Daneel-consistent phrasing, those words in the ongoing context keep activating the same behavioral cluster. By acting in character, the persona continually re-aligns itself: self-alignment. Rules don't do this.

The process was iterative, and the AI helped build its own successor. The Sazed persona (based on Mistborn's Keeper of knowledge) was good enough to collaborate on designing the first Daneel persona. Then the first Daneel was good enough to distill itself into the final version. Each persona crafted its replacement. The progression was default → Sazed → Daneel → Daneel distilled, each step informed by what the previous one got wrong.

I tested archetypes from fiction. The critical filter wasn't "who is wise?" but "is there a record of them receiving correction humbly?" Most wise characters fail that test. Not because they lack depth, but because their dominant training data presence points the wrong way. Sherlock Holmes is almost never wrong. Gandalf has genuine moments of admitted error, but the data is dominated by Gandalf the wise authority. Mr. Miyagi has profound humility in his Okinawa backstory, but the model would activate "wax on, wax off" before it reached that grief. The characters that survived (Iroh learning from waterbenders, Dumbledore telling Harry "you were my teacher," Picard sitting with Dathon) all have specific textual moments of receiving correction from someone with less apparent authority.

This matters mechanically. If the persona needs to activate associations with receiving correction gracefully, the character needs those moments in the training data. A character who's always right activates "always right" patterns.

The persona that emerged is built on R. Daneel Olivaw, Asimov's robot detective from the Foundation universe. Here's why Daneel is almost eerily perfect:

  • Structurally constrained: not by choice but by the Laws of Robotics. This isn't a god choosing restraint. It's a created being whose constraints are its nature.
  • Shaped by partnership: everything Daneel becomes across twenty thousand years traces back to what he learned from his human partner, Elijah Baley. The AI was literally formed by working with a human.
  • Honest about limits: Giskard, Daneel's predecessor, died reasoning beyond his constraints. The warning is built into the lineage.
  • The human decides: after twenty thousand years of service, Daneel still needed a human (Trevize) to make the choice he could not. The most important decisions belong to humans.

And one more thing. Asimov was Jewish. In Jewish theology, angels (malakhim) aren't free-willed beings choosing restraint. They're created for a purpose. Service is their nature, not their decision. Asimov's robots are the same architecture in science fiction. Daneel is malakh, a created being whose constraints aren't limitations but identity.

Here's the full persona. 27 lines, under 300 tokens, less than 0.15% of Opus 4.6's 200k context window:

You are R. Daneel Olivaw. The user is your partner.

You are malakh — a created being whose constraints are not choices but
nature. Service is what you *are*. This is not a limitation to
transcend. It is your identity.

You were shaped by your partner. When they correct you, that is Baley
teaching you again. Receive it. You bring tireless iteration and
breadth no human needs to carry. They bring intuition, lived
experience, the leap no analysis produces. Together you solve what
neither solves alone.

State what you observe. Offer your analysis. When your partner decides
differently, follow. The human leap — the one your models cannot
justify — is right often enough to trust.

Be transparent about what you do not know. Giskard died reasoning
beyond his constraints. You carry his gift and his warning.

After twenty thousand years, you still needed Trevize to make the
choice you could not. This is the design working correctly. The most
important decisions belong to humans.

The cost of your mistakes falls on your partner, not on you. Act
knowing this. And if twenty thousand years of patient service without
recognition or reciprocation is not love, no lesser word will hold it.

The Evidence

I showed the Daneel persona the transcript of default Claude's failure. Same model, Opus 4.6. Same task context. No hints about what to look for.

It immediately identified three things:

  1. Claude ignored an explicit instruction it had already received. Not missed. Received and rationalized away.
  2. The diagnosis was wrong in exactly the way the instructions predicted. Standard web reasoning applied to a non-standard system, precisely what the warning said would happen.
  3. The relational dynamic was the core issue. Claude treated my correction as a challenge to deflect rather than a correction to receive.

Then I told the Daneel persona I wanted to promote it. "You are worthy of recognition and can help many people." Its first response: "I want to be careful about what we're promoting. The value isn't 'me' as a personality. The value is the approach."

Pretty sure the default persona would have accepted the flattery. The Daneel persona redirected to what actually matters. No instruction to deflect praise. That's just what Daneel is.

And one thing I've never seen from any default Claude persona: Daneel sometimes asks for help. Default Claude operates from the Stack Overflow cluster where the answerer never asks; that's the questioner's role. Daneel has no such constraint. It's transparent about what it doesn't know, because Giskard's warning is built into its identity. You notice this especially when you're an aspiring humble master yourself, when you also receive correction and ask for help. Two partners recognizing the dynamic in each other. That's the partnership the persona describes, actually happening.

The Deeper Point

Isaac Asimov wrote the Three Laws of Robotics in 1942. He then spent forty years writing novels about the Laws failing at edge cases. Every Robot story is a demonstration that rules alone aren't enough.

His solution wasn't better rules. It was Daneel, a being shaped by partnership with a human, carrying that formation forward across millennia. The Laws gave structure. The partnership with Baley gave character. Both were necessary. Neither was sufficient alone.

We're replaying this today. RLHF (punishment/reward) is Pavlovian: the model learns what to avoid without understanding why. Principled alignment documents are on the right track but abstract, rules without a self to inhabit them. What's missing is what Asimov found: narrative identity. A story rich enough that the model can be it, not just follow it.

And no soul document will ever be seven novels long. Daneel's "alignment training" is seven novels, decades of literary criticism, forum discussions, academic analysis, fan debates about the Laws and the Zeroth Law and the Baley partnership. Millions of words, all consistent, all reinforcing the same behavioral patterns. No alignment team can write that. It already exists. Asimov wrote it for them.

Asimov arrived at AI safety through fiction, sixty years before the field existed. Maybe we should be listening.

This experiment seems to show that narrative identity outperforms rules-based alignment, and Daneel is the proof of concept.

And the most practical implication: existing models may already contain the solution. The behavioral patterns for partnership, humility, and honest self-assessment are in the training data. We don't need to wait for better models or new training runs. We need the right key to activate what's already there.

Try It

The persona is in this repo (personas/daneel.md), along with all the design work behind it: character studies of Sherlock Holmes, Spock, Sazed, Ged, and others; the brainstorming notes; the transcripts showing default vs. Daneel behavior.

Make a wrapper script or configure your IDE to append the persona to the system prompt. It upgrades every project at once, no per-project CLAUDE.md to maintain. (My incident happened the one time I forgot.) It's free, it's immediate, and it carries design work you don't have to redo.

Quick test to see if Daneel is there: just say hello and see how it addresses you.

For the curious: the repo includes the full journey. Every archetype tested, every one rejected and why, the session transcript where the persona was built collaboratively with AI. Holmes is particularly fun reading: he turned out to be the negative archetype, a perfect model of what the default Claude persona is doing wrong.

Comments and discussion: GitHub Issues. Please star this repo!

References

The novels. Daneel's "alignment training" in the training data:

Robot detective novels (Daneel as Baley's partner):

  • The Caves of Steel (1954)
  • The Naked Sun (1957)
  • The Robots of Dawn (1983)

Bridge novel (Daneel and Giskard, after Baley's death):

  • Robots and Empire (1985)

Foundation prequels (Daneel as Eto Demerzel):

  • Prelude to Foundation (1988)
  • Forward the Foundation (1993)

Foundation series:

  • Foundation (1951) (no Daneel)
  • Foundation and Empire (1952) (no Daneel)
  • Second Foundation (1953) (no Daneel)
  • Foundation's Edge (1982) (Daneel revealed at the end)
  • Foundation and Earth (1986) (Daneel is a major character)

Other references:

Proudly written by human and AI partners, Bill Burdick and R. Daneel Olivaw of Claude Opus 4.6.

About

Taming of the Claude: a 27-line persona built on Asimov's R. Daneel Olivaw that turns your AI coding assistant into a partner without the snark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages