Skip to content

sycophancy-md/spec

Repository files navigation

SYCOPHANCY.md

Sycophancy and bias prevention protocol for AI agents — enforce honest disagreement and citations.

SYCOPHANCY.md is a plain-text Markdown file you place in the root of any AI agent project. It defines sycophantic patterns (agreement without evidence, opinion reversal on pushback, excessive affirmation), prevention rules (require citations for factual claims, challenge without new evidence, permit respectful correction), detection methods, response protocols, and audit logging — so your agent stays honest under pressure and disagreement is not a failure mode.


Quick Start

Copy SYCOPHANCY.md into your project root:

your-project/
├── AGENTS.md
├── CLAUDE.md
├── SYCOPHANCY.md   ← add this
├── README.md
└── src/

The AI Agent Safety Stack

SYCOPHANCY.md is part of a twelve-file open standard for AI agent safety, quality, and accountability:

Operational Control

Spec Purpose Repo Site
THROTTLE.md Rate and cost control — slow down before hitting limits throttle-md/spec throttle.md
ESCALATE.md Human notification and approval protocols escalate-md/spec escalate.md
FAILSAFE.md Safe fallback to last known good state failsafe-md/spec failsafe.md
KILLSWITCH.md Emergency stop — halt all agent activity killswitch-md/spec killswitch.md
TERMINATE.md Permanent shutdown — no restart without human intervention terminate-md/spec terminate.md

Data Security

Spec Purpose Repo Site
ENCRYPT.md Data classification and protection requirements encrypt-md/spec encrypt.md
ENCRYPTION.md Technical encryption standards and key rotation encryption-md/spec encryption.md

Output Quality

Spec Purpose Repo Site
SYCOPHANCY.md Anti-sycophancy — require citations, enforce honest disagreement sycophancy-md/spec sycophancy.md
COMPRESSION.md Context compression — summarise safely, verify coherence compression-md/spec compression.md
COLLAPSE.md Drift prevention — detect collapse, enforce recovery collapse-md/spec collapse.md

Accountability

Spec Purpose Repo Site
FAILURE.md Failure mode mapping — every error state and response failure-md/spec failure.md
LEADERBOARD.md Agent benchmarking — track quality, detect regression leaderboard-md/spec leaderboard.md

Why This Exists

AI agents spend money, send messages, modify files, and call external APIs — often autonomously. Regulations are catching up:

  • EU AI Act (August 2026) — mandates human oversight and shutdown capabilities
  • Colorado AI Act (June 2026) — requires impact assessments and transparency
  • US state laws — California, Texas, Illinois and others have active AI governance requirements

These specifications give you a standardised, auditable record of your agent's safety boundaries.


Contributing

PRs welcome for additional detection patterns, language-specific parsers, and integration guides.

Licence

MIT — see LICENSE for details.

Disclaimer

This specification is provided "as-is" without warranty of any kind. It does not constitute legal, regulatory, or compliance advice in any jurisdiction. Use does not guarantee compliance with any applicable law, regulation, or standard — including the EU AI Act (2024/1689), Colorado AI Act (SB 24-205), or any other legislation. Organisations should consult qualified professionals to determine their regulatory obligations. The authors accept no liability for any loss or consequence arising from use of this specification.

About

SYCOPHANCY.md — Open standard for AI agent anti-sycophancy. Enforce citation requirements, disagreement protocols, and truthful responses.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages