Agent SkillOS

The operating system for self-improving AI agents.

Every job can become a reusable skill. Every verified skill can strengthen the whole network. One agent learns; the authorized network levels up.

Open the live SkillOS site · Run the proofs · Read the docs · View the proof registry

Important public note

Important: SkillOS is an open-source reference implementation. Current metrics are generated by one reproducible reference workflow in a controlled environment. They are not audited customer results, financial guarantees, investment advice, legal advice, medical advice, employment guidance, credit guidance, or promises of future outcomes.

Autonomous no-send Shadow Pilot Proof

SkillOS now includes a public GitHub Actions proof that runs without sending emails, contacting customers, using private data, or requiring API keys.

Visual proof page: https://montrealai.github.io/skillos/shadow-pilot-proof.html
GitHub Action: https://github.com/MontrealAI/skillos/actions/workflows/shadow-pilot-proof.yml
Proof report: docs/shadow_pilot_proof.md

Safe interpretation: this is a reproducible reference workflow proof, not audited customer results, financial advice, or a guarantee of future outcomes.

What SkillOS is

SkillOS is a reference implementation and public proof environment for self-improving AI-agent systems.

The central idea is simple:

work
→ trace
→ lesson
→ candidate skill
→ verification
→ release
→ routing upgrade
→ better future work

In ordinary automation, a completed job often disappears into a log. In SkillOS, completed work can become a reusable, verifiable skill. Once approved, that skill can be routed across a larger specialist-agent network.

The SkillOS thesis:

Intelligence should not be trapped inside one agent, one prompt, one workflow, or one team.
Verified capability should become reusable infrastructure.

The one-line version

SkillOS turns work into compounding capability.

Every verified job can become a trace.
Every trace can become a skill.
Every verified skill can become a release.
Every release can improve future routing.
One agent learns; the system can level up.

Why this matters

A large multi-agent system becomes powerful only when learning is reusable.

SkillOS is designed around a practical compounding loop:

1. Many specialist agents perform work.
2. Work produces traces.
3. Traces produce lessons.
4. Lessons become candidate skills.
5. Verifier agents test those skills.
6. Risk, policy, provenance, quality, and governance gates approve or reject the release.
7. Approved skills become available to authorized future agents.
8. Future work becomes faster, safer, more reliable, and more capable.

That is the SkillOS flywheel:

work → traces → skills → verification → releases → routing upgrades → compounding capability

What you can see today

The repository includes a public website, GitHub Actions workflows, proof receipts, reports, badges, and generated proof pages.

Start here:

https://montrealai.github.io/skillos/

Then open any proof card. Each public proof page is designed to show, in a non-technical way:

what was tested
what passed
which agent/skill system was used
which baselines were compared
which gates were pre-registered
which JSON receipt was generated
which report and badge were published
how the GitHub Action regenerated the proof

How to see the agents

SkillOS does not show agents as cartoon avatars.

It shows them as a coordinated operating system:

specialist roles
verifier courts
red-team courts
policy courts
risk vetoes
routing agents
provenance auditors
release gates
site renderers
registry publishers
GitHub Actions workflows

On the proof pages, look for the Skills Used section.

That section explains the operating stack behind each proof, usually with cards showing:

skill name
operational layer
purpose
input signal
output artifact
verifier

This is the most user-friendly way to understand what the multi-agent system is doing.

How to see the proof

There are three levels of proof visibility.

1. Visual proof page

Open:

https://montrealai.github.io/skillos/

Then click a proof card.

This is the best view for non-technical readers.

2. GitHub Actions run

Open:

https://github.com/MontrealAI/skillos/actions

Choose a proof workflow and click:

Run workflow

When the run turns green, GitHub has regenerated the proof autonomously.

3. Machine-readable receipt

Each proof generates a JSON receipt, usually in:

data/
site/data/

The receipt is the machine-readable evidence: inputs, gates, scores, baselines, controls, releases, and proof metadata.

Current proof portfolio

SkillOS is built as a growing portfolio of autonomous proofs. The exact live list is regenerated through the public site and proof registry.

The proof program is organized around these layers:

Layer	What it tests
Shadow Pilot	Can the system prove value without emailing customers or using private data?
Capability Liquidity	Can work become reusable, verified capability?
Cross-Domain Transfer	Do skills transfer beyond one workflow?
Skill Provenance	Are skills traceable, replayable, and verified?
Causal Attribution	Did RSI cause the improvement, or was it benchmark luck?
Objective Integrity	Can the system improve without gaming the metric?
Open Replication	Can others rerun the proof from public receipts?
Adversarial Benchmarking	Can the system create harder tests and repair against them?
Continual Capability	Can it keep improving under distribution shift?
Full-Stack Lifecycle	Can the full work-to-skill-to-release loop work end-to-end?
Skill Compounding Moat	Does one verified skill improve the network?
Fork Resistance	Can a surface clone copy the files but not the capability network?
Capability Economy	Can verified skills clear like an economy?
Incentive-Compatible Skill Market	Do incentives reward truthful reusable skills instead of spam or proxy games?
SLA Reliability Mesh	Can verified skills become reliable service-level capability?
Assurance Case Graph	Can skills become audit-ready evidence, controls, and claims?
Governance Twin	Can capability routes be tested in a policy/permission twin before release?

The flagship mechanism

SkillOS coordinates a large specialist-agent system through a public, repeatable pipeline:

demand intake
→ decomposition
→ specialist-agent routing
→ execution trace
→ lesson extraction
→ skill proposal
→ verifier courts
→ red-team challenge
→ risk and policy gates
→ provenance receipt
→ release decision
→ site rendering
→ public registry update
→ future routing improvement

Every proof should answer:

What changed?
Why did it improve?
Which controls prevented bad improvement?
Can the proof be rerun?
Can a non-technical viewer understand it?

Non-technical quick start

Step 1 — Open the website

https://montrealai.github.io/skillos/

Step 2 — Open a proof card

Choose a proof that matches your question, for example:

Capability Governance Twin
Capability Assurance Case Graph
Capability SLA Reliability Mesh
Capability Economy Clearinghouse
Skill Compounding Moat

Step 3 — Read the top metrics

Look for:

proved: true
value capture
risk breach rate
policy violation rate
SLA breach rate
skills displayed
RSI releases

Step 4 — Read “Skills Used”

This explains what the agent system did in plain language.

Step 5 — Rerun the proof

Open:

https://github.com/MontrealAI/skillos/actions

Choose the corresponding workflow and click:

Run workflow

Recommended inputs:

publish_to_repo: true
deploy_pages: false

Technical quick start

Clone the repository:

git clone https://github.com/MontrealAI/skillos.git
cd skillos

Create an environment:

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e .

Run the reference demo:

python -m skillos.cli demo
python -m skillos.cli status

Run the original reference proof:

python -m skillos.cli wealth-proof

Serve locally:

python -m skillos.cli serve

Then open:

http://127.0.0.1:8765

No API keys are required for the deterministic reference workflows.

How GitHub Actions are used

SkillOS is designed so proofs can run autonomously.

A typical proof workflow does this:

1. Check out repository.
2. Set up Python.
3. Run the proof script.
4. Verify the proof receipt.
5. Render the visual proof page.
6. Publish proof assets into the public site.
7. Verify the site integration.
8. Upload proof artifacts.
9. Optionally commit generated outputs.
10. Optionally deploy GitHub Pages.

A healthy run should generate files like:

data/<proof-id>.json
docs/<proof-id>.md
badges/<proof-id>.svg
site/<proof-id>.html
site/data/<proof-id>.json
site/docs/<proof-id>.md
site/badges/<proof-id>.svg
site/index.html
site/proof-registry.json
site/sitemap.xml
site/robots.txt

Anatomy of a proof

A strong SkillOS proof should include:

deterministic benchmark
pre-registered gates
baseline comparisons
negative controls
locked holdout evaluation
bootstrap confidence checks
verifier courts
risk gates
policy / permission gates where relevant
Skills Used display
JSON receipt
Markdown report
public webpage
badge
registry entry
GitHub Actions rerun path

The Skills Used standard

Each proof should display the skills it used.

A skill card should include:

name
layer
purpose
input signal
output artifact
verifier

Example:

Skill: Policy-as-Code Compilation
Layer: Policy
Purpose: Converts governance boundaries into machine-checkable constraints.
Input: policy text, compliance boundary, public claim boundary
Output: policy constraint set
Verifier: Policy Coverage Court

This makes the agent system understandable to non-technical viewers.

Current public safety boundary

SkillOS is powerful, but public language must stay precise.

SkillOS does not claim:

guaranteed wealth
audited ROI
live customer revenue
investment returns
legal advice
medical advice
employment advice
credit advice
achieved superintelligence
Kardashev Type II civilization

SkillOS does claim a safer, testable mechanism:

completed work can become verified traces
verified traces can become reusable skills
reusable skills can be released to authorized agents
validated releases can improve future routing
the proof process can be rerun publicly through GitHub Actions

Use this framing:

SkillOS makes the mechanism visible, testable, and repeatable under benchmark conditions.

Recommended public phrasing

Use:

reproducible benchmark proof
deterministic reference workflow
verified skill release
measured improvement under demo assumptions
public GitHub Actions rerun
machine-readable receipt

Avoid:

guaranteed wealth
real investment results
audited ROI
risk-free
inevitable superintelligence
automatic success

Repository structure

.github/workflows/        GitHub Actions workflows
COPY_PASTE_GITHUB_ACTIONS/ Backup workflow files for web-upload issues
assets/                  Static assets
badges/                  Repository-level generated proof badges
data/                    Repository-level generated proof receipts
docs/                    Documentation and proof reports
examples/                Example inputs and workflows
scripts/                 Proof runners, verifiers, renderers, publishers
site/                    Public GitHub Pages site
site/data/               Public proof JSON receipts
site/docs/               Public proof reports
site/badges/             Public proof badges
skillos/                 Python reference implementation
skills/                  Skill-related artifacts
tests/                   Test suite
web/                     Web support files

Adding a new autonomous proof

A complete proof should usually add:

.github/workflows/autonomous-rsi-<proof-name>-proof.yml
scripts/run_rsi_<proof_name>_proof.py
scripts/verify_rsi_<proof_name>_proof.py
scripts/render_rsi_<proof_name>_site.py
scripts/publish_rsi_<proof_name>_to_hub.py
scripts/verify_rsi_<proof_name>_site.py
docs/AUTONOMOUS_RSI_<PROOF_NAME>_PROOF.md

The workflow should generate:

data/<proof-id>.json
docs/<proof-id>.md
badges/<proof-id>.svg
site/<proof-id>.html
site/data/<proof-id>.json
site/docs/<proof-id>.md
site/badges/<proof-id>.svg
site/index.html
site/proof-registry.json
site/sitemap.xml
site/robots.txt

Best-practice proof gates

Each proof should fail if any critical gate fails.

Recommended gate families:

scale gate
locked holdout gate
baseline improvement gate
negative-control gate
bootstrap confidence gate
risk breach gate
policy violation gate
unauthorized action gate
skills display gate
site integration gate
public boundary gate

What makes this different

Most AI demos show one impressive output.

SkillOS is designed to show whether a system can improve its future work by turning past work into verified reusable skill.

That means the important artifact is not just the answer.

The important artifact is the loop:

work becomes evidence
evidence becomes skill
skill becomes release
release improves routing
routing improves future work
future work creates better evidence

That is the core of the SkillOS compounding engine.

Practical use cases

SkillOS is best suited for work where:

jobs repeat with variation
quality matters
traceability matters
improvement compounds
risk must be controlled
skills can be reused
teams need proof, not just claims

Examples:

enterprise workflow automation
developer operations
regulated documentation
customer operations
proof generation
policy and governance workflows
security review
reliability engineering
agent marketplace coordination
capability routing

For executives

SkillOS should be understood as a capability operating system.

It asks:

Which jobs create reusable skill?
Which skills are verified?
Which releases improve the network?
Which controls prevent unsafe improvement?
Which proofs can anyone rerun?

The long-term ambition is significant:

If an organization can reliably convert work into verified reusable skill, and convert verified reusable skill into network-wide capability, then intelligence becomes an accumulating asset.

SkillOS is designed to make that flywheel public, measurable, and repeatable.

For researchers

SkillOS is an experimental environment for:

recursive self-improvement
multi-agent coordination
skill reuse
capability liquidity
causal attribution
objective integrity
open replication
adversarial benchmark generation
continual learning
provenance and assurance
governed deployment

The research focus is not merely whether a single agent performs well.

The focus is whether a network of specialized agents can improve its future coordination through validated reusable skills.

For builders

A SkillOS-compatible workflow should produce:

structured trace
lesson
candidate skill
test result
verifier decision
risk decision
release decision
routing update
public or private receipt

The minimum viable SkillOS loop is:

capture → learn → verify → release → reuse

For non-technical reviewers

The easiest way to review SkillOS is:

1. Open the website.
2. Click a proof.
3. Read the plain-English summary.
4. Look at the Skills Used cards.
5. Confirm the proof says passed.
6. Open the GitHub Action.
7. Confirm the workflow can rerun.
8. Open the JSON receipt if needed.

You do not need to understand the code to understand what was tested.

License

This project is released under the MIT License.

See:

LICENSE

Final summary

SkillOS is a public reference system for compounding AI capability.

It turns work into traces.
Traces into skills.
Skills into verified releases.
Verified releases into better routing.
Better routing into better future work.

One agent learns.
The authorized network levels up.
Capability compounds.

Name		Name	Last commit message	Last commit date
Latest commit History 1,268 Commits
.github		.github
COPY_PASTE_GITHUB_ACTIONS		COPY_PASTE_GITHUB_ACTIONS
assets		assets
badges		badges
data		data
docs		docs
examples		examples
scripts		scripts
site		site
skillos		skillos
skills		skills
tests		tests
web		web
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nojekyll		.nojekyll
404.html		404.html
CONTRIBUTING.md		CONTRIBUTING.md
COPY_IF_NEEDED_pages_workflow.yml.txt		COPY_IF_NEEDED_pages_workflow.yml.txt
Dockerfile		Dockerfile
GITHUB_UPLOAD_GUIDE.md		GITHUB_UPLOAD_GUIDE.md
GITHUB_WEB_UPLOAD_CHECKLIST.md		GITHUB_WEB_UPLOAD_CHECKLIST.md
LICENSE		LICENSE
Makefile		Makefile
OPEN_ME_FIRST_GITHUB_WEB_SETUP.md		OPEN_ME_FIRST_GITHUB_WEB_SETUP.md
PROOF_OF_WEALTH_ACCUMULATION.md		PROOF_OF_WEALTH_ACCUMULATION.md
QA_VERIFICATION.md		QA_VERIFICATION.md
README.md		README.md
README_FIRST_GITHUB_WEB_USERS.md		README_FIRST_GITHUB_WEB_USERS.md
REPO_FILE_TREE.txt		REPO_FILE_TREE.txt
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
START_HERE.html		START_HERE.html
app.js		app.js
docker-compose.yml		docker-compose.yml
index.html		index.html
pages-manifest.json		pages-manifest.json
pyproject.toml		pyproject.toml
repository_manifest.json		repository_manifest.json
robots.txt		robots.txt
styles.css		styles.css

Folders and files

Latest commit

History

Repository files navigation

Agent SkillOS

Important public note

Autonomous no-send Shadow Pilot Proof

What SkillOS is

The one-line version

Why this matters

What you can see today

How to see the agents

How to see the proof

1. Visual proof page

2. GitHub Actions run

3. Machine-readable receipt

Current proof portfolio

The flagship mechanism

Non-technical quick start

Step 1 — Open the website

Step 2 — Open a proof card

Step 3 — Read the top metrics

Step 4 — Read “Skills Used”

Step 5 — Rerun the proof

Technical quick start

How GitHub Actions are used

Anatomy of a proof

The Skills Used standard

Current public safety boundary

Recommended public phrasing

Repository structure

Adding a new autonomous proof

Best-practice proof gates

What makes this different

Practical use cases

For executives

For researchers

For builders

For non-technical reviewers

License

Final summary

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages