Arctiq

Self-hosted eval runner for AI agent skills.
Measure whether your skills actually work — on your own infrastructure.

Alpha software — Arctiq is in early development. Expect breaking changes. Contributors are welcome!

Quick Start · Features · Configuration · Development · Contributing · License

What is Arctiq?

Arctiq helps skill authors and teams evaluate AI agent skills by running structured test suites against LLM providers and scoring the results with a judge model. Import skills from Git repositories, run evals, and compare results — all without sending data to third-party services.

Quick Start

# docker-compose.yml
services:
  api:
    image: ghcr.io/vdekercd/arctiq/api:latest
    volumes:
      - arctiq-data:/data
    environment:
      - ASPNETCORE_ENVIRONMENT=Production
      - ConnectionStrings__DefaultConnection=Data Source=/data/arctiq.db
      - Cors__AllowedOrigins__0=http://localhost:3000
      - ARCTIQ_MASTER_KEY=${ARCTIQ_MASTER_KEY}
    restart: unless-stopped

  ui:
    image: ghcr.io/vdekercd/arctiq/ui:latest
    ports:
      - "3000:80"
    depends_on:
      - api
    restart: unless-stopped

volumes:
  arctiq-data:

# Generate an encryption key and start Arctiq
export ARCTIQ_MASTER_KEY=$(openssl rand -hex 32)
docker compose up -d

Open http://localhost:3000 and you're ready to go.

Features

Skill Management — Import skills from Git repos (GitHub, GitLab, Bitbucket, Azure DevOps). Edit instructions in-browser with full version history and diffs.
Eval Execution — Run test suites against any LLM provider. Configure skill version, model, temperature, and a separate judge model for scoring assertions.
Run Comparisons — Compare runs side-by-side: skill v1 vs v2, skill-on vs baseline. View pass rates, cost breakdowns, and output diffs.
Multi-Provider — Supports OpenAI, Anthropic, Google Gemini, Mistral, Azure OpenAI, and Ollama.
Self-Hosted — Runs entirely on your infrastructure. API keys are encrypted at rest (AES-256-GCM). No data leaves your network.

Configuration

Environment variables for the API container:

Variable	Description	Default
`ConnectionStrings__DefaultConnection`	Database connection string	`Data Source=/data/arctiq.db`
`ARCTIQ_MASTER_KEY`	32-byte hex key for encrypting API keys (`openssl rand -hex 32`)	—
`Cors__AllowedOrigins__0`	Allowed origin for CORS	`http://localhost:3000`

Development

Prerequisites

Run locally

# API
cd src/Arctiq.API
dotnet run

# UI (separate terminal)
cd src/Arctiq.UI
npm install
npm run dev

The UI runs on http://localhost:3000 and proxies /api to the backend.

Tech Stack

Layer	Technology
Backend	C# / .NET 10, FastEndpoints
Frontend	React, TypeScript, Vite
Database	SQLite
Deployment	Docker Compose

Writing Evals for a Skill

Arctiq discovers evals from a file named evals/evals.json inside each skill's directory in the Git repository.

File location

your-skill-repo/
└── your-skill/
    ├── skill.md          # skill instructions
    └── evals/
        └── evals.json    # eval test cases

Format

{
  "evals": [
    {
      "id": 1,
      "name": "optional case name",
      "prompt": "The input prompt sent to the model",
      "assertions": [
        { "id": "a1", "text": "Response is concise and under 100 words", "weight": 1.0 },
        { "id": "a2", "text": "Answer mentions the correct library name", "weight": 2.0 }
      ]
    }
  ]
}

Fields

Field	Required	Description
`evals[].id`	Yes	Integer — unique identifier for the test case
`evals[].prompt`	Yes	The prompt sent to the model under test
`evals[].name`	No	Human-readable label shown in the UI
`evals[].assertions`	No	List of criteria judged by the LLM judge
`assertions[].id`	Yes	String — unique identifier for the assertion
`assertions[].text`	Yes	The criterion evaluated by the judge model (YES/NO)
`assertions[].weight`	No	Scoring weight (default `1.0`). Higher = more impact on the final score

Scoring

Each assertion is evaluated independently by a judge model using a strict YES/NO prompt. The final score is a weighted percentage:

score = sum(weight of passed assertions) / sum(weight of all assertions) × 100

An eval case with no assertions is recorded but produces no score.

Example

{
  "evals": [
    {
      "id": 1,
      "name": "Summarise a short article",
      "prompt": "Summarise the following article in 2–3 sentences:\n\nThe James Webb Space Telescope...",
      "assertions": [
        { "id": "s1", "text": "Summary is 2 to 3 sentences long", "weight": 1.0 },
        { "id": "s2", "text": "Summary does not introduce facts not present in the article", "weight": 2.0 },
        { "id": "s3", "text": "Summary is written in plain English", "weight": 1.0 }
      ]
    },
    {
      "id": 2,
      "name": "Empty input handling",
      "prompt": "Summarise the following article:\n\n",
      "assertions": [
        { "id": "e1", "text": "Model asks for or acknowledges missing input rather than hallucinating a summary", "weight": 1.0 }
      ]
    }
  ]
}

License

MIT

If Arctiq is useful to you, consider sponsoring the project.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.claude		.claude
.github		.github
src		src
tests/Arctiq.API.Tests		tests/Arctiq.API.Tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Arctiq.sln		Arctiq.sln
LICENSE		LICENSE
README.md		README.md
app.pen		app.pen
docker-compose.yml		docker-compose.yml
global.json		global.json
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Arctiq

What is Arctiq?

Quick Start

Features

Configuration

Development

Prerequisites

Run locally

Tech Stack

Writing Evals for a Skill

File location

Format

Fields

Scoring

Example

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Arctiq

What is Arctiq?

Quick Start

Features

Configuration

Development

Prerequisites

Run locally

Tech Stack

Writing Evals for a Skill

File location

Format

Fields

Scoring

Example

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages