worldbank · avsolatorio · Apr 9, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 30, 2026
diff --git a/.gitignore b/.gitignore
@@ -166,3 +166,6 @@ cython_debug/
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
 uv.lock
+
+# Test folders
+src/ai4data/metadata/reviewer/tests
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -19,6 +19,13 @@ parts:
       - file: docs/metadata-augmentation/index.md
       - file: docs/metadata-augmentation/methodology.md
 
+  - caption: Metadata Reviewer
+    numbered: true
+    chapters:
+      - file: docs/metadata-reviewer/overview.md
+      - file: docs/metadata-reviewer/agentic-approach.md
+      - file: docs/metadata-reviewer/implementation.md
+
   - caption: Data Discoverability
     numbered: true
     chapters:

diff --git a/docs/metadata-reviewer/agentic-approach.md b/docs/metadata-reviewer/agentic-approach.md
@@ -0,0 +1,67 @@
+# Agentic AI Approach
+
+This chapter explains why a multi-agent pipeline is the appropriate architecture for metadata quality assurance at scale. It traces the progression from simple API-based LLM usage—which delivers genuine capability gains—through the structural limitations that motivate a more governed, agentic design.
+
+---
+
+## API-Based LLMs for Metadata Quality
+
+API-based LLM usage means sending a metadata record to a hosted or on-premises language model—such as OpenAI, Anthropic, or Azure OpenAI—via an HTTP API call and receiving a natural-language assessment in return. No custom model training is required; the model is accessed as a service and instructed through a prompt.
+
+In the metadata QA context, a typical call includes the full metadata record in the prompt and asks the model to identify issues: fields that are missing, descriptions that are inconsistent, titles that contain typos, or values that contradict one another. The response is a structured list of detected problems, often formatted as JSON for downstream processing. Because these models are trained on large and diverse corpora, they bring strong language understanding to the task without needing domain-specific fine-tuning.
+
+This approach represents a genuine capability leap for metadata quality work at scale. A single API call can assess a rich, free-form metadata record in seconds, surface issues a rule-based system would miss, and do so across any language or schema that the model understands. For organisations with large metadata catalogs, API-based LLM access makes systematic quality checking feasible for the first time.
+
+---
+
+## Limitations of API-Only Approaches
+
+However, API-based access alone does not address all organisational requirements. A single LLM call is episodic: it has no memory of previous assessments, no awareness of what has changed since the last review, and no record of decisions made by human reviewers. Metadata quality assurance also demands explicit governance—rules about which issue types to surface, which fields to exclude, and how findings should be categorised and prioritised—and these requirements cannot be reliably enforced through prompt instructions alone. Without structured orchestration, each call is independent, results are not comparable across runs, and audit trails do not exist.
+
+**Table 1. Structural Limits of LLM-Only Approaches**
+
+| Limitation | Why it matters for metadata QA |
+|---|---|
+| Episodic execution | Metadata QA requires longitudinal awareness: what changed, what was reviewed before. |
+| Weak governance | Organisational rules cannot be reliably enforced through prompts alone. |
+| Lack of auditability | Decisions must be traceable to evidence, versions, and reviewers. |
+| Inconsistency over time | Model and prompt drift undermine comparability across releases. |
+| Poor integration with rules | Schema checks, code lists, and regex rules must be deterministic. |
+
+---
+
+## What Agentic AI Frameworks Are
+
+Agentic AI frameworks are software toolkits that embed large language models within structured systems designed to support multi-step reasoning, persistence, and governance. Rather than treating an LLM call as a one-off interaction, these frameworks provide the scaffolding needed to turn LLM reasoning into repeatable, controllable processes.
+
+In practice, agentic AI frameworks wrap LLMs with several core capabilities: orchestration of multi-step workflows, memory and context persistence across executions, integration with external tools and data sources (such as APIs, code modules, and databases), explicit planning and execution logic, and monitoring, governance, and auditability mechanisms. Together, these capabilities transform LLMs from reactive text generators into components of systems that can act autonomously or semi-autonomously while remaining accountable and inspectable.
+
+For metadata enhancement, this distinction is critical. Metadata quality work requires repeated application of logic, awareness of prior assessments, coordination among specialised checks, and traceability of outcomes. Agentic AI frameworks provide the structural foundation needed to meet these requirements.
+
+**Table 2. Examples of Agentic AI Frameworks**
+
+| Framework | Language(s) | What It Does | Good For |
+|---|---|---|---|
+| LangChain + LangGraph | Python, JavaScript | Orchestrates multi-agent workflows; integrates memory stores, tools, and structured task graphs | Custom pipelines, conditional workflows, data-driven actions |
+| Microsoft AutoGen | Python, .NET | Built-in multi-agent conversation and orchestration framework with explicit role separation (e.g., planner, analyst, executor) | Complex distributed logic and enterprise workflows |
+| OpenAI Agents SDK | Python, JavaScript | SDK for building agents with tool calling, planning, and execution loops around LLMs | Rapid development of agentic workflows using hosted LLM services |
+
+---
+
+## How Agentic Tools Fit into a Metadata Enhancement Workflow
+
+For metadata enhancement, the goal is not simply to generate answers but to implement repeated, conditional, and auditable logic. Agentic AI tools enable this by structuring metadata quality work as a coordinated pipeline of specialised agents.
+
+A typical metadata quality agent pipeline begins with a **trigger mechanism**—such as a scheduled run or an event indicating that a dataset has been published or updated. A **detection agent** then scans metadata and applies quality rules to identify potential issues. A **context or memory agent** integrates persistent storage—such as a vector database or relational store—to retain knowledge of previous flags, annotations, and decisions. A **classification agent** assigns issue categories, such as consistency, completeness, or semantic error. A **severity scoring agent** applies policy-informed logic that considers rules, historical context, and prior reviewer behaviour. An **action and escalation agent** determines whether an issue should be presented as a suggestion, routed for human review, or escalated to a subject matter expert. Finally, a **logging and audit layer** records decisions, agent versions, and execution traces.
+
+This modular, agentic structure allows metadata enhancement to be both scalable and governable, ensuring that AI support strengthens—rather than undermines—human-centred metadata quality management.
+
+`ai4data.metadata.reviewer` implements a focused subset of this pipeline. The **primary** and **secondary** agents serve as the detection layer. The **critic** enforces governance through explicit exclusion rules. The **categorizer** and **severity_scorer** handle classification and prioritisation. Persistent storage, escalation routing, and a logging layer are outside the current scope, which is designed as a stateless, per-submission quality check that can be embedded into a broader workflow. See [implementation.md](implementation.md) for the full pipeline details.
+
+---
+
+## References
+
+- [ai4data Implementation](implementation.md) — Architecture and pipeline details for `ai4data.metadata.reviewer`
+- [Microsoft AutoGen documentation](https://microsoft.github.io/autogen/) — Multi-agent framework
+- [autogen-agentchat](https://pypi.org/project/autogen-agentchat/) — AutoGen conversation and team orchestration library
diff --git a/docs/metadata-reviewer/implementation.md b/docs/metadata-reviewer/implementation.md
@@ -0,0 +1,141 @@
+# Implementation of ai4data.metadata.reviewer
+
+This page describes the internal architecture of `ai4data.metadata.reviewer`: how its two main classes divide responsibility, how the five-agent pipeline processes a metadata record, how jobs are managed, and how the design remains independent of any specific LLM provider.
+
+The implementation is in [`src/ai4data/metadata/reviewer/`](../../src/ai4data/metadata/reviewer/).
+
+---
+
+## Architecture Overview
+
+The implementation is split across two classes with distinct roles:
+
+```
+MetadataReviewerClient          (public API)
+MetadataReviewerCore            (pipeline engine)
+```
+
+**`MetadataReviewerClient`** is the entry point for all external use. It handles job submission in both synchronous and asynchronous modes, tracks job state, and provides cancellation. It owns the `model_client` instance (the LLM connection) and passes it to `MetadataReviewerCore` at construction time.
+
+**`MetadataReviewerCore`** manages the AutoGen agent sessions. On each `run()` call it loads the agents manifest, constructs the AutoGen agent objects, assembles the team, and runs the conversation pipeline against the input metadata. It holds no provider-specific logic; all LLM communication goes through the `model_client` reference it receives.
+
+**`Job`** (in `jobs.py`) represents a single submitted request. It carries the job's `job_id`, current `status`, `result` (once complete), and `error` (on failure). Both submission modes return a `Job` immediately; the caller inspects or awaits it to retrieve the outcome.
+
+---
+
+## The Five-Agent Pipeline
+
+Each run passes the metadata record through five agents in sequence. The table below summarises each agent's role.
+
+| Agent | Receives | Outputs | Key behaviour |
+|---|---|---|---|
+| **primary** | Raw metadata record | JSON array of candidate issues (`detected_issue`, `current_metadata`, `suggested_metadata`) | Independent first-pass scan for all issue types: incorrect, inconsistent, contradictory, missing, duplicated, unclear, typos. Precision preferred; obvious issues must not be omitted. |
+| **secondary** | Raw metadata record | JSON array of candidate issues (same schema) | Independent re-scan. Does NOT rely on primary's output. Surfaces issues the primary may have missed. |
+| **critic** | Combined candidate list from primary and secondary | Filtered JSON array | Removes issues matching general, field-level, and data-state exclusion rules (see below). No speculation; only unambiguous issues pass. |
+| **categorizer** | Critic's filtered list | Same array with `issue_category` added | Assigns one of six exact category strings. Does not add or remove findings; only annotates. Tie-breaker rules applied in fixed order. |
+| **severity_scorer** | Categorizer's annotated list | Final array with `issue_severity` (integer 1–5) added | Assigns severity based on impact, not category alone. Down-weights issues matching exclusion classes to severity 1. Terminates with "DONE" to signal pipeline end. |
+
+Candidates accumulate through the primary and secondary passes. The critic applies a defined filter that removes noise. The categorizer annotates surviving issues without altering them. The severity scorer closes the pipeline with an impact-based score and the termination signal.
+
+---
+
+## Agent Exclusion Rules (Critic and Severity Scorer)
+
+The critic removes issues matching any of the three exclusion classes below. The severity scorer applies the same classes as down-weighting rules: any issue that passes the critic but still matches these conditions receives `issue_severity = 1` rather than being removed.
+
+### General Exclusions
+
+Issues of these types are removed entirely: capitalization-only, spacing or whitespace, style or stylistic preference, CRLF or newline or blank-line or trailing-space, formatting or encoding, abbreviation, code, empty list, missing fields, schema or schema structure, mixed-type objects reflecting structural variation, and URL structure.
+
+### Field-Level Exclusions
+
+Issues involving any of the following metadata fields are removed:
+
+`idno`, `proj_idno`, `version_statement`, `prod_date`, `version_date`, `changed`, `changed_by`, `contacts`, `topics`, `tags`, `database_id`, `visualization`
+
+### Data-State Exclusions
+
+Issues related to null or empty fields, empty lists, nested empty lists, or placeholder-only values with no semantic content are removed.
+
+---
+
+## Job Lifecycle
+
+A submitted job moves through the following states:
+
+```
+pending → running → done
+                 → failed
+                 → cancelled
+```
+
+**Synchronous submission** (`client.submit(metadata)`) spawns a daemon thread with its own `asyncio.run()` event loop and returns a `Job` immediately. The pipeline runs in that background thread. This mode is safe to call from any context, including a REPL or a script with no existing event loop.
+
+**Asynchronous submission** (`await client.submit_async(metadata)`) creates an asyncio `Task` in the caller's event loop and returns a `Job` immediately. This mode is appropriate when the caller is already running inside an async context.
+
+**Waiting for results:**
+
+- `job.wait_sync(timeout=...)` — blocks the calling thread until the job completes, fails, or the timeout expires.
+- `await job.wait(timeout=...)` — async equivalent; suspends the coroutine instead of blocking the thread.
+
+Both raise `RuntimeError` if the job ends in a `failed` or `cancelled` state.
+
+**Cancellation** is handled through an `ExternalTermination` handle stored in the session. Calling `job.cancel()` sets the cancellation flag; the running pipeline observes it and stops at the next agent boundary. The job transitions to `cancelled`.
+
+---
+
+## Team Presets
+
+The team preset controls how AutoGen routes messages between agents. The default is `RoundRobinGroupChat`, which steps through agents in the order defined in the manifest. Alternative presets can be passed via the `team_preset` parameter on `submit()` or `submit_async()`.
+
+| Preset | Routing mechanism | When to use |
+|---|---|---|
+| `RoundRobinGroupChat` (default) | Fixed sequential order | Standard pipeline execution; predictable, auditable turn order |
+| `SelectorGroupChat` | LLM selects the next agent | Dynamic routing based on prior output; useful when agent order should vary by content |
+| `MagenticOneGroupChat` | Dedicated orchestrator agent | Complex multi-step reasoning; orchestrator manages task decomposition |
+| `Swarm` | Agent-to-agent handoff | Distributed, loosely coupled execution; agents decide their own successors |
+
+---
+
+## Custom Agents Manifest
+
+The default manifest (`agents_manifest/default_agents_manifest.yml`) is bundled inside the package. To use a custom manifest, pass `assets_dir` to the `MetadataReviewerClient` constructor and `manifest_file` to `submit()` or `submit_async()`.
+
+The YAML structure is a top-level `agents_manifest` list. Each entry has a `name` and a `system_message`. The `name` values determine agent identity within the pipeline; the `system_message` is passed directly to the AutoGen agent at construction.
+
+Minimal custom manifest:
+
+```yaml
+agents_manifest:
+  - name: primary
+    system_message: |
+      Examine the metadata and list any issues that are incorrect, missing, or inconsistent.
+      Output a JSON array using the standard schema.
+
+  - name: severity_scorer
+    system_message: |
+      Assign issue_severity (1–5) to each finding based on impact.
+      Output a JSON array. Print a final line: DONE
+```
+
+To use it:
+
+```python
+client.submit(
+    metadata,
+    assets_dir="/path/to/my/manifest/",
+    manifest_file="custom_manifest.yml",
+)
+```
+
+---
+
+
+## References
+
+- [`src/ai4data/metadata/reviewer/core.py`](../../src/ai4data/metadata/reviewer/core.py) — `MetadataReviewerCore` implementation
+- [`src/ai4data/metadata/reviewer/client.py`](../../src/ai4data/metadata/reviewer/client.py) — `MetadataReviewerClient` implementation
+- [`src/ai4data/metadata/reviewer/jobs.py`](../../src/ai4data/metadata/reviewer/jobs.py) — `Job` class
+- [`src/ai4data/metadata/reviewer/agents_manifest/default_agents_manifest.yml`](../../src/ai4data/metadata/reviewer/agents_manifest/default_agents_manifest.yml) — Default agents manifest
+- [Microsoft AutoGen documentation](https://microsoft.github.io/autogen/) — Multi-agent framework
+- [autogen-agentchat](https://pypi.org/project/autogen-agentchat/) — AutoGen conversation and team orchestration library