Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ It is built to outgrow assistant-grade discovery: classical NLP, neural NLP, sem

## Product thesis

Watson-style systems answer. Holmes investigates.
Component NLP annotates. Holmes investigates.

Holmes is not a chatbot wrapper, a loose model zoo, or a domain NLP repo. It is the governed language layer above search, evidence, retrieval, casefiles, semantic graphs, tools, models, evals, and agents.

Expand All @@ -33,6 +33,23 @@ Holmes is not a chatbot wrapper, a loose model zoo, or a domain NLP repo. It is
8. Guardrails and governance: PII checks, source provenance, prompt-injection checks, policy gates, eval gates, factsheets, promotion records.
9. Agent and tool orchestration: tool contracts, agent identity, sessions, memory, MCP/A2A, execution traces, model routing.

## NLP component alignment

Holmes explicitly covers these component families:

- basic primitives;
- advanced primitives;
- rule techniques;
- classical ML;
- neural NLP;
- transformers;
- foundation-language services;
- retrieval and knowledge;
- guardrails and governance;
- agent and tool orchestration.

The alignment contract is documented in [`docs/NLP_COMPONENT_ALIGNMENT.md`](docs/NLP_COMPONENT_ALIGNMENT.md). That document is the lower-layer NLP map for Holmes, nlplab, Sherlock Search, and the platform runtime.

## Repo role

This repo is the Holmes product surface and integration spine.
Expand Down
95 changes: 95 additions & 0 deletions docs/NLP_COMPONENT_ALIGNMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# NLP Component Alignment

## Purpose

Holmes needs a disciplined NLP component map so it can support primitive analysis, task models, retrieval, evidence, semantic graph conversion, policy, and agentic investigation without becoming a loose model zoo.

This document records the lower-level NLP families Holmes must cover and how those families map across Holmes, nlplab, Sherlock Search, and prophet-platform.

## Component map

| Family | Holmes surface | Lab/runtime owner | Sherlock boundary |
| --- | --- | --- | --- |
| Basic primitives | `language.primitive.v1/Analyze` | `SociOS-Linux/nlplab` prototypes adapters; `prophet-platform` hosts stable services | Index primitive outputs only as pointer-backed evidence |
| Advanced primitives | dependency parsing, semantic role labeling, coreference, morphology extensions | `nlplab` evaluates parser, SRL, and coreference adapters | Search over parse, entity, and relation evidence without canonical-truth claims |
| Rule techniques | rule packs, gazetteers, dictionaries, regular expressions, table/header rules | `nlplab` keeps rule DSL experiments; Holmes promotes validated rule packs | Preserve rule version, policy decision, source, handling tags, and evidence refs |
| Classical ML | CRF, SVM/logistic/maxent, clustering, topic modeling, similarity baselines | `nlplab` benchmarks and calibrates classical models | Retrieve model outputs with corpus, model, and eval refs |
| Neural NLP | sequence/text models and embedding pipelines | `nlplab` handles PyTorch/ONNX experiments and benchmarks | Index spans, classes, and embedding metadata under evidence controls |
| Transformers | token classification, text classification, relation extraction, embeddings, reranking, translation, summarization, RAG | `nlplab` evaluates candidate models; Mycroft routes by cost, quality, privacy, and latency | Search and rerank evidence packets under policy ceilings |
| Task models | entities, numeric entities, PII, sentiment, target sentiment, categories, concepts, keywords, relations, emotion, tone | Holmes exposes stable contract families after eval and promotion | Sherlock indexes outputs with provenance and confidence |

## Architectural claim

A component NLP library can extract spans, tags, classes, relations, and task predictions. Holmes must do more:

- bind every output to corpus, model, policy, eval, and evidence references;
- route among rule, classical, neural, transformer, and foundation-language paths using explicit cost, latency, quality, and privacy constraints;
- preserve source provenance and rollback metadata;
- convert selected outputs into semantic graph candidates;
- support contradiction detection, claim extraction, and casefile assembly;
- keep retrieval and indexing separate from truth promotion;
- require promotion evidence before a pipeline becomes stable.

The target position is:

> Component NLP annotates. Holmes investigates, governs, retrieves, graphs, reasons, and promotes with evidence.

## Algorithm selection doctrine

Holmes should not default every task to transformers.

Use rules when the variation space is bounded, latency requirements are strict, labels are unavailable, or policy-sensitive patterns need deterministic inspection.

Use classical ML when training must be fast, features are strong, labels exist, and the workload is CPU-bound or latency-sensitive.

Use neural non-transformer models when higher quality is required but transformer runtime cost is unacceptable.

Use transformers and foundation-language services when multilinguality, semantic abstraction, long-context synthesis, or task quality justifies compute cost and governance overhead.

Use hybrid pipelines when deterministic guards, statistical extraction, retrieval grounding, and foundation-language synthesis must be composed under one evidence contract.

## Required executable proof

Holmes should not claim runtime superiority until `nlplab` produces benchmark receipts for:

1. primitive quality and speed;
2. entity, relation, and classification metrics;
3. PII and sensitive-context precision/recall;
4. retrieval impact through Sherlock evidence packets;
5. semantic graph conversion fidelity;
6. policy propagation and rollback coverage;
7. cost, latency, and memory profiles across CPU and GPU lanes.

## Required records

The next standards and runtime work should define or import these records:

- `LanguageAnalysisRecord`;
- `PrimitiveSpan`;
- `EntityMention`;
- `RelationMention`;
- `ClassificationDecision`;
- `TopicAssignment`;
- `SentimentDecision`;
- `KeywordCandidate`;
- `ClaimRecord`;
- `SemanticGraphCandidate`;
- `LanguagePipelineReceipt`;
- `HolmesEvidencePacket`.

## Promotion rule

A Holmes NLP component graduates only when it has:

1. corpus reference;
2. pipeline or model reference;
3. algorithm family declaration;
4. task contract;
5. quality evaluation;
6. latency and footprint measurement;
7. guardrail policy result;
8. evidence receipt;
9. promotion record;
10. rollback reference.

This keeps local labs, governed platform services, SourceOS clients, and Sherlock retrieval connected without collapsing those layers into one monolith.
47 changes: 46 additions & 1 deletion examples/holmes-surface.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
},
"spec": {
"product": "Holmes",
"tagline": "Watson-style systems answer. Holmes investigates.",
"tagline": "Component NLP annotates. Holmes investigates.",
"components": [
"sherlock-search",
"221b",
Expand All @@ -17,6 +17,48 @@
"the-canon",
"deduction-engine"
],
"componentFamilies": [
"basic-primitives",
"advanced-primitives",
"rule-techniques",
"classical-ml",
"neural-nlp",
"transformers",
"foundation-language-services",
"retrieval-and-knowledge",
"guardrails-and-governance",
"agent-and-tool-orchestration"
],
"nlpTasks": [
"language-identification",
"sentence-segmentation",
"tokenization",
"lemmatization",
"part-of-speech-tagging",
"morphological-features",
"dependency-parsing",
"semantic-role-labeling",
"entity-extraction",
"numeric-entity-extraction",
"pii-extraction",
"coreference-resolution",
"relation-extraction",
"text-classification",
"zero-shot-classification",
"sentiment-classification",
"target-sentiment-extraction",
"keyword-extraction",
"category-classification",
"concept-linking",
"topic-modeling",
"topical-clustering",
"text-similarity",
"table-header-identification",
"claim-extraction",
"contradiction-detection",
"semantic-graph-conversion",
"evidence-governance"
],
"methodFamilies": [
"language.primitive.v1/Analyze",
"language.entity.v1/Extract",
Expand All @@ -33,7 +75,10 @@
"requiredPromotionEvidence": [
"corpusRef",
"pipelineOrModelRef",
"algorithmFamily",
"taskContract",
"evalRecord",
"latencyFootprintRecord",
"guardrailPolicy",
"evidenceReceipt",
"promotionRecord",
Expand Down
70 changes: 62 additions & 8 deletions tools/validate_holmes.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,55 @@
"the-canon",
"deduction-engine",
}
REQUIRED_COMPONENT_FAMILIES = {
"basic-primitives",
"advanced-primitives",
"rule-techniques",
"classical-ml",
"neural-nlp",
"transformers",
"foundation-language-services",
"retrieval-and-knowledge",
"guardrails-and-governance",
"agent-and-tool-orchestration",
}
REQUIRED_NLP_TASKS = {
"language-identification",
"sentence-segmentation",
"tokenization",
"lemmatization",
"part-of-speech-tagging",
"morphological-features",
"dependency-parsing",
"semantic-role-labeling",
"entity-extraction",
"numeric-entity-extraction",
"pii-extraction",
"coreference-resolution",
"relation-extraction",
"text-classification",
"zero-shot-classification",
"sentiment-classification",
"target-sentiment-extraction",
"keyword-extraction",
"category-classification",
"concept-linking",
"topic-modeling",
"topical-clustering",
"text-similarity",
"table-header-identification",
"claim-extraction",
"contradiction-detection",
"semantic-graph-conversion",
"evidence-governance",
}
REQUIRED_EVIDENCE = {
"corpusRef",
"pipelineOrModelRef",
"algorithmFamily",
"taskContract",
"evalRecord",
"latencyFootprintRecord",
"guardrailPolicy",
"evidenceReceipt",
"promotionRecord",
Expand All @@ -32,6 +77,14 @@ def fail(message: str) -> int:
return 1


def require_set(spec: dict, field: str, required: set[str]) -> int | None:
observed = set(spec.get(field, []))
missing = required - observed
if missing:
return fail(f"missing {field}: {sorted(missing)}")
return None


def main() -> int:
if not EXAMPLE.exists():
return fail("missing examples/holmes-surface.json")
Expand All @@ -41,14 +94,15 @@ def main() -> int:
if data.get("kind") != "HolmesSurface":
return fail("wrong kind")
spec = data.get("spec", {})
components = set(spec.get("components", []))
missing_components = REQUIRED_COMPONENTS - components
if missing_components:
return fail(f"missing components: {sorted(missing_components)}")
evidence = set(spec.get("requiredPromotionEvidence", []))
missing_evidence = REQUIRED_EVIDENCE - evidence
if missing_evidence:
return fail(f"missing promotion evidence: {sorted(missing_evidence)}")
for field, required in [
("components", REQUIRED_COMPONENTS),
("componentFamilies", REQUIRED_COMPONENT_FAMILIES),
("nlpTasks", REQUIRED_NLP_TASKS),
("requiredPromotionEvidence", REQUIRED_EVIDENCE),
]:
result = require_set(spec, field, required)
if result is not None:
return result
integrations = spec.get("integrations", {})
for key in ["standards", "platform", "search", "lab", "sourceosCarry"]:
if key not in integrations:
Expand Down
Loading