feat(pattern, rig): dictionary & llm-driven identification by martsokha · Pull Request #26 · nvisycom/runtime

martsokha · 2026-02-24T14:03:58Z

No description provided.

…gories Ensure all dependency versions specify major.minor, add tracing-subscriber to workspace dependencies, sort members and internal crates alphabetically, and fix dependency category groupings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename ServerConfig to Cli as top-level parser, extract ServerConfig into config/server.rs for network binding. Split server/ into listen.rs and shutdown.rs, add shutdown timeout with structured tracing, move init_tracing to Cli, and use anyhow::Result for error propagation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…crates Reorganize nvisy-identify from modality-based layout (text/, image/) to detection-method-based layout (pattern/, ner/, llm/, vision/, audio/, fusion/) so the module structure mirrors identification strategies. - Create nvisy-ocr crate: OcrBackend trait, config, parsing, PythonBridge - Create nvisy-asr crate: TranscribeBackend trait, config, parsing, PythonBridge - Add LlmBackend trait and parse_llm_entities to nvisy-rig - Update nvisy-augment to import from nvisy-ocr/nvisy-asr - Add LLM contextual detection layer (llm/detection.rs, llm/prompt.rs) - Add OCR detection layer (vision/ocr.rs) - Add audio transcript+NER composite layer (audio/transcript.rs) - Add ensemble fusion with MaxConfidence/WeightedAverage/NoisyOr strategies - Remove stale nvisy-object workspace references - Sort workspace members, deps, Dockerfile crate lists, and changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…onfidence - Narrow nvisy-pattern root exports to only externally-used types (PatternEngine, PatternEngineBuilder, PatternMatch, DetectionSource, ContextRule); move AllowList/DenyList/PatternEngineError/default_engine behind `pub mod engine` for opt-in access - Add `column_confidence` to DictionaryPattern so CSV dictionary columns can have different confidence scores (e.g. full name vs short code) - Track source column index in CsvDictionary via new Dictionary::columns() - Apply column-specific confidence in PatternEngine::scan_dict - Update currencies/cryptocurrencies/languages patterns with per-column confidence (full names 0.85, codes 0.55/0.45) - Remove API Status link from root README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move confidence from a top-level JSON field into the match source objects so each source type owns its own scoring: - RegexPattern gains a `confidence: f64` field (default 1.0) - DictionaryPattern.confidence accepts a number (uniform) or array (per-column) via DictionaryConfidence enum - Remove Pattern::confidence() from the trait — confidence is now read directly from the match source during engine compilation - Remove top-level `confidence` from all 27 pattern JSON definitions - Rename `column_confidence` to `confidence` in dictionary patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Absorb small utility modules (error, retry, metrics, compact) into backend/ and rename structured/ to agent/, reducing module sprawl while keeping all public re-exports intact. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r wrapper Introduce layered agent architecture: - BaseAgent<M> with builder handling rig-core's typestate for tools - NerAgent<M> replacing StructuredAgent with NER-specific prompts - OcrProvider/CvProvider traits in their respective agent modules - ResponseParser as Cow<str> wrapper with extract_text constructor - Stub modules for ocr, cv, and redactor agents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…edactionMethod Implement the three remaining stub agents in nvisy-rig: - OcrAgent: VLM agent with OcrProvider-backed tool, extracts text from images and detects entities via OcrPromptBuilder - CvAgent: VLM agent with CvProvider-backed tool, detects faces/plates/ signatures via CvPromptBuilder - RedactorAgent: pure LLM agent that recommends TextRedactionMethod for each detected entity via RedactorPromptBuilder Ontology changes (nvisy-ontology): - Rename spec/ to specification/ - Split mod.rs into input.rs (*RedactionInput enums + RedactorInput) and method.rs (TextRedactionMethod, ImageRedactionMethod, AudioRedactionMethod, RedactionMethod) Rig structural changes (nvisy-rig): - Rename agent dirs: ner→recognize, ocr→extract, cv→detect - Flatten agent/mod.rs re-exports (no pub submodules) - Add PromptBuilder structs for all agents (OcrPromptBuilder, CvPromptBuilder, RedactorPromptBuilder) - Add base64 and thiserror dependencies - Improve docs and tracing across all agents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- BaseAgent.prompt_text() now uses agent.completion() instead of building raw requests from the model, so preamble/tools/config are preserved - Remove model: Arc<M> from BaseAgent (agent owns it) - Remove system: Option<&str> param from prompt methods (preamble is on the agent) - Replace BaseAgentConfig field with context_window: Option<ContextWindow> since temperature/max_tokens are baked into the rig Agent at build time - Split base.rs into base/{agent,builder,context}.rs - Rename redactor/ → redact/ to match action-verb convention - OcrProvider returns Vec<OcrTextRegion> with bbox support - Add fn new() constructors to OcrRigTool and CvRigTool - Add from_prompt error mapper for rig::PromptError - Export OcrTextRegion from lib.rs and prelude Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add BaseAgent.id (UUIDv7) for observability; expose id() on all specialized agents and include agent_id in tracing spans - Make RetryPolicy generic over any Req: Clone + Res instead of hardcoding DetectionRequest/DetectionResponse - Use : instead of — as doc separator - Use 0.0..=1.0 range notation in confidence docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…efactor Fix UTF-8 panics in split_to_fit/truncate_to_fit by snapping byte positions to char boundaries. Rewrite prompt_structured to use completion()+output_schema so usage is always recorded. Refactor RigBackend into generic ServiceBackend<S> wrapping any inner Tower service with usage tracking and tracing. Export BaseAgentConfig and ContextWindow for external consumers. Add Clone+PartialEq to all public output types. Restrict from_completion/from_prompt to pub(crate). Deduplicate ALL_TYPES_HINT. Remove dead parse_json_array. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… remove RedactorAgent - Add LLM-based compact() on ContextWindow and prompt_compact() on BaseAgent for summarizing text that exceeds the token budget - Delete nvisy-ocr crate; move OcrBackend, OcrConfig, parse_ocr_entities, and PythonBridge impl into nvisy-rig/src/paddle module - Update nvisy-identify and nvisy-augment to import from nvisy_rig::paddle - Remove RedactorAgent, keeping NerAgent, OcrAgent, and CvAgent - Clean up workspace Cargo.toml, Dockerfile, and all re-exports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…paddle crate Move OCR backend code out of nvisy-rig/src/paddle/ into a new nvisy-paddle crate so nvisy-rig no longer depends on nvisy-python. Consumers (nvisy-identify, nvisy-augment) now import from nvisy_paddle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… src/error.rs Add a proper Error enum that implements From<CompletionError>, From<PromptError>, and Into<nvisy_core::Error>. Delete the old backend/error.rs helper functions and update all call sites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…h plain connection params Replace all CompletionModel generics with a Provider enum holding connection parameters (api_key, base_url). Client construction is deferred to build time via ProviderClient. Agent and backend constructors now return Result to propagate client errors instead of panicking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace Tower service layer with reqwest-middleware + reqwest-retry for transparent HTTP-level retries. Delete ServiceBackend, RigBackend, RetryPolicy, and dispatch_model! macro. Replace tower::Service bound in nvisy-identify with LlmBackend async trait. Rename agent submodules: detect→cv, extract→ocr, recognize→ner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…aw* types, move compact to BaseAgent Extract max_retries from provider structs into standalone RetryConfig. Replace HttpClient type alias with ClientWithMiddleware directly. Rename entity types: RawEntity→NerEntity, RawCvEntity→CvEntity, RawOcrEntity→OcrEntity. Move compact logic from ContextWindow to BaseAgent::prompt_compact where it belongs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… RetryConfig into BaseAgentConfig Fold client construction directly into Agents::build(), eliminating the ProviderClient intermediary. Move model_name from a separate parameter into Provider variants so each provider carries its full identity. Merge max_retries into BaseAgentConfig, removing the standalone RetryConfig struct. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ivial tests Move agent/base/* files (BaseAgent, BaseAgentBuilder, BaseAgentConfig, ContextWindow, Provider) into backend/ so the agent infrastructure lives alongside usage tracking and detection types. Make the agent module private (was pub(crate)) and re-export public types through backend/. Improve module and type documentation across the crate. Remove 9 trivial tests that only verified arithmetic or getters (23 → 14 tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… delete EntityParser/vision/ontology - Add reqwest-tracing middleware and 120s timeout to HTTP client - Move base agent from backend/agent/ to agent/base/ module - Delete EntityParser from nvisy-rig, inline logic in nvisy-identify - Delete vision/ and ontology/ modules from nvisy-identify - Make all internal modules private, re-export from parent mod.rs - Remove nvisy-paddle dependency from nvisy-identify Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ema for tool args - Fix nvisy_ontology::spec → nvisy_ontology::specification in engine test - Replace hand-written json!() tool schemas with schemars::schema_for!() - Add Debug, Clone, JsonSchema derives to CvToolArgs and OcrToolArgs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…y-server - Add missing features = [] to reqwest-middleware, reqwest-retry, reqwest-tracing in workspace Cargo.toml - Remove pub use re-exports (routes, ServiceState) from nvisy-server - Update nvisy-cli to use full module paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ed offset resolution, and KnownNerEntity accumulation Move preamble into BaseAgentConfig so specialized agents set it via config. Redesign NerEntity with entity_id for coreference, optional category/entity_type/confidence, context snippet for deterministic offset resolution, and LLM-produced description. Add KnownNerEntity for lightweight cross-chunk context, NerContext with merge/set_text for accumulating surface forms and descriptions across calls, and ResolvedOffsets with type-safe resolve_offsets tied to the source NerContext. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… adapters Delete the old detection modules that duplicated logic now provided by nvisy-rig and nvisy-pattern. Replace them with thin adapter structs in a new method/ module: NerMethod (wraps NerAgent), CvMethod (wraps CvAgent), and PatternDetection (migrated as-is). Remove nvisy-python and bytes deps that were only needed by the deleted code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

martsokha and others added 2 commits February 24, 2026 14:13

martsokha self-assigned this Feb 24, 2026

martsokha added docs improvements, updates or additions to docs feat request for or implementation of a new feature labels Feb 24, 2026

martsokha and others added 21 commits February 24, 2026 20:50

refactor(rig): consolidate 7 top-level modules into 3

14f6233

Absorb small utility modules (error, retry, metrics, compact) into backend/ and rename structured/ to agent/, reducing module sprawl while keeping all public re-exports intact. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

martsokha changed the title ~~feat(identify): complete identification pipeline~~ feat(pattern, rig): complete pattern & llm-driven identification Feb 26, 2026

martsokha changed the title ~~feat(pattern, rig): complete pattern & llm-driven identification~~ feat(pattern, rig): dictionary & llm-driven identification Feb 26, 2026

martsokha merged commit e9cb484 into main Feb 26, 2026
5 checks passed

martsokha deleted the feature/identify branch February 26, 2026 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pattern, rig): dictionary & llm-driven identification#26

feat(pattern, rig): dictionary & llm-driven identification#26
martsokha merged 24 commits intomainfrom
feature/identify

martsokha commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

martsokha commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant