feat: Multi-source watcher framework for multi-tool AI log monitoring by huang-yi-dae · Pull Request #2674 · volcengine/OpenViking

huang-yi-dae · 2026-06-17T01:40:37Z

Summary

Multi-source AI tool log watcher framework: Background daemon monitors Claude Code, Aider, Cursor, Continue.dev, and generic JSONL logs via watchdog filesystem observers
SQLite DB watcher support: BasePollingWatcher extends the framework to poll-based sources (databases, APIs) using Thread + Event.wait() instead of watchdog Observer
CursorDBWatcher: Monitors Cursor IDE's dual-SQLite storage (global cursorDiskKV → bubbleId:* conversation entries)
Incremental ETL pipeline: watch → filter → reconstruct → LLM extract → deduplicate → route → viking:// storage
Watcher abstraction: BaseFileWatcher ABC + BasePollingWatcher ABC + registry-based factory pattern (@register_watcher decorator) for easy extension to new tools
Server integration: GET /api/v1/daemon/status endpoint, --with-daemon CLI flag, FastAPI lifespan-managed lifecycle
Web UI: Daemon status card on home dashboard with per-watcher metrics and auto-refresh
Configuration: DaemonConfig with WatcherConfig array, backward-compatible single watch_dir fallback, OV_DAEMON_* environment variables

Architecture

AI Tool Logs (JSONL/Markdown/JSON/SQLite)
    → File Watcher (watchdog Observer) OR DB Watcher (Thread polling)
    → Incremental read (SQLite cursor: byte offset for files, rowid for DBs)
    → Parse + Normalize events
    → Batch buffer (50 lines / 300s trigger)
    → BatchETLPipeline:
        1. LowValueFilter (regex noise removal)
        2. ConversationReconstructor (pair user/assistant)
        3. KnowledgeExtractor (VLM prompt, confidence ≥ 0.6)
        4. KnowledgeDeduplicator (MD5 hash)
    → KnowledgeRouter (viking:// URI by category)
    → VikingStorageAdapter (temp Markdown → ResourceService)

Watchers

Tool	Source	Base Class	Default Path
Claude Code	JSONL files	BaseFileWatcher	`~/.claude/projects/`
Aider	Markdown files	BaseFileWatcher	Configurable
Cursor (logs)	JSON files	BaseFileWatcher	Configurable
Cursor (DB)	SQLite `state.vscdb`	BasePollingWatcher	`%APPDATA%\Cursor\User`
Continue.dev	JSON files	BaseFileWatcher	`~/.continue/`
Generic JSONL	JSONL files	BaseFileWatcher	Configurable

BasePollingWatcher design

Uses daemon Thread + Event.wait(interval) for periodic polling (vs watchdog Observer for file watchers)
Cursor key = watch_dir (naturally distinct from file watchers' file_path keys in shared CursorManager)
Cursor advances on all raw events (including filtered) to prevent infinite re-query of filtered-out rows
BatchBuffer() constructed with no args; trigger values held as watcher instance attributes (matching BaseFileWatcher pattern)
Thread-safe: DaemonService._enqueue_batch already uses call_soon_threadsafe

Test Plan

18 test files covering all watchers, ETL pipeline, config, routing, and integration
28 new tests: 12 BasePollingWatcher + 16 CursorDBWatcher
3 cursor_db integration tests (factory creation, reconstructor compat, filter compat)
E2E verified against real state.vscdb (20 bubbleId entries → 5 kept, cursor persistence, idempotent re-poll)
All 53 daemon tests passing, 0 regressions
Manual E2E: OV_DAEMON_ENABLED=true openviking serve --with-daemon → produce conversations → verify viking:// ingestion

github-actions · 2026-06-17T01:42:37Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add watcher framework base classes and registry Relevant files: openviking/daemon/watchers/init.py openviking/daemon/watchers/base_file_watcher.py openviking/daemon/watchers/registry.py tests/daemon/test_registry.py tests/daemon/test_base_file_watcher.py Sub-PR theme: Add individual tool watchers Relevant files: openviking/daemon/watchers/claude_code_watcher.py openviking/daemon/watchers/generic_jsonl_watcher.py openviking/daemon/watchers/aider_watcher.py openviking/daemon/watchers/cursor_watcher.py openviking/daemon/watchers/continue_dev_watcher.py tests/daemon/test_claude_code_watcher.py tests/daemon/test_generic_jsonl_watcher.py tests/daemon/test_aider_watcher.py tests/daemon/test_cursor_watcher.py tests/daemon/test_continue_dev_watcher.py Sub-PR theme: Add daemon service, config, and API integration Relevant files: openviking/daemon/service.py openviking/server/app.py openviking/server/config.py openviking/server/routers/daemon.py tests/daemon/test_service_multi.py tests/daemon/test_config.py
⚡ Recommended focus areas for review Bug: LLM JSON output uses raw json.loads instead of json-repair The KnowledgeExtractor uses raw json.loads to parse LLM responses, which is fragile for malformed JSON (common with LLMs). Should use json-repair as per R4 to handle partial/invalid JSON gracefully. return json.loads(text) Missing License Header New .py files in openviking/ must include the AGPL-3.0 license header at the top (R6). This file (and other new daemon files) are missing it. """ Missing License Header New .py files in openviking/ must include the AGPL-3.0 license header at the top (R6). """

github-actions · 2026-06-17T01:44:25Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category

Suggestion

Impact

General

Add timeout and retries to LLM call

Add timeout and retry logic to the LLM API call. Use a timeout (e.g., 30 seconds)
and retries with exponential backoff for transient errors. This improves API
resilience.

openviking/daemon/knowledge_extractor.py [101-110]

+from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
+# ... in KnowledgeExtractor class ...
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, min=2, max=10),
+    retry=retry_if_exception_type(Exception),
+)
 async def _call_llm(self, prompt: str) -> Optional[Dict]:
     try:
         llm = self._get_llm_service()
         response = await llm.complete(
             prompt=prompt,
             temperature=0.3,
             max_tokens=500,
+            timeout=30,  # Add timeout
         )

Suggestion importance[1-10]: 5

__

Why: Useful improvement to add resilience for transient LLM API errors, enhancing daemon reliability without breaking existing functionality.

Low

Possible issue

Use Pydantic models for API response validation instead of raw dicts

Use Pydantic models for API response validation instead of raw dicts. This improves
type safety, documentation, and request/response validation.

1. Create Pydantic models for WatcherStatus and DaemonStatusResponse.
2. Update the
endpoint return type annotation to use the model.
3. Construct and return the model
instances instead of dicts.

openviking/server/routers/daemon.py [24-81]

+from pydantic import BaseModel, Field
+from typing import List, Optional, Dict, Any
+
+class WatcherStatus(BaseModel):
+    tool_name: str
+    watch_dir: Optional[str] = None
+    file_pattern: Optional[str] = None
+    enabled: bool
+    running: bool
+    cursor_count: int
+    batch_trigger_lines: Optional[int] = None
+    batch_trigger_seconds: Optional[int] = None
+
+class DaemonStatusResponse(BaseModel):
+    enabled: bool
+    running: bool
+    watchers: List[WatcherStatus] = Field(default_factory=list)
+    available_tools: List[str] = Field(default_factory=list)
+    db_path: Optional[str] = None
+
 @router.get("/status")
-async def get_daemon_status() -> Dict[str, Any]:
+async def get_daemon_status() -> DaemonStatusResponse:
     """
     Get multi-watcher daemon status.
-
-    Returns:
-        {
-            "enabled": bool,
-            "running": bool,
-            "watchers": [...],
-            "available_tools": [...],
-            "db_path": str | null
-        }
     """
     from openviking.daemon.watchers.registry import list_available_watchers
 
     if _daemon_service is None:
-        # Daemon not running — return config-based fallback
         from openviking.server.config import DaemonConfig
 
         config = DaemonConfig.from_env()
-        return {
-            "enabled": config.enabled,
-            "running": False,
-            "watchers": [],
-            "available_tools": list_available_watchers(),
-            "db_path": config.db_path,
-        }
+        return DaemonStatusResponse(
+            enabled=config.enabled,
+            running=False,
+            watchers=[],
+            available_tools=list_available_watchers(),
+            db_path=config.db_path,
+        )
 
     svc = _daemon_service
-    watcher_statuses: List[Dict[str, Any]] = []
+    watcher_statuses: List[WatcherStatus] = []
     for i, watcher in enumerate(svc.watchers):
         wc = svc._watcher_configs[i] if i < len(svc._watcher_configs) else None
         cursor_count = 0
         try:
             if svc.cursor_manager:
                 cursor_count = len(svc.cursor_manager.get_all_cursors())
         except Exception:
             pass
 
-        watcher_statuses.append({
-            "tool_name": watcher.tool_name,
-            "watch_dir": wc.watch_dir if wc else None,
-            "file_pattern": wc.file_pattern if wc else None,
-            "enabled": True,
-            "running": True,
-            "cursor_count": cursor_count,
-            "batch_trigger_lines": wc.batch_trigger_lines if wc else None,
-            "batch_trigger_seconds": wc.batch_trigger_seconds if wc else None,
-        })
+        watcher_statuses.append(WatcherStatus(
+            tool_name=watcher.tool_name,
+            watch_dir=wc.watch_dir if wc else None,
+            file_pattern=wc.file_pattern if wc else None,
+            enabled=True,
+            running=True,
+            cursor_count=cursor_count,
+            batch_trigger_lines=wc.batch_trigger_lines if wc else None,
+            batch_trigger_seconds=wc.batch_trigger_seconds if wc else None,
+        ))
 
-    return {
-        "enabled": True,
-        "running": svc.is_running,
-        "watchers": watcher_statuses,
-        "available_tools": list_available_watchers(),
-        "db_path": svc.db_path,
-    }
+    return DaemonStatusResponse(
+        enabled=True,
+        running=svc.is_running,
+        watchers=watcher_statuses,
+        available_tools=list_available_watchers(),
+        db_path=svc.db_path,
+    )

Suggestion importance[1-10]: 5

__

Why: This is a valid code quality improvement. Using Pydantic models enhances type safety, API documentation (via OpenAPI), and response validation. However, it's not a critical bug fix or security change, so it receives a moderate score.

Low

MaojiaSheng · 2026-06-17T12:59:34Z

@@ -0,0 +1,61 @@
+# OpenViking Active Daemon


This file can be placed in docs/zh/agent-integrations

MaojiaSheng · 2026-06-17T13:00:07Z

Thanks for contribution! We'll review this soon

t0saki · 2026-06-17T14:08:44Z

First off—thank you for this PR. The overall direction is genuinely solid.

Using a single daemon to watch multiple coding tools, normalize their outputs into a unified event schema, and feed them into a shared ETL → viking:// pipeline is exactly the right abstraction. The BaseFileWatcher + registry split is clean, and the boundary where "each watcher normalizes while the pipeline remains tool-agnostic" is highly valuable and worth preserving. I'd love to see this land.

Before merging, however, I checked out the branch and tested it against the real on-disk data of the tools I currently have installed (Claude Code, Cursor, Codex, OpenCode), and traced the call paths against the OpenViking codebase. I'm sharing my findings because none of these issues are caught by the green test suite. The fixtures seem hand-authored to match the parsers, meaning the suite confirms internal consistency but not compatibility with real tools. Everything outlined below is reproducible.

Blockers (Verified against the OpenViking codebase, tool-independent)

1. The LLM call path doesn't match any real interface — extraction silently no-ops.
knowledge_extractor.py calls get_service().llm, then await llm.complete(prompt=, temperature=, max_tokens=), and reads response.text.
However, OpenVikingService has no .llm attribute, there is no complete(...) method anywhere in openviking/models/ (the real interface is VLMBase.get_completion_async(prompt=...)), and VLMResponse exposes .content, not .text.
At runtime, this raises an AttributeError that gets swallowed by the except Exception block in extract(). As a result, every turn returns None, and zero knowledge is actually extracted.

2. RequestContext is instantiated with non-existent fields — writes always fail.
service.py initializes RequestContext(user_id="daemon", session_id="daemon-session"), but RequestContext (in server/identity.py) strictly requires user: UserIdentifier and role: Role.
This raises a TypeError, which is also caught by the surrounding except block. Consequently, even if extraction succeeded, nothing would ever be written to storage.

Watcher Format Mismatches (Verified empirically on a real machine)

The normalization layer assumes a top-level, flat JSONL structure: {"role": ..., "content": ..., "type": "message"}. Real-world tools do not output this shape.

Claude Code (the core watcher) extracts 0 events from real logs.
Real Claude Code JSONL logs nest role and content inside a message object, with the top-level type set to "user" or "assistant" (it is never "message"). Because the watcher looks for top-level role/content and strictly requires type == "message", it skips every single line.
I ran the exact normalize_event logic against my own ~/.claude/projects: Out of 3,942 user/assistant messages across 25 sessions, exactly 0 had a top-level role (all 3,942 nested it under message). The currently running session yielded 0/155. This is a deterministic format bug, not an issue between completed vs. running sessions: every session uses the same nested schema.

Cursor stores chats in SQLite, not in .log files.
In a real installation, the *.log files are just standard VS Code text logs (e.g., 2026-... [info] ...). The actual conversations reside in User/globalStorage/state.vscdb under composer.* keys. A watcher that relies on *.log + per-line json.loads matches nothing, and a file-append watcher is fundamentally incapable of reading SQLite databases.

Codex and OpenCode are not covered and don't fit the file-append model anyway.
Codex rollouts (~/.codex/sessions/.../rollout-*.jsonl) nest role and content inside a payload object (with a top-level type of session_meta, response_item, or event_msg). Recent versions of OpenCode store everything in an SQLite database (opencode.db, in tables message, part, and session). Neither tool has a working watcher here.

Aider and Continue.dev: Unverified locally.
I don't have these installed locally, so I can't definitively say they are broken. However, aider_watcher relies on literal #### user: / #### assistant: separator lines, and continue_dev_watcher uses per-line json.loads. Both approaches seem misaligned with those tools' documented formats. This is definitely worth double-checking against a real capture.

Other Real Issues

Time-based flush never fires. BatchBuffer.created_at defaults to 0.0 and is never explicitly assigned (and clear() also resets it to 0.0). Thus, age is always 0, rendering batch_trigger_seconds useless. Only the 50-line volume trigger works, meaning low-volume sessions will never flush.

Cross-thread queue operations. _enqueue_batch runs on the watchdog thread but calls asyncio.Queue.put_nowait directly. This should be routed through loop.call_soon_threadsafe. The 5-second polling in _etl_loop mostly masks this issue, but touching the asyncio loop from a foreign thread remains undefined behavior.

Incremental reader flaws. Using text-mode f.seek(byte_offset) and then advancing the cursor via len(content.encode("utf-8")) will cause cursor drift on CRLF or non-ASCII characters. Furthermore, file_size <= cursor returns indefinitely upon file truncation/rotation without resetting the cursor.

Code conventions. The newly added openviking/daemon/**.py files are missing the AGPL/SPDX headers present in every other module. Also, knowledge_extractor uses raw json.loads on LLM outputs, whereas the repo standardizes on json_repair.

The Primary Ask Before Merging

Please verify each introduced feature end-to-end against a real run of its respective tool, rather than relying solely on synthetic fixtures, and confirm it successfully writes actual output to viking://. Concretely: start the daemon, use the target tool organically (run a real Claude Code / Cursor / Aider / Continue.dev / Codex / OpenCode session), and demonstrate that at least one knowledge item lands in storage for each watcher you claim to support.

Currently, the test suite is green, yet the end-to-end pipeline yields zero output on a real machine. Instituting a "real-run check per feature" as a merging gate would have caught all the issues listed above.

Suggested Path Forward

These fixes are mostly minor, and the underlying architecture is robust enough to handle them easily:

Fix the two blockers (LLM interface + RequestContext). This is a quick fix and will immediately unblock the entire pipeline.
Update normalization schemas. Make the normalization layer parse each tool's real schema (e.g., extracting role/content from message for Claude Code). Replace the synthetic fixtures with real, captured log samples so the tests actually enforce real-world formats.
Scope the PR. Consider scoping this initial PR to just claude_code + generic_jsonl done flawlessly. Cursor and OpenCode can be added later in a separate PR as a distinct SQLite-polling watcher mode (which the design doc already anticipates), rather than trying to shoehorn them into the file-append model.

One quick housekeeping note: This branch currently includes the entire v1 daemon (meaning it overlaps with #2629) alongside some unrelated session/memory timestamp changes. Splitting these out would significantly streamline the review process.

I am more than happy to help verify any of these changes or pair-program on the format fixes. The core idea is fantastic, and I really want to see it merged once the pipeline produces reliable end-to-end output on real machines.

首先，感谢提交这个 PR，整体方向非常棒。

用单个 daemon 监听多种编码工具、将其输出归一化为统一的事件结构，再喂给共享的 ETL → viking:// 管道，这正是我们需要的抽象。BaseFileWatcher + registry 的拆分很干净，“每个 watcher 各自归一化，管道保持工具无关”的设计边界非常有价值，无论如何都应该保留。我非常期待能合入这个特性。

不过在合入前，我拉取了分支，并使用我本机实际安装工具（Claude Code、Cursor、Codex、OpenCode）的真实落盘数据进行了测试，同时对照 OpenViking 代码库梳理了调用链路。我想分享一下我的发现，因为全绿的测试用例完全掩盖了这些问题：测试用的 fixture 似乎是针对 parser 手写的，这导致测试只能证明代码内部自洽，无法保证与真实工具的兼容性。以下所有问题均可复现。

Blocker（对照 OpenViking 代码库核实，与具体工具无关）

1. LLM 调用链路不匹配任何真实接口 —— 知识提取静默空转。
knowledge_extractor.py 的逻辑是先获取 get_service().llm，然后调用 await llm.complete(prompt=, temperature=, max_tokens=)，最后读取 response.text。
但是，OpenVikingService 并没有 .llm 属性，openviking/models/ 目录下也不存在任何 complete(...) 方法（真实的接口是 VLMBase.get_completion_async(prompt=...)），而且 VLMResponse 暴露的属性是 .content 而非 .text。
这在运行时会抛出 AttributeError，但被 extract() 里的 except Exception 吞掉了。结果就是每一轮对话都返回 None，根本提取不出任何知识。

2. 构造 RequestContext 使用了不存在的字段 —— 写入必然失败。
service.py 中调用了 RequestContext(user_id="daemon", session_id="daemon-session")，但 RequestContext（位于 server/identity.py）严格要求传入 user: UserIdentifier 和 role: Role 字段。
这会触发 TypeError，同样被外层的 except 捕获。因此，即便上一步提取成功，数据也永远无法写入存储。

Watcher 格式不匹配（在真实机器上实测验证）

归一化层假设日志采用顶层扁平的 JSONL 结构：{"role": ..., "content": ..., "type": "message"}。但真实工具的输出并非如此。

Claude Code（核心 watcher）从真实日志中提取到了 0 条事件。
真实的 Claude Code JSONL 日志将 role 和 content 嵌套在 message 对象中，且顶层 type 为 "user" 或 "assistant"（绝不会是 "message"）。由于 watcher 只读取顶层的 role/content 并强校验 type == "message"，导致每一行都被跳过。
我用当前的 normalize_event 逻辑跑了一下我自己的 ~/.claude/projects 目录：25 个会话中共计 3942 条 user/assistant 消息，没有一条包含顶层 role（全嵌在 message 里）；当前正在运行的会话提取结果也是 0/155。这是一个确定性的格式解析 Bug，与“会话已结束/运行中”无关，因为所有会话采用的都是同一套嵌套 Schema。

Cursor 把对话存在 SQLite 中，而不是 .log 文件里。
在真实的安装环境下，*.log 只是普通的 VS Code 文本日志（如 2026-... [info] ...），真正的对话数据存储在 User/globalStorage/state.vscdb 数据库的 composer.* 键值中。监听 *.log 并逐行 json.loads 的 watcher 什么都抓不到，且 file-append 模式的 watcher 根本无法读取 SQLite。

Codex 和 OpenCode 未被覆盖，且同样不适用 file-append 模型。
Codex 的 rollout 日志（~/.codex/sessions/.../rollout-*.jsonl）将 role 和 content 嵌套在 payload 对象中（顶层 type 为 session_meta、response_item 或 event_msg）。新版 OpenCode 则将所有内容直接存入 SQLite（opencode.db 的 message、part、session 表）。这两种工具在此 PR 中均无有效支持。

Aider 和 Continue.dev：本机未验证。
因为没安装，我无法断言它们一定不能用。但是 aider_watcher 依赖硬编码的 #### user: / #### assistant: 分隔行进行匹配，而 continue_dev_watcher 采用逐行 json.loads，这两者与官方文档描述的格式似乎都有出入。非常建议用真实的日志样本核对一下。

其他实际问题

基于时间的 flush 永不触发。 BatchBuffer.created_at 默认值为 0.0 且从未被赋值（clear() 操作甚至会将其重置为 0.0）。因此 age 永远为 0，batch_trigger_seconds 形同虚设。目前只有 50 行的容量触发器在生效，导致低频对话永远无法被 flush。

跨线程队列操作。 _enqueue_batch 运行在 watchdog 线程中，却直接调用了 asyncio.Queue.put_nowait。这里应该走 loop.call_soon_threadsafe。虽然 _etl_loop 里 5 秒一次的轮询在很大程度上掩盖了这个问题，但从外部线程直接操作事件循环依然属于未定义行为（Undefined Behavior）。

增量读取缺陷。 在文本模式下使用 f.seek(byte_offset) 并按 len(content.encode("utf-8")) 推进游标，在遇到 CRLF 或非 ASCII 字符时会导致游标漂移。此外，当文件被截断或轮转时，file_size <= cursor 会导致无限 return 且无法重置游标。

代码规范。 新增的 openviking/daemon/**.py 文件缺少了仓库其他模块统一带有的 AGPL/SPDX 协议头。另外，knowledge_extractor 对 LLM 输出直接使用了原生的 json.loads，而仓库的标准做法是统一使用 json_repair。

合入前我唯一的请求

请针对每个引入的特性，基于工具的真实运行环境（而非合成的 fixture）进行端到端验证，并确认它能产生实际的产物写入 viking://。具体来说：启动 daemon，真实地使用这些工具（完整跑一段 Claude Code / Cursor / Aider / Continue.dev / Codex / OpenCode 会话），然后证明你声称支持的每个 watcher 都至少有一条知识被成功存入。

目前的现状是：测试全绿，但在真机上的端到端管道产出为零。因此，将“每个功能都需经过真实运行验证”作为准入条件，能有效拦截上述所有问题。

建议的推进路径

这些修复项大多工作量不大，当前的架构也完全撑得起这些调整：

先修复两个 Blocker（LLM 接口和 RequestContext）。这部分改动很小，且能立刻打通整条管道。
更新归一化 Schema。 让归一化层按照每个工具真实的 Schema 进行读取（例如从 Claude Code 的 message 字段取 role/content）。将合成的 fixture 替换为真实的日志采样，让测试用例真正对齐现实世界的数据格式。
控制 PR 范围。 建议第一个 PR 缩小范围，先把 claude_code + generic_jsonl 做到无懈可击。Cursor 和 OpenCode 可以在后续 PR 中作为独立的 SQLite 轮询模式 watcher 引入（设计文档中已有预留），而不是强行塞进 file-append 模型中。

最后提一个代码整理上的建议：该分支目前包含了完整的 v1 daemon 代码（导致与 #2629 产生重叠），并且混入了一些与此无关的 session/memory 时间戳改动。如果能将这些拆分出去，Code Review 会轻松很多。

我很乐意帮忙验证上述任何修改，或者结对修复格式问题。这个设计的核心理念非常扎实，只要管道能在真机上跑通端到端流程，我非常期待看到它被合入主分支。

- Background daemon monitors AI coding tool logs (Claude Code, Aider, Cursor, Continue.dev, generic JSONL) via watchdog filesystem observers - Incremental ETL pipeline: watch → filter → reconstruct → LLM extract → deduplicate → route → viking:// storage - Watcher abstraction with BaseFileWatcher ABC and registry-based factory - SQLite cursor persistence for incremental file reads across restarts - Server integration: GET /api/v1/daemon/status, --with-daemon CLI flag - Web UI: daemon status card on home dashboard with per-watcher metrics - Full test suite (18 test files) and documentation

Root cause: BatchBuffer.created_at was never set in add_line(), so the time-based trigger never fired and events stayed buffered indefinitely. Additional fixes: - Thread-safe enqueue from watchdog thread via loop.call_soon_threadsafe() - _flush_buffer() now calls callback before clearing buffer (prevents data loss) - KnowledgeRouter uses valid viking://resources/ scope (skills/memories were invalid) - Non-ASCII titles sanitized via sha256 hash to produce valid URI paths - VLM extraction concurrency limited to 2 via semaphore - ClaudeCodeWatcher rewrite: handle nested message.content (text blocks, tool_use) - Project name derived from file path via _post_normalize hook - All exception handlers now include exc_info=True for stack traces - 148 tests passing

…ased AI tool monitoring Extend daemon watcher framework to support SQLite database sources alongside existing JSONL file watchers. BasePollingWatcher uses Thread + Event.wait() polling instead of watchdog Observer, with cursor advancement on all raw events (including filtered) to avoid infinite re-query loops. CursorDBWatcher monitors Cursor IDE's dual-SQLite storage: - Global DB (cursorDiskKV) for bubbleId:* conversation entries - type 1=user, 2=assistant; filters empty text (tool calls/streaming) Includes 28 new tests (12 polling base + 16 cursor_db) and 3 integration tests.

- Mark Phase 1-2 as complete (BasePollingWatcher + CursorDBWatcher) - Add section 4.1: real data E2E validation results against state.vscdb - Add section 4.2: cursor advancement bug fix documentation - Document _discover_composer_ids real data finding (cursorDiskKV, not ItemTable) - Update task checklist and file change list with status

github-project-automation Bot added this to OpenViking project Jun 17, 2026

github-project-automation Bot moved this to Backlog in OpenViking project Jun 17, 2026

github-actions Bot added the Review effort 4/5 label Jun 17, 2026

MaojiaSheng reviewed Jun 17, 2026

View reviewed changes

MaojiaSheng mentioned this pull request Jun 17, 2026

feat: OpenViking Active Daemon - Automatic Knowledge Extraction from Claude Code Logs #2629

Closed

huang-yi-dae force-pushed the feature/multi-watcher branch from bfd54ed to 54fe519 Compare June 17, 2026 15:06

huang-yi-dae force-pushed the feature/multi-watcher branch from 54fe519 to 52b0a34 Compare June 17, 2026 15:27

huang-yi-dae added 3 commits June 19, 2026 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Multi-source watcher framework for multi-tool AI log monitoring#2674

feat: Multi-source watcher framework for multi-tool AI log monitoring#2674
huang-yi-dae wants to merge 4 commits into
volcengine:mainfrom
huang-yi-dae:feature/multi-watcher

huang-yi-dae commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

MaojiaSheng Jun 17, 2026

Uh oh!

MaojiaSheng commented Jun 17, 2026

Uh oh!

t0saki commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huang-yi-dae commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Watchers

BasePollingWatcher design

Test Plan

Uh oh!

github-actions Bot commented Jun 17, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Jun 17, 2026

PR Code Suggestions ✨

Uh oh!

MaojiaSheng Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

MaojiaSheng commented Jun 17, 2026

Uh oh!

t0saki commented Jun 17, 2026

Blockers (Verified against the OpenViking codebase, tool-independent)

Watcher Format Mismatches (Verified empirically on a real machine)

Other Real Issues

The Primary Ask Before Merging

Suggested Path Forward

Blocker（对照 OpenViking 代码库核实，与具体工具无关）

Watcher 格式不匹配（在真实机器上实测验证）

其他实际问题

合入前我唯一的请求

建议的推进路径

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huang-yi-dae commented Jun 17, 2026 •

edited

Loading