feat: OpenViking Active Daemon - Automatic Knowledge Extraction from Claude Code Logs by huang-yi-dae · Pull Request #2629 · volcengine/OpenViking

huang-yi-dae · 2026-06-15T14:07:54Z

Summary

This PR implements the OpenViking Active Daemon, a background service that automatically monitors Claude Code JSONL logs and extracts valuable knowledge into viking:// storage.

Features

Core Functionality

File Watching: Monitors ~/.claude/projects/ for Claude Code session changes using watchdog
Incremental Processing: File cursor tracking ensures only new content is processed
Batch ETL Pipeline: Filters noise, reconstructs conversations, extracts knowledge via LLM, deduplicates
Smart Routing: Routes extracted knowledge to appropriate viking:// paths (skills/memories/resources)
ResourceService Integration: Writes through OpenViking's standard resource ingestion pipeline

Three Ways to Enable

ov.conf configuration (recommended):

{
  "server": {
    "daemon": {
      "enabled": true,
      "watch_dir": "~/.claude/projects",
      "batch_trigger_lines": 50,
      "batch_trigger_seconds": 300
    }
  }
}

CLI argument:
```
openviking serve --with-daemon
```

Environment variables:

export OV_DAEMON_ENABLED=true
openviking serve

Web Studio Integration

Daemon Status Card on Home dashboard (/studio/home)
Real-time status display (running/enabled/disabled)
Shows watch directory, batch settings, tracked file count, last flush time
Auto-refreshes every 30 seconds

API Endpoint

GET /api/v1/daemon/status returns current daemon state

Architecture

Claude Code JSONL → watchdog → Batch Buffer → LowValueFilter → ConversationReconstructor
    → LLM Extractor → MD5 Deduplicator → KnowledgeRouter → VikingStorageAdapter → viking://

Files Changed

New Files (12)

openviking/daemon/models.py - Data models (FileCursor, BatchBuffer, ConversationTurn, ExtractedKnowledge)
openviking/daemon/cursor_manager.py - SQLite-based cursor persistence
openviking/daemon/deduplicator.py - MD5 content deduplication
openviking/daemon/watchers/claude_code_watcher.py - Watchdog-based file monitoring
openviking/daemon/filters.py - Rule-based low-value conversation filtering
openviking/daemon/conversation_reconstructor.py - User/assistant pairing logic
openviking/daemon/knowledge_extractor.py - LLM-based knowledge extraction
openviking/daemon/etl_pipeline.py - Batch ETL orchestration
openviking/daemon/knowledge_router.py - viking:// URI routing
openviking/daemon/storage_adapter.py - ResourceService write adapter
openviking/daemon/service.py - Main DaemonService with lifecycle management
tests/daemon/* (8 test files) - 45 unit/integration tests

Modified Files (5)

openviking/server/config.py - Added DaemonConfig nested in ServerConfig
openviking/server/bootstrap.py - Added --with-daemon CLI argument
openviking/server/app.py - Integrated daemon lifecycle in FastAPI lifespan
web-studio/src/routes/home/route.tsx - Added DaemonStatusCard component
web-studio/src/i18n/locales/* - Added daemon translations (en/zh-CN)

Testing

✅ 45 tests pass (unit + integration)
✅ TypeScript compilation verified
✅ API endpoint registration confirmed

Usage Example

Start server with daemon enabled:

export OV_DAEMON_ENABLED=true
openviking serve

Visit http://127.0.0.1:1933/studio/home to see the daemon status card.

Commits: 22 commits on feature/active-daemon branch
Test Coverage: 45 tests, all passing

The vector DB layer already stores timestamps for all memories, but the Markdown file layer (MEMORY_FIELDS metadata) never included them. This meant that reading a memory file directly gave no indication of when it was created or last updated. Changes: - memory_updater.py: _apply_upsert now injects created_at (preserved on updates, set to now for new files) and updated_at (always refreshed) into MEMORY_FIELDS after merging schema fields. - memory_type_registry.py: initialize_memory_files now includes created_at/updated_at when creating initial identity.md, soul.md, etc. - Added TestUpsertTimestamps with two regression tests covering new-file and update scenarios. The serialization/deserialization infrastructure in memory_file_utils.py already supported these fields — the gap was only in the write path. 🤖 Generated with [Qoder][https://qoder.com]

- Add i18n translations for daemon status in English and Chinese - Create DaemonStatusCard component with real-time status polling - Integrate daemon status card into home page layout - Card displays enabled/running status, watch directory, batch settings, cursor count, and last flush time - Auto-refreshes every 30 seconds with graceful error handling

…tialization

github-actions · 2026-06-15T14:09:29Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 80
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add created_at/updated_at timestamps to memory files Relevant files: openviking/session/memory/memory_type_registry.py openviking/session/memory/memory_updater.py tests/session/memory/test_memory_updater.py Sub-PR theme: Add daemon status card to web studio Relevant files: web-studio/src/i18n/locales/en.ts web-studio/src/i18n/locales/zh-CN.ts web-studio/src/routes/home/-components/daemon-status-card.tsx web-studio/src/routes/home/route.tsx Sub-PR theme: Add daemon documentation and examples Relevant files: docs/daemon/README.md docs/daemon/configuration.md examples/daemon/start-daemon.sh examples/daemon/docker-compose-daemon.yml examples/daemon/start-daemon.bat
⚡ Recommended focus areas for review Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ JSON Parsing Without Repair Uses raw json.loads on LLM output without json-repair, which could fail on malformed JSON (e.g., trailing commas, extra text). return json.loads(text) Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ Missing License Header New daemon file missing required copyright and license header. """ Unsafe Integer Parsing in DaemonConfig Converts OV_DAEMON_BATCH_LINES and OV_DAEMON_BATCH_SECONDS to int directly without error handling; invalid values will raise ValueError. batch_trigger_lines=int(os.getenv("OV_DAEMON_BATCH_LINES", "50")), batch_trigger_seconds=int(os.getenv("OV_DAEMON_BATCH_SECONDS", "300")), Placeholder Endpoint /api/v1/daemon/status returns static placeholder data instead of actual daemon state. # This is a placeholder — actual implementation needs access to DaemonService instance # For now, return static config info from openviking.server.config import DaemonConfig daemon_config = DaemonConfig.from_env() return { "enabled": daemon_config.enabled, "running": False, # Would need to track actual state "watch_dir": daemon_config.watch_dir, "db_path": daemon_config.db_path, "batch_trigger_lines": daemon_config.batch_trigger_lines, "batch_trigger_seconds": daemon_config.batch_trigger_seconds, "cursor_count": 0, "last_flush_time": None, }

github-actions · 2026-06-15T14:11:51Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category

Suggestion

Impact

Possible issue

Add timeout and retry logic for LLM calls

Add timeout and retry logic for LLM calls to improve API resilience.

openviking/daemon/knowledge_extractor.py [100-108]

+from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
+
+class LLMCallError(Exception):
+    pass
+
 async def _call_llm(self, prompt: str) -> Optional[Dict]:
-    """Call LLM and parse JSON response."""
+    """Call LLM and parse JSON response with retries and timeout."""
+    @retry(
+        stop=stop_after_attempt(3),
+        wait=wait_exponential(multiplier=1, min=2, max=10),
+        retry=retry_if_exception_type((LLMCallError, TimeoutError)),
+    )
+    async def _attempt_call():
+        llm = self._get_llm_service()
+        try:
+            response = await asyncio.wait_for(
+                llm.complete(
+                    prompt=prompt,
+                    temperature=0.3,
+                    max_tokens=500,
+                ),
+                timeout=30,  # 30 seconds timeout
+            )
+            return response
+        except asyncio.TimeoutError:
+            raise TimeoutError("LLM call timed out")
+        except Exception as e:
+            raise LLMCallError(f"LLM call failed: {e}")
+
     try:
-        llm = self._get_llm_service()
-        response = await llm.complete(
-            prompt=prompt,
-            temperature=0.3,
-            max_tokens=500,
-        )
+        response = await _attempt_call()
+        text = response.text.strip()
+        # Strip markdown code block markers if present
+        text = re.sub(r"```json\s*", "", text)
+        text = re.sub(r"\s*```", "", text)
+        return json.loads(text)
+    except json.JSONDecodeError as e:
+        logger.warning("JSON parse error in LLM response: %s", e)
+        return None
+    except Exception as e:
+        logger.error("LLM call failed after retries: %s", e)
+        return None

Suggestion importance[1-10]: 6

__

Why: This improves resilience against LLM API failures and timeouts. The implementation uses reasonable retry and timeout logic, which is a practical enhancement for the daemon's reliability.

Low

General

Use Pydantic model for daemon status API response

Use a Pydantic model for the daemon status API response to ensure validation and
consistent schema.

openviking/server/routers/daemon.py [11-43]

+from pydantic import BaseModel, Field
+from typing import Optional
+
+class DaemonStatusResponse(BaseModel):
+    enabled: bool = Field(..., description="Whether the daemon is enabled")
+    running: bool = Field(..., description="Whether the daemon is currently running")
+    watch_dir: Optional[str] = Field(None, description="Directory being watched for logs")
+    db_path: Optional[str] = Field(None, description="Path to cursor database")
+    batch_trigger_lines: int = Field(..., description="Number of lines to trigger batch processing")
+    batch_trigger_seconds: int = Field(..., description="Seconds to trigger batch processing")
+    cursor_count: int = Field(..., description="Number of tracked files")
+    last_flush_time: Optional[str] = Field(None, description="Last time batch was flushed")
+
 @router.get("/status")
-async def get_daemon_status() -> Dict[str, Any]:
+async def get_daemon_status() -> DaemonStatusResponse:
     """
     Get the current status of the Active Daemon.
-
-    Returns:
-        {
-            "enabled": bool,
-            "running": bool,
-            "watch_dir": str | null,
-            "db_path": str | null,
-            "batch_trigger_lines": int,
-            "batch_trigger_seconds": int,
-            "cursor_count": int,
-            "last_flush_time": str | null,
-        }
     """
     # This is a placeholder — actual implementation needs access to DaemonService instance
-    # For now, return static config info
     from openviking.server.config import DaemonConfig
     
     daemon_config = DaemonConfig.from_env()
     
-    return {
-        "enabled": daemon_config.enabled,
-        "running": False,  # Would need to track actual state
-        "watch_dir": daemon_config.watch_dir,
-        "db_path": daemon_config.db_path,
-        "batch_trigger_lines": daemon_config.batch_trigger_lines,
-        "batch_trigger_seconds": daemon_config.batch_trigger_seconds,
-        "cursor_count": 0,
-        "last_flush_time": None,
-    }
+    return DaemonStatusResponse(
+        enabled=daemon_config.enabled,
+        running=False,  # Would need to track actual state
+        watch_dir=daemon_config.watch_dir,
+        db_path=daemon_config.db_path,
+        batch_trigger_lines=daemon_config.batch_trigger_lines,
+        batch_trigger_seconds=daemon_config.batch_trigger_seconds,
+        cursor_count=0,
+        last_flush_time=None,
+    )

Suggestion importance[1-10]: 5

__

Why: Using a Pydantic model ensures consistent response schema, adds validation, and improves API documentation. This is a good practice for FastAPI endpoints.

Low

MaojiaSheng · 2026-06-17T13:06:12Z

duplicated with #2674

huang-yi-dae added 22 commits June 11, 2026 21:47

feat: add daemon data models

264556c

feat: add cursor manager with SQLite persistence

d1f5f3d

feat: add knowledge deduplicator with MD5 hashing

25c930f

feat: add Claude Code JSONL watcher with watchdog

195990a

feat: add low-value conversation filter with regex patterns

fc84495

feat: add conversation reconstructor from flat events

f07c29c

feat: add LLM-based knowledge extractor with prompt template

32982fb

feat: add batch ETL pipeline with filtering and deduplication

9cb2bb3

feat: add knowledge router with viking:// URI mapping

541dbe9

feat: add Viking storage adapter with ResourceService integration

97efb97

feat: add DaemonService with lifecycle management

4d2b8f6

feat: add DaemonConfig with environment variable support

12596ce

feat: integrate Active Daemon into server bootstrap

6ec1c44

test: add integration tests for Active Daemon

153947d

docs: add Active Daemon usage and configuration guides

4c5500f

examples: add daemon startup scripts and docker-compose

dfe41ad

feat: add --with-daemon CLI argument

7689f36

feat: add daemon status API endpoint

58f27bc

fix: move daemon lifecycle to FastAPI lifespan for proper service ini…

4a8b767

…tialization

fix: use correct attribute name service.resources (plural)

2bf1f80

github-project-automation Bot added this to OpenViking project Jun 15, 2026

github-project-automation Bot moved this to Backlog in OpenViking project Jun 15, 2026

MaojiaSheng closed this Jun 17, 2026

github-project-automation Bot moved this from Backlog to Done in OpenViking project Jun 17, 2026

t0saki mentioned this pull request Jun 17, 2026

feat: Multi-source watcher framework for multi-tool AI log monitoring #2674

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenViking Active Daemon - Automatic Knowledge Extraction from Claude Code Logs#2629

feat: OpenViking Active Daemon - Automatic Knowledge Extraction from Claude Code Logs#2629
huang-yi-dae wants to merge 22 commits into
volcengine:mainfrom
huang-yi-dae:feature/active-daemon

huang-yi-dae commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

MaojiaSheng commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants