Skip to content

[FEATURE] Watch Mode for File-Based Knowledge Sources #26

@sajeerzeji

Description

@sajeerzeji

Is your feature request related to a problem? Please describe.

Currently, when using file-based knowledge sources (MarkdownSource, JSONSource, SQLiteSource), any changes to the source files require either:

  1. Manually calling kb.sync() to re-index
  2. Restarting the entire application

This creates a poor developer experience during development and prevents long-running agents from automatically updating their knowledge base when documentation or data files change. For example, a documentation chatbot cannot automatically pick up new articles without a restart.

Describe the solution you'd like

Implement the watch?(): AsyncIterable<ChunkUpdate> method defined in the KnowledgeSource interface for all file-based sources. This will enable automatic detection and incremental updates when source files change.

Implementation Details

1. MarkdownSource Watch Mode

File: packages/toolpack-knowledge/src/sources/markdown.ts

Implementation Steps:

  1. Add watch option to constructor - Add a watch?: boolean option to MarkdownSourceOptions interface

  2. Implement watch() method using fs.watch - Use Node.js native fs.watch to monitor directories containing matched files, detect file changes/renames, and emit appropriate ChunkUpdate events

  3. Add chunk tracking - Maintain a Map<string, Chunk[]> to track which chunks belong to which files for efficient updates

  4. Update load() to track chunks - Modify the existing load() method to populate the chunk tracking map during initial ingestion

2. JSONSource Watch Mode

File: packages/toolpack-knowledge/src/sources/json.ts

Implementation Steps:

Similar to MarkdownSource, but:

  • Watch single JSON file (not a glob pattern)
  • Parse entire file on change with 500ms debounce to handle rapid changes
  • Compare chunk IDs to determine what changed (added, updated, deleted)
  • Implement diffChunks() helper to efficiently compute differences
  • Emit appropriate ChunkUpdate events based on the diff

3. SQLiteSource Watch Mode (Poll-based)

File: packages/toolpack-knowledge/src/sources/sqlite.ts

Implementation Steps:

SQLite doesn't support native file watching, so use poll-based change detection:

  1. Add watch configuration options - watch, pollInterval (default 5000ms), changeDetection strategy, and relevant column names

  2. Implement three change detection strategies:

    • Watermark mode: Query rows where ID > last seen ID (most efficient for append-only tables)
    • Timestamp mode: Query rows where updated_at > last check time (for tables with update timestamps)
    • Scan mode: Full table scan with content hash comparison (expensive, for legacy tables without suitable columns)
  3. Poll on interval - Continuously check for changes every pollInterval milliseconds

  4. Emit ChunkUpdate events - Convert detected changes to appropriate add/update events

4. Knowledge Class Integration

File: packages/toolpack-knowledge/src/knowledge.ts

Implementation Steps:

  1. Start watchers after initial sync - Call startWatchers() at the end of Knowledge.create() after the initial sync completes

  2. Implement watcher management:

    • Maintain array of active watchers
    • For each source with watch() method, start the watcher
    • Process updates in background using processWatchUpdates()
    • Handle add/update events by embedding content and adding to provider
    • Handle delete events by removing chunks from provider
    • Trigger onSync callbacks for monitoring
    • Respect onError handler for error handling
  3. Update stop() method - Properly clean up all watchers by calling their return() method and clearing the watchers array before closing the provider

5. Testing

File: packages/toolpack-knowledge/src/__tests__/watch-mode.test.ts

Create comprehensive tests covering:

  • New file detection - Create a new markdown file and verify it gets indexed automatically
  • File modification detection - Modify an existing file and verify the chunks are updated
  • File deletion detection - Delete a file and verify its chunks are removed from the knowledge base
  • Multiple sources - Test watch mode with multiple sources simultaneously
  • Error handling - Test behavior when file operations fail
  • Cleanup - Verify watchers are properly stopped when kb.stop() is called

Acceptance Criteria

  • MarkdownSource implements watch() using fs.watch
  • JSONSource implements watch() with file watching
  • SQLiteSource implements watch() with poll-based detection
  • All three change detection strategies work for SQLite (watermark, timestamp, scan)
  • Knowledge.create() automatically starts watchers for sources that support it
  • kb.stop() properly cleans up all watchers
  • File additions are detected and indexed
  • File modifications trigger re-indexing
  • File deletions remove chunks from the knowledge base
  • Watch mode works with PersistentKnowledgeProvider
  • Comprehensive tests cover all watch scenarios
  • Documentation updated with watch mode examples
  • No memory leaks from unclosed watchers

Describe alternatives you've considered

  1. Manual sync polling: Require users to call kb.sync() on a timer - rejected because it's inefficient and requires full re-indexing
  2. External file watcher service: Use a separate process to watch files - rejected because it adds deployment complexity
  3. Chokidar library: Use chokidar instead of native fs.watch - considered for better cross-platform support, but adds dependency

Additional context

  • Watch mode is optional and disabled by default to avoid unexpected behavior
  • Native fs.watch is sufficient for most use cases; can add chokidar later if needed
  • SQLite watch mode uses polling because SQLite doesn't support change notifications
  • This feature enables "live reload" workflows during development
  • Critical for long-running agents that need to stay up-to-date with changing documentation

Dependencies:

  • No new dependencies required (uses native fs.watch)
  • Optional: chokidar for better cross-platform file watching (future enhancement)

Related Issues:

  • Implements interface defined in packages/toolpack-knowledge/src/interfaces.ts:49
  • Complements reSync: false option in PersistentKnowledgeProvider

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthigh-priorityHigh priority issuestoolpack-knowledgeIssues related to toolpack-knowledge package

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions