DOCSight Architecture

Documentation current as of v2026-03-07.1

This document describes the technical architecture of DOCSight.

Overview

DOCSight is built around a modular collector pattern that separates data collection, analysis, storage, and presentation into independent, testable components.

System Architecture

┌────────────────────────────────────────────────────────────────────┐
│                         Main Process (main.py)                      │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │              Collector Discovery & Registry                   │  │
│  │                                                               │  │
│  │  discover_collectors() →  ┌─────────────┐                    │  │
│  │                           │ Config Check │                    │  │
│  │                           └──────┬───────┘                    │  │
│  │                                  │                            │  │
│  │      ┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐│  │
│  │      │          │          │          │          │          │          ││  │
│  │      ▼          ▼          ▼          ▼          ▼          ▼          ││  │
│  │ ┌──────────┐┌──────────┐┌──────────┐┌──────────┐┌──────────┐┌──────────┐│  │
│  │ │  Modem   ││   Demo   ││ Speedtest││   BQM    ││  BNetzA  ││  Backup  ││  │
│  │ │ Collect. ││ Collect. ││ Collect. ││ Collect. ││ Watcher  ││ Collect. ││  │
│  │ │          ││          ││          ││          ││          ││          ││  │
│  │ │Poll:900s ││Poll:900s ││Poll: 300s││Poll: 24h ││Poll: 300s││Poll: 24h ││  │
│  │ └────┬─────┘└────┬─────┘└────┬─────┘└────┬─────┘└────┬─────┘└────┬─────┘│  │
│  │      │           │           │           │           │           │      │  │
│  │      └───────────┴───────────┼───────────┴───────────┴───────────┘      │  │
│  │                                │                              │  │
│  └────────────────────────────────┼──────────────────────────────┘  │
│                                   │                                 │
│                                   ▼                                 │
│                      ┌─────────────────────────┐                    │
│                      │  Polling Loop (1s tick) │                    │
│                      │                         │                    │
│                      │  ThreadPoolExecutor:    │                    │
│                      │    submit all due       │                    │
│                      │    collectors in        │                    │
│                      │    parallel threads     │                    │
│                      └────────────┬────────────┘                    │
│                                   │                                 │
└───────────────────────────────────┼─────────────────────────────────┘
                                    │
         ┌──────────────────────────┴──────────────────────────┐
         │                                                      │
         ▼                                                      ▼
  ┌─────────────┐                                      ┌──────────────┐
  │   Analyzer  │                                      │    Event     │
  │             │                                      │   Detector   │
  │ DOCSIS data │                                      │              │
  │  → health   │                                      │  Anomaly     │
  │  assessment │                                      │  detection   │
  └──────┬──────┘                                      └──────┬───────┘
         │                                                    │
         └────────────────────┬───────────────────────────────┘
                              │
                              ▼
                  ┌────────────────────────┐
                  │  SQLite Storage        │
                  │                        │
                  │  • Snapshots           │
                  │  • Trends              │
                  │  • Events              │
                  │  • Speedtest cache     │
                  │  • Incident journal    │
                  └───────────┬────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
         ▼                    ▼                    ▼
  ┌───────────┐      ┌──────────────┐    ┌───────────────┐
  │   MQTT    │      │  Flask Web   │    │ PDF Reports   │
  │ Publisher │      │  UI + API    │    │  (fpdf2)      │
  │           │      │              │    │               │
  │ Home      │      │ 11 views     │    │ Complaint     │
  │ Assistant │      │ + REST API   │    │ letters       │
  └───────────┘      └──────────────┘    └───────────────┘

Collector Pattern

Base Collector Class

All data collectors inherit from app/collectors/base.py:

class Collector(ABC):
    """Abstract base for all data collectors."""
    
    @property
    @abstractmethod
    def name(self) -> str:
        """Unique identifier."""
    
    @abstractmethod
    def collect(self) -> CollectorResult:
        """Fetch and return data."""
    
    def should_poll(self) -> bool:
        """True if enough time elapsed."""
    
    def record_success(self):
        """Reset penalty counter."""
    
    def record_failure(self):
        """Increment penalty counter."""

Collector Lifecycle

┌──────────────────────────────────────────────────────────────┐
│                    Collector Lifecycle                        │
│                                                               │
│  START                                                        │
│    │                                                          │
│    ▼                                                          │
│  ┌─────────────────────┐                                     │
│  │  should_poll()?     │ ──No──┐                             │
│  │  (time + penalty)   │       │                             │
│  └───────┬─────────────┘       │                             │
│          │ Yes                 │                             │
│          ▼                     │                             │
│  ┌─────────────────────┐       │                             │
│  │    collect()        │       │                             │
│  │  (fetch data)       │       │                             │
│  └───────┬─────────────┘       │                             │
│          │                     │                             │
│          ▼                     │                             │
│     Success?                   │                             │
│      /    \                    │                             │
│    Yes    No                   │                             │
│     │      │                   │                             │
│     │      ▼                   │                             │
│     │  ┌──────────────────┐    │                             │
│     │  │ record_failure() │    │                             │
│     │  │  • failures++    │    │                             │
│     │  │  • penalty 2^N   │    │                             │
│     │  │  • max 3600s     │    │                             │
│     │  └────────┬─────────┘    │                             │
│     │           │              │                             │
│     ▼           ▼              │                             │
│  ┌─────────────────────┐       │                             │
│  │ record_success()    │       │                             │
│  │  • failures = 0     │       │                             │
│  │  • penalty = 0      │       │                             │
│  └──────────┬──────────┘       │                             │
│             │                  │                             │
│             └──────────────────┘                             │
│             │                                                │
│             ▼                                                │
│        Wait 1 second                                         │
│             │                                                │
│             └──────────────────────────────────────┐         │
│                                                    │         │
│  ┌─────────────────────────────────────────────────┘         │
│  │ Auto-reset check:                                        │
│  │  if idle > 24h: failures = 0                             │
│  └──────────────────────────────────────────────────────────┘
│                                                               │
│  REPEAT                                                       │
└───────────────────────────────────────────────────────────────┘

Parallel Execution

Collectors run in parallel via concurrent.futures.ThreadPoolExecutor (one thread per collector). A blocking external call (e.g. Speedtest Tracker timeout) never delays the local modem poll.

# main.py (simplified)
with ThreadPoolExecutor(max_workers=len(collectors)) as executor:
    while not stop_event.is_set():
        futures = {}
        for c in collectors:
            if c.is_enabled() and c.should_poll():
                futures[executor.submit(_run_collector, c)] = c
        for future in as_completed(futures, timeout=120):
            collector = futures[future]
            result = future.result()
            # record_success() or record_failure()
        stop_event.wait(1)

Thread Safety

All shared state is protected by locks:

Lock	Location	Protects
`_lock`	`Collector` base	Scheduling state (`_last_poll`, `_consecutive_failures`)
`_collect_lock`	`Collector` base	Prevents concurrent `collect()` (manual poll vs auto-poll)
`_state_lock`	`web.py`	Shared `_state` dict (written by collectors, read by Flask)
`_lock`	`EventDetector`	Previous snapshot comparison (`_prev`)

SQLite uses WAL mode (PRAGMA journal_mode=WAL) for concurrent reads during writes.

Fail-Safe Mechanism

Exponential Backoff:

Failure #1:   30s  penalty
Failure #2:   60s  penalty
Failure #3:  120s  penalty
Failure #4:  240s  penalty
Failure #5:  480s  penalty
Failure #6:  960s  penalty
Failure #7: 1920s  penalty
Failure #8: 3600s  penalty (cap reached)
Failure #9: 3600s  (stays at cap)
...
After 24h idle: auto-reset to 0

Why: Prevents hammering external services during outages, with automatic recovery.

Implemented Collectors

ModemCollector (`app/collectors/modem.py`)

Purpose: Fetch DOCSIS channel data from cable modem/router Poll Interval: 900s (15 minutes, configurable) Data Source: Pluggable modem driver (see Driver Architecture below)

Pipeline:

Driver.get_docsis_data()
  → analyzer.analyze()
    → event_detector.check()
      → storage.save_snapshot()
        → mqtt_pub.publish_data()
          → web.update_state()

Output: CollectorResult with channel health assessment

SpeedtestCollector (`app/collectors/speedtest.py`)

Purpose: Fetch speed test results from Speedtest Tracker
Poll Interval: 300s (5 minutes)
Data Source: Speedtest Tracker REST API

Features:

Delta fetching (only new results since last poll)
Local SQLite caching for performance
Correlation with DOCSIS signal snapshots

DemoCollector (`app/collectors/demo.py`)

Purpose: Generate realistic DOCSIS data for testing without a real modem Poll Interval: Configurable (default 900s) Data Source: app/fixtures/demo_channels.json with per-poll random variation Activation: DEMO_MODE=true environment variable

Pipeline:

_generate_data() (base channels + variation)
  → analyzer.analyze() (real analysis!)
    → event_detector.check()
      → storage.save_snapshot()
        → mqtt_pub.publish_data()
          → web.update_state()

Features:

25 DS channels (24× DOCSIS 3.0 + 1× DOCSIS 3.1) and 4 US channels
Per-poll variation: ±0.3 dBmV power, ±0.5 dB SNR, slowly accumulating errors
Seeds 9 months (270 days) of historical snapshots retroactive from current date
Pre-populated event log, journal entries (12), incident groups (3), speedtest results (270 days), BQM graphs (30 days), and BNetzA measurement campaigns (9 monthly)
Time-based patterns: diurnal cycles, seasonal drift, periodic "bad periods"
All demo rows marked with is_demo=1 flag for clean separation from user data
Purge-before-seed on container rebuild prevents duplicate data
Device info: "DOCSight Demo Router", Connection: 250/40 Mbit/s Cable

Live Migration: Users can switch from demo to live mode via Settings UI or POST /api/demo/migrate. This purges all is_demo=1 rows while preserving user-created entries, disables demo mode, and restarts polling for real modem data.

BQMCollector (`app/collectors/bqm.py`)

Purpose: Download broadband quality graphs Poll Interval: 86400s (24 hours) Data Source: ThinkBroadband BQM service

Output: PNG graph image saved to storage

BnetzWatcherCollector (`app/collectors/bnetz_watcher.py`)

Purpose: Auto-import BNetzA measurement protocols from a watched directory Poll Interval: 300s (5 minutes) Data Source: Local filesystem (PDF and CSV files) Activation: BNETZ_WATCH_ENABLED=true + BNETZ_WATCH_DIR=/data/bnetz

Pipeline:

Scan watch_dir for new .pdf/.csv files (not in .imported marker)
  -> PDF: parse_bnetz_pdf(bytes) -> storage.save_bnetz_measurement(parsed, pdf_bytes, source="watcher")
  -> CSV: parse_bnetz_csv(content) -> storage.save_bnetz_measurement(parsed, None, source="csv_import")
    -> Move processed files to processed/ subdirectory
      -> Append filenames to .imported marker

Features:

Watches for .pdf and .csv files in configured directory
Tracks already-imported files via .imported marker (set of filenames)
Moves processed files to processed/ subdirectory
PDF parsing: reuses app/bnetz_parser.py (official BNetzA Messprotokoll format)
CSV parsing: app/bnetz_csv_parser.py (semicolon-separated, German locale numbers)
Partial failure handling: continues importing remaining files on individual errors
get_status() includes watch_dir and last_import_count for UI banner

Related files:

app/bnetz_parser.py - PDF parser for official BNetzA Messprotokolle
app/bnetz_csv_parser.py - CSV parser for BNetzA Desktop App exports
See wiki Example Compose Stacks for sidecar examples

BackupCollector (`app/collectors/backup.py`)

Purpose: Automatically create scheduled backups of all DOCSight data Poll Interval: Configurable via backup_interval_hours (default: 24h) Data Source: Local /data directory (SQLite DB, config, keys) Activation: backup_enabled=true + backup_path configured in Settings

Pipeline:

create_backup_to_file(data_dir, dest_dir)
  -> VACUUM INTO for atomic, consistent DB copy
    -> Strip demo data (is_demo=1) from copy
      -> Pack into .tar.gz with backup_meta.json
        -> cleanup_old_backups(dest_dir, keep=retention)

Features:

Atomic SQLite copy via VACUUM INTO (no WAL issues, consistent snapshot)
backup_meta.json with format version, timestamp, app version, table row counts
Configurable retention (number of backups to keep)
Server-side directory browser for selecting backup path
Manual download (in-memory .tar.gz) via Settings UI
Restore from setup wizard on fresh instances

Related files:

app/backup.py - Core backup/restore logic (no Flask dependency)
app/templates/setup.html - Restore option in setup wizard

Notifier (`app/notifier.py`)

Purpose: Route event notifications to external channels Trigger: Called by EventDetector when anomalies are detected Channels: Webhook (ntfy, Discord, Gotify, custom endpoints)

Features:

Severity filtering (minimum severity threshold)
Cooldown period to prevent notification spam
Template-based message formatting
Configurable via Settings UI (webhook URL, headers, severity)

Driver Architecture

Modem drivers live in app/drivers/ and implement the ModemDriver base class:

class ModemDriver(ABC):
    def __init__(self, url: str, user: str, password: str): ...

    @abstractmethod
    def login(self) -> None: ...

    @abstractmethod
    def get_docsis_data(self) -> dict: ...

    @abstractmethod
    def get_device_info(self) -> dict: ...

    @abstractmethod
    def get_connection_info(self) -> dict: ...

Supported Drivers

Driver	Module	Hardware	Auth
`fritzbox`	`fritzbox.py`	AVM FRITZ!Box	SID-based (data.lua)
`tc4400`	`tc4400.py`	Technicolor TC4400	SNMP
`ultrahub7`	`ultrahub7.py`	Vodafone Ultra Hub 7	Session cookie
`cm3500`	`cm3500.py`	Arris CM3500B	Form POST (IP-based session)
`connectbox`	`connectbox.py`	Unitymedia Connect Box (CH7465)	Session cookie
`vodafone_station`	`vodafone_station.py`	CGA6444VF, CGA4322DE, TG3442DE	Auto-detected (see below)
`cm3000`	`cm3000.py`	Netgear CM3000	HTTP Basic Auth
`surfboard`	`surfboard.py`	Arris SURFboard S33/S34/SB8200	HNAP1 HMAC-SHA256
`cm8200`	`cm8200.py`	Arris Touchstone CM8200A	Base64 query string
`generic`	`generic.py`	Generic Router (no DOCSIS)	None

Driver Registry (`app/drivers/init.py`)

Drivers are loaded by name via load_driver(modem_type, url, user, password). The registry maps type strings to fully qualified class paths for lazy importing.

Vodafone Station Auto-Detection

The Vodafone Station driver supports two hardware variants with different auth flows:

CGA (CGA6444VF / CGA4322DE): Double PBKDF2-SHA256 + JSON REST API
TG (TG3442DE): AES-CCM encrypted credentials + HTML/AJAX endpoints

Variant is auto-detected on first login: CGA is tried first, then TG on failure.

Data Flow

Modem Data Collection

┌──────────────────────────────────────────────────────────────┐
│ 1. Polling Loop (main.py)                                    │
│    if modem_collector.should_poll():                         │
│      result = modem_collector.collect()                      │
└────────────────────┬─────────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────────┐
│ 2. ModemCollector (collectors/modem.py)                      │
│    driver.login()                                            │
│    data = driver.get_docsis_data()                           │
│    analysis = analyzer.analyze(data)                         │
└────────────────────┬─────────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────────┐
│ 3. Analyzer (analyzer.py)                                    │
│    • Load thresholds from thresholds.json                    │
│    • Parse DS/US channels                                    │
│    • Assess power, SNR, errors per channel                   │
│    • Aggregate to overall health (good/tolerated/marginal/critical) │
│    • Return structured analysis dict                         │
└────────────────────┬─────────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────────┐
│ 4. Event Detector (event_detector.py)                        │
│    events = detector.check(analysis)                         │
│    • Compare to previous snapshot                            │
│    • Detect power shifts, SNR drops, modulation changes      │
│    • Generate event records with severity                    │
└────────────────────┬─────────────────────────────────────────┘
                     │
                     ├──────────────┬──────────────┬────────────┐
                     │              │              │            │
                     ▼              ▼              ▼            ▼
         ┌────────────────┐ ┌────────────┐ ┌──────────┐ ┌──────────┐
         │    Storage     │ │    MQTT    │ │   Web    │ │  Return  │
         │ save_snapshot()│ │  publish() │ │  update  │ │  Result  │
         │ save_events()  │ │  (HA)      │ │  _state()│ │          │
         └────────────────┘ └────────────┘ └──────────┘ └──────────┘

Web API Request Flow

User clicks refresh button
    │
    ▼
POST /api/poll  (web.py)
    │
    ├─ Rate limit check (10s cooldown)
    │
    ├─ Acquire _collect_lock (non-blocking)
    │   └─ If busy → 429 "Poll already in progress"
    │
    ▼
result = modem_collector.collect()
    │
    ├─ (same flow as automatic polling)
    │
    ▼
Return JSON { success: true, analysis: {...} }

Key: Manual refresh uses the same collector as automatic polling, ensuring consistent fail-safe behavior. The _collect_lock prevents collision with a parallel auto-poll.

Storage Layer

Database: SQLite (/data/docsis_history.db) with WAL mode for concurrent access

Timestamp convention: All timestamp, created_at, updated_at, and last_used_at columns store UTC with Z-suffix (YYYY-MM-DDTHH:MM:SSZ). Date-only columns (date, start_date, end_date) store calendar dates (YYYY-MM-DD) without timezone conversion. On first startup after upgrade, existing naive local timestamps are automatically migrated to UTC using the configured timezone. A safety backup (docsis_history.db.pre_utc_migration) is created before conversion.

Schema:

-- DOCSIS signal snapshots
CREATE TABLE snapshots (
    id INTEGER PRIMARY KEY,
    timestamp TEXT NOT NULL,  -- UTC with Z-suffix
    summary_json TEXT,
    ds_channels_json TEXT,
    us_channels_json TEXT,
    is_demo INTEGER NOT NULL DEFAULT 0
);

-- Speed test results (cached from Speedtest Tracker)
CREATE TABLE speedtest_results (
    id INTEGER PRIMARY KEY,
    timestamp TEXT NOT NULL,
    download_mbps REAL,
    upload_mbps REAL,
    ping_ms REAL,
    ...,
    is_demo INTEGER NOT NULL DEFAULT 0
);

-- Event log (anomaly detection)
CREATE TABLE events (
    id INTEGER PRIMARY KEY,
    timestamp TEXT NOT NULL,
    severity TEXT,  -- info|warning|critical
    event_type TEXT,  -- health_change, power_shift, snr_drop, etc.
    message TEXT,
    acknowledged INTEGER,
    is_demo INTEGER NOT NULL DEFAULT 0
);

-- Incident containers (groups)
CREATE TABLE incidents (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    status TEXT DEFAULT 'open',  -- open|resolved|escalated
    start_date TEXT,
    end_date TEXT,
    icon TEXT,
    created_at TEXT,
    updated_at TEXT,
    is_demo INTEGER NOT NULL DEFAULT 0
);

-- Journal entries (formerly "incidents")
CREATE TABLE journal_entries (
    id INTEGER PRIMARY KEY,
    date TEXT,
    title TEXT,
    description TEXT,
    icon TEXT,
    incident_id INTEGER,  -- FK to incidents.id, nullable
    created_at TEXT,
    updated_at TEXT,
    is_demo INTEGER NOT NULL DEFAULT 0
);

-- Journal attachments
CREATE TABLE journal_attachments (
    id INTEGER PRIMARY KEY,
    entry_id INTEGER,  -- FK to journal_entries.id
    filename TEXT,
    mime_type TEXT,
    data BLOB,
    created_at TEXT
);

-- BQM graphs
CREATE TABLE bqm_graphs (
    id INTEGER PRIMARY KEY,
    date TEXT UNIQUE,
    timestamp TEXT,
    image_blob BLOB,
    is_demo INTEGER NOT NULL DEFAULT 0
);

-- BNetzA broadband measurements
CREATE TABLE bnetz_measurements (
    id INTEGER PRIMARY KEY,
    date TEXT NOT NULL,
    timestamp TEXT NOT NULL,
    provider TEXT,
    tariff TEXT,
    download_max_tariff REAL,
    download_normal_tariff REAL,
    download_min_tariff REAL,
    upload_max_tariff REAL,
    upload_normal_tariff REAL,
    upload_min_tariff REAL,
    download_measured_avg REAL,
    upload_measured_avg REAL,
    measurement_count INTEGER,
    verdict_download TEXT,
    verdict_upload TEXT,
    measurements_json TEXT,
    pdf_blob BLOB,               -- NULL for CSV imports
    source TEXT DEFAULT 'upload'  -- 'upload', 'watcher', 'csv_import'
);

Retention: Configurable via history_days setting (default: 7 days)

Web Layer

Framework: Flask
Port: 8765 (configurable)
Auth: Optional password protection (bcrypt hashing) + API token authentication (Bearer tokens)

API Endpoints

Endpoint	Method	Purpose
`/`	GET	Main dashboard
`/settings`	GET	Configuration UI
`/api/poll`	POST	Manual data refresh
`/api/config`	POST	Save configuration
`/api/test-modem`	POST	Test modem connection
`/api/collectors/status`	GET	NEW: Collector health monitoring
`/api/snapshots`	GET	Available snapshot timestamps
`/api/trends`	GET	Trend data (day/week/month)
`/api/speedtest`	GET	Cached speedtest results
`/api/speedtest/<id>/signal`	GET	Correlated DOCSIS snapshot
`/api/events`	GET	Event log with filters
`/api/correlation`	GET	Cross-source timeline
`/api/journal`	GET/POST	Journal entries (list, create)
`/api/journal/<id>`	GET/PUT/DELETE	Single journal entry CRUD
`/api/journal/<id>/attachments`	POST	Upload entry attachments
`/api/journal/import/*`	POST	Excel/CSV import (preview + confirm)
`/api/incidents`	GET/POST	Incident containers (list, create)
`/api/incidents/<id>`	GET/PUT/DELETE	Single incident container CRUD
`/api/incidents/<id>/assign`	POST	Assign entries to incident
`/api/journal/export`	GET	Export journal entries (CSV, JSON, or Markdown)
`/api/tokens`	GET/POST	List or create API tokens (POST requires session auth)
`/api/tokens/<id>`	DELETE	Revoke an API token (requires session auth)
`/api/export`	GET	LLM-optimized report
`/api/report`	GET	PDF incident report
`/api/bnetz/upload`	POST	Upload BNetzA measurement (PDF or CSV)
`/api/bnetz/measurements`	GET	List BNetzA measurements
`/api/bnetz/pdf/<id>`	GET	Download original measurement PDF
`/api/bnetz/<id>`	DELETE	Delete a BNetzA measurement
`/api/demo/migrate`	POST	Switch from demo to live mode (purges demo data, preserves user data)
`/api/backup`	POST	Create and download backup (.tar.gz)
`/api/backup/scheduled`	POST	Create backup in configured backup path
`/api/backup/list`	GET	List existing backups in backup path
`/api/backup/<filename>`	DELETE	Delete a backup file
`/api/restore/validate`	POST	Validate uploaded backup archive
`/api/restore`	POST	Restore from uploaded backup archive
`/api/browse`	GET	Server-side directory browser for path selection

New in v2.0:

GET /api/collectors/status

[
  {
    "name": "modem",
    "enabled": true,
    "consecutive_failures": 0,
    "penalty_seconds": 0,
    "poll_interval": 900,
    "effective_interval": 900,
    "last_poll": 1771140669.27,
    "next_poll_in": 890
  },
  {
    "name": "speedtest",
    "enabled": true,
    "consecutive_failures": 2,
    "penalty_seconds": 60,
    "poll_interval": 300,
    "effective_interval": 360,
    "last_poll": 1771140670.19,
    "next_poll_in": 120
  },
  ...
]

Configuration

File: /data/config.json (AES-128 encrypted)
Format: JSON

Key Settings:

{
  "modem_type": "fritzbox",
  "modem_url": "http://192.168.178.1",
  "modem_user": "user",
  "modem_password": "<encrypted>",
  "poll_interval": 900,
  "history_days": 7,
  "mqtt_host": "localhost",
  "speedtest_tracker_url": "http://...",
  ...
}

Override: Environment variables take precedence over config.json

Demo Mode: Set DEMO_MODE=true to run without a real modem. The DemoCollector replaces the ModemCollector and generates 9 months of realistic simulated data. No modem password required, setup page is bypassed. All demo-seeded rows are tagged with is_demo=1 so they can be cleanly purged when switching to live mode via POST /api/demo/migrate.

Extending DOCSight

Adding a New Collector

Create collector class:

# app/collectors/my_collector.py
from .base import Collector, CollectorResult

class MyCollector(Collector):
    name = "my_source"
    
    def __init__(self, config_mgr, storage, poll_interval):
        super().__init__(poll_interval)
        self._config = config_mgr
        self._storage = storage
    
    def collect(self) -> CollectorResult:
        try:
            # Fetch data from external API
            data = self._fetch_data()
            
            # Store results
            self._storage.save_my_data(data)
            
            return CollectorResult.ok(self.name, data)
        except Exception as e:
            return CollectorResult.failure(self.name, str(e))
    
    def is_enabled(self) -> bool:
        # Check if configured
        return bool(self._config.get("my_source_url"))

Register in discovery:

# app/collectors/__init__.py
from .my_collector import MyCollector

COLLECTOR_REGISTRY = {
    "modem": ModemCollector,
    "demo": DemoCollector,
    "speedtest": SpeedtestCollector,
    "bqm": BQMCollector,
    "bnetz_watcher": BnetzWatcherCollector,
    "backup": BackupCollector,
    "my_source": MyCollector,  # Add here
}

def discover_collectors(...):
    collectors = []
    
    # ... existing collectors ...
    
    # My new collector
    if config_mgr.get("my_source_url"):
        collectors.append(MyCollector(
            config_mgr=config_mgr,
            storage=storage,
            poll_interval=3600,  # 1 hour
        ))
    
    return collectors

Add configuration UI:

Update app/templates/settings.html to include configuration fields for your collector.

Add tests:

# tests/test_my_collector.py
def test_my_collector_success():
    collector = MyCollector(...)
    result = collector.collect()
    assert result.success

That's it! The collector will:

Poll automatically at your specified interval
Apply fail-safe on errors
Report health via /api/collectors/status
Integrate with the rest of the system

Security

Password Storage:

Admin password: scrypt/pbkdf2 hashed
Modem password: AES-128 encrypted
MQTT password: AES-128 encrypted
Config file: chmod 600 (owner-only)

Session Management:

Flask sessions with HTTPOnly cookies
SameSite=Strict policy
Persistent session key in /data/.session_key

Rate Limiting:

Login: 5 attempts per 15 minutes per IP
Manual poll: 10 second cooldown

Headers:

HSTS (Strict-Transport-Security)
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
CSP (Content Security Policy) for XSS protection

Testing

Framework: pytest Coverage: Extensive pytest suite across analyzers, collectors, storage, web, config, and integrations

Run tests:

python -m pytest tests/ -v

Test Categories:

Analyzer: DOCSIS threshold logic
Collectors: Data collection and fail-safe
Storage: Database operations
Web: API endpoints and auth
Event detection: Anomaly detection
Config: Configuration management
BNetzA: PDF/CSV parsing and file watcher
Demo mode: is_demo marking, purge, migration, idempotency
Backup: create, validate, restore, cleanup, directory browser
CM3500: HTML parsing, service flow parsing, HTTPS enforcement
Notifier: webhook delivery, severity filtering, cooldown
i18n: Translation completeness

Performance

Memory: ~50-100 MB typical (depends on history_days)
CPU: <1% average (spikes during polling)
Disk: ~1-5 MB per day (depends on poll_interval and enabled collectors)
Network: Minimal (only modem queries + optional external APIs)

Optimization:

Speedtest results cached locally (reduces API calls)
SQLite with indexes for fast queries
uPlot for lightweight charting (~50KB vs Chart.js 204KB)

Deployment

Recommended: Docker Compose

services:
  docsight:
    image: ghcr.io/itsdnns/docsight:latest
    container_name: docsight
    restart: unless-stopped
    ports:
      - "8765:8765"
    volumes:
      - docsight_data:/data
      - docsight_backup:/backup  # Optional: for scheduled backups
    environment:
      - TZ=Europe/Berlin

volumes:
  docsight_data:
  docsight_backup:

Data persistence: All data in /data volume

Troubleshooting

Check collector status:

curl http://localhost:8765/api/collectors/status | jq .

Check logs:

docker logs docsight

Common issues:

Modem collector failing:
- Check modem URL and credentials
- Verify modem is on same network
- Check /api/collectors/status for penalty state
Speedtest not updating:
- Verify Speedtest Tracker URL and token
- Check /api/collectors/status for errors
High penalty on collector:
- Auto-resets after 24h idle
- Fix underlying issue (credentials, network)
- Restart container to reset immediately

License

MIT

Uh oh!

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

DOCSight Architecture

Overview

System Architecture

Collector Pattern

Base Collector Class

Collector Lifecycle

Parallel Execution

Thread Safety

Fail-Safe Mechanism

Implemented Collectors

ModemCollector (app/collectors/modem.py)

SpeedtestCollector (app/collectors/speedtest.py)

DemoCollector (app/collectors/demo.py)

BQMCollector (app/collectors/bqm.py)

BnetzWatcherCollector (app/collectors/bnetz_watcher.py)

BackupCollector (app/collectors/backup.py)

Notifier (app/notifier.py)

Driver Architecture

Supported Drivers

Driver Registry (app/drivers/__init__.py)

Vodafone Station Auto-Detection

Data Flow

Modem Data Collection

Web API Request Flow

Storage Layer

Web Layer

API Endpoints

Configuration

Extending DOCSight

Adding a New Collector

Security

Testing

Performance

Deployment

Troubleshooting

License

Further Reading

ModemCollector (`app/collectors/modem.py`)

SpeedtestCollector (`app/collectors/speedtest.py`)

DemoCollector (`app/collectors/demo.py`)

BQMCollector (`app/collectors/bqm.py`)

BnetzWatcherCollector (`app/collectors/bnetz_watcher.py`)

BackupCollector (`app/collectors/backup.py`)

Notifier (`app/notifier.py`)

Driver Registry (`app/drivers/init.py`)