Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,36 @@ jobs:
uv pip install pre-commit
uv run pre-commit run -a

pyright:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
python-version: "3.13"

- name: Cache uv
uses: actions/cache@v4
with:
path: ~/.cache/uv
key: ${{ runner.os }}-uv-pyright-${{ hashFiles('requirements-dev.lock') }}
restore-keys: |
${{ runner.os }}-uv-pyright-
${{ runner.os }}-uv-

- name: Typecheck with Pyright
env:
SETUPTOOLS_SCM_WRITE_TO_SOURCE: "1"
run: |
uv venv
uv pip install -r requirements-dev.lock
uv pip install -e .
uv run pyright

test:
runs-on: ubuntu-latest
timeout-minutes: 15
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ python -m pytest --tb=short --cov=. --cov-report=term-missing --cov-fail-under=9

Coverage writes a local **`.coverage`** file (binary data used by `coverage.py`; safe to delete). It is listed in `.gitignore`.

**CI:** [`.github/workflows/actions.yml`](.github/workflows/actions.yml) runs pytest with `DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5432/postgres` and the same `DJANGO_SETTINGS_MODULE=config.test_settings` as local.
**CI:** [`.github/workflows/actions.yml`](.github/workflows/actions.yml) runs three jobs on pushes/PRs (see the workflow for triggers): **`lint`** (pre-commit on all files), **`pyright`** (static analysis from `pyrightconfig.json`), and **`test`** (pytest with Postgres, `DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5432/postgres`, `DJANGO_SETTINGS_MODULE=config.test_settings`, coverage, and `--cov-fail-under=90`).

6. Run a subset of tests (e.g. one app or one file):

Expand All @@ -140,6 +140,8 @@ python -m pytest github_activity_tracker/tests/test_sync_utils.py -v

CI runs pytest with coverage (`--cov`, HTML/XML reports). To match a **local** coverage gate, use **`--cov-fail-under=90`** (see step 5 above). If coverage fails locally or you need a fresh test DB schema after model changes, run once with `python -m pytest --create-db`.

**Pyright (local):** with dev dependencies installed (`uv pip install -r requirements-dev.lock`), run **`uv run pyright`** from the repo root to match the **`pyright`** CI job (`pyrightconfig.json` scopes `core`, `github_activity_tracker`, and `discord_activity_tracker`).

See [docs/Development_guideline.md](docs/Development_guideline.md#testing-workflow) for when to run tests during development.

## Project structure
Expand Down Expand Up @@ -202,7 +204,7 @@ Docs are organized **by topic** (one doc per concern: workflow, workspace, servi

- [Onboarding.md](docs/Onboarding.md) – First-day orientation for contributors (mental model, app roles, data flow).
- [docs/README.md](docs/README.md) – Per-topic index and how to find app-specific info.
- [Running tests](#running-tests) – How to run the test suite (pytest, coverage).
- [Running tests](#running-tests) – How to run the test suite (pytest, coverage) and **Pyright** (`uv run pyright`).
- [Celery](#celery) – How to start the Celery worker and Beat.
- [Celery_test.md](docs/Celery_test.md) – Testing the Celery task (run once, Beat, Redis).
- [operations/](docs/operations/README.md) – **Operations group:** shared I/O (GitHub, Discord, etc.); index and per-operation docs.
Expand Down
38 changes: 36 additions & 2 deletions core/collectors/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,45 @@


class CollectorBase(_CollectorLifecycleMixin, ABC):
"""Base type for collectors run via management commands or YAML schedules."""
"""
Legacy abstract base for collectors run via management commands or YAML schedules.

**Purpose:** Provide a single orchestration hook, :meth:`run`, while sharing
Pinecone and structured logging behavior with :class:`AbstractCollector` through
:class:`_CollectorLifecycleMixin`.

**Subclasses must implement:** :meth:`run` only. For new collectors, prefer
:class:`AbstractCollector` (``name``, ``validate_config``, ``collect``); its
concrete :meth:`~AbstractCollector.run` calls those hooks so
:class:`~core.collectors.command_base.BaseCollectorCommand` stays unchanged.

**Lifecycle hooks (inherited):** :meth:`~_CollectorLifecycleMixin.sync_pinecone`
(optional post-run, default no-op) and :meth:`~_CollectorLifecycleMixin.handle_error`
(structured logging). When the collector is driven by
:class:`~core.collectors.command_base.BaseCollectorCommand`, attribute
``_error_phase`` is set to ``\"run\"`` or ``\"sync_pinecone\"`` for the duration
of each phase so logs include ``collector_phase``.

**Error handling contract:** Exceptions other than :class:`django.core.management.base.CommandError`
are passed to :meth:`~_CollectorLifecycleMixin.handle_error`, which uses
:func:`core.errors.classify_failure` and ``logger.exception`` with ``failure_category``
in ``extra``. The original exception is then re-raised. ``CommandError`` is logged
by the command layer without calling :meth:`~_CollectorLifecycleMixin.handle_error`.
"""

@abstractmethod
def run(self) -> None:
"""Execute the collector work unit."""
"""
Execute the collector work unit.

Returns:
None: Implementations should return implicitly after successful work.

Note:
Prefer :class:`AbstractCollector` when splitting validation from
collection improves clarity; it still satisfies this base's contract via
its concrete :meth:`~AbstractCollector.run`.
"""


class DjangoCommandCollector(CollectorBase):
Expand Down
86 changes: 75 additions & 11 deletions core/collectors/base_collector.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,56 @@

@runtime_checkable
class CollectorRunnable(Protocol):
"""Collector instance with ``run``, ``sync_pinecone``, and ``handle_error`` (see ``BaseCollectorCommand``)."""
"""
Structural type for objects executed by :class:`BaseCollectorCommand`.

def run(self) -> None: ...
Implementations are typically :class:`CollectorBase` or :class:`AbstractCollector`
subclasses. The command invokes :meth:`run`, then :meth:`sync_pinecone`, and
routes failures through :meth:`handle_error` (except :class:`~django.core.management.base.CommandError`).
"""

def sync_pinecone(self) -> None: ...
def run(self) -> None:
"""Main collection phase; see :class:`CollectorBase` or :class:`AbstractCollector`."""
...

def handle_error(self, exc: BaseException) -> None: ...
def sync_pinecone(self) -> None:
"""Optional post-run sync; may be a no-op."""
...

def handle_error(self, exc: BaseException) -> None:
"""Log *exc* with structured ``failure_category``; must not swallow *exc*."""
...


class _CollectorLifecycleMixin:
"""
Shared ``handle_error`` / ``sync_pinecone`` for legacy and structured collectors.

Uses :func:`core.errors.classify_failure` so logs align with
:class:`core.errors.CollectorFailureCategory`.
Uses :func:`core.errors.classify_failure` (not a method on
:class:`~core.errors.CollectorFailureCategory`) so log ``extra`` includes a stable
``failure_category`` enum value.

**``_error_phase``:** Set only by :class:`~core.collectors.command_base.BaseCollectorCommand`
around each phase; used when logging. If :func:`~core.errors.classify_failure`
or logging raises, the command's ``finally`` still clears ``_error_phase`` on the
collector instance.

**Intentional gaps:** Many domain or SDK exceptions map to ``unknown``. Override
:meth:`handle_error` when you need a different category or extra context.
"""

def handle_error(self, exc: BaseException) -> None:
"""Log failures with ``failure_category`` from :class:`CollectorFailureCategory`."""
"""
Log *exc* at exception level with structured fields for metrics and alerting.

Args:
exc: The exception from ``run`` or ``sync_pinecone`` (never a
:class:`~django.core.management.base.CommandError`; those are handled
in the command).

``logger.exception`` receives ``extra`` keys: ``collector``, ``collector_phase``,
``failure_category`` (string value of :class:`~core.errors.CollectorFailureCategory`).
"""
category = classify_failure(exc)
phase = getattr(self, "_error_phase", None) or "unknown"
collector_id = getattr(self, "name", None)
Expand All @@ -56,7 +87,12 @@ def handle_error(self, exc: BaseException) -> None:
)

def sync_pinecone(self) -> None:
"""Optional post-run Pinecone sync; default is no-op."""
"""
Optional post-run Pinecone sync; default is no-op.

Returns:
None
"""
return None


Expand All @@ -74,15 +110,43 @@ class AbstractCollector(_CollectorLifecycleMixin, ABC):
@property
@abstractmethod
def name(self) -> str:
"""Stable collector id for logs and metrics (e.g. app or command slug)."""
"""
Stable collector id for logs and metrics (e.g. app or command slug).

Returns:
Non-empty string used as the ``collector`` field in structured logs when
:meth:`handle_error` runs.
"""

@abstractmethod
def validate_config(self) -> None:
"""Raise or no-op before :meth:`collect`; keep side effects in services."""
"""
Validate settings and environment before :meth:`collect`.

Returns:
None

Raises:
Exception: Typically validation-related errors; should not perform
heavy I/O—keep that in :meth:`collect` via services.
"""

@abstractmethod
def collect(self) -> None:
"""Main collection work; DB writes belong in ``services.py``."""
"""
Main collection work (fetch, transform, persist).

Returns:
None

Raises:
Exception: Domain failures; propagate after logging when run under
:class:`~core.collectors.command_base.BaseCollectorCommand`.

Note:
DB writes should go through the app's ``services.py`` module per project
conventions.
"""

def run(self) -> None:
self.validate_config()
Expand Down
62 changes: 54 additions & 8 deletions core/collectors/command_base.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
"""Django management command base class for CollectorBase-backed collectors."""

# Design notes (review summary):
# - Template method: handle() -> get_collector(**options) -> phase(run) -> phase(sync_pinecone).
# - ABC: subclasses that omit get_collector() raise TypeError at instantiation, not at import.
# - Each _run_collector_phase uses try/except/finally; finally clears _error_phase even if
# handle_error, classify_failure, or logging raises (double fault cleanup).
# - classify_failure (core.errors) maps a core dependency surface; many SDK/DB errors stay
# unknown—override handle_error on the collector when you need a specific category.

from __future__ import annotations

import logging
from abc import abstractmethod
from abc import ABC, abstractmethod
from typing import Any

from django.core.management.base import BaseCommand, CommandError
Expand All @@ -13,12 +21,42 @@
logger = logging.getLogger(__name__)


class BaseCollectorCommand(BaseCommand):
"""Runs ``get_collector(**options).run()`` then ``sync_pinecone()`` with shared error handling."""
class BaseCollectorCommand(ABC, BaseCommand):
"""
Thin Django ``BaseCommand`` adapter using the template-method pattern.

**Flow:** :meth:`django.core.management.base.BaseCommand.handle` is implemented as
``get_collector(**options)``, then :meth:`_run_collector_phase` with ``collector.run``,
then :meth:`_run_collector_phase` with ``collector.sync_pinecone``.

**``get_collector`` contract:** Must return a :class:`CollectorRunnable`—any object
with ``run()``, ``sync_pinecone()``, and ``handle_error(exc)``. Typical implementations
return a :class:`~core.collectors.base.CollectorBase` or
:class:`~core.collectors.base_collector.AbstractCollector` instance. Subclasses that
do not implement :meth:`get_collector` cannot be instantiated (``TypeError`` from
``abc``), which surfaces as soon as the command object is constructed, usually when
Django loads the command.

**Errors:** :class:`~django.core.management.base.CommandError` is logged with
``failure_category`` set to ``\"command\"`` and re-raised without calling
``handle_error``. Any other :class:`Exception` is passed to ``collector.handle_error``
(which classifies via :func:`core.errors.classify_failure` and logs), then re-raised.
A ``finally`` block always removes ``collector._error_phase`` after each phase.
"""

@abstractmethod
def get_collector(self, **options: Any) -> CollectorRunnable:
"""Instantiate the collector from parsed CLI options."""
"""
Build the collector instance from parsed CLI options.

Args:
**options: Keyword arguments forwarded from :meth:`handle` (Django-parsed
command-line options and defaults).

Returns:
A :class:`CollectorRunnable` executed by :meth:`handle` (``run`` then
``sync_pinecone``).
"""

def handle(self, *args: Any, **options: Any) -> None:
collector = self.get_collector(**options)
Expand All @@ -31,11 +69,19 @@ def _run_collector_phase(
phase: Any,
) -> None:
"""
Run one collector phase (``run`` or ``sync_pinecone``).
Run a single zero-argument callable phase on *collector*.

Sets ``collector._error_phase`` to the callable's ``__name__`` (for example
``\"run\"`` or ``\"sync_pinecone\"``) before invoking *phase*, clears it in
``finally``, and routes failures per :class:`BaseCollectorCommand` error rules.

Args:
collector: Object providing ``handle_error`` for non-command failures.
phase: Bound method or callable with no arguments (typically
``collector.run`` or ``collector.sync_pinecone``).

On unexpected exceptions, :meth:`CollectorRunnable.handle_error` is invoked with
``collector._error_phase`` set to the phase name for structured logs
(``collector``, ``collector_phase``, ``failure_category`` on the log record).
Returns:
None
"""
phase_name = getattr(phase, "__name__", str(phase))
setattr(collector, "_error_phase", phase_name)
Expand Down
Loading
Loading