Skip to content

feat(discord): validate staging JSON with Pydantic; document schema; align exporter fields#198

Merged
wpak-ai merged 7 commits into
cppalliance:developfrom
leostar0412:feature/discord-tracker-schema-validation
May 13, 2026
Merged

feat(discord): validate staging JSON with Pydantic; document schema; align exporter fields#198
wpak-ai merged 7 commits into
cppalliance:developfrom
leostar0412:feature/discord-tracker-schema-validation

Conversation

@leostar0412
Copy link
Copy Markdown
Collaborator

@leostar0412 leostar0412 commented May 12, 2026

Summary

Adds a documented Discord staging JSON model, validates envelopes and normalized messages with Pydantic before DB writes in backfill and run_discord_activity_tracker, ships a generated JSON Schema (discord_staging_v1.json), and tightens exporter output (canonical timestamps, actor/source fields where applicable). Shared parsing and truncation live in core so Discord and GitHub trackers stay consistent.

Changes

  • Docs: docs/discord-tracker-schema.md — envelope, normalized message, reactions, limitations, JSON Schema vs runtime validation.
  • Discord: staging_schema.py, schemas/discord_staging_v1.json, scripts/write_staging_json_schema.py; validation wired in backfill + run command; chat_exporter / workspace iteration updates; tests for schema, workspace ordering, exporter, sync utils.
  • Core: format_instant_iso_z / parse_iso_datetime_lenient (and tests); truncate_content helper + tests.
  • GitHub: sync/utils.py uses core’s lenient datetime parser (dedupe).
  • Deps: pydantic in requirements.in with lockfile updates.

How to test

pytest discord_activity_tracker/tests/test_staging_schema.py \
  discord_activity_tracker/tests/test_sync_chat_exporter.py \
  discord_activity_tracker/tests/test_run_discord_activity_tracker_command.py \
  core/tests/test_datetime_parsing.py \
  core/tests/test_text_processing.py

Closes #188

Summary by CodeRabbit

  • New Features

    • Lenient ISO datetime parsing/formatting with safe fallbacks
    • Text truncation utility
    • Discord staging JSON Schema, runtime validation, and schema export script
  • Improvements

    • Consistent datetime normalization across syncs and earlier validation of incoming exports
    • Active-channel logic now derives activity from stored messages; per-channel timestamp fields removed
    • Deterministic ordering of message JSON processing; admin list display simplified
  • Tests

    • Expanded coverage for datetime parsing/formatting, truncation, conversion, staging-schema validation, and sync flows
  • Documentation

    • Added/updated Discord tracker schema and service API docs

Review Change Stack

@leostar0412 leostar0412 self-assigned this May 12, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Pydantic staging models and a committed JSON Schema + writer; centralizes lenient ISO datetime parsing, canonical-Z formatting, and text truncation; integrates envelope/normalized-message validation into Discord sync/backfill/export; replaces channel timestamp fields with message-derived activity queries; updates tests, docs, and a migration.

Changes

Discord Staging Schema & Utility Consolidation

Layer / File(s) Summary
Core DateTime & Text Utilities
core/utils/datetime_parsing.py, core/utils/text_processing.py, core/tests/test_datetime_parsing.py, core/tests/test_text_processing.py
Adds parse_iso_datetime_lenient(), format_instant_iso_z(), CANONICAL_INSTANT_UTC_Z_PATTERN, module logger, and truncate_content() with tests covering empty/valid/invalid/edge cases.
Pydantic Schema Models & Validation
discord_activity_tracker/staging_schema.py
Pydantic DiscordChatExporterEnvelope and NormalizedDiscordMessage models, validators, _validation_error translation to ValueError, JSON-schema bundle builder, and write_staging_json_schema() helper.
Committed JSON Schema & Generation Script
discord_activity_tracker/schemas/discord_staging_v1.json, discord_activity_tracker/scripts/write_staging_json_schema.py, discord_activity_tracker/scripts/__init__.py
Committed draft 2020-12 JSON Schema for envelope and normalized message; script to write the schema from models and package docstring.
Chat Exporter Conversion & Normalization
discord_activity_tracker/sync/chat_exporter.py, discord_activity_tracker/tests/test_sync_chat_exporter.py
convert_exporter_message_to_dict() accepts server_id/channel_id, normalizes/coerces and filters reactions, normalizes timestamps, sets occurred_at for canonical timestamps, derives actor_id, and conditionally sets source_url; tests updated.
Sync Messages & Management Commands Integration
discord_activity_tracker/sync/messages.py, discord_activity_tracker/management/commands/*.py
Sync and management commands validate envelopes, use parse_iso_datetime_lenient() for timestamps, convert messages with explicit server/channel ids, validate normalized messages, and stop persisting per-channel last-activity/synced timestamps.
Services, Models & Migration
discord_activity_tracker/services.py, discord_activity_tracker/models.py, discord_activity_tracker/migrations/0006_remove_channel_last_timestamps.py
Removes DiscordChannel.last_activity_at/last_synced_at and index; adds get_channel_latest_message_at, queryset_channels_with_recent_messages, and updates get_active_channels to use message-derived activity; migration added; admin/docs updated.
Export, Workspace Ordering, Tests & Docs
discord_activity_tracker/sync/export.py, discord_activity_tracker/workspace.py, discord_activity_tracker/tests/*, docs/discord-tracker-schema.md
Export delegates active-channel selection to service helper; iter_existing_message_jsons() yields sorted filenames deterministically; adds tests for staging validation, normalized messages, reaction handling, datetime regex/formatting, truncation, and schema documentation.
GitHub Adapter & Requirements
github_activity_tracker/sync/utils.py, requirements.in
GitHub adapter re-exports parse_iso_datetime_lenient from core; pydantic>=2,<3 added to dependencies.
Misc: scripts & cleanup
discord_activity_tracker/scripts/*, discord_activity_tracker/sync/utils.py
Adds script entrypoint to write schema, module docstring, and removes local parse_datetime/truncate_content helpers in favour of core utilities.

Sequence Diagram(s)

sequenceDiagram
  participant Cmd as Management Command
  participant JSONFile as Discord JSON
  participant ValidateEnv as validate_envelope
  participant Convert as convert_exporter_message_to_dict
  participant ValidateMsg as validate_normalized_message
  participant Parser as parse_iso_datetime_lenient
  participant DB as Database

  Cmd->>JSONFile: Load JSON export
  JSONFile-->>Cmd: Raw envelope dict
  Cmd->>ValidateEnv: validate_envelope(raw, source)
  ValidateEnv-->>Cmd: DiscordChatExporterEnvelope model
  Cmd->>Convert: convert_exporter_message_to_dict(msg, server_id, channel_id)
  Convert->>Parser: parse created_at/edited_at -> datetime
  Parser-->>Convert: datetime (or None)
  Convert-->>Cmd: Normalized dict (actor_id, occurred_at, source_url)
  Cmd->>ValidateMsg: validate_normalized_message(normalized, source)
  ValidateMsg-->>Cmd: NormalizedDiscordMessage model
  Cmd->>DB: Persist validated message
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • jonathanMLDev
  • wpak-ai

"🐰 Hop hooray — schemas, dates, and text aligned!
I trimmed long tails and Z-times I signed,
Validators guard each normalized line,
Reactions clean, and source URLs shine,
Hop—tests pass and trackers hum in kind!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly addresses the primary objective: it announces the addition of Pydantic-based staging JSON validation, schema documentation, and alignment of exporter fields to canonical form—exactly what the PR delivers.
Linked Issues check ✅ Passed All coding-related requirements from issue #188 are met: JSON schema documented in docs/discord-tracker-schema.md [#188], Pydantic validation models in staging_schema.py with validate_envelope and validate_normalized_message [#188], snake_case fields and ISO Z timestamps [#188], test coverage for well-formed messages/reactions and malformed rejection [#188], and deterministic workspace file ordering [#188].
Out of Scope Changes check ✅ Passed All changes align with issue #188 scope: core datetime/text utilities support the validation layer, admin/model/service/sync refactoring removes obsolete timestamp-tracking fields, migration removes deprecated columns, and tests validate the new schema/validation infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
discord_activity_tracker/management/commands/run_discord_activity_tracker.py (1)

202-207: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Same validation-bypass bug here: consume validate_envelope(...) output.

validate_envelope is currently verification-only in this path. Its normalized/coerced values are discarded, so malformed-but-coercible payloads can still fail later when reading raw data.

Suggested fix
-            data = parse_exported_json(json_path)
-            validate_envelope(data, source=json_path.name)
-            guild_info = data.get("guild", {})
-            channel_info = data.get("channel", {})
-            messages = data.get("messages", [])
+            data = parse_exported_json(json_path)
+            envelope = validate_envelope(data, source=json_path.name)
+            guild_info = envelope.guild.model_dump(by_alias=True)
+            channel_info = envelope.channel.model_dump(by_alias=True)
+            messages = envelope.messages
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@discord_activity_tracker/management/commands/run_discord_activity_tracker.py`
around lines 202 - 207, validate_envelope(...) is being used only for
side-effect validation while the code continues to use the original raw `data`,
discarding normalized/coerced values; change the flow to use the function's
returned/normalized envelope (e.g. assign the result of
validate_envelope(json_path.name) to a variable or replace `data` with that
return value) and then read `guild_info`, `channel_info`, and `messages` from
that validated object (referencing parse_exported_json, validate_envelope, and
the variables `data`, `guild_info`, `channel_info`, `messages`) so you consume
the coerced values rather than the original raw payload.
🧹 Nitpick comments (2)
core/tests/test_text_processing.py (1)

73-86: ⚡ Quick win

Add a regression test for negative max_length behavior

These tests cover the positive-path boundaries well. Add one case for negative max_length to pin expected behavior after the utility guard is introduced.

Suggested test case
+import pytest
+
 def test_truncate_content_max_length_at_most_three():
     assert truncate_content("abcdef", max_length=2) == "ab"
+
+
+def test_truncate_content_negative_max_length_raises():
+    with pytest.raises(ValueError):
+        truncate_content("abcdef", max_length=-1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/tests/test_text_processing.py` around lines 73 - 86, Add a regression
test for truncate_content when max_length is negative: create a new test (e.g.,
test_truncate_content_negative_max_length) that calls
truncate_content(some_string, max_length=-1) and asserts the expected behavior
(either returns the original string, an empty string, or clamps to 0 depending
on the implemented guard). Locate truncate_content usage in the test module
(functions like test_truncate_content_short_unchanged,
test_truncate_content_long_adds_ellipsis,
test_truncate_content_max_length_at_most_three) and add the negative-case
assertion consistent with the utility's guard.
discord_activity_tracker/tests/test_staging_schema.py (1)

25-35: ⚡ Quick win

Add a negative timestamp-format test to lock canonical Z behavior.

Current tests don’t assert that malformed/non-Z timestamps are rejected, so this contract can regress silently.

🧪 Suggested test addition
+def test_validate_normalized_rejects_non_canonical_timestamp():
+    raw = _minimal_exporter_message()
+    converted = convert_exporter_message_to_dict(raw, server_id=1, channel_id=2)
+    converted["created_at"] = "2026-01-01 12:00:00"  # not ISO Z
+
+    with pytest.raises(ValueError, match="Invalid normalized Discord message"):
+        validate_normalized_message(converted, source="unit")

Also applies to: 55-81

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@discord_activity_tracker/tests/test_staging_schema.py` around lines 25 - 35,
Add a negative test that ensures timestamps without the canonical trailing "Z"
are rejected: create a new test (e.g.,
test_validate_normalized_rejects_non_z_timestamp) that builds a message via
_minimal_exporter_message() and convert_exporter_message_to_dict(...), then
mutate converted["occurred_at"] to a non-`Z` form (e.g., replace trailing "Z"
with "+00:00" or drop the "Z"), and assert
validate_normalized_message(converted, source="test") raises the expected
validation error (use pytest.raises or the project’s ValidationError type).
Repeat the same style of negative assertion for the other related tests in the
55-81 area so the canonical `Z` contract is locked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/utils/text_processing.py`:
- Around line 249-255: The function truncate_content currently allows negative
max_length which triggers Python negative slicing; add an explicit guard at the
start of truncate_content to handle max_length < 0 (e.g., raise a ValueError
with a clear message like "max_length must be non-negative" or otherwise
normalize negative values to 0) before the existing len/content checks so the
function has deterministic behavior for negative limits; keep the existing
branches (<= max_length, <= 3, and the "..."/trimming logic) intact.

In
`@discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py`:
- Around line 83-89: validate_envelope is called but its normalized return value
is ignored, so downstream reads (guild_info/channel_info/messages) still use the
raw data and can be None (e.g. messages). Replace usage of raw data with the
validated envelope result: capture the return of validate_envelope(data,
source=rel) (e.g. validated = validate_envelope(...)) and then derive guild_info
= validated.get("guild", {}), channel_info = validated.get("channel", {}), and
messages = validated.get("messages", []) so the normalized values from
validate_envelope are used.

In `@discord_activity_tracker/schemas/discord_staging_v1.json`:
- Around line 131-135: The schema uses unresolved refs like
"#/$defs/DiscordExporterGuild" and "#/$defs/DiscordExporterChannel"; update
those $ref pointers to the actual locations where the defs live (e.g., change
"#/$defs/DiscordExporterGuild" and "#/$defs/DiscordExporterChannel" to
"#/discord_chat_exporter_envelope/$defs/DiscordExporterGuild" and
"#/discord_chat_exporter_envelope/$defs/DiscordExporterChannel"), and similarly
update refs that point into normalized_discord_message to
"#/normalized_discord_message/$defs/YourDefName" (replace YourDefName with the
exact symbols such as DiscordMessage, etc.) so all $ref values point to the
correct nested paths (use the exact symbol names DiscordExporterGuild,
DiscordExporterChannel, and the normalized_discord_message defs to locate and
fix each reference).
- Around line 233-249: The schema currently allows any non-empty string for
created_at, edited_at, and occurred_at; tighten these fields to enforce
canonical ISO-8601 UTC timestamps by replacing the loose string constraint with
a pattern that requires the full-dateTime with trailing Z (e.g., use a regex
like ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z$). For created_at and
occurred_at set type: "string" plus the pattern; for edited_at keep the anyOf
allowing null but change the string branch to require the same pattern (and
remove or keep default null as appropriate). Update the schema entries for
created_at, edited_at, and occurred_at accordingly.

In `@discord_activity_tracker/sync/chat_exporter.py`:
- Around line 649-665: The reaction count parsing can raise ValueError when
external JSON supplies a non-numeric "count"; in the loop that builds
reactions_out from msg_data (look for the for reaction in
msg_data.get("reactions", []) block) wrap the conversion of
reaction.get("count", 0) to an int in a safe guard: validate the raw value
(e.g., check isinstance(...) for int/str digits or try/except
ValueError/TypeError) and on failure fall back to 0, then use max(0,
parsed_count) when appending to reactions_out so malformed exporter payloads do
not abort message conversion.

---

Outside diff comments:
In
`@discord_activity_tracker/management/commands/run_discord_activity_tracker.py`:
- Around line 202-207: validate_envelope(...) is being used only for side-effect
validation while the code continues to use the original raw `data`, discarding
normalized/coerced values; change the flow to use the function's
returned/normalized envelope (e.g. assign the result of
validate_envelope(json_path.name) to a variable or replace `data` with that
return value) and then read `guild_info`, `channel_info`, and `messages` from
that validated object (referencing parse_exported_json, validate_envelope, and
the variables `data`, `guild_info`, `channel_info`, `messages`) so you consume
the coerced values rather than the original raw payload.

---

Nitpick comments:
In `@core/tests/test_text_processing.py`:
- Around line 73-86: Add a regression test for truncate_content when max_length
is negative: create a new test (e.g., test_truncate_content_negative_max_length)
that calls truncate_content(some_string, max_length=-1) and asserts the expected
behavior (either returns the original string, an empty string, or clamps to 0
depending on the implemented guard). Locate truncate_content usage in the test
module (functions like test_truncate_content_short_unchanged,
test_truncate_content_long_adds_ellipsis,
test_truncate_content_max_length_at_most_three) and add the negative-case
assertion consistent with the utility's guard.

In `@discord_activity_tracker/tests/test_staging_schema.py`:
- Around line 25-35: Add a negative test that ensures timestamps without the
canonical trailing "Z" are rejected: create a new test (e.g.,
test_validate_normalized_rejects_non_z_timestamp) that builds a message via
_minimal_exporter_message() and convert_exporter_message_to_dict(...), then
mutate converted["occurred_at"] to a non-`Z` form (e.g., replace trailing "Z"
with "+00:00" or drop the "Z"), and assert
validate_normalized_message(converted, source="test") raises the expected
validation error (use pytest.raises or the project’s ValidationError type).
Repeat the same style of negative assertion for the other related tests in the
55-81 area so the canonical `Z` contract is locked.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f47ecc0a-1aeb-48c9-802a-7fdd433b38db

📥 Commits

Reviewing files that changed from the base of the PR and between 43679a0 and 064181b.

⛔ Files ignored due to path filters (2)
  • requirements-dev.lock is excluded by !**/*.lock
  • requirements.lock is excluded by !**/*.lock
📒 Files selected for processing (21)
  • core/tests/test_datetime_parsing.py
  • core/tests/test_text_processing.py
  • core/utils/datetime_parsing.py
  • core/utils/text_processing.py
  • discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py
  • discord_activity_tracker/management/commands/run_discord_activity_tracker.py
  • discord_activity_tracker/schemas/discord_staging_v1.json
  • discord_activity_tracker/scripts/__init__.py
  • discord_activity_tracker/scripts/write_staging_json_schema.py
  • discord_activity_tracker/staging_schema.py
  • discord_activity_tracker/sync/chat_exporter.py
  • discord_activity_tracker/sync/messages.py
  • discord_activity_tracker/sync/utils.py
  • discord_activity_tracker/tests/test_staging_schema.py
  • discord_activity_tracker/tests/test_sync_chat_exporter.py
  • discord_activity_tracker/tests/test_sync_utils.py
  • discord_activity_tracker/tests/test_workspace.py
  • discord_activity_tracker/workspace.py
  • docs/discord-tracker-schema.md
  • github_activity_tracker/sync/utils.py
  • requirements.in

Comment thread core/utils/text_processing.py
Comment thread discord_activity_tracker/schemas/discord_staging_v1.json Outdated
Comment thread discord_activity_tracker/schemas/discord_staging_v1.json
Comment thread discord_activity_tracker/sync/chat_exporter.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@discord_activity_tracker/tests/test_staging_schema.py`:
- Around line 106-110: The test test_validate_envelope_rejects_non_list_messages
should use valid guild and channel shapes so the failure comes from messages not
being a list; update the test's call to validate_envelope to pass realistic
guild and channel dicts (matching shapes used elsewhere in tests) instead of
empty {} for "guild" and "channel" while keeping "messages": "nope" and the same
pytest.raises assertion so the error is triggered by the non-list messages.
- Around line 83-103: The test
test_validate_normalized_rejects_created_at_without_z_suffix currently builds a
full hand-crafted payload that may fail for reasons other than the timestamp;
instead, obtain a known-valid normalized message (e.g., from the existing valid
fixture or conversion helper used elsewhere in tests) and only mutate the
created_at field to "2026-01-01T00:00:00+00:00" before calling
validate_normalized_message(source="unit"), so the assertion specifically checks
the Z-suffix rejection and nothing else.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24720052-20ac-44a4-9a10-072bc9679155

📥 Commits

Reviewing files that changed from the base of the PR and between 064181b and 1efb055.

📒 Files selected for processing (11)
  • core/tests/test_datetime_parsing.py
  • core/tests/test_text_processing.py
  • core/utils/datetime_parsing.py
  • core/utils/text_processing.py
  • discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py
  • discord_activity_tracker/management/commands/run_discord_activity_tracker.py
  • discord_activity_tracker/schemas/discord_staging_v1.json
  • discord_activity_tracker/staging_schema.py
  • discord_activity_tracker/sync/chat_exporter.py
  • discord_activity_tracker/tests/test_staging_schema.py
  • discord_activity_tracker/tests/test_sync_chat_exporter.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • core/tests/test_text_processing.py
  • discord_activity_tracker/tests/test_sync_chat_exporter.py
  • discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py
  • discord_activity_tracker/schemas/discord_staging_v1.json
  • core/utils/datetime_parsing.py
  • discord_activity_tracker/management/commands/run_discord_activity_tracker.py
  • discord_activity_tracker/sync/chat_exporter.py
  • discord_activity_tracker/staging_schema.py

Comment thread discord_activity_tracker/tests/test_staging_schema.py
Comment thread discord_activity_tracker/tests/test_staging_schema.py
…for message creation and improve validation checks.
@leostar0412 leostar0412 requested a review from jonathanMLDev May 12, 2026 19:57
Comment thread discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py Outdated
Comment thread discord_activity_tracker/sync/chat_exporter.py Outdated
Comment thread core/utils/datetime_parsing.py
Comment thread discord_activity_tracker/staging_schema.py Outdated
Comment thread discord_activity_tracker/tests/test_sync_utils.py Outdated
…st_synced_at fields from DiscordChannel model. Update related services and tests to utilize message timestamps instead. Enhance datetime parsing functions for improved error handling and normalization.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@discord_activity_tracker/management/commands/run_discord_activity_tracker.py`:
- Around line 201-204: validate_envelope invocation can raise validation errors
but current generic error handling unlinks (deletes) the input file; change the
flow so validation failures do not remove the source JSON. Wrap the call to
validate_envelope (and subsequent envelope.* access) in a specific try/except
for validation errors (the exception type raised by validate_envelope) and in
that handler preserve the file (e.g., leave it in place or move it to a "failed"
folder) instead of unlinking; keep the existing generic error path for
non-validation exceptions that still perform unlinking. Refer to
validate_envelope, envelope, and json_path to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94aaa79b-d3df-43eb-81f3-e9ae929da71f

📥 Commits

Reviewing files that changed from the base of the PR and between 2165660 and af07fd7.

📒 Files selected for processing (18)
  • core/utils/datetime_parsing.py
  • discord_activity_tracker/admin.py
  • discord_activity_tracker/management/commands/backfill_discord_activity_tracker.py
  • discord_activity_tracker/management/commands/run_discord_activity_tracker.py
  • discord_activity_tracker/migrations/0006_remove_channel_last_timestamps.py
  • discord_activity_tracker/models.py
  • discord_activity_tracker/services.py
  • discord_activity_tracker/staging_schema.py
  • discord_activity_tracker/sync/chat_exporter.py
  • discord_activity_tracker/sync/export.py
  • discord_activity_tracker/sync/messages.py
  • discord_activity_tracker/tests/test_messages_more.py
  • discord_activity_tracker/tests/test_services_core.py
  • discord_activity_tracker/tests/test_services_extras.py
  • discord_activity_tracker/tests/test_sync_messages.py
  • discord_activity_tracker/tests/test_sync_utils.py
  • docs/Schema.md
  • docs/service_api/discord_activity_tracker.md
💤 Files with no reviewable changes (4)
  • discord_activity_tracker/admin.py
  • discord_activity_tracker/models.py
  • discord_activity_tracker/tests/test_services_extras.py
  • docs/Schema.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • discord_activity_tracker/sync/chat_exporter.py
  • discord_activity_tracker/staging_schema.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
discord_activity_tracker/tests/test_run_discord_activity_tracker_command.py (1)

44-64: ⚡ Quick win

Add a command-level test for file retention on validation failures.

These tests only lock down the helper’s current string matching. The user-visible contract lives in task_discord_sync: staging validation failures should keep the JSON file, while unrelated failures should still delete it. A small command-level test for those two paths would catch regressions that this unit coverage won’t.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@discord_activity_tracker/tests/test_run_discord_activity_tracker_command.py`
around lines 44 - 64, Add a command-level test in
tests/test_run_discord_activity_tracker_command.py that exercises the
task_discord_sync command path (the user-visible contract) to assert that when a
staging validation failure occurs (use
_is_discord_staging_validation_error-friendly input or mock task_discord_sync to
raise that specific ValueError) the original JSON file is preserved on disk, and
that when an unrelated failure occurs the file is deleted; target the task entry
point (task_discord_sync or the command runner used by the tests) and use
temporary files/fixtures to create a JSON file, invoke the command, then assert
file existence for the validation-error case and file absence for the other
error case so regressions of the retention behaviour are caught.
discord_activity_tracker/management/commands/run_discord_activity_tracker.py (1)

376-382: The validation call doesn't cause data loss, but could be clearer.

validate_normalized_message(...) is called for error-checking side effects and its return value (a NormalizedDiscordMessage model) is discarded. The function does transform blank timestamp strings to None, but _prepare_message_data calls parse_iso_datetime_lenient() which already handles both empty strings and None gracefully, so no data is lost.

However, the intent here is unclear: either wire the validated model into converted for consistency, or make the validation call's purpose explicit with a comment (validate early to catch bad data).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@discord_activity_tracker/management/commands/run_discord_activity_tracker.py`
around lines 376 - 382, The call to validate_normalized_message(...) currently
discards its return value which makes intent unclear; either replace each
converted entry with the validated NormalizedDiscordMessage returned by
validate_normalized_message (i.e., assign converted[idx] =
validate_normalized_message(...)) so downstream uses the canonical model, or if
you only want early error checking, add a short comment above the loop stating
"validate only for early error checking; do not replace converted entries" and
keep the call for side-effects; reference validate_normalized_message,
convert_exporter_message_to_dict, _prepare_message_data,
parse_iso_datetime_lenient, converted, and _process_messages_in_batches when
making the change so reviewers can find the logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@discord_activity_tracker/management/commands/run_discord_activity_tracker.py`:
- Around line 60-67: Replace the brittle string-based check in
_is_discord_staging_validation_error with a concrete typed exception: add a new
StagingValidationError class in staging_schema.py and have validate_envelope()
and validate_normalized_message() raise StagingValidationError on validation
failures, then change the caller that currently uses
_is_discord_staging_validation_error (the delete-guard logic around the Discord
staging processing) to catch StagingValidationError directly instead of matching
ValueError text; remove or simplify _is_discord_staging_validation_error to use
isinstance(exc, staging_schema.StagingValidationError) (or inline the isinstance
check where the guard runs) so malformed staging payloads are reliably
preserved.

---

Nitpick comments:
In
`@discord_activity_tracker/management/commands/run_discord_activity_tracker.py`:
- Around line 376-382: The call to validate_normalized_message(...) currently
discards its return value which makes intent unclear; either replace each
converted entry with the validated NormalizedDiscordMessage returned by
validate_normalized_message (i.e., assign converted[idx] =
validate_normalized_message(...)) so downstream uses the canonical model, or if
you only want early error checking, add a short comment above the loop stating
"validate only for early error checking; do not replace converted entries" and
keep the call for side-effects; reference validate_normalized_message,
convert_exporter_message_to_dict, _prepare_message_data,
parse_iso_datetime_lenient, converted, and _process_messages_in_batches when
making the change so reviewers can find the logic.

In `@discord_activity_tracker/tests/test_run_discord_activity_tracker_command.py`:
- Around line 44-64: Add a command-level test in
tests/test_run_discord_activity_tracker_command.py that exercises the
task_discord_sync command path (the user-visible contract) to assert that when a
staging validation failure occurs (use
_is_discord_staging_validation_error-friendly input or mock task_discord_sync to
raise that specific ValueError) the original JSON file is preserved on disk, and
that when an unrelated failure occurs the file is deleted; target the task entry
point (task_discord_sync or the command runner used by the tests) and use
temporary files/fixtures to create a JSON file, invoke the command, then assert
file existence for the validation-error case and file absence for the other
error case so regressions of the retention behaviour are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eabbdbf5-b74a-454a-8bc7-87497d139ad3

📥 Commits

Reviewing files that changed from the base of the PR and between af07fd7 and b82b25f.

📒 Files selected for processing (2)
  • discord_activity_tracker/management/commands/run_discord_activity_tracker.py
  • discord_activity_tracker/tests/test_run_discord_activity_tracker_command.py

Comment thread discord_activity_tracker/management/commands/run_discord_activity_tracker.py Outdated
…idationError for validation failures in envelope and normalized message processing.
@leostar0412 leostar0412 requested a review from jonathanMLDev May 13, 2026 16:38
…ent tests and updating docstring references for clarity.
@leostar0412 leostar0412 requested a review from wpak-ai May 13, 2026 17:00
@wpak-ai wpak-ai merged commit 4027f51 into cppalliance:develop May 13, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discord Activity Tracker JSON Design Alignment

3 participants