diff --git a/docs/designs/data-details-center.zh.html b/docs/designs/data-details-center.zh.html new file mode 100644 index 0000000..4ab1067 --- /dev/null +++ b/docs/designs/data-details-center.zh.html @@ -0,0 +1,973 @@ + + + + + + + 数据明细中心开发设计 + + + +
+
+
+

Keystone / Synapse Development Design

+

数据明细中心

+

在数据运维下新增统一的 episode 明细工作台,承接原质检中心、云同步中心和数据生产统计明细卡片的“查看、筛选、单条操作”需求。第一版只做分页明细、筛选、单条质检/同步操作和历史抽屉,不做批量、不做顶部统计卡、不做导出。

+
+
+ 设计状态 +
已确认方案,待实现
+
创建时间:2026-06-06
+
影响范围:keystone、synapse
+
+
+ + + +
+

背景

+

质检中心、云同步中心和数据生产统计中的明细记录卡片都在解决同一类需求:查看和筛选 episode 级别的数据记录。三处各自维护明细列表会造成字段、筛选、分页、操作栏和状态语义重复。

+

本设计将 episode 明细查看能力收敛到一个新页面:数据明细。质检和同步不再作为独立列表页面存在,而是成为数据明细中的筛选维度、状态列和行级操作。

+
+ 设计边界:本轮不是把所有统计分析、质检处理和同步队列监控塞进一个大页面。数据明细 只负责 episode 明细查询和单条操作;统计页继续负责汇总、趋势和分布。 +
+
+ +
+

目标与非目标

+
+
+

目标

+
    +
  • 新增 Synapse 管理后台 数据运维 / 数据明细 页面。
  • +
  • 默认展示全部 episode,按 created_at DESC, id DESC 分页。
  • +
  • 支持 QA 状态、云同步状态、时间、场景、设备类型、设备 ID 和数采员筛选。
  • +
  • 表格展示 episode 基础信息、最近 QA、云同步状态和创建时间。
  • +
  • 行级支持详情、预览、下载、重新质检、同步/重新同步/重试、QA 历史和同步历史。
  • +
  • 新增统一列表 API:GET /api/v1/data-ops/episodes
  • +
  • 删除前端 质检中心云同步中心 页面入口。
  • +
  • 删除数据生产统计中的明细记录卡片。
  • +
+
+
+

非目标

+
    +
  • 第一版不做批量质检。
  • +
  • 第一版不做批量同步。
  • +
  • 第一版不做顶部统计卡或状态汇总卡。
  • +
  • 第一版不做 CSV/筛选结果导出。
  • +
  • 第一版不一步到位清理全部旧后端 API。
  • +
  • 第一版不支持 data_collector、display 或其他非 admin 角色。
  • +
  • 第一版不在主列表展示 MCAP、sidecar、cloud destination 等长路径字段。
  • +
+
+
+
+ + + +
+

筛选设计

+

筛选交互参考 DataProductionStatistics.vue:基础筛选 + 高级筛选 + RemoteSelect 多选。用户看到名称,API 传稳定 ID 或业务标识。

+ +

基础筛选

+
+
+ 时间范围 +

复用统计页的时间 presets 和自定义时间输入。数据明细默认不限最近 7 天,仍按分页加载全部数据。

+
+
+ 查询 / 重置 +

查询时重置到第一页;重置清空筛选并回到默认排序。

+
+
+ +

高级筛选

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
筛选项UIAPI 参数规则
场景RemoteSelect 多选scene_id=12,13下拉显示场景名称,传场景 ID。
设备类型RemoteSelect 多选robot_type_id=3,5下拉显示设备类型名称/型号,传 robot_type ID。
设备 IDRemoteSelect 多选robot_device_id=robot-001,robot-002精确匹配 robots.device_id
数采员RemoteSelect 多选collector_operator_id=op001,op002精确匹配 data_collectors.operator_id
QA 状态多选qa_status=failed,pending_qa直接复用 episodes.qa_status,支持逗号多选。
云同步状态多选sync_status=not_started,failed复用当前同步状态,并补充 not_started
+ +

QA 状态筛选

+
+ 全部 + 待质检 pending_qa + 质检中 qa_running + 失败 failed + 待检查 needs_inspection + 已通过 approved,inspector_approved + 已拒绝 rejected +
+ +

云同步状态筛选

+
+ 全部 + 未同步 not_started + 已入队 pending + 同步中 in_progress + 已同步 completed + 失败 failed +
+
+ +
+

表格与行级操作

+

表格列

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
内容说明
Episodeepisode_id + numeric id主标识,点击详情。
任务task_public_id / task_id展示任务业务 ID,缺失时回退 numeric ID。
场景scene_name第一版不强制展示 subscene。
机器人robot_type + robot_device_id设备类型和设备 ID 上下两行展示。
采集员collector_operator_id展示工号。
QA 状态qa_status + quality_flag 摘要状态 badge + 一行质量说明。
最近质检latest_qa_check展示检查项、通过/失败、时间和摘要。
云同步sync_status + latest_sync_log状态沿用云同步中心,缺少日志时为 not_started
创建时间created_at默认排序字段。
操作固定在最右侧sticky action column,和其他管理表格一致。
+ +

不进入主列表的字段

+
+ mcap_path + sidecar_path + checksum + cloud_mcap_path + cloud_sidecar_path + destination_path +
+ +

操作列

+

直接展示 详情预览更多。其他操作放入更多菜单,避免操作列过宽。

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
操作位置规则
详情直接按钮跳转现有 EpisodeDetail
预览直接按钮qa_status=failed 时禁用,后端仍为最终门禁。
下载 MCAP更多菜单qa_status=failed 时禁用。
下载 JSON更多菜单不受 MCAP QA 失败影响。
重新质检更多菜单qa_status=qa_running 时禁用;复用现有 QA suite API。
同步 / 重试 / 重新同步更多菜单sync_statuscloud_synced 选择文案;QA 未通过时禁用。
QA 历史更多菜单当前页右侧抽屉展示。
同步历史更多菜单当前页右侧抽屉展示。
+
+ +
+

后端 API

+

第一版新增统一列表 API;单条 QA 和同步操作暂时复用现有 API,不一步到位迁移到 data-ops 命名空间。

+ +

列表接口

+
GET /api/v1/data-ops/episodes
+  ?limit=20
+  &offset=0
+  &created_at_from=2026-06-01T00:00:00Z
+  &created_at_to=2026-06-06T23:59:59Z
+  &qa_status=failed,pending_qa
+  &sync_status=not_started,failed
+  &scene_id=12,13
+  &robot_type_id=3,5
+  &robot_device_id=robot-001,robot-002
+  &collector_operator_id=op001,op002
+ +

响应示例

+
{
+  "items": [
+    {
+      "id": 123,
+      "episode_id": "ep_20260606_001",
+      "task_id": 88,
+      "task_public_id": "task_20260606_001",
+      "scene_name": "pick",
+      "robot_type_id": 3,
+      "robot_type": "arm_bot",
+      "robot_device_id": "robot-001",
+      "collector_operator_id": "op001",
+      "qa_status": "failed",
+      "quality_flag": "MCAP integrity check failed: tail magic mismatch",
+      "latest_qa_check": {
+        "check_name": "mcap_magic",
+        "passed": false,
+        "details": "MCAP integrity check failed: tail magic mismatch",
+        "checked_at": "2026-06-06T09:30:00Z"
+      },
+      "sync_status": "not_started",
+      "latest_sync_log": null,
+      "cloud_synced": false,
+      "duration_sec": 30.5,
+      "file_size_bytes": 123456789,
+      "labels": ["recalled_batch"],
+      "created_at": "2026-06-06T09:20:00Z"
+    }
+  ],
+  "total": 1234,
+  "limit": 20,
+  "offset": 0,
+  "hasNext": true,
+  "hasPrev": false
+}
+ +

复用的现有单条能力

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
能力接口说明
重新质检POST /api/v1/qa/episodes/:id/run运行完整 QA suite。
QA 历史GET /api/v1/qa/episodes/:id/checks右侧抽屉展示。
同步POST /api/v1/sync/episodes/:id未同步或失败重试时使用。
重新同步POST /api/v1/sync/episodes/:id/resync已同步时使用。
同步历史GET /api/v1/sync/episodes/:id/logs右侧抽屉展示。
同步状态GET /api/v1/sync/episodes/:id/statusEpisodeDetail 可继续使用;数据明细列表由统一 API 返回。
+
+ +
+

查询策略与数据库压力控制

+

统一 API 不能写成一个巨型 join。后端应采用分页主查询 + 当前页附加状态查询的方式,把 QA 和同步查询限制在当前页 episode IDs 内。

+
+
+ 1 + 查当前页 +

episodes 出发按筛选和排序查出 20/50 条 episode。

+
+
+ 2 + 取 IDs +

提取当前页 episode IDs,后续查询只围绕这些 ID。

+
+
+ 3 + 查最近 QA +

episode_id IN (...) 查询每个 episode 最新 QA 记录。

+
+
+ 4 + 查最近同步 +

episode_id IN (...) 查询每个 episode 最新 sync log。

+
+
+ 5 + 内存合并 +

在 Go 侧组装 latest_qa_checksync_statuslatest_sync_log

+
+
+ +

同步状态映射

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
条件sync_status展示
无 sync lognot_started未同步
latest sync log = pendingpending已入队
latest sync log = in_progressin_progress同步中
latest sync log = completedcompleted已同步
latest sync log = failedfailed失败
+ +

索引建议

+ +
+ +
+

前端实现

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
文件/模块动作说明
src/views/admin/data-ops/DataDetails.vue新增不要把 QACenter.vue 改名硬扩展;新页面从 episode 综合视角实现。
src/api/dataOps.js新增封装 GET /data-ops/episodes
src/components/layout/AdminSidebar.vue调整数据运维下只保留 数据明细
src/router/index.js调整新增 /admin/data-details,删除 qa-centercloud-sync 路由。
src/views/admin/qa/QACenter.vue删除页面不再保留。
src/views/admin/sync/CloudSyncCenter.vue删除页面不再保留。
src/api/qa.js保留DataDetails 和 EpisodeDetail 仍复用单条 QA 操作/历史。
src/api/sync.js保留DataDetails 和 EpisodeDetail 仍复用单条同步操作/历史。
+ +

页面交互

+ +
+ +
+

数据生产统计调整

+

数据生产统计 后续只回答汇总、趋势、分布和统计导出,不再展示具体 episode 明细。

+
+
+ 删除 +

明细记录卡片、明细表格、明细分页、明细 API 调用,以及只服务明细表格的状态和 formatter。

+
+
+ 保留 +

顶部筛选器、汇总指标、趋势图、维度分布和统计导出能力。

+
+
+
+ 原则:统计页回答“数量、趋势、分布”;数据明细页回答“具体有哪些 episode,以及对单条 episode 做什么操作”。 +
+
+ +
+

测试计划

+
+
+

Keystone

+
    +
  • GET /api/v1/data-ops/episodes 默认分页返回全部 episode。
  • +
  • 支持 qa_status 多选筛选。
  • +
  • 支持 sync_status 多选筛选,包含 not_started
  • +
  • 支持时间范围、scene_id、robot_type_id、robot_device_id 和 collector_operator_id 筛选。
  • +
  • 列表响应包含当前页 latest QA 和 latest sync。
  • +
  • 无 sync log 的 episode 返回 sync_status=not_started
  • +
  • 复杂筛选下 total、hasNext、hasPrev 正确。
  • +
+
+
+

Synapse

+
    +
  • 数据运维菜单只展示 数据明细
  • +
  • /admin/data-details 页面可分页加载。
  • +
  • 筛选 UI 参考数据生产统计页,可查询和重置。
  • +
  • 操作列固定在最右侧。
  • +
  • 详情、预览、下载、重新质检、同步/重试/重新同步按状态启用/禁用。
  • +
  • QA 历史和同步历史抽屉可打开并展示数据。
  • +
  • 数据生产统计不再展示明细记录卡片。
  • +
+
+
+
+ +
+

已确认决策

+ +
+ + +
+ + diff --git a/docs/designs/data-ops-bulk-actions-api.zh.html b/docs/designs/data-ops-bulk-actions-api.zh.html new file mode 100644 index 0000000..92ea960 --- /dev/null +++ b/docs/designs/data-ops-bulk-actions-api.zh.html @@ -0,0 +1,846 @@ + + + + + + + 数据明细批量质检与云同步 API 设计 + + + +
+
+
+

Data Ops Bulk Actions

+

数据明细批量质检与云同步 API 设计

+

+ 为数据明细页面提供“按当前筛选结果批量质检”和“按当前筛选结果批量云同步”的后端能力。第一版提供确认前预览和轻量级异步执行 API,不做持久化批任务、进度查询或取消。 +

+
+
+ 实现目标 +
预览 API 先行
+
固定筛选快照
+
后台 goroutine 异步处理
+
无硬上限保护
+
+
+ + + +
+

1. 范围

+
+
+

本次实现

+
    +
  • /api/v1/data-ops 下新增两个 admin-only 预览接口和两个执行接口。
  • +
  • 请求体只接收逗号字符串形式的筛选条件。
  • +
  • 预览接口只计算命中和预计可操作统计,不启动后台任务。
  • +
  • 请求内解析筛选条件并查询固定 episode ID 快照。
  • +
  • 返回 202 Accepted 后由后台 goroutine 异步处理。
  • +
  • 补 Swagger 注释和后端 helper 单测。
  • +
+
+
+

明确不做

+
    +
  • 不做前端页面改造。
  • +
  • 不新增批任务表、批次 ID、进度查询或取消接口。
  • +
  • 不返回最终成功、跳过、失败统计。
  • +
  • 不设置批量命中条数硬上限;逗号筛选值数量也不设置代码层数量上限。
  • +
  • 批量同步不做已同步数据的重新同步。
  • +
+
+
+
+ +
+

2. 已确认决策

+
+
+ 异步模型 + 两个执行接口均返回 202 Accepted,后台继续处理,不等待整批完成。 +
+
+ 作用范围 + 作用于当前筛选命中的全部 episode,忽略分页参数。 +
+
+ 筛选快照 + 前端只发送筛选条件;后端请求内查出固定 ID 快照,再把 ID 列表交给后台 goroutine。 +
+
+ 确认前预览 + 执行前先调用预览接口,返回命中数、预计可操作数和跳过原因汇总;预览不要求 confirm,也不启动后台任务。 +
+
+ 确认保护 + 两个执行接口都必须带 "confirm": true,否则返回 400。预览接口不需要 confirm +
+
+ 空筛选 + { "filters": {} } 合法,表示全部未删除 episodes。 +
+
+ 无硬上限 + 第一版不限制命中数量,也不限制逗号筛选值数量;仍不承诺完整批次可靠性。 +
+
+ matched_count + 只表示筛选命中总数,不表示实际可操作数量。 +
+
+ QA 并发 + 批量 QA 后台固定并发 4,每条调用现有手动质检逻辑。 +
+
+ 同步语义 + 批量同步只做普通同步或手动重试,不做批量 resync。 +
+
+ 可用性检查 + sync worker 未配置或未运行时,批量同步立即返回 503 +
+
+
+ +
+

3. API 规格

+
+

3.1 批量预览

+

POST /api/v1/data-ops/episodes/bulk-qa/preview

+

POST /api/v1/data-ops/episodes/bulk-sync/preview

+
{
+  "filters": {
+    "created_at_from": "2026-06-01T00:00:00Z",
+    "created_at_to": "2026-06-08T00:00:00Z",
+    "qa_status": "failed,pending_qa",
+    "sync_status": "not_started,failed",
+    "scene_id": "1,2",
+    "sop_id": "9,10",
+    "robot_type_id": "3",
+    "robot_device_id": "robot-001,robot-002",
+    "collector_operator_id": "op001"
+  }
+}
+
{
+  "status": "preview",
+  "action": "bulk_qa",
+  "matched_count": 123,
+  "eligible_count": 121,
+  "skipped_count": 2,
+  "protected_status_count": 8,
+  "skipped_breakdown": [
+    { "reason": "qa_running", "count": 2 }
+  ],
+  "warnings": [
+    "8 episodes are in protected manual QA statuses; checks can run but status will not be overwritten"
+  ]
+}
+
{
+  "status": "preview",
+  "action": "bulk_sync",
+  "matched_count": 42,
+  "eligible_count": 17,
+  "skipped_count": 25,
+  "sync_worker_running": true,
+  "skipped_breakdown": [
+    { "reason": "qa_not_approved", "count": 10 },
+    { "reason": "already_synced", "count": 8 },
+    { "reason": "sync_active", "count": 7 }
+  ],
+  "warnings": []
+}
+
+

预览结果是确认前估算,不锁定批次。执行接口仍会按同一筛选条件重新查询固定 ID 快照,因此预览后如果筛选条件改变,前端必须让预览失效并要求重新预览。

+
+ +

3.2 批量质检执行

+

POST /api/v1/data-ops/episodes/bulk-qa

+
{
+  "confirm": true,
+  "filters": {
+    "created_at_from": "2026-06-01T00:00:00Z",
+    "created_at_to": "2026-06-08T00:00:00Z",
+    "qa_status": "failed,pending_qa",
+    "sync_status": "not_started,failed",
+    "scene_id": "1,2",
+    "sop_id": "9,10",
+    "robot_type_id": "3",
+    "robot_device_id": "robot-001,robot-002",
+    "collector_operator_id": "op001"
+  }
+}
+
{
+  "status": "accepted",
+  "matched_count": 123,
+  "message": "123 episodes accepted for bulk QA"
+}
+ +

3.3 批量云同步执行

+

POST /api/v1/data-ops/episodes/bulk-sync

+
{
+  "confirm": true,
+  "filters": {
+    "qa_status": "approved,inspector_approved",
+    "sync_status": "not_started,failed"
+  }
+}
+
{
+  "status": "accepted",
+  "matched_count": 42,
+  "message": "42 episodes accepted for bulk cloud sync"
+}
+
+ +
+

3.4 错误响应

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
场景HTTP响应
缺少 confirm: true400{ "error": "confirm must be true" }
筛选时间格式错误、状态非法、ID 非法400{ "error": "..." }
数据库未配置503{ "error": "database is not configured" }
批量 QA 依赖未配置503{ "error": "qa service is not configured" }
sync worker 未配置或未运行503{ "error": "sync worker is not running" }
查询 ID 快照失败500{ "error": "failed to select data operation episodes" }
+
+
+ +
+

4. 筛选语义

+
+

预览接口和执行接口共用数据明细页面的完整筛选字段。第一版只支持逗号字符串,不支持数组。

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
字段类型说明
created_at_fromRFC3339 字符串episode 创建时间下界。
created_at_toRFC3339 字符串episode 创建时间上界;必须晚于 created_at_from
qa_status逗号字符串支持 pending_qaqa_runningapprovedneeds_inspectioninspector_approvedrejectedfailed
sync_status逗号字符串支持 not_startedpendingin_progresscompletedfailed
scene_id逗号字符串场景 ID 列表。
sop_id逗号字符串SOP ID 列表,沿用 episode SOP 到 task SOP 的 fallback 逻辑。
robot_type_id逗号字符串设备类型 ID 列表。
robot_device_id逗号字符串设备 ID 列表。
collector_operator_id逗号字符串数采员工号列表。
+
+

分页字段 limitoffset 不属于批量请求语义。即使前端误传,后端也应忽略。预览和执行都作用于完整筛选结果。

+
+
+
+ +
+

5. 执行模型

+
+
+

预览路径

+
    +
  1. 校验数据库可用性;批量同步预览额外返回 sync worker 当前运行状态。
  2. +
  3. 解析 JSON 请求体;预览不要求 confirm
  4. +
  5. filters 转换为 data-ops query。
  6. +
  7. 复用数据明细筛选 SQL 计算 matched_count
  8. +
  9. 按动作规则计算 eligible_countskipped_countskipped_breakdown
  10. +
  11. HTTP 返回 200 OK,不查询 ID 快照,不启动后台 goroutine。
  12. +
+
+
+

执行请求路径

+
    +
  1. 校验数据库、QA handler 或 sync worker 可用性。
  2. +
  3. 解析 JSON 请求体,要求 confirm: true
  4. +
  5. filters 转换为 data-ops query。
  6. +
  7. 使用同一套 dataOpsEpisodeBaseFromSQLbuildDataOpsEpisodeWhere 查询 ID 快照。
  8. +
  9. e.created_at DESC, e.id DESC 固定顺序返回 ID。
  10. +
  11. 启动后台 goroutine,HTTP 返回 202matched_count
  12. +
+
+
+ +
+

后台处理

+
    +
  1. 后台不使用 HTTP request context,改用 context.Background()
  2. +
  3. 单条 QA 使用与自动 QA 相同的超时策略。
  4. +
  5. 批量 QA 通过 worker pool 固定并发 4
  6. +
  7. 批量同步逐条调用 EnqueueEpisodeManual
  8. +
  9. 单条失败或跳过不影响整批继续处理。
  10. +
  11. 开始和结束打印 summary 日志,异常错误逐条打印。
  12. +
+
+ +
+
+

QA 预览口径

+
    +
  • eligible_count:命中项中非 qa_running 的数量。
  • +
  • skipped_breakdown.qa_running:正在质检,执行时会跳过。
  • +
  • protected_status_countneeds_inspectioninspector_approvedrejected 数量;这些会执行检查记录,但不会覆盖人工决策状态。
  • +
+
+
+

同步预览口径

+
    +
  • eligible_count:QA 已通过、未 cloud synced、且最新同步不在 pending/in_progress/completed 的数量。
  • +
  • failed 最新同步记录视为可手动重试。
  • +
  • sync_worker_running 为 false 时仍可预览,但执行接口会返回 503
  • +
+
+
+ +
+
+ QA 状态处理 +

不预先把所有命中项改成 qa_running。每条真正开始执行时,由现有 RunEpisodeQASuite claim 并更新状态。

+
+
+ 人工状态保护 +

needs_inspectioninspector_approvedrejected 继承现有手动质检保护规则,不覆盖人工决策。

+
+
+ 同步跳过规则 +

已同步、QA 未通过、正在同步等情况由现有 sync worker 逻辑拒绝或跳过,批量 goroutine 只计数和记录异常。

+
+
+ +
+

第一版没有持久化批任务。服务重启后,未处理完的批量 QA 会丢失;批量同步中已经写入 sync_logs 的任务可由 worker 恢复,未写入的不会恢复。

+
+
+ +
+

6. 后端改造建议

+
+

结构调整

+
    +
  • DataOpsHandler 增加 QA handler 和 sync worker 依赖。
  • +
  • NewDataOpsHandler 接收新增依赖;server 初始化处传入现有 qaHandlersyncWorker
  • +
  • RegisterRoutes 新增 POST /episodes/bulk-qa/previewPOST /episodes/bulk-sync/previewPOST /episodes/bulk-qaPOST /episodes/bulk-sync
  • +
  • 新增 request/response struct:DataOpsBulkEpisodeActionRequestDataOpsBulkEpisodePreviewResponseDataOpsBulkEpisodeActionResponse
  • +
  • 新增 helper:解析 body filters、计算预览统计、查询 ID 快照、启动 QA worker pool、启动 sync enqueue goroutine。
  • +
+ +

伪代码

+
func (h *DataOpsHandler) PreviewBulkEpisodeQA(c *gin.Context) {
+  q, ok := h.parseBulkEpisodeFilters(c)
+  preview := h.previewBulkEpisodeQA(c.Request.Context(), q)
+  c.JSON(http.StatusOK, preview)
+}
+
+func (h *DataOpsHandler) PreviewBulkSyncEpisodes(c *gin.Context) {
+  q, ok := h.parseBulkEpisodeFilters(c)
+  preview := h.previewBulkEpisodeSync(c.Request.Context(), q)
+  c.JSON(http.StatusOK, preview)
+}
+
+func (h *DataOpsHandler) BulkRunEpisodeQA(c *gin.Context) {
+  req, q, ok := h.parseBulkEpisodeAction(c)
+  ids := h.selectBulkEpisodeIDs(c.Request.Context(), q)
+  go h.runBulkEpisodeQA(ids)
+  c.JSON(http.StatusAccepted, response("accepted", len(ids), "..."))
+}
+
+func (h *DataOpsHandler) BulkSyncEpisodes(c *gin.Context) {
+  if h.syncWorker == nil || !h.syncWorker.IsRunning() { ...503... }
+  req, q, ok := h.parseBulkEpisodeAction(c)
+  ids := h.selectBulkEpisodeIDs(c.Request.Context(), q)
+  go h.runBulkEpisodeSync(ids)
+  c.JSON(http.StatusAccepted, response("accepted", len(ids), "..."))
+}
+
+
+ +
+

7. 日志策略

+
+ + + + + + + + + + + + + + + + + + + + + + + + + +
时机日志内容
接受批量请求[DATA_OPS] Bulk QA accepted: matched=123
批量完成[DATA_OPS] Bulk QA completed: matched=123, attempted=120, skipped=2, failed=1
单条异常失败逐条打印 episode ID 和错误。
正常跳过不逐条打印,只计数,避免大量日志。
+
+
+ +
+

8. 测试计划

+
+

必须新增的 helper 单测

+
    +
  • 预览请求不要求 confirm,执行请求要求 confirm: true
  • +
  • 解析批量请求时,confirm 缺失或为 false 会失败。
  • +
  • 批量请求 filters 只接受逗号字符串;逗号值数量不做代码层上限,但仍校验单值格式。
  • +
  • 批量解析忽略 limitoffset
  • +
  • QA 预览统计能计算 matched_counteligible_countskipped_countprotected_status_count
  • +
  • 同步预览统计能计算 QA 未通过、已同步、正在同步等跳过原因汇总。
  • +
  • ID 快照 SQL 复用 data-ops from/where,并按 e.created_at DESC, e.id DESC 排序。
  • +
  • 空 filters 合法,会生成只包含 e.deleted_at IS NULL 的 where。
  • +
  • 非法 qa_statussync_status、ID 列表和时间范围仍返回解析错误。
  • +
+ +

可选集成测试

+
    +
  • 使用 fake QA runner 验证 bulk QA goroutine 能按 ID 调用。
  • +
  • 使用 fake sync enqueuer 验证 worker 未运行时返回 503
  • +
  • 路由注册 smoke test,确保两个 POST endpoint 挂在 data-ops admin group 下。
  • +
+
+
+ +
+

9. 实现清单

+
+
    +
  • 修改 keystone/internal/api/handlers/data_ops.go:新增 request/response、routes、handlers、helpers。
  • +
  • 修改 keystone/internal/server/server.go:构造 DataOpsHandler 时传入 QA handler 和 sync worker。
  • +
  • 必要时给 sync worker 暴露轻量 enqueuer 接口,降低 handler 对具体类型的耦合。
  • +
  • 补 Swagger 注释后运行 swag init -g internal/server/server.go -o docs
  • +
  • keystone/internal/api/handlers/data_ops_test.go 测试。
  • +
  • 运行 go test ./internal/api/handlers/... -v,最后按需要运行更大范围测试。
  • +
+
+
+ + +
+ + diff --git a/docs/designs/episode-qa-checks-mcap-integrity.zh.html b/docs/designs/episode-qa-checks-mcap-integrity.zh.html new file mode 100644 index 0000000..498953b --- /dev/null +++ b/docs/designs/episode-qa-checks-mcap-integrity.zh.html @@ -0,0 +1,880 @@ + + + + + + + 数据运维质检中心与 Episode 自动质检开发设计 + + + +
+
+
+

Keystone / Synapse Development Design

+

数据运维质检中心与 Episode 自动质检

+

在管理后台“数据运维”板块增加质检中心,作为运营处理异常 episode 的工作台。第一版提供轻量 MCAP 完整性检查、episode 创建后自动质检、手动重新质检和统一质检历史;后续再扩展 robot_type 维度的 Go/Python 脚本质检配置。

+
+
+ 设计状态 +
MVP 已实现,后续扩展待规划
+
更新时间:2026-06-05
+
影响范围:keystone、synapse
+
+
+ + + +
+

背景

+

部分异常会导致上传后的 MCAP 无法播放,例如播放器初始化时报错:Expected MCAP magic '89 4d 43 41 50 30 0d 0a', found ...。这类错误通常在预览加载早期暴露,不需要等待全量数据解析。

+

上一版已经验证了轻量 MCAP 头尾 magic 检查的价值。下一版需要把单个详情页按钮升级为后台“质检中心”:运营人员能集中看到待处理 episode、重新触发完整质检 suite、查看最近一次检查结果和历史证据;episode 创建后也应自动进入质检流程。

+
+ 边界说明:头尾 magic 校验只能证明 MCAP 文件边界基本正确,不能证明文件一定可播放。内部 chunk、schema、压缩数据、CRC 或索引仍可能损坏。该检查用于快速拦截明显坏包,后续可以通过更多检查项补足。 +
+
+ +
+

目标与非目标

+
+
+

目标

+
    +
  • 在 Synapse 管理后台“数据运维”板块增加 质检中心 页面。
  • +
  • 质检中心第一版是运营工作台,默认展示 pending_qafailedneeds_inspection 等可处理 episode。
  • +
  • episode 创建后由 Keystone 异步触发自动质检。
  • +
  • 前端手动入口统一触发完整质检 suite,而不是触发单个检查项。
  • +
  • 第一版默认质检 suite 固定为 ['mcap_magic']
  • +
  • 所有检查项都写入 qa_checks,失败时写入 quality_flag 并将 qa_status 置为 failed
  • +
  • 完整 suite 全部通过后,允许自动将可自动流转的 episode 置为 approved
  • +
  • qa_status=failed 继续阻止 MCAP 预览、MCAP 下载和云同步。
  • +
+
+
+

非目标

+
    +
  • 第一版不支持批量质检或批量同步。
  • +
  • 第一版不引入持久化 QA job/queue 表。
  • +
  • 第一版不做 robot_type 质检配置 UI。
  • +
  • 第一版不执行 Python 脚本。
  • +
  • 不新增 episode 状态字段,继续复用现有 qa_statusquality_flagqa_checks
  • +
  • 不把 POST /api/v1/episodes/:id/qa-checks 作为新的 UI 主接口;UI 改走完整 suite 接口。
  • +
+
+
+
+ +
+

数据模型复用

+

本设计不新增字段。Episode 当前质量状态继续放在 episodes.qa_status,运营可见质量摘要继续放在 episodes.quality_flag,每次检查证据继续沉淀到 qa_checks

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
字段/表用途规则
episodes.qa_statusepisode 当前对外质量状态和门禁依据自动/手动质检运行时置为 qa_running;任一检查失败置为 failed;全部通过时按状态保护规则置为 approved 或保持人工状态
episodes.quality_flag面向研究员和运营人员的质量说明失败时写入失败摘要,例如头尾 magic 不匹配;通过时可清空由自动质检写入的失败摘要
qa_checks记录每个检查项的历史和证据每次 suite 运行内的每个检查项都插入记录,不覆盖旧记录
qa_checks.check_name检查项标识第一版固定为 mcap_magic,未来可出现 topic_requiredduration_rangepython:<script_name>
qa_checks.check_metadata结构化检查详情记录 expected/head/tail/file_size 等数据,便于质检中心抽屉展示和问题排查
+ +

qa_checks.passedqa_status 的区别

+
+
+ qa_checks.passed +

单次、单项检查的事实记录。它回答“这一次 mcap_magic 是否通过”。

+
+
+ qa_status +

episode 当前对外状态。它回答“这个 episode 现在能否预览、下载、同步、进入后续流程”。

+
+
+ 为什么都需要 +

历史检查可能多次通过/失败,但门禁只看 episode 当前状态;两者分离可以保留证据,同时支持重新质检恢复。

+
+
+
+ +
+

状态规则

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
场景输入状态输出状态说明
自动质检开始pending_qaqa_runningqa_running 作为 episode 级互斥锁,避免重复运行
自动质检全部通过pending_qaqa_runningapproved用户已确认希望自动批准
自动质检任一失败可运行状态failed所有检查项失败都统一落为 failed
手动重新质检全部通过failedqa_runningapproved允许修复数据或重新上传后从失败恢复
手动重新质检任一失败可运行状态failed写入最新失败原因,继续阻止 MCAP 相关行为
人工终态保护rejectedneeds_inspectioninspector_approved保持原状态手动质检不能覆盖人工决策状态,但仍可写入 qa_checks 历史
已有质检运行中qa_runningqa_running新的手动运行返回 409,前端提示稍后重试
+
+ 实现约束:如果手动入口从人工终态触发,只能记录检查事实,不能把 rejectedneeds_inspectioninspector_approved 自动改成 approvedfailed。这些状态代表人工判断,优先级高于自动质检。 +
+
+ +
+

API 设计

+

新 UI 统一走 /api/v1/qa 命名空间。旧的单项接口不作为质检中心和 episode 详情页的主调用路径,后续可以删除或仅保留为内部兼容入口。

+ +

查询质检中心 episode 列表

+
GET /api/v1/qa/episodes?status=failed&robot_type=arm_bot&q=demo&page=1&page_size=20
+
{
+  "items": [
+    {
+      "id": 123,
+      "public_id": "ep_20260605_001",
+      "task_id": 88,
+      "robot_type": "arm_bot",
+      "qa_status": "failed",
+      "quality_flag": "MCAP integrity check failed: tail magic mismatch",
+      "created_at": "2026-06-05T10:20:00Z",
+      "latest_qa_check": {
+        "check_name": "mcap_magic",
+        "passed": false,
+        "details": "MCAP integrity check failed: tail magic mismatch",
+        "checked_at": "2026-06-05T10:30:00Z"
+      }
+    }
+  ],
+  "pagination": {
+    "page": 1,
+    "page_size": 20,
+    "total": 1
+  }
+}
+ +

查询 episode 质检历史

+
GET /api/v1/qa/episodes/:id/checks
+
{
+  "items": [
+    {
+      "id": 456,
+      "episode_id": 123,
+      "check_name": "mcap_magic",
+      "passed": false,
+      "score": 0,
+      "details": "MCAP integrity check failed: tail magic mismatch",
+      "check_metadata": {
+        "expected_magic": "89 4d 43 41 50 30 0d 0a",
+        "found_head_magic": "89 4d 43 41 50 30 0d 0a",
+        "found_tail_magic": "8b ef b8 75 c6 97 96 61",
+        "file_size_bytes": 123456789
+      },
+      "checked_at": "2026-06-05T10:30:00Z"
+    }
+  ]
+}
+ +

运行 episode 完整质检 suite

+
POST /api/v1/qa/episodes/:id/run
+Content-Type: application/json
+
+{
+  "mode": "manual"
+}
+
{
+  "episode_id": 123,
+  "qa_status": "approved",
+  "passed": true,
+  "checks": [
+    {
+      "check_name": "mcap_magic",
+      "passed": true,
+      "score": 1,
+      "details": "MCAP head and tail magic matched",
+      "checked_at": "2026-06-05T10:35:00Z"
+    }
+  ]
+}
+ +

错误处理

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
场景状态码说明
episode 不存在404返回 {"error":"episode not found"}
已有质检运行中409返回 {"error":"qa already running"}
S3/MinIO 不可用502503不写入失败质检记录,避免把基础设施故障误判成数据坏包
文件格式检查失败200返回 passed=false,并落库更新 episode 状态
+
+ +
+

后端实现

+
+

执行流程

+
+
+ 1 + 创建或手动触发 +

episode 创建后自动入内存队列;质检中心和详情页手动调用 run 接口。

+
+
+ 2 + 获取互斥锁 +

将可运行状态置为 qa_running;若已运行则手动请求返回 409

+
+
+ 3 + 加载 suite +

第一版硬编码 ['mcap_magic'],未来按 robot_type 加载配置。

+
+
+ 4 + 写入历史 +

每个检查项都插入 qa_checks,保留本次运行证据。

+
+
+ 5 + 更新门禁 +

任一失败置为 failed;全部通过按状态保护规则置为 approved

+
+
+
+ +
+
+

核心服务接口

+
type QARunMode string
+
+const (
+    QARunModeAuto QARunMode = "auto"
+    QARunModeManual QARunMode = "manual"
+)
+
+func (s *QAService) RunEpisodeQASuite(
+    ctx context.Context,
+    episodeID int64,
+    mode QARunMode,
+) (*QASuiteResult, error)
+
+
+

检查器结果模型

+
type QACheckResult struct {
+    CheckName string
+    Passed bool
+    Score float64
+    Details string
+    Metadata map[string]any
+}
+
+
+ +
+

自动质检队列

+
    +
  • episode 创建成功并提交事务后,将 episode ID 推入 Keystone 进程内轻量队列。
  • +
  • 队列 worker 异步调用 RunEpisodeQASuite(ctx, episodeID, QARunModeAuto)
  • +
  • 第一版不持久化 job,不提供批量重放;服务重启时可能丢失尚未执行的自动质检任务,当前接受该取舍。
  • +
  • 质检中心仍可以通过筛选 pending_qa 发现漏检 episode,并手动触发 重新质检
  • +
+
+ +
+

mcap_magic 检查细节

+
    +
  • 期望 magic:89 4d 43 41 50 30 0d 0a
  • +
  • 先通过 S3 stat 获取对象大小。
  • +
  • 对象大小小于 16 字节时直接返回 passed=false
  • +
  • 读取 Range: bytes=0-7 作为 head magic。
  • +
  • 读取 Range: bytes=(size-8)-(size-1) 作为 tail magic。
  • +
  • head 或 tail 任一不匹配即失败。
  • +
  • 不初始化 MCAP reader,不加载索引,不解压 chunk。
  • +
+
+ +
+

门禁改造

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
行为规则实现点
MCAP 预览 URLqa_status=failed 时拒绝签发EpisodeHandler.GetEpisodePresignedURL
MCAP 下载 URLqa_status=failed 时拒绝签发kind=mcap 的 presign 请求
JSON sidecar 下载不受 MCAP 完整性失败影响kind=sidecar 保持现有行为
普通云同步现有 approved 条件自然阻止 failedSyncHandler.TriggerEpisodeSync
重新同步qa_status=failed 时阻止SyncHandler.TriggerEpisodeResync
+
+
+ +
+

前端实现

+

Synapse 需要新增质检中心页面,并让 episode 详情页复用同一个“重新质检”动作。前端不暴露检查项选择,也不显示 robot_type 脚本配置占位,避免第一版范围膨胀。

+
+
+ 导航入口 +

管理后台侧边栏的“数据运维”下新增 质检中心

+
+
+ 默认列表 +

默认筛选 pending_qafailedneeds_inspection,优先展示需要处理的数据。

+
+
+ 统一动作 +

质检中心行操作和 episode 详情页按钮都命名为 重新质检,调用同一个 run 接口。

+
+
+ +

筛选项

+
+ 全部 + pending_qa + failed + needs_inspection + approved / inspector_approved + rejected +
+ +

列表字段

+ + +

交互规则

+ +
+ +
+

测试计划

+
+
+

Keystone

+
    +
  • episode 创建后自动入队并执行 mcap_magic
  • +
  • 自动质检全部通过时,pending_qaqa_running 更新为 approved
  • +
  • 自动质检失败时写入 qa_checks.passed=falsequality_flag,并设置 qa_status=failed
  • +
  • 手动重新质检允许 failed 在全部通过后恢复为 approved
  • +
  • 手动重新质检不能覆盖 rejectedneeds_inspectioninspector_approved
  • +
  • qa_running 状态下再次手动运行返回 409
  • +
  • GET /api/v1/qa/episodes 返回最新质检结果。
  • +
  • GET /api/v1/qa/episodes/:id/checks 返回完整质检历史。
  • +
  • presign?kind=mcapfailed 时拒绝,presign?kind=sidecar 仍允许。
  • +
  • 同步和重新同步在 failed 时拒绝。
  • +
+
+
+

Synapse

+
    +
  • 数据运维下显示 质检中心 菜单。
  • +
  • 质检中心默认加载可处理 episode。
  • +
  • 筛选项能切换全部、待质检、失败、待人工确认、已批准和已拒绝。
  • +
  • 列表展示最近一次质检结果。
  • +
  • 质检历史抽屉能展示完整 qa_checks
  • +
  • 质检中心和 episode 详情页都通过 重新质检 调用 POST /api/v1/qa/episodes/:id/run
  • +
+
+
+
+ +
+

后续 robot_type 与 Python 脚本扩展

+

下一阶段可以在质检中心增加配置能力:不同 robot_type 绑定多个检查脚本,episode 运行质检时根据机器人类型加载 suite。检查项可以是 Go 内置检查器,也可以是 Python 脚本。所有检查都复用同一结果模型、落库规则和状态流转规则。

+
{
+  "robot_type": "arm_bot",
+  "checks": [
+    { "name": "mcap_magic", "runtime": "go" },
+    { "name": "topic_required", "runtime": "python", "script": "topic_required.py" },
+    { "name": "duration_range", "runtime": "python", "script": "duration_range.py" }
+  ]
+}
+
+ 未来自动批准规则:如果某个机器人类型配置了多个检查脚本,该机器人生产的 episode 必须通过全部已配置检查,才自动把 qa_status 改成 approved;任一失败都改成 failed。 +
+
+ sidecar_schema + duration_range + topic_required + python:custom_script +
+
+ +
+

已确认决策

+ +
+ + +
+ + diff --git a/internal/api/handlers/data_ops.go b/internal/api/handlers/data_ops.go new file mode 100644 index 0000000..03f4d79 --- /dev/null +++ b/internal/api/handlers/data_ops.go @@ -0,0 +1,569 @@ +// SPDX-FileCopyrightText: 2026 ArcheBase +// +// SPDX-License-Identifier: MulanPSL-2.0 + +package handlers + +import ( + "context" + "database/sql" + "fmt" + "net/http" + "strings" + "time" + + "github.com/gin-gonic/gin" + "github.com/jmoiron/sqlx" + + "archebase.com/keystone-edge/internal/logger" + "archebase.com/keystone-edge/internal/services" +) + +const syncStatusNotStarted = "not_started" + +var validDataOpsSyncStatuses = map[string]struct{}{ + syncStatusNotStarted: {}, + "pending": {}, + "in_progress": {}, + "completed": {}, + "failed": {}, +} + +// DataOpsHandler handles data operations APIs for the admin workbench. +type DataOpsHandler struct { + db *sqlx.DB + qa *EpisodeQAHandler + syncWorker *services.SyncWorker +} + +// NewDataOpsHandler creates a data operations handler. +func NewDataOpsHandler(db *sqlx.DB) *DataOpsHandler { + return &DataOpsHandler{db: db} +} + +// SetBulkActionDeps wires optional services used by data-ops bulk actions. +func (h *DataOpsHandler) SetBulkActionDeps(qa *EpisodeQAHandler, syncWorker *services.SyncWorker) { + if h == nil { + return + } + h.qa = qa + h.syncWorker = syncWorker +} + +// RegisterRoutes registers data operations routes under /data-ops. +func (h *DataOpsHandler) RegisterRoutes(apiV1 *gin.RouterGroup) { + apiV1.GET("/episodes", h.ListEpisodes) + apiV1.POST("/episodes/bulk-qa/preview", h.PreviewBulkEpisodeQA) + apiV1.POST("/episodes/bulk-sync/preview", h.PreviewBulkEpisodeSync) + apiV1.POST("/episodes/bulk-qa", h.BulkRunEpisodeQA) + apiV1.POST("/episodes/bulk-sync", h.BulkSyncEpisodes) +} + +type dataOpsEpisodeQuery struct { + Pagination PaginationParams + CreatedAtFrom time.Time + CreatedAtTo time.Time + HasCreatedAtFrom bool + HasCreatedAtTo bool + Keyword string + QAStatuses []string + SyncStatuses []string + SceneIDs []int64 + SOPIDs []int64 + RobotTypeIDs []int64 + RobotDeviceIDs []string + CollectorOperatorIDs []string + Label string +} + +type dataOpsEpisodeRow struct { + ID int64 `db:"id"` + EpisodeID string `db:"episode_id"` + TaskID int64 `db:"task_id"` + TaskPublicID sql.NullString `db:"task_public_id"` + SOPID sql.NullInt64 `db:"sop_id"` + SOP sql.NullString `db:"sop"` + SceneID int64 `db:"scene_id"` + SceneName sql.NullString `db:"scene_name"` + RobotTypeID sql.NullInt64 `db:"robot_type_id"` + RobotType sql.NullString `db:"robot_type"` + RobotDeviceID sql.NullString `db:"robot_device_id"` + CollectorOperatorID sql.NullString `db:"collector_operator_id"` + CollectorName sql.NullString `db:"collector_name"` + QAStatus string `db:"qa_status"` + QualityFlag sql.NullString `db:"quality_flag"` + CloudSynced bool `db:"cloud_synced"` + DurationSec sql.NullFloat64 `db:"duration_sec"` + FileSizeBytes sql.NullInt64 `db:"file_size_bytes"` + LabelsJSON sql.NullString `db:"labels"` + CreatedAt time.Time `db:"created_at"` +} + +// DataOpsEpisodeItemResponse describes one episode row in the data operations table. +type DataOpsEpisodeItemResponse struct { + ID int64 `json:"id"` + EpisodeID string `json:"episode_id"` + TaskID int64 `json:"task_id"` + TaskPublicID *string `json:"task_public_id,omitempty"` + SOPID *int64 `json:"sop_id,omitempty"` + SOP *string `json:"sop,omitempty"` + SceneID int64 `json:"scene_id"` + SceneName *string `json:"scene_name,omitempty"` + RobotTypeID *int64 `json:"robot_type_id,omitempty"` + RobotType *string `json:"robot_type,omitempty"` + RobotDeviceID *string `json:"robot_device_id,omitempty"` + CollectorOperatorID *string `json:"collector_operator_id,omitempty"` + CollectorName *string `json:"collector_name,omitempty"` + QAStatus string `json:"qa_status"` + QualityFlag *string `json:"quality_flag,omitempty"` + LatestQACheck *EpisodeQACheckRecordResponse `json:"latest_qa_check,omitempty"` + SyncStatus string `json:"sync_status"` + LatestSyncLog *SyncJobResponse `json:"latest_sync_log,omitempty"` + CloudSynced bool `json:"cloud_synced"` + DurationSec *float64 `json:"duration_sec,omitempty"` + FileSizeBytes *int64 `json:"file_size_bytes,omitempty"` + Labels []string `json:"labels"` + CreatedAt string `json:"created_at"` +} + +// DataOpsEpisodeListResponse contains paginated episode rows for data operations. +type DataOpsEpisodeListResponse struct { + Items []DataOpsEpisodeItemResponse `json:"items"` + Total int `json:"total"` + Limit int `json:"limit"` + Offset int `json:"offset"` + HasNext bool `json:"hasNext,omitempty"` + HasPrev bool `json:"hasPrev,omitempty"` +} + +// ListEpisodes returns unified episode detail rows for data operations. +// +// @Summary List data operation episodes +// @Description Lists episode details with latest QA and cloud sync states. +// @Tags data-ops +// @Produce json +// @Param limit query int false "Max results" +// @Param offset query int false "Pagination offset" +// @Param created_at_from query string false "created_at >= RFC3339" +// @Param created_at_to query string false "created_at <= RFC3339" +// @Param q query string false "Search episode/task/quality text" +// @Param qa_status query string false "Comma-separated QA statuses" +// @Param sync_status query string false "Comma-separated sync statuses: not_started,pending,in_progress,completed,failed" +// @Param scene_id query string false "Comma-separated scene IDs" +// @Param sop_id query string false "Comma-separated SOP IDs" +// @Param robot_type_id query string false "Comma-separated robot type IDs" +// @Param robot_device_id query string false "Comma-separated robot device IDs" +// @Param collector_operator_id query string false "Comma-separated collector operator IDs" +// @Param label query string false "Exact label" +// @Success 200 {object} DataOpsEpisodeListResponse +// @Failure 400 {object} map[string]string +// @Failure 500 {object} map[string]string +// @Router /data-ops/episodes [get] +func (h *DataOpsHandler) ListEpisodes(c *gin.Context) { + if h.db == nil { + c.JSON(http.StatusServiceUnavailable, gin.H{"error": "database is not configured"}) + return + } + + q, err := parseDataOpsEpisodeQuery(c) + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) + return + } + + fromSQL := dataOpsEpisodeBaseFromSQL() + where, args := buildDataOpsEpisodeWhere(q) + countQuery := "SELECT COUNT(1) " + fromSQL + where + + var total int + if err := h.db.Get(&total, countQuery, args...); err != nil { + logger.Printf("[DATA_OPS] episode count query failed: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to count data operation episodes"}) + return + } + + query := dataOpsEpisodeListSQL(fromSQL, where) + queryArgs := append(append([]interface{}{}, args...), q.Pagination.Limit, q.Pagination.Offset) + + var rows []dataOpsEpisodeRow + if err := h.db.Select(&rows, query, queryArgs...); err != nil { + logger.Printf("[DATA_OPS] episode list query failed: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to list data operation episodes"}) + return + } + + episodeIDs := dataOpsEpisodeIDs(rows) + latestQAChecks, err := h.latestQAChecksByEpisode(c.Request.Context(), episodeIDs) + if err != nil { + logger.Printf("[DATA_OPS] latest QA query failed: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to list data operation episodes"}) + return + } + latestSyncLogs, err := h.latestSyncLogsByEpisode(c.Request.Context(), episodeIDs) + if err != nil { + logger.Printf("[DATA_OPS] latest sync query failed: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to list data operation episodes"}) + return + } + + items := make([]DataOpsEpisodeItemResponse, 0, len(rows)) + for _, row := range rows { + item := dataOpsEpisodeItemFromRow(row) + if qaCheck, ok := latestQAChecks[row.ID]; ok { + item.LatestQACheck = qaCheck + } + if syncLog, ok := latestSyncLogs[row.ID]; ok { + log := syncLog + item.LatestSyncLog = &log + item.SyncStatus = log.Status + } + items = append(items, item) + } + + c.JSON(http.StatusOK, DataOpsEpisodeListResponse{ + Items: items, + Total: total, + Limit: q.Pagination.Limit, + Offset: q.Pagination.Offset, + HasNext: q.Pagination.Offset+q.Pagination.Limit < total, + HasPrev: q.Pagination.Offset > 0, + }) +} + +func parseDataOpsEpisodeQuery(c *gin.Context) (dataOpsEpisodeQuery, error) { + pagination, err := ParsePagination(c) + if err != nil { + return dataOpsEpisodeQuery{}, err + } + + qaStatuses, err := parseStatsStringListQuery(c, "qa_status") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + for _, status := range qaStatuses { + if _, ok := validDataProductionQAStatuses[status]; !ok { + return dataOpsEpisodeQuery{}, fmt.Errorf("qa_status must be one of pending_qa, qa_running, approved, needs_inspection, inspector_approved, rejected, failed") + } + } + + syncStatuses, err := parseStatsStringListQuery(c, "sync_status") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + for _, status := range syncStatuses { + if _, ok := validDataOpsSyncStatuses[status]; !ok { + return dataOpsEpisodeQuery{}, fmt.Errorf("sync_status must be one of not_started, pending, in_progress, completed, failed") + } + } + + sceneIDs, err := parsePositiveInt64List(c.Query("scene_id"), "scene_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + sopIDs, err := parsePositiveInt64List(c.Query("sop_id"), "sop_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + robotTypeIDs, err := parsePositiveInt64List(c.Query("robot_type_id"), "robot_type_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + robotDeviceIDs, err := parseStatsStringListQuery(c, "robot_device_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + collectorOperatorIDs, err := parseStatsStringListQuery(c, "collector_operator_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + + out := dataOpsEpisodeQuery{ + Pagination: pagination, + Keyword: strings.TrimSpace(c.Query("q")), + QAStatuses: qaStatuses, + SyncStatuses: syncStatuses, + SceneIDs: sceneIDs, + SOPIDs: sopIDs, + RobotTypeIDs: robotTypeIDs, + RobotDeviceIDs: robotDeviceIDs, + CollectorOperatorIDs: collectorOperatorIDs, + Label: strings.TrimSpace(c.Query("label")), + } + + if raw := strings.TrimSpace(c.Query("created_at_from")); raw != "" { + parsed, err := parseEpisodeRFC3339(raw) + if err != nil { + return dataOpsEpisodeQuery{}, fmt.Errorf("invalid created_at_from") + } + out.CreatedAtFrom = parsed + out.HasCreatedAtFrom = true + } + if raw := strings.TrimSpace(c.Query("created_at_to")); raw != "" { + parsed, err := parseEpisodeRFC3339(raw) + if err != nil { + return dataOpsEpisodeQuery{}, fmt.Errorf("invalid created_at_to") + } + out.CreatedAtTo = parsed + out.HasCreatedAtTo = true + } + if out.HasCreatedAtFrom && out.HasCreatedAtTo && out.CreatedAtTo.Before(out.CreatedAtFrom) { + return dataOpsEpisodeQuery{}, fmt.Errorf("created_at_to must be after created_at_from") + } + if len(out.Label) > maxMultiValueFilterStringItemLength { + return dataOpsEpisodeQuery{}, fmt.Errorf("label contains a value longer than %d characters", maxMultiValueFilterStringItemLength) + } + + return out, nil +} + +func dataOpsEpisodeBaseFromSQL() string { + return ` + FROM episodes e + LEFT JOIN tasks t ON t.id = e.task_id AND t.deleted_at IS NULL + LEFT JOIN scenes sc ON sc.id = e.scene_id AND sc.deleted_at IS NULL + LEFT JOIN workstations ws ON ws.id = COALESCE(e.workstation_id, t.workstation_id) AND ws.deleted_at IS NULL + LEFT JOIN robots r ON r.id = ws.robot_id AND r.deleted_at IS NULL + LEFT JOIN robot_types rt ON rt.id = r.robot_type_id AND rt.deleted_at IS NULL + LEFT JOIN data_collectors dc ON dc.id = ws.data_collector_id AND dc.deleted_at IS NULL + LEFT JOIN sops s ON s.id = COALESCE(e.sop_id, t.sop_id) AND s.deleted_at IS NULL + ` +} + +func buildDataOpsEpisodeWhere(q dataOpsEpisodeQuery) (string, []interface{}) { + where := " WHERE e.deleted_at IS NULL" + args := []interface{}{} + + if q.HasCreatedAtFrom { + where += " AND e.created_at >= ?" + args = append(args, q.CreatedAtFrom) + } + if q.HasCreatedAtTo { + where += " AND e.created_at <= ?" + args = append(args, q.CreatedAtTo) + } + + where, args = appendStringInFilter(where, args, "e.qa_status", q.QAStatuses) + where, args = appendInt64InFilter(where, args, "e.scene_id", q.SceneIDs) + where, args = appendInt64InFilter(where, args, "COALESCE(e.sop_id, t.sop_id)", q.SOPIDs) + where, args = appendInt64InFilter(where, args, "r.robot_type_id", q.RobotTypeIDs) + where, args = appendStringInFilter(where, args, "COALESCE(NULLIF(r.device_id, ''), NULLIF(ws.robot_serial, ''), '')", q.RobotDeviceIDs) + where, args = appendStringInFilter(where, args, "COALESCE(NULLIF(dc.operator_id, ''), NULLIF(ws.collector_operator_id, ''), '')", q.CollectorOperatorIDs) + + if q.Keyword != "" { + where, args = appendKeywordSearch(where, args, q.Keyword, "e.episode_id", "t.task_id", "e.quality_flag") + } + if q.Label != "" { + where += " AND JSON_CONTAINS(COALESCE(e.labels, JSON_ARRAY()), JSON_QUOTE(?))" + args = append(args, q.Label) + } + if len(q.SyncStatuses) > 0 { + syncWhere, syncArgs := dataOpsSyncStatusWhere(q.SyncStatuses) + where += syncWhere + args = append(args, syncArgs...) + } + + return where, args +} + +func dataOpsSyncStatusWhere(statuses []string) (string, []interface{}) { + if len(statuses) == 0 { + return "", nil + } + + hasNotStarted := false + latestStatuses := []string{} + for _, status := range statuses { + if status == syncStatusNotStarted { + hasNotStarted = true + continue + } + latestStatuses = append(latestStatuses, status) + } + + parts := []string{} + args := []interface{}{} + if hasNotStarted { + parts = append(parts, "NOT EXISTS (SELECT 1 FROM sync_logs sl0 WHERE sl0.episode_id = e.id)") + } + if len(latestStatuses) > 0 { + placeholders := make([]string, 0, len(latestStatuses)) + for _, status := range latestStatuses { + placeholders = append(placeholders, "?") + args = append(args, status) + } + parts = append(parts, ` + EXISTS ( + SELECT 1 + FROM sync_logs sl_latest + WHERE sl_latest.episode_id = e.id + AND sl_latest.id = ( + SELECT MAX(sl2.id) + FROM sync_logs sl2 + WHERE sl2.episode_id = e.id + ) + AND sl_latest.status IN (`+strings.Join(placeholders, ",")+`) + ) + `) + } + + return " AND (" + strings.Join(parts, " OR ") + ")", args +} + +func dataOpsEpisodeListSQL(fromSQL string, where string) string { + return ` + SELECT + e.id, + e.episode_id, + e.task_id, + t.task_id AS task_public_id, + COALESCE(e.sop_id, t.sop_id) AS sop_id, + CASE + WHEN NULLIF(s.slug, '') IS NULL THEN + CASE + WHEN COALESCE(e.sop_id, t.sop_id) IS NULL THEN '' + ELSE CONCAT('SOP #', CAST(COALESCE(e.sop_id, t.sop_id) AS CHAR)) + END + WHEN NULLIF(s.version, '') IS NULL THEN s.slug + ELSE CONCAT(s.slug, ' @ ', s.version) + END AS sop, + e.scene_id, + COALESCE(NULLIF(e.scene_name, ''), NULLIF(t.scene_name, ''), NULLIF(sc.name, '')) AS scene_name, + r.robot_type_id, + COALESCE(NULLIF(rt.name, ''), NULLIF(rt.model, ''), NULLIF(ws.robot_name, '')) AS robot_type, + COALESCE(NULLIF(r.device_id, ''), NULLIF(ws.robot_serial, '')) AS robot_device_id, + COALESCE(NULLIF(dc.operator_id, ''), NULLIF(ws.collector_operator_id, '')) AS collector_operator_id, + COALESCE(NULLIF(dc.name, ''), NULLIF(ws.collector_name, '')) AS collector_name, + COALESCE(e.qa_status, '') AS qa_status, + e.quality_flag, + e.cloud_synced, + e.duration_sec, + e.file_size_bytes, + e.labels, + e.created_at + ` + fromSQL + where + ` + ORDER BY e.created_at DESC, e.id DESC + LIMIT ? OFFSET ? + ` +} + +func dataOpsEpisodeIDs(rows []dataOpsEpisodeRow) []int64 { + ids := make([]int64, 0, len(rows)) + for _, row := range rows { + ids = append(ids, row.ID) + } + return ids +} + +func (h *DataOpsHandler) latestQAChecksByEpisode(ctx context.Context, episodeIDs []int64) (map[int64]*EpisodeQACheckRecordResponse, error) { + out := make(map[int64]*EpisodeQACheckRecordResponse) + if len(episodeIDs) == 0 { + return out, nil + } + + query, args := dataOpsLatestQAChecksSQL(episodeIDs) + var rows []episodeQACheckDBRow + if err := h.db.SelectContext(ctx, &rows, query, args...); err != nil { + return nil, err + } + for _, row := range rows { + record := qaCheckRecordFromDBRow(row) + out[row.EpisodeID] = &record + } + return out, nil +} + +func dataOpsLatestQAChecksSQL(episodeIDs []int64) (string, []interface{}) { + placeholders, args := int64Placeholders(episodeIDs) + return ` + SELECT qc.id, qc.episode_id, qc.check_name, qc.passed, qc.score, qc.details, qc.check_metadata, qc.checked_at + FROM qa_checks qc + INNER JOIN ( + SELECT episode_id, MAX(id) AS latest_id + FROM qa_checks + WHERE episode_id IN (` + placeholders + `) + GROUP BY episode_id + ) latest ON latest.episode_id = qc.episode_id AND latest.latest_id = qc.id + `, args +} + +func (h *DataOpsHandler) latestSyncLogsByEpisode(ctx context.Context, episodeIDs []int64) (map[int64]SyncJobResponse, error) { + out := make(map[int64]SyncJobResponse) + if len(episodeIDs) == 0 { + return out, nil + } + + query, args := dataOpsLatestSyncLogsSQL(episodeIDs) + var rows []syncLogRow + if err := h.db.SelectContext(ctx, &rows, query, args...); err != nil { + return nil, err + } + for _, row := range rows { + out[row.EpisodeID] = syncJobResponseFromRow(row) + } + return out, nil +} + +func dataOpsLatestSyncLogsSQL(episodeIDs []int64) (string, []interface{}) { + placeholders, args := int64Placeholders(episodeIDs) + return ` + SELECT + sl.id, + sl.episode_id, + e.episode_id AS episode_public_id, + sl.source_factory_id, + sl.source_path, + sl.destination_path, + sl.status, + sl.bytes_transferred, + sl.duration_sec, + sl.error_message, + COALESCE(sl.attempt_count, 0) AS attempt_count, + sl.next_retry_at, + sl.started_at, + sl.completed_at + FROM sync_logs sl + INNER JOIN ( + SELECT episode_id, MAX(id) AS latest_id + FROM sync_logs + WHERE episode_id IN (` + placeholders + `) + GROUP BY episode_id + ) latest ON latest.episode_id = sl.episode_id AND latest.latest_id = sl.id + LEFT JOIN episodes e ON e.id = sl.episode_id AND e.deleted_at IS NULL + `, args +} + +func int64Placeholders(values []int64) (string, []interface{}) { + placeholders := make([]string, 0, len(values)) + args := make([]interface{}, 0, len(values)) + for _, value := range values { + placeholders = append(placeholders, "?") + args = append(args, value) + } + return strings.Join(placeholders, ","), args +} + +func dataOpsEpisodeItemFromRow(row dataOpsEpisodeRow) DataOpsEpisodeItemResponse { + return DataOpsEpisodeItemResponse{ + ID: row.ID, + EpisodeID: row.EpisodeID, + TaskID: row.TaskID, + TaskPublicID: nullableString(row.TaskPublicID), + SOPID: nullableInt64(row.SOPID), + SOP: nullableString(row.SOP), + SceneID: row.SceneID, + SceneName: nullableString(row.SceneName), + RobotTypeID: nullableInt64(row.RobotTypeID), + RobotType: nullableString(row.RobotType), + RobotDeviceID: nullableString(row.RobotDeviceID), + CollectorOperatorID: nullableString(row.CollectorOperatorID), + CollectorName: nullableString(row.CollectorName), + QAStatus: row.QAStatus, + QualityFlag: nullableString(row.QualityFlag), + SyncStatus: syncStatusNotStarted, + CloudSynced: row.CloudSynced, + DurationSec: nullableFloat64(row.DurationSec), + FileSizeBytes: nullableInt64(row.FileSizeBytes), + Labels: episodeLabelsFromDB(row.LabelsJSON), + CreatedAt: row.CreatedAt.UTC().Format(time.RFC3339), + } +} diff --git a/internal/api/handlers/data_ops_bulk.go b/internal/api/handlers/data_ops_bulk.go new file mode 100644 index 0000000..aade406 --- /dev/null +++ b/internal/api/handlers/data_ops_bulk.go @@ -0,0 +1,682 @@ +// SPDX-FileCopyrightText: 2026 ArcheBase +// +// SPDX-License-Identifier: MulanPSL-2.0 + +package handlers + +import ( + "context" + "errors" + "fmt" + "io" + "net/http" + "strconv" + "strings" + "sync" + "sync/atomic" + + "github.com/gin-gonic/gin" + + "archebase.com/keystone-edge/internal/logger" + "archebase.com/keystone-edge/internal/services" +) + +const dataOpsBulkQAConcurrency = 4 + +// DataOpsBulkEpisodeFilters contains data-ops filters for bulk episode actions. +type DataOpsBulkEpisodeFilters struct { + CreatedAtFrom string `json:"created_at_from,omitempty"` + CreatedAtTo string `json:"created_at_to,omitempty"` + Keyword string `json:"q,omitempty"` + QAStatus string `json:"qa_status,omitempty"` + SyncStatus string `json:"sync_status,omitempty"` + SceneID string `json:"scene_id,omitempty"` + SOPID string `json:"sop_id,omitempty"` + RobotTypeID string `json:"robot_type_id,omitempty"` + RobotDeviceID string `json:"robot_device_id,omitempty"` + CollectorOperatorID string `json:"collector_operator_id,omitempty"` + Label string `json:"label,omitempty"` + Limit string `json:"limit,omitempty"` + Offset string `json:"offset,omitempty"` +} + +// DataOpsBulkEpisodeActionRequest is the request body for bulk preview and execute calls. +type DataOpsBulkEpisodeActionRequest struct { + Confirm bool `json:"confirm,omitempty"` + Filters DataOpsBulkEpisodeFilters `json:"filters,omitempty"` +} + +// DataOpsBulkSkippedBreakdownItem summarizes one skipped reason in a bulk preview. +type DataOpsBulkSkippedBreakdownItem struct { + Reason string `json:"reason"` + Count int `json:"count"` +} + +// DataOpsBulkEpisodePreviewResponse reports matched, eligible, and skipped counts before execution. +type DataOpsBulkEpisodePreviewResponse struct { + Status string `json:"status"` + Action string `json:"action"` + MatchedCount int `json:"matched_count"` + EligibleCount int `json:"eligible_count"` + SkippedCount int `json:"skipped_count"` + ProtectedStatusCount int `json:"protected_status_count,omitempty"` + SyncWorkerRunning *bool `json:"sync_worker_running,omitempty"` + SkippedBreakdown []DataOpsBulkSkippedBreakdownItem `json:"skipped_breakdown"` + Warnings []string `json:"warnings"` +} + +// DataOpsBulkEpisodeActionResponse acknowledges an accepted asynchronous bulk action. +type DataOpsBulkEpisodeActionResponse struct { + Status string `json:"status"` + MatchedCount int `json:"matched_count"` + Message string `json:"message"` +} + +type dataOpsBulkQAPreviewRow struct { + MatchedCount int64 `db:"matched_count"` + QARunningCount int64 `db:"qa_running_count"` + ProtectedStatusCount int64 `db:"protected_status_count"` +} + +type dataOpsBulkSyncPreviewRow struct { + MatchedCount int64 `db:"matched_count"` + EligibleCount int64 `db:"eligible_count"` + QANotApprovedCount int64 `db:"qa_not_approved_count"` + AlreadySyncedCount int64 `db:"already_synced_count"` + SyncActiveCount int64 `db:"sync_active_count"` + UnsupportedSyncStatus int64 `db:"unsupported_sync_status_count"` +} + +// PreviewBulkEpisodeQA previews a bulk QA run for the current data-ops filters. +// +// @Summary Preview bulk episode QA +// @Description Previews matched, eligible, and skipped episode counts for a filtered bulk QA operation. +// @Tags data-ops +// @Accept json +// @Produce json +// @Param request body DataOpsBulkEpisodeActionRequest false "Bulk preview filters" +// @Success 200 {object} DataOpsBulkEpisodePreviewResponse +// @Failure 400 {object} map[string]string +// @Failure 503 {object} map[string]string +// @Failure 500 {object} map[string]string +// @Router /data-ops/episodes/bulk-qa/preview [post] +func (h *DataOpsHandler) PreviewBulkEpisodeQA(c *gin.Context) { + if !h.ensureDataOpsDatabase(c) { + return + } + + _, q, ok := h.parseBulkEpisodeActionRequest(c, false) + if !ok { + return + } + + preview, err := h.previewBulkEpisodeQA(c.Request.Context(), q) + if err != nil { + logger.Printf("[DATA_OPS] bulk QA preview failed: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to preview bulk qa"}) + return + } + c.JSON(http.StatusOK, preview) +} + +// PreviewBulkEpisodeSync previews a bulk cloud sync run for the current data-ops filters. +// +// @Summary Preview bulk episode cloud sync +// @Description Previews matched, eligible, and skipped episode counts for a filtered bulk cloud sync operation. +// @Tags data-ops +// @Accept json +// @Produce json +// @Param request body DataOpsBulkEpisodeActionRequest false "Bulk preview filters" +// @Success 200 {object} DataOpsBulkEpisodePreviewResponse +// @Failure 400 {object} map[string]string +// @Failure 503 {object} map[string]string +// @Failure 500 {object} map[string]string +// @Router /data-ops/episodes/bulk-sync/preview [post] +func (h *DataOpsHandler) PreviewBulkEpisodeSync(c *gin.Context) { + if !h.ensureDataOpsDatabase(c) { + return + } + + _, q, ok := h.parseBulkEpisodeActionRequest(c, false) + if !ok { + return + } + + preview, err := h.previewBulkEpisodeSync(c.Request.Context(), q) + if err != nil { + logger.Printf("[DATA_OPS] bulk sync preview failed: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to preview bulk sync"}) + return + } + c.JSON(http.StatusOK, preview) +} + +// BulkRunEpisodeQA starts a filtered asynchronous bulk QA run. +// +// @Summary Run bulk episode QA +// @Description Accepts a filtered episode snapshot and starts an asynchronous bulk QA run. +// @Tags data-ops +// @Accept json +// @Produce json +// @Param request body DataOpsBulkEpisodeActionRequest true "Bulk QA filters and confirmation" +// @Success 202 {object} DataOpsBulkEpisodeActionResponse +// @Failure 400 {object} map[string]string +// @Failure 503 {object} map[string]string +// @Failure 500 {object} map[string]string +// @Router /data-ops/episodes/bulk-qa [post] +func (h *DataOpsHandler) BulkRunEpisodeQA(c *gin.Context) { + if !h.ensureDataOpsDatabase(c) { + return + } + if !h.ensureBulkQAConfigured(c) { + return + } + + _, q, ok := h.parseBulkEpisodeActionRequest(c, true) + if !ok { + return + } + + ids, err := h.selectBulkEpisodeIDs(c.Request.Context(), q) + if err != nil { + logger.Printf("[DATA_OPS] bulk QA ID snapshot failed: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to select data operation episodes"}) + return + } + + logger.Printf("[DATA_OPS] Bulk QA accepted: matched=%d", len(ids)) + go h.runBulkEpisodeQA(ids) + + c.JSON(http.StatusAccepted, DataOpsBulkEpisodeActionResponse{ + Status: "accepted", + MatchedCount: len(ids), + Message: fmt.Sprintf("%d episodes accepted for bulk QA", len(ids)), + }) +} + +// BulkSyncEpisodes starts a filtered asynchronous bulk cloud sync run. +// +// @Summary Run bulk episode cloud sync +// @Description Accepts a filtered episode snapshot and starts asynchronous cloud sync enqueues. +// @Tags data-ops +// @Accept json +// @Produce json +// @Param request body DataOpsBulkEpisodeActionRequest true "Bulk sync filters and confirmation" +// @Success 202 {object} DataOpsBulkEpisodeActionResponse +// @Failure 400 {object} map[string]string +// @Failure 503 {object} map[string]string +// @Failure 500 {object} map[string]string +// @Router /data-ops/episodes/bulk-sync [post] +func (h *DataOpsHandler) BulkSyncEpisodes(c *gin.Context) { + if !h.ensureDataOpsDatabase(c) { + return + } + if !h.ensureBulkSyncWorkerRunning(c) { + return + } + + _, q, ok := h.parseBulkEpisodeActionRequest(c, true) + if !ok { + return + } + + ids, err := h.selectBulkEpisodeIDs(c.Request.Context(), q) + if err != nil { + logger.Printf("[DATA_OPS] bulk sync ID snapshot failed: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to select data operation episodes"}) + return + } + + logger.Printf("[DATA_OPS] Bulk sync accepted: matched=%d", len(ids)) + go h.runBulkEpisodeSync(ids) + + c.JSON(http.StatusAccepted, DataOpsBulkEpisodeActionResponse{ + Status: "accepted", + MatchedCount: len(ids), + Message: fmt.Sprintf("%d episodes accepted for bulk cloud sync", len(ids)), + }) +} + +func (h *DataOpsHandler) ensureDataOpsDatabase(c *gin.Context) bool { + if h == nil || h.db == nil { + c.JSON(http.StatusServiceUnavailable, gin.H{"error": "database is not configured"}) + return false + } + return true +} + +func (h *DataOpsHandler) ensureBulkQAConfigured(c *gin.Context) bool { + if h.qa == nil || h.qa.db == nil || h.qa.s3 == nil { + c.JSON(http.StatusServiceUnavailable, gin.H{"error": "qa service is not configured"}) + return false + } + return true +} + +func (h *DataOpsHandler) ensureBulkSyncWorkerRunning(c *gin.Context) bool { + if h.syncWorker == nil { + c.JSON(http.StatusServiceUnavailable, gin.H{"error": "sync worker is not configured"}) + return false + } + if !h.syncWorker.IsRunning() { + c.JSON(http.StatusServiceUnavailable, gin.H{"error": services.ErrSyncWorkerNotRunning.Error()}) + return false + } + return true +} + +func (h *DataOpsHandler) parseBulkEpisodeActionRequest(c *gin.Context, requireConfirm bool) (DataOpsBulkEpisodeActionRequest, dataOpsEpisodeQuery, bool) { + var req DataOpsBulkEpisodeActionRequest + if c.Request.Body != nil { + if err := c.ShouldBindJSON(&req); err != nil && !errors.Is(err, io.EOF) { + c.JSON(http.StatusBadRequest, gin.H{"error": "invalid bulk episode request"}) + return req, dataOpsEpisodeQuery{}, false + } + } + + if requireConfirm && !req.Confirm { + c.JSON(http.StatusBadRequest, gin.H{"error": "confirm must be true"}) + return req, dataOpsEpisodeQuery{}, false + } + + q, err := parseDataOpsBulkEpisodeFilters(req.Filters) + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) + return req, dataOpsEpisodeQuery{}, false + } + return req, q, true +} + +func parseDataOpsBulkEpisodeFilters(filters DataOpsBulkEpisodeFilters) (dataOpsEpisodeQuery, error) { + qaStatuses, err := parseDataOpsBulkStringList(filters.QAStatus, "qa_status") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + for _, status := range qaStatuses { + if _, ok := validDataProductionQAStatuses[status]; !ok { + return dataOpsEpisodeQuery{}, fmt.Errorf("qa_status must be one of pending_qa, qa_running, approved, needs_inspection, inspector_approved, rejected, failed") + } + } + + syncStatuses, err := parseDataOpsBulkStringList(filters.SyncStatus, "sync_status") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + for _, status := range syncStatuses { + if _, ok := validDataOpsSyncStatuses[status]; !ok { + return dataOpsEpisodeQuery{}, fmt.Errorf("sync_status must be one of not_started, pending, in_progress, completed, failed") + } + } + + sceneIDs, err := parseDataOpsBulkPositiveInt64List(filters.SceneID, "scene_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + sopIDs, err := parseDataOpsBulkPositiveInt64List(filters.SOPID, "sop_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + robotTypeIDs, err := parseDataOpsBulkPositiveInt64List(filters.RobotTypeID, "robot_type_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + robotDeviceIDs, err := parseDataOpsBulkStringList(filters.RobotDeviceID, "robot_device_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + collectorOperatorIDs, err := parseDataOpsBulkStringList(filters.CollectorOperatorID, "collector_operator_id") + if err != nil { + return dataOpsEpisodeQuery{}, err + } + + out := dataOpsEpisodeQuery{ + Keyword: strings.TrimSpace(filters.Keyword), + QAStatuses: qaStatuses, + SyncStatuses: syncStatuses, + SceneIDs: sceneIDs, + SOPIDs: sopIDs, + RobotTypeIDs: robotTypeIDs, + RobotDeviceIDs: robotDeviceIDs, + CollectorOperatorIDs: collectorOperatorIDs, + Label: strings.TrimSpace(filters.Label), + } + + if raw := strings.TrimSpace(filters.CreatedAtFrom); raw != "" { + parsed, err := parseEpisodeRFC3339(raw) + if err != nil { + return dataOpsEpisodeQuery{}, fmt.Errorf("invalid created_at_from") + } + out.CreatedAtFrom = parsed + out.HasCreatedAtFrom = true + } + if raw := strings.TrimSpace(filters.CreatedAtTo); raw != "" { + parsed, err := parseEpisodeRFC3339(raw) + if err != nil { + return dataOpsEpisodeQuery{}, fmt.Errorf("invalid created_at_to") + } + out.CreatedAtTo = parsed + out.HasCreatedAtTo = true + } + if out.HasCreatedAtFrom && out.HasCreatedAtTo && out.CreatedAtTo.Before(out.CreatedAtFrom) { + return dataOpsEpisodeQuery{}, fmt.Errorf("created_at_to must be after created_at_from") + } + if len(out.Label) > maxMultiValueFilterStringItemLength { + return dataOpsEpisodeQuery{}, fmt.Errorf("label contains a value longer than %d characters", maxMultiValueFilterStringItemLength) + } + + return out, nil +} + +func parseDataOpsBulkPositiveInt64List(raw string, fieldName string) ([]int64, error) { + items, err := splitDataOpsBulkCommaItems(raw, fieldName, maxMultiValueFilterIntegerItemLength) + if err != nil { + return nil, err + } + if len(items) == 0 { + return nil, nil + } + + seen := make(map[int64]struct{}) + values := []int64{} + for _, item := range items { + parsed, err := strconv.ParseInt(item, 10, 64) + if err != nil || parsed <= 0 { + return nil, fmt.Errorf("invalid %s format", fieldName) + } + if _, ok := seen[parsed]; ok { + continue + } + seen[parsed] = struct{}{} + values = append(values, parsed) + } + return values, nil +} + +func parseDataOpsBulkStringList(raw string, fieldName string) ([]string, error) { + items, err := splitDataOpsBulkCommaItems(raw, fieldName, maxMultiValueFilterStringItemLength) + if err != nil { + return nil, err + } + if len(items) == 0 { + return nil, nil + } + + seen := make(map[string]struct{}) + values := []string{} + for _, item := range items { + if _, ok := seen[item]; ok { + continue + } + seen[item] = struct{}{} + values = append(values, item) + } + return values, nil +} + +func splitDataOpsBulkCommaItems(raw string, fieldName string, maxItemLength int) ([]string, error) { + raw = strings.TrimSpace(raw) + if raw == "" { + return nil, nil + } + + items := []string{} + for _, item := range strings.Split(raw, ",") { + item = strings.TrimSpace(item) + if item == "" { + continue + } + if len(item) > maxItemLength { + return nil, fmt.Errorf("%s contains a value longer than %d characters", fieldName, maxItemLength) + } + items = append(items, item) + } + return items, nil +} + +func (h *DataOpsHandler) previewBulkEpisodeQA(ctx context.Context, q dataOpsEpisodeQuery) (DataOpsBulkEpisodePreviewResponse, error) { + fromSQL := dataOpsEpisodeBaseFromSQL() + where, args := buildDataOpsEpisodeWhere(q) + query := dataOpsBulkQAPreviewSQL(fromSQL, where) + + var row dataOpsBulkQAPreviewRow + if err := h.db.GetContext(ctx, &row, query, args...); err != nil { + return DataOpsBulkEpisodePreviewResponse{}, err + } + + matched := int(row.MatchedCount) + qaRunning := int(row.QARunningCount) + protected := int(row.ProtectedStatusCount) + eligible := matched - qaRunning + if eligible < 0 { + eligible = 0 + } + + breakdown := []DataOpsBulkSkippedBreakdownItem{} + if qaRunning > 0 { + breakdown = append(breakdown, DataOpsBulkSkippedBreakdownItem{Reason: "qa_running", Count: qaRunning}) + } + warnings := []string{} + if protected > 0 { + warnings = append(warnings, fmt.Sprintf("%d episodes are in protected manual QA statuses; checks can run but status will not be overwritten", protected)) + } + + return DataOpsBulkEpisodePreviewResponse{ + Status: "preview", + Action: "bulk_qa", + MatchedCount: matched, + EligibleCount: eligible, + SkippedCount: qaRunning, + ProtectedStatusCount: protected, + SkippedBreakdown: breakdown, + Warnings: warnings, + }, nil +} + +func dataOpsBulkQAPreviewSQL(fromSQL string, where string) string { + return ` + SELECT + COUNT(1) AS matched_count, + COALESCE(SUM(CASE WHEN COALESCE(e.qa_status, '') = 'qa_running' THEN 1 ELSE 0 END), 0) AS qa_running_count, + COALESCE(SUM(CASE WHEN COALESCE(e.qa_status, '') IN ('needs_inspection', 'inspector_approved', 'rejected') THEN 1 ELSE 0 END), 0) AS protected_status_count + ` + fromSQL + where +} + +func (h *DataOpsHandler) previewBulkEpisodeSync(ctx context.Context, q dataOpsEpisodeQuery) (DataOpsBulkEpisodePreviewResponse, error) { + fromSQL := dataOpsEpisodeBaseFromSQL() + dataOpsLatestSyncPreviewJoinSQL() + where, args := buildDataOpsEpisodeWhere(q) + query := dataOpsBulkSyncPreviewSQL(fromSQL, where) + + var row dataOpsBulkSyncPreviewRow + if err := h.db.GetContext(ctx, &row, query, args...); err != nil { + return DataOpsBulkEpisodePreviewResponse{}, err + } + + matched := int(row.MatchedCount) + eligible := int(row.EligibleCount) + skipped := matched - eligible + if skipped < 0 { + skipped = 0 + } + workerRunning := h.syncWorker != nil && h.syncWorker.IsRunning() + breakdown := []DataOpsBulkSkippedBreakdownItem{} + appendBreakdown := func(reason string, count int64) { + if count > 0 { + breakdown = append(breakdown, DataOpsBulkSkippedBreakdownItem{Reason: reason, Count: int(count)}) + } + } + appendBreakdown("qa_not_approved", row.QANotApprovedCount) + appendBreakdown("already_synced", row.AlreadySyncedCount) + appendBreakdown("sync_active", row.SyncActiveCount) + appendBreakdown("unsupported_sync_status", row.UnsupportedSyncStatus) + + warnings := []string{} + if !workerRunning { + warnings = append(warnings, "sync worker is not running; execution will be rejected until the worker starts") + } + + return DataOpsBulkEpisodePreviewResponse{ + Status: "preview", + Action: "bulk_sync", + MatchedCount: matched, + EligibleCount: eligible, + SkippedCount: skipped, + SyncWorkerRunning: &workerRunning, + SkippedBreakdown: breakdown, + Warnings: warnings, + }, nil +} + +func dataOpsLatestSyncPreviewJoinSQL() string { + return ` + LEFT JOIN ( + SELECT sl.episode_id, sl.status + FROM sync_logs sl + INNER JOIN ( + SELECT episode_id, MAX(id) AS latest_id + FROM sync_logs + GROUP BY episode_id + ) latest ON latest.episode_id = sl.episode_id AND latest.latest_id = sl.id + ) latest_sync ON latest_sync.episode_id = e.id + ` +} + +func dataOpsBulkSyncPreviewSQL(fromSQL string, where string) string { + approved := "(COALESCE(e.qa_status, '') IN ('approved', 'inspector_approved'))" + latestStatus := "COALESCE(latest_sync.status, '')" + synced := "(e.cloud_synced = TRUE OR " + latestStatus + " = 'completed')" + active := "(" + latestStatus + " IN ('pending', 'in_progress'))" + eligible := approved + " AND NOT " + synced + " AND (latest_sync.status IS NULL OR latest_sync.status = 'failed')" + unsupported := approved + " AND NOT " + synced + " AND latest_sync.status IS NOT NULL AND latest_sync.status NOT IN ('pending', 'in_progress', 'completed', 'failed')" + + return ` + SELECT + COUNT(1) AS matched_count, + COALESCE(SUM(CASE WHEN ` + eligible + ` THEN 1 ELSE 0 END), 0) AS eligible_count, + COALESCE(SUM(CASE WHEN NOT ` + approved + ` THEN 1 ELSE 0 END), 0) AS qa_not_approved_count, + COALESCE(SUM(CASE WHEN ` + approved + ` AND ` + synced + ` THEN 1 ELSE 0 END), 0) AS already_synced_count, + COALESCE(SUM(CASE WHEN ` + approved + ` AND NOT ` + synced + ` AND ` + active + ` THEN 1 ELSE 0 END), 0) AS sync_active_count, + COALESCE(SUM(CASE WHEN ` + unsupported + ` THEN 1 ELSE 0 END), 0) AS unsupported_sync_status_count + ` + fromSQL + where +} + +func (h *DataOpsHandler) selectBulkEpisodeIDs(ctx context.Context, q dataOpsEpisodeQuery) ([]int64, error) { + fromSQL := dataOpsEpisodeBaseFromSQL() + where, args := buildDataOpsEpisodeWhere(q) + query := dataOpsEpisodeIDSnapshotSQL(fromSQL, where) + + ids := []int64{} + if err := h.db.SelectContext(ctx, &ids, query, args...); err != nil { + return nil, err + } + return ids, nil +} + +func dataOpsEpisodeIDSnapshotSQL(fromSQL string, where string) string { + return ` + SELECT e.id + ` + fromSQL + where + ` + ORDER BY e.created_at DESC, e.id DESC + ` +} + +func (h *DataOpsHandler) runBulkEpisodeQA(ids []int64) { + matched := int64(len(ids)) + if matched == 0 { + logger.Printf("[DATA_OPS] Bulk QA completed: matched=0, attempted=0, skipped=0, failed=0") + return + } + + workerCount := dataOpsBulkQAConcurrency + if len(ids) < workerCount { + workerCount = len(ids) + } + + var attempted int64 + var skipped int64 + var failed int64 + jobs := make(chan int64) + var wg sync.WaitGroup + + for i := 0; i < workerCount; i++ { + wg.Add(1) + go func() { + defer wg.Done() + for episodeID := range jobs { + ctx, cancel := context.WithTimeout(context.Background(), defaultEpisodeQATimeout) + _, err := h.qa.RunEpisodeQASuite(ctx, episodeID, qaRunModeManual) + cancel() + if err != nil { + if isBulkQASkippedError(err) { + atomic.AddInt64(&skipped, 1) + continue + } + atomic.AddInt64(&failed, 1) + logger.Printf("[DATA_OPS] Bulk QA failed: episode=%d, err=%v", episodeID, err) + continue + } + atomic.AddInt64(&attempted, 1) + } + }() + } + + for _, episodeID := range ids { + jobs <- episodeID + } + close(jobs) + wg.Wait() + + logger.Printf( + "[DATA_OPS] Bulk QA completed: matched=%d, attempted=%d, skipped=%d, failed=%d", + matched, + attempted, + skipped, + failed, + ) +} + +func isBulkQASkippedError(err error) bool { + return errors.Is(err, errEpisodeQAAlreadyRunning) || + errors.Is(err, errEpisodeQANotFound) || + errors.Is(err, errEpisodeQAAutoSkipped) +} + +func (h *DataOpsHandler) runBulkEpisodeSync(ids []int64) { + matched := int64(len(ids)) + var attempted int64 + var skipped int64 + var failed int64 + + for _, episodeID := range ids { + err := h.syncWorker.EnqueueEpisodeManual(context.Background(), episodeID) + if err != nil { + if isBulkSyncSkippedError(err) { + skipped++ + continue + } + failed++ + logger.Printf("[DATA_OPS] Bulk sync enqueue failed: episode=%d, err=%v", episodeID, err) + continue + } + attempted++ + } + + logger.Printf( + "[DATA_OPS] Bulk sync completed: matched=%d, attempted=%d, skipped=%d, failed=%d", + matched, + attempted, + skipped, + failed, + ) +} + +func isBulkSyncSkippedError(err error) bool { + if errors.Is(err, services.ErrEpisodeAlreadyEnqueued) || + errors.Is(err, services.ErrSyncAlreadyInProgress) { + return true + } + + msg := strings.ToLower(err.Error()) + return strings.Contains(msg, "already synced") || + strings.Contains(msg, "qa_status") || + strings.Contains(msg, "sync already completed") +} diff --git a/internal/api/handlers/data_ops_test.go b/internal/api/handlers/data_ops_test.go new file mode 100644 index 0000000..e2537bb --- /dev/null +++ b/internal/api/handlers/data_ops_test.go @@ -0,0 +1,329 @@ +// SPDX-FileCopyrightText: 2026 ArcheBase +// +// SPDX-License-Identifier: MulanPSL-2.0 + +package handlers + +import ( + "bytes" + "context" + "net/http" + "net/http/httptest" + "strings" + "testing" + + "github.com/gin-gonic/gin" + "github.com/jmoiron/sqlx" + _ "modernc.org/sqlite" +) + +func TestParseDataOpsEpisodeQuery(t *testing.T) { + gin.SetMode(gin.TestMode) + + c, _ := gin.CreateTestContext(httptest.NewRecorder()) + c.Request = httptest.NewRequest(http.MethodGet, "/data-ops/episodes?limit=20&offset=40&created_at_from=2026-06-01T00:00:00Z&created_at_to=2026-06-06T00:00:00Z&q=ep&qa_status=failed,pending_qa&sync_status=not_started,failed&scene_id=1,2&sop_id=9,10&robot_type_id=3&robot_device_id=robot-001,robot-002&collector_operator_id=op001&label=recalled_batch", nil) + + got, err := parseDataOpsEpisodeQuery(c) + if err != nil { + t.Fatalf("parseDataOpsEpisodeQuery returned error: %v", err) + } + if got.Pagination.Limit != 20 || got.Pagination.Offset != 40 { + t.Fatalf("unexpected pagination: %+v", got.Pagination) + } + if !got.HasCreatedAtFrom || !got.HasCreatedAtTo || got.Keyword != "ep" || got.Label != "recalled_batch" { + t.Fatalf("unexpected scalar filters: %+v", got) + } + if strings.Join(got.QAStatuses, ",") != "failed,pending_qa" { + t.Fatalf("unexpected qa statuses: %#v", got.QAStatuses) + } + if strings.Join(got.SyncStatuses, ",") != "not_started,failed" { + t.Fatalf("unexpected sync statuses: %#v", got.SyncStatuses) + } + if len(got.SceneIDs) != 2 || got.SceneIDs[0] != 1 || got.SceneIDs[1] != 2 { + t.Fatalf("unexpected scene ids: %#v", got.SceneIDs) + } + if len(got.SOPIDs) != 2 || got.SOPIDs[0] != 9 || got.SOPIDs[1] != 10 { + t.Fatalf("unexpected sop ids: %#v", got.SOPIDs) + } + if len(got.RobotTypeIDs) != 1 || got.RobotTypeIDs[0] != 3 { + t.Fatalf("unexpected robot type ids: %#v", got.RobotTypeIDs) + } + if strings.Join(got.RobotDeviceIDs, ",") != "robot-001,robot-002" || strings.Join(got.CollectorOperatorIDs, ",") != "op001" { + t.Fatalf("unexpected string filters: %+v", got) + } +} + +func TestDataOpsEpisodeWhereIncludesSOPFilter(t *testing.T) { + sql, args := buildDataOpsEpisodeWhere(dataOpsEpisodeQuery{SOPIDs: []int64{9, 10}}) + if !strings.Contains(sql, "COALESCE(e.sop_id, t.sop_id) IN (?,?)") { + t.Fatalf("SOP filter SQL should use episode/task SOP fallback: %s", sql) + } + if len(args) != 2 || args[0] != int64(9) || args[1] != int64(10) { + t.Fatalf("unexpected args: %#v", args) + } +} + +func TestDataOpsEpisodeListSQLIncludesSOPColumns(t *testing.T) { + sql := dataOpsEpisodeListSQL(dataOpsEpisodeBaseFromSQL(), " WHERE e.deleted_at IS NULL") + for _, want := range []string{ + "COALESCE(e.sop_id, t.sop_id) AS sop_id", + "LEFT JOIN sops s ON s.id = COALESCE(e.sop_id, t.sop_id)", + "CONCAT('SOP #', CAST(COALESCE(e.sop_id, t.sop_id) AS CHAR))", + "ELSE CONCAT(s.slug, ' @ ', s.version)", + "COALESCE(NULLIF(dc.name, ''), NULLIF(ws.collector_name, '')) AS collector_name", + } { + if !strings.Contains(sql, want) { + t.Fatalf("data ops SQL should include %q: %s", want, sql) + } + } +} + +func TestDataOpsSyncStatusWhereSupportsNotStartedAndLatestStatus(t *testing.T) { + sql, args := dataOpsSyncStatusWhere([]string{"not_started", "failed"}) + if !strings.Contains(sql, "NOT EXISTS") { + t.Fatalf("sync status SQL should include not_started branch: %s", sql) + } + if !strings.Contains(sql, "MAX(sl2.id)") || !strings.Contains(sql, "sl_latest.status IN (?)") { + t.Fatalf("sync status SQL should filter latest sync log status: %s", sql) + } + if len(args) != 1 || args[0] != "failed" { + t.Fatalf("unexpected args: %#v", args) + } +} + +func TestDataOpsLatestQueriesOnlyUsePageEpisodeIDs(t *testing.T) { + qaSQL, qaArgs := dataOpsLatestQAChecksSQL([]int64{10, 20}) + if !strings.Contains(qaSQL, "WHERE episode_id IN (?,?)") { + t.Fatalf("latest QA SQL should constrain page episode IDs: %s", qaSQL) + } + if len(qaArgs) != 2 { + t.Fatalf("latest QA args = %#v", qaArgs) + } + + syncSQL, syncArgs := dataOpsLatestSyncLogsSQL([]int64{10, 20}) + if !strings.Contains(syncSQL, "WHERE episode_id IN (?,?)") { + t.Fatalf("latest sync SQL should constrain page episode IDs: %s", syncSQL) + } + if len(syncArgs) != 2 { + t.Fatalf("latest sync args = %#v", syncArgs) + } +} + +func TestParseDataOpsBulkEpisodeFilters(t *testing.T) { + got, err := parseDataOpsBulkEpisodeFilters(DataOpsBulkEpisodeFilters{ + CreatedAtFrom: "2026-06-01T00:00:00Z", + CreatedAtTo: "2026-06-06T00:00:00Z", + Keyword: "ep", + QAStatus: "failed,pending_qa", + SyncStatus: "not_started,failed", + SceneID: "1,2", + SOPID: "9,10", + RobotTypeID: "3", + RobotDeviceID: "robot-001,robot-002", + CollectorOperatorID: "op001", + Label: "recalled_batch", + Limit: "20", + Offset: "40", + }) + if err != nil { + t.Fatalf("parseDataOpsBulkEpisodeFilters returned error: %v", err) + } + if got.Pagination.Limit != 0 || got.Pagination.Offset != 0 { + t.Fatalf("bulk filters should ignore pagination: %+v", got.Pagination) + } + if !got.HasCreatedAtFrom || !got.HasCreatedAtTo || got.Keyword != "ep" || got.Label != "recalled_batch" { + t.Fatalf("unexpected scalar filters: %+v", got) + } + if strings.Join(got.QAStatuses, ",") != "failed,pending_qa" { + t.Fatalf("unexpected qa statuses: %#v", got.QAStatuses) + } + if strings.Join(got.SyncStatuses, ",") != "not_started,failed" { + t.Fatalf("unexpected sync statuses: %#v", got.SyncStatuses) + } + if len(got.SceneIDs) != 2 || got.SceneIDs[0] != 1 || got.SceneIDs[1] != 2 { + t.Fatalf("unexpected scene ids: %#v", got.SceneIDs) + } + if len(got.SOPIDs) != 2 || got.SOPIDs[0] != 9 || got.SOPIDs[1] != 10 { + t.Fatalf("unexpected sop ids: %#v", got.SOPIDs) + } + if len(got.RobotTypeIDs) != 1 || got.RobotTypeIDs[0] != 3 { + t.Fatalf("unexpected robot type ids: %#v", got.RobotTypeIDs) + } + if strings.Join(got.RobotDeviceIDs, ",") != "robot-001,robot-002" || strings.Join(got.CollectorOperatorIDs, ",") != "op001" { + t.Fatalf("unexpected string filters: %+v", got) + } +} + +func TestParseDataOpsBulkEpisodeFiltersDoesNotCapMultiValueCount(t *testing.T) { + got, err := parseDataOpsBulkEpisodeFilters(DataOpsBulkEpisodeFilters{ + SceneID: joinedNumberList(maxMultiValueFilterItems + 1), + RobotDeviceID: joinedStringList("robot-", maxMultiValueFilterItems+1), + }) + if err != nil { + t.Fatalf("parseDataOpsBulkEpisodeFilters returned error: %v", err) + } + if len(got.SceneIDs) != maxMultiValueFilterItems+1 { + t.Fatalf("scene id count = %d, want %d", len(got.SceneIDs), maxMultiValueFilterItems+1) + } + if len(got.RobotDeviceIDs) != maxMultiValueFilterItems+1 { + t.Fatalf("robot device id count = %d, want %d", len(got.RobotDeviceIDs), maxMultiValueFilterItems+1) + } +} + +func TestParseDataOpsBulkEpisodeRequestConfirmGuard(t *testing.T) { + gin.SetMode(gin.TestMode) + + recorder := httptest.NewRecorder() + c, _ := gin.CreateTestContext(recorder) + c.Request = httptest.NewRequest(http.MethodPost, "/data-ops/episodes/bulk-qa", bytes.NewBufferString(`{"filters":{}}`)) + c.Request.Header.Set("Content-Type", "application/json") + + h := &DataOpsHandler{} + if _, _, ok := h.parseBulkEpisodeActionRequest(c, true); ok { + t.Fatal("bulk execute request without confirm should fail") + } + if recorder.Code != http.StatusBadRequest { + t.Fatalf("status = %d, want 400", recorder.Code) + } +} + +func TestParseDataOpsBulkEpisodeRequestPreviewDoesNotRequireConfirm(t *testing.T) { + gin.SetMode(gin.TestMode) + + c, _ := gin.CreateTestContext(httptest.NewRecorder()) + c.Request = httptest.NewRequest(http.MethodPost, "/data-ops/episodes/bulk-qa/preview", bytes.NewBufferString(`{"filters":{"qa_status":"failed"}}`)) + c.Request.Header.Set("Content-Type", "application/json") + + h := &DataOpsHandler{} + _, q, ok := h.parseBulkEpisodeActionRequest(c, false) + if !ok { + t.Fatal("bulk preview request should not require confirm") + } + if strings.Join(q.QAStatuses, ",") != "failed" { + t.Fatalf("unexpected qa statuses: %#v", q.QAStatuses) + } +} + +func TestDataOpsEpisodeIDSnapshotSQLUsesDataOpsOrdering(t *testing.T) { + sql := dataOpsEpisodeIDSnapshotSQL(dataOpsEpisodeBaseFromSQL(), " WHERE e.deleted_at IS NULL") + for _, want := range []string{ + "SELECT e.id", + "FROM episodes e", + "ORDER BY e.created_at DESC, e.id DESC", + } { + if !strings.Contains(sql, want) { + t.Fatalf("ID snapshot SQL should include %q: %s", want, sql) + } + } +} + +func TestDataOpsBulkPreviewSQLs(t *testing.T) { + qaSQL := dataOpsBulkQAPreviewSQL(dataOpsEpisodeBaseFromSQL(), " WHERE e.deleted_at IS NULL") + for _, want := range []string{"matched_count", "qa_running_count", "protected_status_count"} { + if !strings.Contains(qaSQL, want) { + t.Fatalf("QA preview SQL should include %q: %s", want, qaSQL) + } + } + + syncSQL := dataOpsBulkSyncPreviewSQL(dataOpsEpisodeBaseFromSQL()+dataOpsLatestSyncPreviewJoinSQL(), " WHERE e.deleted_at IS NULL") + for _, want := range []string{"latest_sync", "eligible_count", "qa_not_approved_count", "already_synced_count", "sync_active_count"} { + if !strings.Contains(syncSQL, want) { + t.Fatalf("sync preview SQL should include %q: %s", want, syncSQL) + } + } +} + +func TestPreviewBulkEpisodeSyncTreatsMissingSyncLogAsEligible(t *testing.T) { + db := setupDataOpsBulkPreviewTestDB(t) + h := &DataOpsHandler{db: db} + + for id := int64(1); id <= 11; id++ { + if _, err := db.Exec(` + INSERT INTO episodes (id, episode_id, task_id, scene_id, qa_status, cloud_synced, deleted_at, created_at) + VALUES (?, ?, 0, 0, 'approved', 0, NULL, '2026-06-01T00:00:00Z') + `, id, "episode"); err != nil { + t.Fatalf("insert episode %d: %v", id, err) + } + } + if _, err := db.Exec(` + INSERT INTO sync_logs (id, episode_id, status) + VALUES (1, 1, 'failed') + `); err != nil { + t.Fatalf("insert failed sync log: %v", err) + } + + preview, err := h.previewBulkEpisodeSync(context.Background(), dataOpsEpisodeQuery{ + QAStatuses: []string{"approved"}, + }) + if err != nil { + t.Fatalf("previewBulkEpisodeSync returned error: %v", err) + } + + if preview.MatchedCount != 11 || preview.EligibleCount != 11 || preview.SkippedCount != 0 { + t.Fatalf("preview counts = matched %d eligible %d skipped %d, want 11/11/0", preview.MatchedCount, preview.EligibleCount, preview.SkippedCount) + } + if len(preview.SkippedBreakdown) != 0 { + t.Fatalf("unexpected skipped breakdown: %#v", preview.SkippedBreakdown) + } +} + +func setupDataOpsBulkPreviewTestDB(t *testing.T) *sqlx.DB { + t.Helper() + + db, err := sqlx.Open("sqlite", ":memory:") + if err != nil { + t.Fatalf("open sqlite: %v", err) + } + t.Cleanup(func() { + if err := db.Close(); err != nil { + t.Fatalf("close sqlite: %v", err) + } + }) + + schema := []string{ + `CREATE TABLE episodes ( + id INTEGER PRIMARY KEY, + episode_id TEXT NOT NULL, + task_id INTEGER NOT NULL, + scene_id INTEGER NOT NULL, + workstation_id INTEGER, + sop_id INTEGER, + qa_status TEXT, + cloud_synced BOOLEAN NOT NULL DEFAULT 0, + deleted_at TEXT, + created_at TEXT NOT NULL + )`, + `CREATE TABLE tasks ( + id INTEGER PRIMARY KEY, + sop_id INTEGER, + workstation_id INTEGER, + deleted_at TEXT + )`, + `CREATE TABLE scenes (id INTEGER PRIMARY KEY, deleted_at TEXT)`, + `CREATE TABLE workstations ( + id INTEGER PRIMARY KEY, + robot_id INTEGER, + data_collector_id INTEGER, + deleted_at TEXT + )`, + `CREATE TABLE robots ( + id INTEGER PRIMARY KEY, + robot_type_id INTEGER, + deleted_at TEXT + )`, + `CREATE TABLE robot_types (id INTEGER PRIMARY KEY, deleted_at TEXT)`, + `CREATE TABLE data_collectors (id INTEGER PRIMARY KEY, deleted_at TEXT)`, + `CREATE TABLE sops (id INTEGER PRIMARY KEY, deleted_at TEXT)`, + `CREATE TABLE sync_logs ( + id INTEGER PRIMARY KEY, + episode_id INTEGER NOT NULL, + status TEXT NOT NULL + )`, + } + for _, stmt := range schema { + if _, err := db.Exec(stmt); err != nil { + t.Fatalf("create schema: %v", err) + } + } + return db +} diff --git a/internal/api/handlers/data_production_statistics.go b/internal/api/handlers/data_production_statistics.go index dc76b51..68f53b0 100644 --- a/internal/api/handlers/data_production_statistics.go +++ b/internal/api/handlers/data_production_statistics.go @@ -211,6 +211,7 @@ type breakdownStatsRow struct { SuccessCount sql.NullInt64 `db:"success_count"` FailedCount sql.NullInt64 `db:"failed_count"` ProcessingCount sql.NullInt64 `db:"processing_count"` + TotalDurationMs sql.NullFloat64 `db:"total_duration_ms"` AvgDurationMs sql.NullFloat64 `db:"avg_duration_ms"` MaxDurationMs sql.NullFloat64 `db:"max_duration_ms"` TotalBytes sql.NullInt64 `db:"total_bytes"` @@ -510,6 +511,7 @@ func dataProductionBreakdownSQL(idExpr string, nameExpr string, baseSQL string) COALESCE(SUM(CASE WHEN status = 'success' THEN count_value ELSE 0 END), 0) AS success_count, COALESCE(SUM(CASE WHEN status IN ('failed', 'cancelled') THEN count_value ELSE 0 END), 0) AS failed_count, COALESCE(SUM(CASE WHEN status = 'processing' THEN count_value ELSE 0 END), 0) AS processing_count, + SUM(duration_ms) AS total_duration_ms, AVG(duration_ms) AS avg_duration_ms, MAX(duration_ms) AS max_duration_ms, COALESCE(SUM(COALESCE(size_bytes, 0)), 0) AS total_bytes, @@ -971,8 +973,9 @@ func breakdownRowToItem(row breakdownStatsRow) dataProductionBreakdownItem { SuccessRate: rate(success, total), }, Duration: statsDurationMetrics{ - AvgMs: roundNullFloat(row.AvgDurationMs), - MaxMs: roundNullFloat(row.MaxDurationMs), + TotalMs: roundNullFloat(row.TotalDurationMs), + AvgMs: roundNullFloat(row.AvgDurationMs), + MaxMs: roundNullFloat(row.MaxDurationMs), }, Size: statsSizeMetrics{ TotalBytes: nullInt64(row.TotalBytes), diff --git a/internal/api/handlers/data_production_statistics_test.go b/internal/api/handlers/data_production_statistics_test.go index 4614cf5..f35409b 100644 --- a/internal/api/handlers/data_production_statistics_test.go +++ b/internal/api/handlers/data_production_statistics_test.go @@ -131,6 +131,9 @@ func TestDataProductionBreakdownSQLGroupsByDimensionExpression(t *testing.T) { } querySQL := dataProductionBreakdownSQL("robot_device_id", "robot_device_id", "SELECT 1") + if !strings.Contains(querySQL, "SUM(duration_ms) AS total_duration_ms") { + t.Fatalf("breakdown SQL should select total duration: %s", querySQL) + } if !strings.Contains(querySQL, "GROUP BY robot_device_id") { t.Fatalf("breakdown SQL should group by the dimension expression: %s", querySQL) } @@ -139,6 +142,21 @@ func TestDataProductionBreakdownSQLGroupsByDimensionExpression(t *testing.T) { } } +func TestBreakdownRowToItemIncludesTotalDuration(t *testing.T) { + item := breakdownRowToItem(breakdownStatsRow{ + TotalDurationMs: sql.NullFloat64{Float64: 1234.4, Valid: true}, + AvgDurationMs: sql.NullFloat64{Float64: 617.2, Valid: true}, + MaxDurationMs: sql.NullFloat64{Float64: 900.6, Valid: true}, + }) + + if item.Duration.TotalMs != 1234 { + t.Fatalf("total duration = %d, want 1234", item.Duration.TotalMs) + } + if item.Duration.AvgMs != 617 || item.Duration.MaxMs != 901 { + t.Fatalf("unexpected duration metrics: %+v", item.Duration) + } +} + func TestStatsBreakdownExpressionsSupportsEpisodeDimensions(t *testing.T) { tests := []struct { dimension string diff --git a/internal/api/handlers/episode.go b/internal/api/handlers/episode.go index 38024b6..26bce51 100644 --- a/internal/api/handlers/episode.go +++ b/internal/api/handlers/episode.go @@ -94,6 +94,7 @@ type episodeRow struct { DurationSec sql.NullFloat64 `db:"duration_sec"` QaStatus string `db:"qa_status"` QaScore sql.NullFloat64 `db:"qa_score"` + QualityFlag sql.NullString `db:"quality_flag"` AutoApproved bool `db:"auto_approved"` InspectorID sql.NullString `db:"inspector_id"` InspectionDecision sql.NullString `db:"inspection_decision"` @@ -124,6 +125,7 @@ type Episode struct { DurationSec *float64 `json:"duration_sec"` QaStatus string `json:"qa_status"` QaScore *float64 `json:"qa_score"` + QualityFlag *string `json:"quality_flag,omitempty"` AutoApproved bool `json:"auto_approved"` InspectorID *string `json:"inspector_id"` InspectionDecision *string `json:"inspection_decision"` @@ -275,6 +277,7 @@ func (h *EpisodeHandler) ListEpisodes(c *gin.Context) { e.duration_sec, COALESCE(e.qa_status, '') as qa_status, e.qa_score, + e.quality_flag, e.auto_approved, e.cloud_synced, e.cloud_processed, @@ -445,6 +448,7 @@ func (h *EpisodeHandler) ListEpisodes(c *gin.Context) { DurationSec: nullableFloat64(r.DurationSec), QaStatus: r.QaStatus, QaScore: nullableFloat64(r.QaScore), + QualityFlag: nullableString(r.QualityFlag), AutoApproved: r.AutoApproved, InspectorID: nullableString(r.InspectorID), InspectionDecision: nullableString(r.InspectionDecision), @@ -537,10 +541,12 @@ func (h *EpisodeHandler) GetEpisodePresignedURL(c *gin.Context) { } var row struct { - McapPath string `db:"mcap_path"` - SidecarPath string `db:"sidecar_path"` + McapPath string `db:"mcap_path"` + SidecarPath string `db:"sidecar_path"` + QaStatus string `db:"qa_status"` + QualityFlag sql.NullString `db:"quality_flag"` } - err := h.db.Get(&row, "SELECT mcap_path, sidecar_path FROM episodes WHERE id = ? AND deleted_at IS NULL LIMIT 1", episodeID) + err := h.db.Get(&row, "SELECT mcap_path, sidecar_path, COALESCE(qa_status, '') AS qa_status, quality_flag FROM episodes WHERE id = ? AND deleted_at IS NULL LIMIT 1", episodeID) if err == sql.ErrNoRows { c.JSON(http.StatusNotFound, gin.H{"error": "episode not found"}) return @@ -557,6 +563,14 @@ func (h *EpisodeHandler) GetEpisodePresignedURL(c *gin.Context) { selectedPath = row.SidecarPath fieldName = "sidecar_path" } + if kind == "mcap" && row.QaStatus == "failed" { + resp := gin.H{"error": "episode mcap is blocked by failed qa status"} + if row.QualityFlag.Valid && strings.TrimSpace(row.QualityFlag.String) != "" { + resp["quality_flag"] = row.QualityFlag.String + } + c.JSON(http.StatusConflict, resp) + return + } bucket, objectName, ok := resolveEpisodeMcapLocation(h.bucket, selectedPath) if !ok { @@ -609,6 +623,7 @@ func (h *EpisodeHandler) GetEpisode(c *gin.Context) { e.duration_sec, COALESCE(e.qa_status, '') AS qa_status, e.qa_score, + e.quality_flag, e.auto_approved, CASE WHEN i.inspector_id IS NULL THEN NULL ELSE ins.inspector_id END AS inspector_id, CASE WHEN i.decision IS NULL THEN NULL ELSE i.decision END AS inspection_decision, @@ -660,6 +675,7 @@ func (h *EpisodeHandler) GetEpisode(c *gin.Context) { DurationSec: nullableFloat64(row.DurationSec), QaStatus: row.QaStatus, QaScore: nullableFloat64(row.QaScore), + QualityFlag: nullableString(row.QualityFlag), AutoApproved: row.AutoApproved, InspectorID: nullableString(row.InspectorID), InspectionDecision: nullableString(row.InspectionDecision), diff --git a/internal/api/handlers/episode_qa_check.go b/internal/api/handlers/episode_qa_check.go new file mode 100644 index 0000000..e4fd95a --- /dev/null +++ b/internal/api/handlers/episode_qa_check.go @@ -0,0 +1,1046 @@ +// SPDX-FileCopyrightText: 2026 ArcheBase +// +// SPDX-License-Identifier: MulanPSL-2.0 + +package handlers + +import ( + "bytes" + "context" + "database/sql" + "encoding/json" + "errors" + "fmt" + "io" + "net/http" + "strconv" + "strings" + "time" + + "github.com/gin-gonic/gin" + "github.com/jmoiron/sqlx" + "github.com/minio/minio-go/v7" + + "archebase.com/keystone-edge/internal/auth" + "archebase.com/keystone-edge/internal/config" + "archebase.com/keystone-edge/internal/logger" + "archebase.com/keystone-edge/internal/storage/s3" +) + +const ( + episodeQACheckMcapMagic = "mcap_magic" + + qaRunModeAuto QARunMode = "auto" + qaRunModeManual QARunMode = "manual" + + qaStatusPendingQA = "pending_qa" + qaStatusRunning = "qa_running" + qaStatusApproved = "approved" + qaStatusNeedsInspection = "needs_inspection" + qaStatusInspectorApproved = "inspector_approved" + qaStatusRejected = "rejected" + qaStatusFailed = "failed" + + defaultEpisodeQAQueueSize = 256 + defaultEpisodeQATimeout = 2 * time.Minute +) + +var ( + mcapMagicBytes = []byte{0x89, 0x4d, 0x43, 0x41, 0x50, 0x30, 0x0d, 0x0a} + + errEpisodeQANotFound = errors.New("episode not found") + errEpisodeQAAlreadyRunning = errors.New("episode qa already running") + errEpisodeQAAutoSkipped = errors.New("episode auto qa skipped") +) + +// QARunMode identifies whether a suite run was triggered automatically or manually. +type QARunMode string + +// EpisodeQAHandler handles QA center APIs and lightweight automatic QA execution. +type EpisodeQAHandler struct { + db *sqlx.DB + s3 *s3.Client + bucket string + authCfg *config.AuthConfig + queue chan int64 +} + +// EpisodeQARunRequest is the request body for running an episode QA suite. +type EpisodeQARunRequest struct { + Mode QARunMode `json:"mode,omitempty" example:"manual"` +} + +// EpisodeQACheckRecordResponse is one persisted QA check result. +type EpisodeQACheckRecordResponse struct { + ID int64 `json:"id,omitempty"` + EpisodeID int64 `json:"episode_id,omitempty"` + CheckName string `json:"check_name"` + Passed bool `json:"passed"` + Score float64 `json:"score"` + Details string `json:"details"` + CheckMetadata map[string]any `json:"check_metadata,omitempty"` + CheckedAt string `json:"checked_at"` +} + +// EpisodeQASuiteResponse is the response for a full episode QA suite run. +type EpisodeQASuiteResponse struct { + EpisodeID int64 `json:"episode_id"` + QAStatus string `json:"qa_status"` + Passed bool `json:"passed"` + Mode QARunMode `json:"mode"` + Checks []EpisodeQACheckRecordResponse `json:"checks"` +} + +// EpisodeQAEpisodeResponse is one row in the QA center episode list. +type EpisodeQAEpisodeResponse struct { + ID int64 `json:"id"` + PublicID string `json:"public_id"` + EpisodeID string `json:"episode_id"` + TaskID int64 `json:"task_id"` + TaskPublicID *string `json:"task_public_id,omitempty"` + RobotType *string `json:"robot_type,omitempty"` + QAStatus string `json:"qa_status"` + QualityFlag *string `json:"quality_flag,omitempty"` + CreatedAt string `json:"created_at"` + LatestQACheck *EpisodeQACheckRecordResponse `json:"latest_qa_check,omitempty"` +} + +// EpisodeQAPaginationResponse describes page-based QA center pagination. +type EpisodeQAPaginationResponse struct { + Page int `json:"page"` + PageSize int `json:"page_size"` + Total int `json:"total"` +} + +// EpisodeQAListResponse is the QA center episode list response. +type EpisodeQAListResponse struct { + Items []EpisodeQAEpisodeResponse `json:"items"` + Pagination EpisodeQAPaginationResponse `json:"pagination"` + Total int `json:"total"` + Limit int `json:"limit"` + Offset int `json:"offset"` + HasNext bool `json:"hasNext,omitempty"` + HasPrev bool `json:"hasPrev,omitempty"` +} + +type episodeQACheckOutcome struct { + CheckName string + Passed bool + Score float64 + Details string + Metadata map[string]any +} + +type episodeQACheckRow struct { + ID int64 `db:"id"` + McapPath string `db:"mcap_path"` + QAStatus string `db:"qa_status"` + Quality sql.NullString `db:"quality_flag"` +} + +type episodeQARunClaim struct { + EpisodeID int64 + OriginalStatus string + MutableStatus bool +} + +type episodeQACheckDBRow struct { + ID int64 `db:"id"` + EpisodeID int64 `db:"episode_id"` + CheckName string `db:"check_name"` + Passed bool `db:"passed"` + Score float64 `db:"score"` + Details sql.NullString `db:"details"` + CheckMetadata sql.NullString `db:"check_metadata"` + CheckedAt sql.NullTime `db:"checked_at"` +} + +type episodeQAListRow struct { + ID int64 `db:"id"` + EpisodeID string `db:"episode_id"` + TaskID int64 `db:"task_id"` + TaskPublicID sql.NullString `db:"task_public_id"` + RobotType sql.NullString `db:"robot_type"` + QAStatus string `db:"qa_status"` + QualityFlag sql.NullString `db:"quality_flag"` + CreatedAt time.Time `db:"created_at"` + LatestCheckID sql.NullInt64 `db:"latest_check_id"` + LatestCheckName sql.NullString `db:"latest_check_name"` + LatestCheckPassed sql.NullBool `db:"latest_check_passed"` + LatestCheckScore sql.NullFloat64 `db:"latest_check_score"` + LatestCheckDetails sql.NullString `db:"latest_check_details"` + LatestCheckMetadata sql.NullString `db:"latest_check_metadata"` + LatestCheckCheckedAt sql.NullTime `db:"latest_check_checked_at"` +} + +// NewEpisodeQAHandler creates the QA handler and starts the in-memory auto-QA worker. +func NewEpisodeQAHandler(db *sqlx.DB, s3Client *s3.Client, bucket string, authCfg *config.AuthConfig) *EpisodeQAHandler { + h := &EpisodeQAHandler{ + db: db, + s3: s3Client, + bucket: strings.TrimSpace(bucket), + authCfg: authCfg, + queue: make(chan int64, defaultEpisodeQAQueueSize), + } + if db != nil { + go h.runAutoWorker() + } + return h +} + +// RegisterRoutes registers QA center routes under /api/v1/qa. +func (h *EpisodeQAHandler) RegisterRoutes(apiV1 *gin.RouterGroup) { + qa := apiV1.Group("/qa") + qa.GET("/episodes", h.ListQAEpisodes) + qa.GET("/episodes/:id/checks", h.ListEpisodeQAChecks) + qa.POST("/episodes/:id/run", h.RunEpisodeQASuiteHTTP) +} + +// EnqueueEpisode schedules lightweight automatic QA for a newly created episode. +func (h *EpisodeQAHandler) EnqueueEpisode(episodeID int64) { + if h == nil || h.queue == nil || episodeID <= 0 { + return + } + select { + case h.queue <- episodeID: + default: + logger.Printf("[EPISODE-QA] Auto QA queue full, dropped episode=%d", episodeID) + } +} + +func (h *EpisodeQAHandler) runAutoWorker() { + for episodeID := range h.queue { + ctx, cancel := context.WithTimeout(context.Background(), defaultEpisodeQATimeout) + if _, err := h.RunEpisodeQASuite(ctx, episodeID, qaRunModeAuto); err != nil && !errors.Is(err, errEpisodeQAAutoSkipped) { + logger.Printf("[EPISODE-QA] Auto QA failed: episode=%d, err=%v", episodeID, err) + } + cancel() + } +} + +func (h *EpisodeQAHandler) requireBearerJWT(c *gin.Context) bool { + if h.authCfg == nil { + return true + } + authHeader := strings.TrimSpace(c.GetHeader("Authorization")) + parts := strings.SplitN(authHeader, " ", 2) + if len(parts) != 2 || !strings.EqualFold(parts[0], "Bearer") || strings.TrimSpace(parts[1]) == "" { + c.JSON(http.StatusUnauthorized, gin.H{"error": "missing or invalid authorization header"}) + return false + } + if _, err := auth.ParseToken(parts[1], h.authCfg); err != nil { + c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid or expired token"}) + return false + } + return true +} + +// ListQAEpisodes lists episodes for the QA center. +// +// @Summary List QA center episodes +// @Description Lists episodes with latest QA check. Defaults to actionable statuses. +// @Tags qa +// @Produce json +// @Param status query string false "QA status filter: all, pending_qa, failed, needs_inspection, approved, rejected" +// @Param robot_type query string false "Robot type name or model" +// @Param q query string false "Search episode/task/quality text" +// @Param page query int false "Page number, default 1" +// @Param page_size query int false "Page size, default 20" +// @Success 200 {object} EpisodeQAListResponse +// @Failure 400 {object} map[string]string +// @Failure 500 {object} map[string]string +// @Router /qa/episodes [get] +func (h *EpisodeQAHandler) ListQAEpisodes(c *gin.Context) { + if h.db == nil { + c.JSON(http.StatusServiceUnavailable, gin.H{"error": "database is not configured"}) + return + } + if !h.requireBearerJWT(c) { + return + } + + page := parsePositiveIntQuery(c, "page", 1) + pageSize := parsePositiveIntQuery(c, "page_size", 20) + if pageSize > 100 { + pageSize = 100 + } + offset := (page - 1) * pageSize + + statuses, err := qaEpisodeStatusFilter(c.DefaultQuery("status", "")) + if err != nil { + c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) + return + } + + where, args := buildQAEpisodeListWhere(statuses, c.Query("robot_type"), c.Query("q")) + countQuery := ` + SELECT COUNT(1) + FROM episodes e + LEFT JOIN tasks t ON t.id = e.task_id AND t.deleted_at IS NULL + LEFT JOIN workstations ws ON ws.id = e.workstation_id AND ws.deleted_at IS NULL + LEFT JOIN robots r ON r.id = ws.robot_id AND r.deleted_at IS NULL + LEFT JOIN robot_types rt ON rt.id = r.robot_type_id AND rt.deleted_at IS NULL + ` + where + + var total int + if err := h.db.Get(&total, countQuery, args...); err != nil { + logger.Printf("[EPISODE-QA] Failed to count QA episodes: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to count qa episodes"}) + return + } + + query := ` + SELECT + e.id, + e.episode_id, + e.task_id, + t.task_id AS task_public_id, + COALESCE(rt.name, rt.model, '') AS robot_type, + COALESCE(e.qa_status, '') AS qa_status, + e.quality_flag, + e.created_at, + qc.id AS latest_check_id, + qc.check_name AS latest_check_name, + qc.passed AS latest_check_passed, + qc.score AS latest_check_score, + qc.details AS latest_check_details, + qc.check_metadata AS latest_check_metadata, + qc.checked_at AS latest_check_checked_at + FROM episodes e + LEFT JOIN tasks t ON t.id = e.task_id AND t.deleted_at IS NULL + LEFT JOIN workstations ws ON ws.id = e.workstation_id AND ws.deleted_at IS NULL + LEFT JOIN robots r ON r.id = ws.robot_id AND r.deleted_at IS NULL + LEFT JOIN robot_types rt ON rt.id = r.robot_type_id AND rt.deleted_at IS NULL + LEFT JOIN qa_checks qc ON qc.id = ( + SELECT qc2.id + FROM qa_checks qc2 + WHERE qc2.episode_id = e.id + ORDER BY qc2.checked_at DESC, qc2.id DESC + LIMIT 1 + ) + ` + where + ` + ORDER BY e.created_at DESC, e.id DESC + LIMIT ? OFFSET ? + ` + queryArgs := append(append([]interface{}{}, args...), pageSize, offset) + + var rows []episodeQAListRow + if err := h.db.Select(&rows, query, queryArgs...); err != nil { + logger.Printf("[EPISODE-QA] Failed to query QA episodes: %v", err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to query qa episodes"}) + return + } + + items := make([]EpisodeQAEpisodeResponse, 0, len(rows)) + for _, row := range rows { + item := EpisodeQAEpisodeResponse{ + ID: row.ID, + PublicID: row.EpisodeID, + EpisodeID: row.EpisodeID, + TaskID: row.TaskID, + TaskPublicID: nullableString(row.TaskPublicID), + RobotType: nullableString(row.RobotType), + QAStatus: row.QAStatus, + QualityFlag: nullableString(row.QualityFlag), + CreatedAt: row.CreatedAt.UTC().Format(time.RFC3339), + } + if row.LatestCheckID.Valid { + item.LatestQACheck = latestQACheckFromListRow(row) + } + items = append(items, item) + } + + hasNext := offset+pageSize < total + c.JSON(http.StatusOK, EpisodeQAListResponse{ + Items: items, + Pagination: EpisodeQAPaginationResponse{ + Page: page, + PageSize: pageSize, + Total: total, + }, + Total: total, + Limit: pageSize, + Offset: offset, + HasNext: hasNext, + HasPrev: page > 1, + }) +} + +// ListEpisodeQAChecks lists all QA check records for one episode. +// +// @Summary List episode QA checks +// @Description Lists persisted QA check history for one episode. +// @Tags qa +// @Produce json +// @Param id path int true "Episode ID" +// @Success 200 {object} map[string][]EpisodeQACheckRecordResponse +// @Failure 404 {object} map[string]string +// @Failure 500 {object} map[string]string +// @Router /qa/episodes/{id}/checks [get] +func (h *EpisodeQAHandler) ListEpisodeQAChecks(c *gin.Context) { + if h.db == nil { + c.JSON(http.StatusServiceUnavailable, gin.H{"error": "database is not configured"}) + return + } + if !h.requireBearerJWT(c) { + return + } + + episodeID, ok := parseEpisodeIDParam(c) + if !ok { + return + } + if err := h.ensureEpisodeExists(c.Request.Context(), episodeID); err != nil { + if errors.Is(err, errEpisodeQANotFound) { + c.JSON(http.StatusNotFound, gin.H{"error": "episode not found"}) + return + } + logger.Printf("[EPISODE-QA] Failed to query episode %d: %v", episodeID, err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to query episode"}) + return + } + + var rows []episodeQACheckDBRow + if err := h.db.SelectContext(c.Request.Context(), &rows, ` + SELECT id, episode_id, check_name, passed, score, details, check_metadata, checked_at + FROM qa_checks + WHERE episode_id = ? + ORDER BY checked_at DESC, id DESC + `, episodeID); err != nil { + logger.Printf("[EPISODE-QA] Failed to query QA checks: episode=%d, err=%v", episodeID, err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to query qa checks"}) + return + } + + items := make([]EpisodeQACheckRecordResponse, 0, len(rows)) + for _, row := range rows { + items = append(items, qaCheckRecordFromDBRow(row)) + } + c.JSON(http.StatusOK, gin.H{"items": items}) +} + +// RunEpisodeQASuiteHTTP runs the full QA suite for one episode. +// +// @Summary Run episode QA suite +// @Description Runs the configured QA suite for one episode. The MVP suite is mcap_magic. +// @Tags qa +// @Accept json +// @Produce json +// @Param id path int true "Episode ID" +// @Param request body EpisodeQARunRequest false "QA run request" +// @Success 200 {object} EpisodeQASuiteResponse +// @Failure 404 {object} map[string]string +// @Failure 409 {object} map[string]string +// @Failure 500 {object} map[string]string +// @Failure 502 {object} map[string]string +// @Router /qa/episodes/{id}/run [post] +func (h *EpisodeQAHandler) RunEpisodeQASuiteHTTP(c *gin.Context) { + if h.db == nil { + c.JSON(http.StatusServiceUnavailable, gin.H{"error": "database is not configured"}) + return + } + if !h.requireBearerJWT(c) { + return + } + + episodeID, ok := parseEpisodeIDParam(c) + if !ok { + return + } + + var req EpisodeQARunRequest + if c.Request.Body != nil { + if err := c.ShouldBindJSON(&req); err != nil && !errors.Is(err, io.EOF) { + c.JSON(http.StatusBadRequest, gin.H{"error": "invalid qa run request"}) + return + } + } + mode := req.Mode + if mode == "" { + mode = qaRunModeManual + } + if mode != qaRunModeManual && mode != qaRunModeAuto { + c.JSON(http.StatusBadRequest, gin.H{"error": "mode must be manual or auto"}) + return + } + + result, err := h.RunEpisodeQASuite(c.Request.Context(), episodeID, mode) + if err != nil { + switch { + case errors.Is(err, errEpisodeQANotFound): + c.JSON(http.StatusNotFound, gin.H{"error": "episode not found"}) + case errors.Is(err, errEpisodeQAAlreadyRunning): + c.JSON(http.StatusConflict, gin.H{"error": "qa already running"}) + default: + logger.Printf("[EPISODE-QA] Suite failed: episode=%d, mode=%s, err=%v", episodeID, mode, err) + c.JSON(http.StatusBadGateway, gin.H{"error": "failed to run qa suite"}) + } + return + } + + c.JSON(http.StatusOK, result) +} + +// RunEpisodeQASuite executes and persists the configured QA suite for one episode. +func (h *EpisodeQAHandler) RunEpisodeQASuite(ctx context.Context, episodeID int64, mode QARunMode) (*EpisodeQASuiteResponse, error) { + if h == nil || h.db == nil { + return nil, fmt.Errorf("database is not configured") + } + if mode == "" { + mode = qaRunModeManual + } + + row, err := h.loadEpisodeForQACheck(ctx, episodeID) + if err != nil { + return nil, err + } + + claim, err := h.claimEpisodeQARun(ctx, row, mode) + if err != nil { + return nil, err + } + + checks := defaultEpisodeQASuite(row) + outcomes := make([]episodeQACheckOutcome, 0, len(checks)) + checkedAt := time.Now().UTC() + for _, checkName := range checks { + outcome, err := h.runEpisodeQACheck(ctx, checkName, row) + if err != nil { + h.releaseEpisodeQARun(ctx, claim) + return nil, err + } + outcomes = append(outcomes, outcome) + } + + result, err := h.persistEpisodeQASuiteResult(ctx, claim, mode, outcomes, checkedAt) + if err != nil { + return nil, err + } + return result, nil +} + +func defaultEpisodeQASuite(_ episodeQACheckRow) []string { + return []string{episodeQACheckMcapMagic} +} + +func normalizeEpisodeQACheckName(raw string) string { + return strings.TrimSpace(strings.ToLower(raw)) +} + +func isSupportedEpisodeQACheckName(checkName string) bool { + switch checkName { + case episodeQACheckMcapMagic: + return true + default: + return false + } +} + +func (h *EpisodeQAHandler) loadEpisodeForQACheck(ctx context.Context, episodeID int64) (episodeQACheckRow, error) { + var row episodeQACheckRow + err := h.db.GetContext(ctx, &row, ` + SELECT id, mcap_path, COALESCE(qa_status, '') AS qa_status, quality_flag + FROM episodes + WHERE id = ? AND deleted_at IS NULL + LIMIT 1 + `, episodeID) + if err == sql.ErrNoRows { + return row, errEpisodeQANotFound + } + if err != nil { + return row, fmt.Errorf("query episode: %w", err) + } + return row, nil +} + +func (h *EpisodeQAHandler) ensureEpisodeExists(ctx context.Context, episodeID int64) error { + var exists int + err := h.db.GetContext(ctx, &exists, ` + SELECT 1 + FROM episodes + WHERE id = ? AND deleted_at IS NULL + LIMIT 1 + `, episodeID) + if err == sql.ErrNoRows { + return errEpisodeQANotFound + } + if err != nil { + return fmt.Errorf("query episode: %w", err) + } + return nil +} + +func (h *EpisodeQAHandler) claimEpisodeQARun(ctx context.Context, row episodeQACheckRow, mode QARunMode) (episodeQARunClaim, error) { + claim := episodeQARunClaim{ + EpisodeID: row.ID, + OriginalStatus: row.QAStatus, + } + + if row.QAStatus == qaStatusRunning { + return claim, errEpisodeQAAlreadyRunning + } + if mode == qaRunModeAuto && row.QAStatus != qaStatusPendingQA { + return claim, errEpisodeQAAutoSkipped + } + if mode == qaRunModeManual && isManualQAProtectedStatus(row.QAStatus) { + return claim, nil + } + + // #nosec G701 -- static SQL with placeholder-bound status and episode values. + res, err := h.db.ExecContext(ctx, ` + UPDATE episodes + SET qa_status = ? + WHERE id = ? AND deleted_at IS NULL AND COALESCE(qa_status, '') = ? + `, qaStatusRunning, row.ID, row.QAStatus) + if err != nil { + return claim, fmt.Errorf("claim episode qa run: %w", err) + } + affected, err := res.RowsAffected() + if err != nil { + return claim, fmt.Errorf("read claim rows affected: %w", err) + } + if affected == 0 { + fresh, err := h.loadEpisodeForQACheck(ctx, row.ID) + if err != nil { + return claim, err + } + if fresh.QAStatus == qaStatusRunning { + return claim, errEpisodeQAAlreadyRunning + } + if mode == qaRunModeAuto { + return claim, errEpisodeQAAutoSkipped + } + return claim, fmt.Errorf("episode qa status changed from %q to %q", row.QAStatus, fresh.QAStatus) + } + + claim.MutableStatus = true + return claim, nil +} + +func (h *EpisodeQAHandler) releaseEpisodeQARun(ctx context.Context, claim episodeQARunClaim) { + if h == nil || h.db == nil || !claim.MutableStatus { + return + } + // #nosec G701 -- static SQL with placeholder-bound status and episode values. + if _, err := h.db.ExecContext(ctx, ` + UPDATE episodes + SET qa_status = ? + WHERE id = ? AND deleted_at IS NULL AND qa_status = ? + `, claim.OriginalStatus, claim.EpisodeID, qaStatusRunning); err != nil { + logger.Printf("[EPISODE-QA] Failed to release QA run: episode=%d, err=%v", claim.EpisodeID, err) + } +} + +func isManualQAProtectedStatus(status string) bool { + switch status { + case qaStatusRejected, qaStatusNeedsInspection, qaStatusInspectorApproved: + return true + default: + return false + } +} + +func (h *EpisodeQAHandler) runEpisodeQACheck(ctx context.Context, checkName string, row episodeQACheckRow) (episodeQACheckOutcome, error) { + checkName = normalizeEpisodeQACheckName(checkName) + if !isSupportedEpisodeQACheckName(checkName) { + return episodeQACheckOutcome{}, fmt.Errorf("unsupported qa check %q", checkName) + } + switch checkName { + case episodeQACheckMcapMagic: + return h.runMcapMagicQACheck(ctx, row) + default: + return episodeQACheckOutcome{}, fmt.Errorf("unsupported qa check %q", checkName) + } +} + +func (h *EpisodeQAHandler) runMcapMagicQACheck(ctx context.Context, row episodeQACheckRow) (episodeQACheckOutcome, error) { + if h.s3 == nil { + return episodeQACheckOutcome{}, fmt.Errorf("storage is not configured") + } + + bucket, objectName, ok := resolveEpisodeMcapLocation(h.bucket, row.McapPath) + if !ok { + return evaluateMcapMagicCheck(0, nil, nil, "invalid mcap_path"), nil + } + + stat, err := h.s3.StatObject(ctx, bucket, objectName, minio.StatObjectOptions{}) + if err != nil { + if isS3NotFound(err) { + return mcapMagicFailure("MCAP integrity check failed: object not found", map[string]any{ + "bucket": bucket, + "object": objectName, + }), nil + } + return episodeQACheckOutcome{}, fmt.Errorf("stat mcap object: %w", err) + } + + size := stat.Size + if size < int64(len(mcapMagicBytes)*2) { + return evaluateMcapMagicCheck(size, nil, nil, "file is smaller than 16 bytes"), nil + } + + head, err := h.readS3ObjectRange(ctx, bucket, objectName, 0, int64(len(mcapMagicBytes)-1)) + if err != nil { + if isS3NotFound(err) { + return mcapMagicFailure("MCAP integrity check failed: object not found", map[string]any{ + "bucket": bucket, + "object": objectName, + "file_size_bytes": size, + }), nil + } + return episodeQACheckOutcome{}, fmt.Errorf("read mcap head: %w", err) + } + + tailStart := size - int64(len(mcapMagicBytes)) + tail, err := h.readS3ObjectRange(ctx, bucket, objectName, tailStart, size-1) + if err != nil { + if isS3NotFound(err) { + return mcapMagicFailure("MCAP integrity check failed: object not found", map[string]any{ + "bucket": bucket, + "object": objectName, + "file_size_bytes": size, + }), nil + } + return episodeQACheckOutcome{}, fmt.Errorf("read mcap tail: %w", err) + } + + return evaluateMcapMagicCheck(size, head, tail, ""), nil +} + +func (h *EpisodeQAHandler) readS3ObjectRange(ctx context.Context, bucket, objectName string, start, end int64) ([]byte, error) { + var opts minio.GetObjectOptions + if err := opts.SetRange(start, end); err != nil { + return nil, fmt.Errorf("set range %d-%d: %w", start, end, err) + } + + obj, err := h.s3.GetObject(ctx, bucket, objectName, opts) + if err != nil { + return nil, err + } + defer func() { + if err := obj.Close(); err != nil { + logger.Printf("[EPISODE-QA] S3 object close failed: bucket=%s, object=%s, err=%v", bucket, objectName, err) + } + }() + + return io.ReadAll(obj) +} + +func evaluateMcapMagicCheck(fileSize int64, head, tail []byte, explicitReason string) episodeQACheckOutcome { + metadata := map[string]any{ + "expected_magic": spacedHex(mcapMagicBytes), + "found_head_magic": spacedHex(head), + "found_tail_magic": spacedHex(tail), + "file_size_bytes": fileSize, + } + + if explicitReason != "" { + return mcapMagicFailure("MCAP integrity check failed: "+explicitReason, metadata) + } + + headOK := bytes.Equal(head, mcapMagicBytes) + tailOK := bytes.Equal(tail, mcapMagicBytes) + if headOK && tailOK { + return episodeQACheckOutcome{ + CheckName: episodeQACheckMcapMagic, + Passed: true, + Score: 1, + Details: "MCAP head and tail magic matched", + Metadata: metadata, + } + } + + reason := "head and tail magic mismatch" + if headOK { + reason = "tail magic mismatch" + } else if tailOK { + reason = "head magic mismatch" + } + return mcapMagicFailure("MCAP integrity check failed: "+reason, metadata) +} + +func mcapMagicFailure(details string, metadata map[string]any) episodeQACheckOutcome { + base := map[string]any{ + "expected_magic": spacedHex(mcapMagicBytes), + "found_head_magic": "", + "found_tail_magic": "", + } + for k, v := range metadata { + base[k] = v + } + return episodeQACheckOutcome{ + CheckName: episodeQACheckMcapMagic, + Passed: false, + Score: 0, + Details: details, + Metadata: base, + } +} + +func isS3NotFound(err error) bool { + errResp := minio.ToErrorResponse(err) + return errResp.Code == "NoSuchKey" || errResp.StatusCode == http.StatusNotFound +} + +func spacedHex(data []byte) string { + if len(data) == 0 { + return "" + } + parts := make([]string, len(data)) + for i, b := range data { + parts[i] = fmt.Sprintf("%02x", b) + } + return strings.Join(parts, " ") +} + +func (h *EpisodeQAHandler) persistEpisodeQASuiteResult(ctx context.Context, claim episodeQARunClaim, mode QARunMode, outcomes []episodeQACheckOutcome, checkedAt time.Time) (*EpisodeQASuiteResponse, error) { + if h.db == nil { + return nil, fmt.Errorf("database is not configured") + } + + tx, err := h.db.BeginTxx(ctx, nil) + if err != nil { + return nil, fmt.Errorf("begin qa check transaction: %w", err) + } + defer func() { _ = tx.Rollback() }() + + checks := make([]EpisodeQACheckRecordResponse, 0, len(outcomes)) + allPassed := true + scoreSum := 0.0 + failureDetails := "" + for _, outcome := range outcomes { + if !outcome.Passed { + allPassed = false + if failureDetails == "" { + failureDetails = outcome.Details + } + } + scoreSum += outcome.Score + + metadataJSON, err := json.Marshal(outcome.Metadata) + if err != nil { + return nil, fmt.Errorf("marshal qa check metadata: %w", err) + } + + // #nosec G701 -- static SQL with placeholder-bound QA check values. + res, err := tx.ExecContext(ctx, ` + INSERT INTO qa_checks (episode_id, check_name, passed, score, details, check_metadata, checked_at) + VALUES (?, ?, ?, ?, ?, ?, ?) + `, claim.EpisodeID, outcome.CheckName, outcome.Passed, outcome.Score, outcome.Details, string(metadataJSON), checkedAt) + if err != nil { + return nil, fmt.Errorf("insert qa_check: %w", err) + } + id, err := res.LastInsertId() + if err != nil { + return nil, fmt.Errorf("read qa_check insert id: %w", err) + } + checks = append(checks, EpisodeQACheckRecordResponse{ + ID: id, + EpisodeID: claim.EpisodeID, + CheckName: outcome.CheckName, + Passed: outcome.Passed, + Score: outcome.Score, + Details: outcome.Details, + CheckMetadata: outcome.Metadata, + CheckedAt: checkedAt.Format(time.RFC3339), + }) + } + + score := 0.0 + if len(outcomes) > 0 { + score = scoreSum / float64(len(outcomes)) + } + + finalStatus := claim.OriginalStatus + if allPassed { + finalStatus = qaStatusApproved + } else if failureDetails != "" { + finalStatus = qaStatusFailed + } + + if claim.MutableStatus { + if allPassed { + if mode == qaRunModeAuto { + // #nosec G701 -- static SQL with placeholder-bound episode QA values. + if _, err := tx.ExecContext(ctx, ` + UPDATE episodes + SET qa_status = ?, qa_score = ?, quality_flag = NULL, auto_approved = ? + WHERE id = ? AND deleted_at IS NULL AND qa_status = ? + `, qaStatusApproved, score, 1, claim.EpisodeID, qaStatusRunning); err != nil { + return nil, fmt.Errorf("mark episode qa approved: %w", err) + } + } else { + // #nosec G701 -- static SQL with placeholder-bound episode QA values. + if _, err := tx.ExecContext(ctx, ` + UPDATE episodes + SET qa_status = ?, qa_score = ?, quality_flag = NULL + WHERE id = ? AND deleted_at IS NULL AND qa_status = ? + `, qaStatusApproved, score, claim.EpisodeID, qaStatusRunning); err != nil { + return nil, fmt.Errorf("mark episode qa approved: %w", err) + } + } + } else { + // #nosec G701 -- static SQL with placeholder-bound episode QA values. + if _, err := tx.ExecContext(ctx, ` + UPDATE episodes + SET qa_status = ?, qa_score = ?, quality_flag = ? + WHERE id = ? AND deleted_at IS NULL AND qa_status = ? + `, qaStatusFailed, score, failureDetails, claim.EpisodeID, qaStatusRunning); err != nil { + return nil, fmt.Errorf("mark episode qa failed: %w", err) + } + } + } else { + if failureDetails != "" { + // #nosec G701 -- static SQL with placeholder-bound episode QA values. + if _, err := tx.ExecContext(ctx, ` + UPDATE episodes + SET qa_score = ?, quality_flag = ? + WHERE id = ? AND deleted_at IS NULL + `, score, failureDetails, claim.EpisodeID); err != nil { + return nil, fmt.Errorf("write protected episode qa failure details: %w", err) + } + } + finalStatus = claim.OriginalStatus + } + + if err := tx.Commit(); err != nil { + return nil, fmt.Errorf("commit qa check transaction: %w", err) + } + + return &EpisodeQASuiteResponse{ + EpisodeID: claim.EpisodeID, + QAStatus: finalStatus, + Passed: allPassed, + Mode: mode, + Checks: checks, + }, nil +} + +func parsePositiveIntQuery(c *gin.Context, key string, fallback int) int { + raw := strings.TrimSpace(c.Query(key)) + if raw == "" { + return fallback + } + value, err := strconv.Atoi(raw) + if err != nil || value <= 0 { + return fallback + } + return value +} + +func qaEpisodeStatusFilter(raw string) ([]string, error) { + status := strings.TrimSpace(strings.ToLower(raw)) + if status == "" { + return []string{qaStatusPendingQA, qaStatusFailed, qaStatusNeedsInspection}, nil + } + if status == "all" { + return nil, nil + } + if status == qaStatusApproved { + return []string{qaStatusApproved, qaStatusInspectorApproved}, nil + } + + parts := strings.Split(status, ",") + out := make([]string, 0, len(parts)) + for _, part := range parts { + s := strings.TrimSpace(part) + if s == "" { + continue + } + switch s { + case qaStatusPendingQA, qaStatusRunning, qaStatusFailed, qaStatusNeedsInspection, qaStatusInspectorApproved, qaStatusRejected: + out = append(out, s) + default: + return nil, fmt.Errorf("unsupported qa status %q", s) + } + } + return out, nil +} + +func buildQAEpisodeListWhere(statuses []string, robotType, keyword string) (string, []interface{}) { + where := " WHERE e.deleted_at IS NULL" + args := []interface{}{} + + if len(statuses) > 0 { + placeholders := make([]string, len(statuses)) + for i, status := range statuses { + placeholders[i] = "?" + args = append(args, status) + } + where += " AND e.qa_status IN (" + strings.Join(placeholders, ",") + ")" + } + + rt := strings.TrimSpace(robotType) + if rt != "" { + where += " AND (rt.name = ? OR rt.model = ?)" + args = append(args, rt, rt) + } + + q := strings.TrimSpace(keyword) + if q != "" { + like := "%" + q + "%" + where += " AND (e.episode_id LIKE ? OR t.task_id LIKE ? OR e.quality_flag LIKE ?)" + args = append(args, like, like, like) + } + + return where, args +} + +func latestQACheckFromListRow(row episodeQAListRow) *EpisodeQACheckRecordResponse { + checkedAt := "" + if row.LatestCheckCheckedAt.Valid { + checkedAt = row.LatestCheckCheckedAt.Time.UTC().Format(time.RFC3339) + } + return &EpisodeQACheckRecordResponse{ + ID: row.LatestCheckID.Int64, + EpisodeID: row.ID, + CheckName: row.LatestCheckName.String, + Passed: row.LatestCheckPassed.Valid && row.LatestCheckPassed.Bool, + Score: nullFloat64Value(row.LatestCheckScore), + Details: nullStringValue(row.LatestCheckDetails), + CheckMetadata: parseQACheckMetadata(row.LatestCheckMetadata), + CheckedAt: checkedAt, + } +} + +func qaCheckRecordFromDBRow(row episodeQACheckDBRow) EpisodeQACheckRecordResponse { + checkedAt := "" + if row.CheckedAt.Valid { + checkedAt = row.CheckedAt.Time.UTC().Format(time.RFC3339) + } + return EpisodeQACheckRecordResponse{ + ID: row.ID, + EpisodeID: row.EpisodeID, + CheckName: row.CheckName, + Passed: row.Passed, + Score: row.Score, + Details: nullStringValue(row.Details), + CheckMetadata: parseQACheckMetadata(row.CheckMetadata), + CheckedAt: checkedAt, + } +} + +func parseQACheckMetadata(raw sql.NullString) map[string]any { + if !raw.Valid || strings.TrimSpace(raw.String) == "" { + return nil + } + var out map[string]any + if err := json.Unmarshal([]byte(raw.String), &out); err != nil { + return nil + } + return out +} + +func nullStringValue(value sql.NullString) string { + if !value.Valid { + return "" + } + return value.String +} + +func nullFloat64Value(value sql.NullFloat64) float64 { + if !value.Valid { + return 0 + } + return value.Float64 +} diff --git a/internal/api/handlers/episode_qa_check_test.go b/internal/api/handlers/episode_qa_check_test.go new file mode 100644 index 0000000..0d3eff1 --- /dev/null +++ b/internal/api/handlers/episode_qa_check_test.go @@ -0,0 +1,339 @@ +// SPDX-FileCopyrightText: 2026 ArcheBase +// +// SPDX-License-Identifier: MulanPSL-2.0 + +package handlers + +import ( + "context" + "database/sql" + "testing" + "time" + + "github.com/jmoiron/sqlx" + _ "modernc.org/sqlite" +) + +func TestEvaluateMcapMagicCheck(t *testing.T) { + valid := append([]byte(nil), mcapMagicBytes...) + bad := []byte{0x8b, 0xef, 0xb8, 0x75, 0xc6, 0x97, 0x96, 0x61} + + tests := []struct { + name string + head []byte + tail []byte + wantPassed bool + wantDetail string + }{ + { + name: "head and tail match", + head: valid, + tail: valid, + wantPassed: true, + wantDetail: "MCAP head and tail magic matched", + }, + { + name: "head mismatch", + head: bad, + tail: valid, + wantPassed: false, + wantDetail: "MCAP integrity check failed: head magic mismatch", + }, + { + name: "tail mismatch", + head: valid, + tail: bad, + wantPassed: false, + wantDetail: "MCAP integrity check failed: tail magic mismatch", + }, + { + name: "both mismatch", + head: bad, + tail: bad, + wantPassed: false, + wantDetail: "MCAP integrity check failed: head and tail magic mismatch", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got := evaluateMcapMagicCheck(1024, tt.head, tt.tail, "") + if got.Passed != tt.wantPassed { + t.Fatalf("passed = %v, want %v", got.Passed, tt.wantPassed) + } + if got.Details != tt.wantDetail { + t.Fatalf("details = %q, want %q", got.Details, tt.wantDetail) + } + if got.Metadata["expected_magic"] != "89 4d 43 41 50 30 0d 0a" { + t.Fatalf("expected_magic metadata = %v", got.Metadata["expected_magic"]) + } + }) + } +} + +func TestPersistEpisodeQACheckFailureMarksEpisodeFailed(t *testing.T) { + db := setupEpisodeQACheckTestDB(t) + handler := &EpisodeQAHandler{db: db} + + _, err := db.Exec(` + INSERT INTO episodes (id, qa_status, quality_flag, deleted_at) + VALUES (1, 'qa_running', NULL, NULL) + `) + if err != nil { + t.Fatalf("insert episode: %v", err) + } + + outcome := episodeQACheckOutcome{ + CheckName: episodeQACheckMcapMagic, + Passed: false, + Score: 0, + Details: "MCAP integrity check failed: tail magic mismatch", + Metadata: map[string]any{ + "expected_magic": "89 4d 43 41 50 30 0d 0a", + "found_tail_magic": "8b ef b8 75 c6 97 96 61", + }, + } + claim := episodeQARunClaim{ + EpisodeID: 1, + OriginalStatus: qaStatusApproved, + MutableStatus: true, + } + result, err := handler.persistEpisodeQASuiteResult(context.Background(), claim, qaRunModeManual, []episodeQACheckOutcome{outcome}, time.Now().UTC()) + if err != nil { + t.Fatalf("persist qa check: %v", err) + } + if result.QAStatus != qaStatusFailed { + t.Fatalf("result qa_status = %q, want failed", result.QAStatus) + } + + var episode struct { + QaStatus string `db:"qa_status"` + QualityFlag string `db:"quality_flag"` + } + if err := db.Get(&episode, "SELECT qa_status, quality_flag FROM episodes WHERE id = 1"); err != nil { + t.Fatalf("query episode: %v", err) + } + if episode.QaStatus != "failed" { + t.Fatalf("qa_status = %q, want failed", episode.QaStatus) + } + if episode.QualityFlag != outcome.Details { + t.Fatalf("quality_flag = %q, want %q", episode.QualityFlag, outcome.Details) + } + + var count int + if err := db.Get(&count, "SELECT COUNT(1) FROM qa_checks WHERE episode_id = 1 AND check_name = 'mcap_magic' AND passed = FALSE"); err != nil { + t.Fatalf("count qa_checks: %v", err) + } + if count != 1 { + t.Fatalf("failed qa_check count = %d, want 1", count) + } +} + +func TestPersistEpisodeQACheckManualSuccessRestoresFailedEpisode(t *testing.T) { + db := setupEpisodeQACheckTestDB(t) + handler := &EpisodeQAHandler{db: db} + + _, err := db.Exec(` + INSERT INTO episodes (id, qa_status, quality_flag, deleted_at) + VALUES (1, 'qa_running', 'previous failure', NULL) + `) + if err != nil { + t.Fatalf("insert episode: %v", err) + } + + outcome := episodeQACheckOutcome{ + CheckName: episodeQACheckMcapMagic, + Passed: true, + Score: 1, + Details: "MCAP head and tail magic matched", + Metadata: map[string]any{ + "expected_magic": "89 4d 43 41 50 30 0d 0a", + }, + } + claim := episodeQARunClaim{ + EpisodeID: 1, + OriginalStatus: qaStatusFailed, + MutableStatus: true, + } + result, err := handler.persistEpisodeQASuiteResult(context.Background(), claim, qaRunModeManual, []episodeQACheckOutcome{outcome}, time.Now().UTC()) + if err != nil { + t.Fatalf("persist qa check: %v", err) + } + if result.QAStatus != qaStatusApproved { + t.Fatalf("result qa_status = %q, want approved", result.QAStatus) + } + + var episode struct { + QaStatus string `db:"qa_status"` + QualityFlag sql.NullString `db:"quality_flag"` + } + if err := db.Get(&episode, "SELECT qa_status, quality_flag FROM episodes WHERE id = 1"); err != nil { + t.Fatalf("query episode: %v", err) + } + if episode.QaStatus != "approved" { + t.Fatalf("qa_status = %q, want approved", episode.QaStatus) + } + if episode.QualityFlag.Valid { + t.Fatalf("quality_flag = %q, want NULL", episode.QualityFlag.String) + } +} + +func TestPersistEpisodeQACheckAutoSuccessAutoApprovesEpisode(t *testing.T) { + db := setupEpisodeQACheckTestDB(t) + handler := &EpisodeQAHandler{db: db} + + _, err := db.Exec(` + INSERT INTO episodes (id, qa_status, quality_flag, auto_approved, deleted_at) + VALUES (1, 'qa_running', NULL, 0, NULL) + `) + if err != nil { + t.Fatalf("insert episode: %v", err) + } + + outcome := episodeQACheckOutcome{ + CheckName: episodeQACheckMcapMagic, + Passed: true, + Score: 1, + Details: "MCAP head and tail magic matched", + Metadata: map[string]any{ + "expected_magic": "89 4d 43 41 50 30 0d 0a", + }, + } + claim := episodeQARunClaim{ + EpisodeID: 1, + OriginalStatus: qaStatusPendingQA, + MutableStatus: true, + } + result, err := handler.persistEpisodeQASuiteResult(context.Background(), claim, qaRunModeAuto, []episodeQACheckOutcome{outcome}, time.Now().UTC()) + if err != nil { + t.Fatalf("persist qa check: %v", err) + } + if result.QAStatus != qaStatusApproved || !result.Passed { + t.Fatalf("unexpected result: %+v", result) + } + + var episode struct { + QaStatus string `db:"qa_status"` + QualityFlag sql.NullString `db:"quality_flag"` + AutoApproved bool `db:"auto_approved"` + } + if err := db.Get(&episode, "SELECT qa_status, quality_flag, auto_approved FROM episodes WHERE id = 1"); err != nil { + t.Fatalf("query episode: %v", err) + } + if episode.QaStatus != qaStatusApproved { + t.Fatalf("qa_status = %q, want approved", episode.QaStatus) + } + if !episode.AutoApproved { + t.Fatalf("auto_approved = false, want true") + } + if episode.QualityFlag.Valid { + t.Fatalf("quality_flag = %q, want NULL", episode.QualityFlag.String) + } +} + +func TestPersistEpisodeQACheckDoesNotOverrideProtectedManualStatus(t *testing.T) { + db := setupEpisodeQACheckTestDB(t) + handler := &EpisodeQAHandler{db: db} + + _, err := db.Exec(` + INSERT INTO episodes (id, qa_status, quality_flag, deleted_at) + VALUES (1, 'needs_inspection', NULL, NULL) + `) + if err != nil { + t.Fatalf("insert episode: %v", err) + } + + outcome := episodeQACheckOutcome{ + CheckName: episodeQACheckMcapMagic, + Passed: false, + Score: 0, + Details: "MCAP integrity check failed: tail magic mismatch", + Metadata: map[string]any{ + "expected_magic": "89 4d 43 41 50 30 0d 0a", + }, + } + claim := episodeQARunClaim{ + EpisodeID: 1, + OriginalStatus: qaStatusNeedsInspection, + MutableStatus: false, + } + if _, err := handler.persistEpisodeQASuiteResult(context.Background(), claim, qaRunModeManual, []episodeQACheckOutcome{outcome}, time.Now().UTC()); err != nil { + t.Fatalf("persist qa check: %v", err) + } + + var episode struct { + QaStatus string `db:"qa_status"` + QualityFlag string `db:"quality_flag"` + } + if err := db.Get(&episode, "SELECT qa_status, quality_flag FROM episodes WHERE id = 1"); err != nil { + t.Fatalf("query episode: %v", err) + } + if episode.QaStatus != "needs_inspection" { + t.Fatalf("qa_status = %q, want needs_inspection", episode.QaStatus) + } + if episode.QualityFlag != outcome.Details { + t.Fatalf("quality_flag = %q, want %q", episode.QualityFlag, outcome.Details) + } +} + +func TestClaimEpisodeQARunReturnsConflictWhenRunning(t *testing.T) { + db := setupEpisodeQACheckTestDB(t) + handler := &EpisodeQAHandler{db: db} + + _, err := db.Exec(` + INSERT INTO episodes (id, mcap_path, qa_status, quality_flag, deleted_at) + VALUES (1, 'bucket/path.mcap', 'qa_running', NULL, NULL) + `) + if err != nil { + t.Fatalf("insert episode: %v", err) + } + + row, err := handler.loadEpisodeForQACheck(context.Background(), 1) + if err != nil { + t.Fatalf("load episode: %v", err) + } + if _, err := handler.claimEpisodeQARun(context.Background(), row, qaRunModeManual); err != errEpisodeQAAlreadyRunning { + t.Fatalf("claim error = %v, want errEpisodeQAAlreadyRunning", err) + } +} + +func setupEpisodeQACheckTestDB(t *testing.T) *sqlx.DB { + t.Helper() + + db, err := sqlx.Open("sqlite", ":memory:") + if err != nil { + t.Fatalf("open sqlite: %v", err) + } + t.Cleanup(func() { + if err := db.Close(); err != nil { + t.Fatalf("close sqlite: %v", err) + } + }) + + _, err = db.Exec(` + CREATE TABLE episodes ( + id INTEGER PRIMARY KEY, + mcap_path TEXT, + qa_status TEXT, + qa_score REAL, + auto_approved BOOLEAN, + quality_flag TEXT, + deleted_at TIMESTAMP NULL + ); + CREATE TABLE qa_checks ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + episode_id INTEGER NOT NULL, + check_name TEXT NOT NULL, + passed BOOLEAN NOT NULL, + score REAL NOT NULL, + details TEXT, + check_metadata TEXT, + checked_at TIMESTAMP + ); + `) + if err != nil { + t.Fatalf("create schema: %v", err) + } + + return db +} diff --git a/internal/api/handlers/transfer.go b/internal/api/handlers/transfer.go index ac9112a..1c677ff 100644 --- a/internal/api/handlers/transfer.go +++ b/internal/api/handlers/transfer.go @@ -31,6 +31,10 @@ import ( "archebase.com/keystone-edge/internal/storage/s3" ) +type episodeQAEnqueuer interface { + EnqueueEpisode(episodeID int64) +} + // TransferHandler handles WebSocket connections and REST API for Transfer Service type TransferHandler struct { hub *services.TransferHub @@ -45,6 +49,7 @@ type TransferHandler struct { recorderHub *services.RecorderHub recorderRPCTimeout time.Duration stateBroker *services.DeviceStateBroker + qaEnqueuer episodeQAEnqueuer } // NewTransferHandler creates a new TransferHandler. @@ -74,6 +79,14 @@ func (h *TransferHandler) SetDeviceStateBroker(broker *services.DeviceStateBroke h.stateBroker = broker } +// SetEpisodeQAEnqueuer enables best-effort automatic QA after an episode is created. +func (h *TransferHandler) SetEpisodeQAEnqueuer(enqueuer episodeQAEnqueuer) { + if h == nil { + return + } + h.qaEnqueuer = enqueuer +} + // RegisterRoutes registers all transfer-related REST routes func (h *TransferHandler) RegisterRoutes(apiV1 *gin.RouterGroup) { // Note: apiV1 is already /transfer (set by server.go) @@ -543,6 +556,7 @@ func (h *TransferHandler) onUploadComplete(ctx context.Context, dc *services.Tra sc := readSidecarFromS3(ctx, h.s3, h.bucket, jsonKey) // Step 2: Insert into episodes table + var createdEpisodePK int64 tx, err := h.db.BeginTx(ctx, nil) if err != nil { // #nosec G706 -- Set aside for now @@ -675,7 +689,7 @@ func (h *TransferHandler) onUploadComplete(ctx context.Context, dc *services.Tra } episodeMetadata := assetIDSnapshotMetadata(ctx, tx, taskRow.WorkstationID) - _, dbErr := tx.ExecContext(ctx, + insertRes, dbErr := tx.ExecContext(ctx, `INSERT INTO episodes ( episode_id, task_id, @@ -710,7 +724,7 @@ func (h *TransferHandler) onUploadComplete(ctx context.Context, dc *services.Tra durationSec, fileSizeBytes, checksum, - "approved", + qaStatusPendingQA, episodeMetadata, ) if dbErr != nil { @@ -718,6 +732,12 @@ func (h *TransferHandler) onUploadComplete(ctx context.Context, dc *services.Tra logger.Printf("%s DB insert failed: %v", transferTaskLogPrefix(dc.DeviceID, taskID), dbErr) return } + createdEpisodePK, dbErr = insertRes.LastInsertId() + if dbErr != nil { + // #nosec G706 -- Set aside for now + logger.Printf("%s DB insert id read failed: %v", transferTaskLogPrefix(dc.DeviceID, taskID), dbErr) + return + } // Write-time maintenance for batch episode_count. if _, dbErr := tx.ExecContext(ctx, ` @@ -738,6 +758,9 @@ func (h *TransferHandler) onUploadComplete(ctx context.Context, dc *services.Tra logger.Printf("%s DB commit error: %v", transferTaskLogPrefix(dc.DeviceID, taskID), err) return } + if createdEpisodePK > 0 && h.qaEnqueuer != nil { + h.qaEnqueuer.EnqueueEpisode(createdEpisodePK) + } // Step 3: Send upload_ack ackMsg := map[string]interface{}{ diff --git a/internal/server/server.go b/internal/server/server.go index 30c65ed..3f27e69 100644 --- a/internal/server/server.go +++ b/internal/server/server.go @@ -40,6 +40,7 @@ type Server struct { recorder *handlers.RecorderHandler deviceState *handlers.DeviceStateHandler episode *handlers.EpisodeHandler + qa *handlers.EpisodeQAHandler task *handlers.TaskHandler batch *handlers.BatchHandler robotType *handlers.RobotTypeHandler @@ -55,6 +56,7 @@ type Server struct { scene *handlers.SceneHandler subscene *handlers.SubsceneHandler order *handlers.OrderHandler + dataOps *handlers.DataOpsHandler dataStats *handlers.DataProductionStatisticsHandler productionDashboard *handlers.ProductionDashboardHandler syncHandler *handlers.SyncHandler @@ -110,6 +112,8 @@ func New(cfg *config.Config, db *sqlx.DB, s3Client *s3.Client, syncWorker *servi // Create EpisodeHandler for episode listing episodeHandler := handlers.NewEpisodeHandler(db, s3Client, cfg.Storage.Bucket, &cfg.Auth) + qaHandler := handlers.NewEpisodeQAHandler(db, s3Client, cfg.Storage.Bucket, &cfg.Auth) + transferHandler.SetEpisodeQAEnqueuer(qaHandler) transferWriteTimeout := axonTransferWriteTimeout(&cfg.AxonTransfer) @@ -132,6 +136,7 @@ func New(cfg *config.Config, db *sqlx.DB, s3Client *s3.Client, syncWorker *servi sceneHandler *handlers.SceneHandler subsceneHandler *handlers.SubsceneHandler orderHandler *handlers.OrderHandler + dataOpsHandler *handlers.DataOpsHandler dataStatsHandler *handlers.DataProductionStatisticsHandler productionDashboardHandler *handlers.ProductionDashboardHandler ) @@ -150,6 +155,8 @@ func New(cfg *config.Config, db *sqlx.DB, s3Client *s3.Client, syncWorker *servi sceneHandler = handlers.NewSceneHandler(db) subsceneHandler = handlers.NewSubsceneHandler(db) orderHandler = handlers.NewOrderHandler(db, recorderHub, recorderRPCTimeout) + dataOpsHandler = handlers.NewDataOpsHandler(db) + dataOpsHandler.SetBulkActionDeps(qaHandler, syncWorker) dataStatsHandler = handlers.NewDataProductionStatisticsHandler(db) productionDashboardHandler = handlers.NewProductionDashboardHandler(db, recorderHub, transferHub) } @@ -169,6 +176,7 @@ func New(cfg *config.Config, db *sqlx.DB, s3Client *s3.Client, syncWorker *servi recorder: recorderHandler, deviceState: deviceStateHandler, episode: episodeHandler, + qa: qaHandler, task: taskHandler, batch: batchHandler, robotType: robotTypeHandler, @@ -184,6 +192,7 @@ func New(cfg *config.Config, db *sqlx.DB, s3Client *s3.Client, syncWorker *servi scene: sceneHandler, subscene: subsceneHandler, order: orderHandler, + dataOps: dataOpsHandler, dataStats: dataStatsHandler, productionDashboard: productionDashboardHandler, syncHandler: syncHandler, @@ -259,6 +268,9 @@ func (s *Server) buildRoutes() http.Handler { // Episodes API v1Episodes := v1Routes.Group("/episodes") s.episode.RegisterRoutes(v1Episodes) + if s.qa != nil { + s.qa.RegisterRoutes(v1Routes) + } // Tasks API v1Tasks := v1Routes.Group("") @@ -317,6 +329,11 @@ func (s *Server) buildRoutes() http.Handler { adminStats := v1Routes.Group("/admin/statistics/data-production", jwtMw, middleware.RequireRole("admin")) s.dataStats.RegisterRoutes(adminStats) } + if s.dataOps != nil { + jwtMw := middleware.JWTAuth(&s.cfg.Auth) + adminDataOps := v1Routes.Group("/data-ops", jwtMw, middleware.RequireRole("admin")) + s.dataOps.RegisterRoutes(adminDataOps) + } if s.productionDashboard != nil { dashboard := v1Routes.Group("/production/dashboard", middleware.DashboardAuth(&s.cfg.Auth), middleware.RequireAnyRole("admin", "data_collector", "display")) s.productionDashboard.RegisterRoutes(dashboard) diff --git a/internal/services/sync_worker.go b/internal/services/sync_worker.go index 52bc7cd..373b25a 100644 --- a/internal/services/sync_worker.go +++ b/internal/services/sync_worker.go @@ -307,11 +307,12 @@ func (w *SyncWorker) persistPendingSyncLog(ctx context.Context, episodeID int64, lockClause := txLockClause(tx) var episode struct { - ID int64 `db:"id"` - CloudSynced bool `db:"cloud_synced"` + ID int64 `db:"id"` + CloudSynced bool `db:"cloud_synced"` + QaStatus string `db:"qa_status"` } if err := tx.GetContext(ctx, &episode, ` - SELECT id, cloud_synced + SELECT id, cloud_synced, COALESCE(qa_status, '') AS qa_status FROM episodes WHERE id = ? AND deleted_at IS NULL `+lockClause, episodeID); err != nil { @@ -323,6 +324,9 @@ func (w *SyncWorker) persistPendingSyncLog(ctx context.Context, episodeID int64, if episode.CloudSynced && !allowSynced { return fmt.Errorf("episode %d already synced", episodeID) } + if episode.QaStatus != "approved" && episode.QaStatus != "inspector_approved" { + return fmt.Errorf("episode %d qa_status is %q, must be approved or inspector_approved", episodeID, episode.QaStatus) + } var activeCount int if err := tx.GetContext(ctx, &activeCount, ` @@ -405,11 +409,12 @@ func (w *SyncWorker) persistResyncSyncLog(ctx context.Context, episodeID int64) lockClause := txLockClause(tx) var episode struct { - ID int64 `db:"id"` - CloudSynced bool `db:"cloud_synced"` + ID int64 `db:"id"` + CloudSynced bool `db:"cloud_synced"` + QaStatus string `db:"qa_status"` } if err := tx.GetContext(ctx, &episode, ` - SELECT id, cloud_synced + SELECT id, cloud_synced, COALESCE(qa_status, '') AS qa_status FROM episodes WHERE id = ? AND deleted_at IS NULL `+lockClause, episodeID); err != nil { @@ -421,6 +426,9 @@ func (w *SyncWorker) persistResyncSyncLog(ctx context.Context, episodeID int64) if !episode.CloudSynced { return fmt.Errorf("episode %d has not completed cloud sync", episodeID) } + if episode.QaStatus != "approved" && episode.QaStatus != "inspector_approved" { + return fmt.Errorf("episode %d qa_status is %q, must be approved or inspector_approved", episodeID, episode.QaStatus) + } var activeCount int if err := tx.GetContext(ctx, &activeCount, `