Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

22 changes: 16 additions & 6 deletions docs/roadmap/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,22 @@ Build trusted, scalable AI capabilities that help people discover gospel content

### Content Discovery

| ID | Feature | Owner | Priority | Start | Days | Status |
| --------------------------------------------------------------------- | ------------------------------------- | ----- | -------- | ------ | ---- | ----------- |
| [feat-009](content-discovery/feat-009-pgvector-embedding-indexing.md) | pgvector Setup and Embedding Indexing | nisal | P0 | Apr 7 | 14 | not-started |
| [feat-010](content-discovery/feat-010-semantic-search-api.md) | Semantic Search API | nisal | P0 | Apr 14 | 21 | not-started |
| [feat-011](content-discovery/feat-011-search-ui-web.md) | Search UI — Web | urim | P0 | Apr 14 | 21 | not-started |
| [feat-012](content-discovery/feat-012-search-ui-mobile.md) | Search UI — Mobile | urim | P0 | Apr 14 | 21 | not-started |
| ID | Feature | Owner | Priority | Start | Days | Status |
| ------------------------------------------------------------------------- | ------------------------------------- | ----- | -------- | ------ | ---- | ----------- |
| [feat-009](content-discovery/feat-009-pgvector-embedding-indexing.md) | pgvector Setup and Embedding Indexing | nisal | P0 | Apr 7 | 14 | not-started |
| [feat-010](content-discovery/feat-010-semantic-search-api.md) | Semantic Search API | nisal | P0 | Apr 14 | 21 | not-started |
| [feat-011](content-discovery/feat-011-search-ui-web.md) | Search UI — Web | urim | P0 | Apr 14 | 21 | not-started |
| [feat-012](content-discovery/feat-012-search-ui-mobile.md) | Search UI — Mobile | urim | P0 | Apr 14 | 21 | not-started |
| [feat-037](content-discovery/feat-037-video-content-vectorization.md) | Video Content Vectorization for Recs | nisal | P1 | Apr 21 | 42 | not-started |
| [feat-038](content-discovery/feat-038-video-vectorization-data-audit.md) | Vectorization — Data Audit | nisal | P1 | Apr 21 | 3 | not-started |
| [feat-039](content-discovery/feat-039-chapter-based-scene-boundaries.md) | Vectorization — Scene Boundaries | nisal | P1 | Apr 24 | 7 | not-started |
| [feat-040](content-discovery/feat-040-multimodal-scene-descriptions.md) | Vectorization — Scene Descriptions | nisal | P1 | May 1 | 10 | not-started |
| [feat-041](content-discovery/feat-041-scene-embeddings-table.md) | Vectorization — Embeddings Table | nisal | P1 | May 11 | 7 | not-started |
| [feat-042](content-discovery/feat-042-backfill-worker.md) | Vectorization — English Backfill | nisal | P1 | May 18 | 10 | not-started |
| [feat-043](content-discovery/feat-043-visual-shot-detection-fusion.md) | Vectorization — Visual Shot Fusion | nisal | P2 | May 28 | 10 | not-started |
| [feat-044](content-discovery/feat-044-recommendation-query-api.md) | Vectorization — Recommendation API | nisal | P1 | May 28 | 7 | not-started |
| [feat-045](content-discovery/feat-045-pipeline-integration.md) | Vectorization — Pipeline Integration | nisal | P1 | Jun 4 | 7 | not-started |
| [feat-046](content-discovery/feat-046-recommendations-demo-experience.md) | Vectorization — Recommendations Demo | nisal | P1 | Jun 4 | 7 | not-started |

### Topic Experiences

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ depends_on:
- "feat-002"
blocks:
- "feat-010"
- "feat-037"
tags:
- "cms"
- "pgvector"
Expand Down
235 changes: 235 additions & 0 deletions docs/roadmap/content-discovery/feat-037-video-content-vectorization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
---
id: "feat-037"
title: "Video Content Vectorization for Recommendations"
owner: "nisal"
priority: "P1"
status: "not-started"
start_date: "2026-04-21"
duration: 42
depends_on:
- "feat-009"
- "feat-031"
blocks:
- "feat-038"
tags:
- "cms"
- "pgvector"
- "ai-pipeline"
- "search"
- "manager"
---

## Problem

Current recommendations are metadata-driven — "you watched Film X, here it is in 1,500 other languages." Transcript embeddings (feat-009/010) capture what was said, but miss what was shown. Visual scene embeddings enable cross-film recommendations based on visual setting, actions, emotional tone, and mood.

**Phase 1 (this feature)**: English, Spanish, and French videos. Three languages are required to verify locale-aware deduplication — a user watching in Spanish must never see the same film recommended in English. Prove recommendation quality at ~$130-$400 estimated cost. Phase 2 (full 50K+ catalog) is a separate funding decision.

## Entry Points — Read These First

1. `apps/manager/src/services/chapters.ts` — existing scene-like segmentation: `Chapter { title, startSeconds, endSeconds, summary }`. This is the baseline for R1a.
2. `apps/manager/src/services/embeddings.ts` — existing text embedding pipeline using `text-embedding-3-small` (1536 dims). Scene descriptions will be embedded through the same model.
3. `apps/manager/src/workflows/videoEnrichment.ts` — enrichment workflow with parallel steps. R6 adds scene vectorization as a new branch.
4. `apps/manager/src/services/storage.ts` — S3 artifact storage pattern (`{assetId}/{type}.json`).
5. `apps/cms/src/api/video/content-types/video/schema.json` — Video content type with `coreId`, `label` enum, `variants` relation.
6. `apps/cms/src/api/video-variant/content-types/video-variant/schema.json` — VideoVariant with `language` and `muxVideo` relations.
7. `apps/cms/src/api/mux-video/content-types/mux-video/schema.json` — MuxVideo with `assetId` and `playbackId` for frame extraction.
8. `docs/brainstorms/2026-04-02-video-content-vectorization-requirements.md` — full requirements doc with storage schema, cost model, and rollout strategy.

## Grep These

- `chapters` in `apps/manager/src/` — existing chapter/scene segmentation
- `getOpenrouter` in `apps/manager/src/` — AI model client (text-only; needs multimodal extension)
- `text-embedding-3-small` in `apps/manager/src/` — embedding model
- `strapi.db.connection.raw` in `apps/cms/src/` — raw SQL patterns for pgvector
- `muxAssetId` in `apps/manager/src/` — Mux asset references for frame extraction
- `playbackId` in `apps/cms/src/` — Mux playback IDs for thumbnail URLs
- `label` in `apps/cms/src/api/video/` — video type enum (featureFilm, shortFilm, etc.)

## What To Build

### R0. Data Audit (first task)

Query CMS to determine English video landscape:

```sql
-- Video count by label type
SELECT label, COUNT(*) FROM videos GROUP BY label;

-- Duration distribution
SELECT label,
COUNT(*) as count,
AVG(duration) as avg_duration,
MAX(duration) as max_duration
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 = 'en'
GROUP BY label;

-- Chapter metadata coverage
SELECT COUNT(DISTINCT ej.mux_asset_id)
FROM enrichment_jobs ej
WHERE ej.step_statuses->>'chapters' = 'completed';
```

### R1. Scene Segmentation

**R1a — Transcript-based (extend chapters.ts)**:

- For each English video, use existing chapter output as scene boundaries
- Short clips (single chapter) → treat as one scene
- Store chapter boundaries as scene candidates

**R1b — Visual fusion (feature films only)**:

- Extract frames at chapter boundaries using Mux thumbnail API: `https://image.mux.com/{PLAYBACK_ID}/thumbnail.jpg?time={SECONDS}`
- Feed frame sequences + transcript to multimodal LLM to refine/merge chapter boundaries into narrative scenes
- Research: evaluate PySceneDetect for shot boundary detection to augment

### R2. Scene Content Description

New service: `apps/manager/src/services/sceneDescription.ts`

```typescript
type SceneAnalysis = {
sceneIndex: number
startSeconds: number
endSeconds: number | null
description: string // concatenated extraction (all signals) — this is what gets embedded
themes: string[] // felt needs: ["forgiveness", "redemption", "grief", "hope"]
bibleVerses: string[] // ["Matthew 6:14-15", "Ephesians 4:32"]
demographics: string[] // ["youth", "student"] — empty if not extractable
chapterTitle: string | null
}

export async function analyzeScene(
muxAssetId: string,
playbackId: string,
startSeconds: number,
endSeconds: number | null,
transcript: string,
metadata: { bibleVerses?: string[]; videoLabel: string },
chapterTitle: string | null,
): Promise<SceneAnalysis>
```

- Send **actual video segment** (not stills) to Gemini 2.5 Flash via its native video input, alongside transcript chunk and CMS metadata
- LLM extracts structured signals (ordered by importance):
1. **Felt needs/themes** (MOST IMPORTANT): forgiveness, hope, grief, loneliness, identity, redemption, belonging, purpose, healing, doubt, courage
2. **Bible verses**: from CMS metadata where available + LLM-identified additional references
3. **Content**: narrative summary, dialogue, message being communicated
4. **Emotional tone**: contemplative, joyful, grieving, urgent, peaceful, hopeful
5. **Demographics** (where extractable): age group, life stage, cultural context
- `description` concatenates all signals into a single text block for embedding, with themes/needs weighted first
- Structured fields stored as arrays for filtering and display
- **Requires new multimodal client** — existing OpenRouter client is text-only and cannot process video

### R3. Scene Embedding + Storage

Create `scene_embeddings` table via bootstrap SQL (same pattern as feat-009):

```sql
CREATE TABLE IF NOT EXISTS scene_embeddings (
id SERIAL PRIMARY KEY,
video_id INTEGER NOT NULL,
core_id TEXT,
mux_asset_id TEXT NOT NULL,
playback_id TEXT NOT NULL,
scene_index INTEGER NOT NULL,
start_seconds FLOAT NOT NULL,
end_seconds FLOAT,
description TEXT NOT NULL, -- concatenated extraction (all signals) — embedded
themes TEXT[] DEFAULT '{}', -- felt needs: {"forgiveness","redemption","grief"}
bible_verses TEXT[] DEFAULT '{}', -- {"Matthew 6:14-15","Ephesians 4:32"}
demographics TEXT[] DEFAULT '{}', -- {"youth","student"} — may be empty
chapter_title TEXT,
embedding vector(1536) NOT NULL,
model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
language TEXT NOT NULL DEFAULT 'en',
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(video_id, scene_index)
);

CREATE INDEX IF NOT EXISTS scene_embeddings_hnsw
ON scene_embeddings USING hnsw (embedding vector_cosine_ops);
CREATE INDEX IF NOT EXISTS scene_embeddings_video_id
ON scene_embeddings(video_id);
CREATE INDEX IF NOT EXISTS scene_embeddings_language
ON scene_embeddings(language);
```

Indexing service: `apps/cms/src/api/scene-embedding/services/indexer.ts`

```typescript
export async function indexSceneEmbeddings(
videoId: number,
scenes: SceneDescription[],
embeddings: number[][],
meta: {
coreId: string
muxAssetId: string
playbackId: string
language: string
},
): Promise<{ scenesIndexed: number }>
```

### R4. Cross-film Recommendation Query

```sql
-- Locale-aware: only return videos available in the user's language
SELECT se.video_id, se.scene_index, se.description, se.start_seconds,
1 - (se.embedding <=> $1) AS similarity
FROM scene_embeddings se
JOIN video_variants vv ON vv.video_id = se.video_id
JOIN languages l ON vv.language_id = l.id
WHERE se.video_id != $2
AND l.bcp47 = $3 -- user's locale
AND se.language IN ('en', 'es', 'fr')
ORDER BY se.embedding <=> $1
LIMIT 10;
```

Expose as CMS service or API endpoint for web/mobile consumption. API accepts optional `rerank` parameter (no-op in Phase 1, reserved for user-driven scoring).

### R5. Backfill Worker

Dedicated Railway service (or separate entry point in manager) for one-time English catalog processing:

- Queue-based: iterate English videos, process each through R1 → R2 → R3
- Resumable: track processed video IDs, skip on restart
- Cost controls: configurable batch size, rate limits, cost tracking per video, auto-pause at threshold
- Dry-run mode: estimate cost without LLM calls

### R6. Pipeline Integration

Add scene vectorization to `videoEnrichment.ts` as an independent branch:

- Runs after transcription completes (needs transcript)
- Also needs muxAssetId/playbackId (for frames) — different input than other parallel steps
- Triggers R1a → R2 → R3 for the new video

## Constraints

- **Phase 1 languages: en, es, fr** — filter by language in all queries and processing. `language` column enables future expansion.
- **No locale bleed** — recommendations are locale-aware. A user's locale determines which results they see. Never recommend the same video in a different language.
- **No human tags** — existing CMS tags are unreliable. All semantic signal comes from LLM-generated scene descriptions only.
- **Pure vector similarity scoring** — no user feedback loop in Phase 1. API accepts optional `rerank` parameter (no-op) to prepare for user-driven scoring in Phase 2.
- **Separate table from `video_embeddings`** — different columns, different query patterns. Do not extend feat-009's table.
- **Do NOT use a Strapi content type** for scene embeddings — pgvector columns don't work with Strapi ORM. Use raw SQL (same pattern as feat-009).
- **Embed once per Video, not per VideoVariant** — language variants share visual content. Dedup by `video_id`.
- **Cost cap** — backfill worker must auto-pause if cumulative cost exceeds configurable threshold.
- **Mux thumbnail API** for frame extraction — do not download full videos. Confirm API supports arbitrary timestamps during planning.

## Verification

1. **Data audit complete**: know English video count by label, duration distribution, chapter coverage
2. **Scene segmentation**: sample 10 feature films, verify scene boundaries align with narrative scenes (not just shot cuts)
3. **Scene descriptions**: sample 20 scenes, verify descriptions capture visual content, not just transcript paraphrasing
4. **Embeddings indexed**: `SELECT COUNT(*) FROM scene_embeddings WHERE language IN ('en', 'es', 'fr')` matches expected scene count
5. **Recommendation quality**: for 50 seed videos, top-10 similar scenes include at least 3 relevant cross-film results for 80% of seeds
6. **No locale bleed**: query recommendations for a Spanish video with locale=es → results are all videos with Spanish variants. Repeat for en and fr. No cross-locale contamination.
7. **Deduplication**: recommendations never surface the same video (different variant) as the input
8. **Cost tracking**: backfill worker logs cumulative cost, stays within budget
9. **Pipeline integration**: upload a new video in en/es/fr → scene embeddings appear in `scene_embeddings` table automatically
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
id: "feat-038"
title: "Video Vectorization — Data Audit"
owner: "nisal"
priority: "P1"
status: "not-started"
start_date: "2026-04-21"
duration: 3
depends_on:
- "feat-037"
blocks:
- "feat-039"
- "feat-042"
tags:
- "cms"
- "pgvector"
---

## Problem

Before building the scene vectorization pipeline, we need to know the shape of the English, Spanish, and French video catalog: how many videos by type, duration distribution, existing chapter coverage, and critically — whether language variants share a Video parent (dedup model). This gates all downstream sizing, cost estimates, and architecture decisions.

## Entry Points — Read These First

1. `apps/cms/src/api/video/content-types/video/schema.json` — Video schema with `label` enum
2. `apps/cms/src/api/video-variant/content-types/video-variant/schema.json` — VideoVariant with language relation
3. `apps/cms/src/api/enrichment-job/content-types/enrichment-job/schema.json` — tracks chapter completion status
4. `docs/brainstorms/2026-04-02-video-content-vectorization-requirements.md` — R0 requirements

## Grep These

- `label` in `apps/cms/src/api/video/` — video type enum values
- `bcp47` in `apps/cms/src/` — language code field for filtering English

## What To Build

Run diagnostic queries against the CMS database:

```sql
-- Video count by label for Phase 1 languages (en, es, fr)
SELECT v.label, l.bcp47, COUNT(*) as count
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 IN ('en', 'es', 'fr')
GROUP BY v.label, l.bcp47 ORDER BY v.label, l.bcp47;

-- Unique Video count (deduped across languages) — this is what we actually process
SELECT COUNT(DISTINCT v.id) as unique_videos
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 IN ('en', 'es', 'fr');

-- Duration distribution for Phase 1 languages
SELECT v.label,
COUNT(*) as count,
ROUND(AVG(vv.duration)) as avg_duration_sec,
MAX(vv.duration) as max_duration_sec
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 IN ('en', 'es', 'fr')
GROUP BY v.label;

-- Chapter metadata coverage
SELECT COUNT(DISTINCT ej.mux_asset_id)
FROM enrichment_jobs ej
WHERE ej.step_statuses->>'chapters' = 'completed';

-- CRITICAL: Confirm Video → VideoVariant dedup model
-- Do en/es/fr variants of the same film share a Video parent?
SELECT v.id, v.label,
COUNT(vv.id) as variant_count,
ARRAY_AGG(DISTINCT l.bcp47) as languages
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 IN ('en', 'es', 'fr')
GROUP BY v.id, v.label
HAVING COUNT(DISTINCT l.bcp47) > 1
ORDER BY variant_count DESC LIMIT 20;

-- How many Videos have variants in multiple Phase 1 languages?
-- (high overlap = dedup model works, low overlap = mostly unique per language)
SELECT multi_lang_count, COUNT(*) as video_count FROM (
SELECT v.id, COUNT(DISTINCT l.bcp47) as multi_lang_count
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 IN ('en', 'es', 'fr')
GROUP BY v.id
) sub GROUP BY multi_lang_count ORDER BY multi_lang_count;
```

Deliverable: update the brainstorm doc cost model with actual numbers. Confirm or revise the ~$130-$400 Phase 1 estimate. **If the dedup model is broken (same film = separate Video records per language), flag immediately — the entire dedup strategy must be revised.**

## Constraints

- Read-only queries — do not modify production data
- Use `strapi.db.connection.raw()` pattern or direct DB access

## Verification

- Know exact video count by label type for en, es, fr
- Know how many unique Video entities span multiple Phase 1 languages (dedup model validation)
- Know duration distribution (what % are short clips vs feature films)
- Know chapter coverage (what % already have scene-like metadata)
- Cost model in brainstorm doc updated with real numbers
- **Dedup model confirmed or red-flagged**: en/es/fr variants of the same film share a Video parent
Loading
Loading