JesusFilm · Kneesal · Apr 2, 2026 · Apr 2, 2026
diff --git a/docs/brainstorms/2026-04-02-video-content-vectorization-requirements.md b/docs/brainstorms/2026-04-02-video-content-vectorization-requirements.md
diff --git a/docs/roadmap/README.md b/docs/roadmap/README.md
@@ -30,12 +30,22 @@ Build trusted, scalable AI capabilities that help people discover gospel content
 
 ### Content Discovery
 
-| ID                                                                    | Feature                               | Owner | Priority | Start  | Days | Status      |
-| --------------------------------------------------------------------- | ------------------------------------- | ----- | -------- | ------ | ---- | ----------- |
-| [feat-009](content-discovery/feat-009-pgvector-embedding-indexing.md) | pgvector Setup and Embedding Indexing | nisal | P0       | Apr 7  | 14   | not-started |
-| [feat-010](content-discovery/feat-010-semantic-search-api.md)         | Semantic Search API                   | nisal | P0       | Apr 14 | 21   | not-started |
-| [feat-011](content-discovery/feat-011-search-ui-web.md)               | Search UI — Web                       | urim  | P0       | Apr 14 | 21   | not-started |
-| [feat-012](content-discovery/feat-012-search-ui-mobile.md)            | Search UI — Mobile                    | urim  | P0       | Apr 14 | 21   | not-started |
+| ID                                                                        | Feature                               | Owner | Priority | Start  | Days | Status      |
+| ------------------------------------------------------------------------- | ------------------------------------- | ----- | -------- | ------ | ---- | ----------- |
+| [feat-009](content-discovery/feat-009-pgvector-embedding-indexing.md)     | pgvector Setup and Embedding Indexing | nisal | P0       | Apr 7  | 14   | not-started |
+| [feat-010](content-discovery/feat-010-semantic-search-api.md)             | Semantic Search API                   | nisal | P0       | Apr 14 | 21   | not-started |
+| [feat-011](content-discovery/feat-011-search-ui-web.md)                   | Search UI — Web                       | urim  | P0       | Apr 14 | 21   | not-started |
+| [feat-012](content-discovery/feat-012-search-ui-mobile.md)                | Search UI — Mobile                    | urim  | P0       | Apr 14 | 21   | not-started |
+| [feat-037](content-discovery/feat-037-video-content-vectorization.md)     | Video Content Vectorization for Recs  | nisal | P1       | Apr 21 | 42   | not-started |
+| [feat-038](content-discovery/feat-038-video-vectorization-data-audit.md)  | Vectorization — Data Audit            | nisal | P1       | Apr 21 | 3    | not-started |
+| [feat-039](content-discovery/feat-039-chapter-based-scene-boundaries.md)  | Vectorization — Scene Boundaries      | nisal | P1       | Apr 24 | 7    | not-started |
+| [feat-040](content-discovery/feat-040-multimodal-scene-descriptions.md)   | Vectorization — Scene Descriptions    | nisal | P1       | May 1  | 10   | not-started |
+| [feat-041](content-discovery/feat-041-scene-embeddings-table.md)          | Vectorization — Embeddings Table      | nisal | P1       | May 11 | 7    | not-started |
+| [feat-042](content-discovery/feat-042-backfill-worker.md)                 | Vectorization — English Backfill      | nisal | P1       | May 18 | 10   | not-started |
+| [feat-043](content-discovery/feat-043-visual-shot-detection-fusion.md)    | Vectorization — Visual Shot Fusion    | nisal | P2       | May 28 | 10   | not-started |
+| [feat-044](content-discovery/feat-044-recommendation-query-api.md)        | Vectorization — Recommendation API    | nisal | P1       | May 28 | 7    | not-started |
+| [feat-045](content-discovery/feat-045-pipeline-integration.md)            | Vectorization — Pipeline Integration  | nisal | P1       | Jun 4  | 7    | not-started |
+| [feat-046](content-discovery/feat-046-recommendations-demo-experience.md) | Vectorization — Recommendations Demo  | nisal | P1       | Jun 4  | 7    | not-started |
 
 ### Topic Experiences
 

diff --git a/docs/roadmap/content-discovery/feat-009-pgvector-embedding-indexing.md b/docs/roadmap/content-discovery/feat-009-pgvector-embedding-indexing.md
@@ -10,6 +10,7 @@ depends_on:
   - "feat-002"
 blocks:
   - "feat-010"
+  - "feat-037"
 tags:
   - "cms"
   - "pgvector"

diff --git a/docs/roadmap/content-discovery/feat-037-video-content-vectorization.md b/docs/roadmap/content-discovery/feat-037-video-content-vectorization.md
@@ -0,0 +1,235 @@
+---
+id: "feat-037"
+title: "Video Content Vectorization for Recommendations"
+owner: "nisal"
+priority: "P1"
+status: "not-started"
+start_date: "2026-04-21"
+duration: 42
+depends_on:
+  - "feat-009"
+  - "feat-031"
+blocks:
+  - "feat-038"
+tags:
+  - "cms"
+  - "pgvector"
+  - "ai-pipeline"
+  - "search"
+  - "manager"
+---
+
+## Problem
+
+Current recommendations are metadata-driven — "you watched Film X, here it is in 1,500 other languages." Transcript embeddings (feat-009/010) capture what was said, but miss what was shown. Visual scene embeddings enable cross-film recommendations based on visual setting, actions, emotional tone, and mood.
+
+**Phase 1 (this feature)**: English, Spanish, and French videos. Three languages are required to verify locale-aware deduplication — a user watching in Spanish must never see the same film recommended in English. Prove recommendation quality at ~$130-$400 estimated cost. Phase 2 (full 50K+ catalog) is a separate funding decision.
+
+## Entry Points — Read These First
+
+1. `apps/manager/src/services/chapters.ts` — existing scene-like segmentation: `Chapter { title, startSeconds, endSeconds, summary }`. This is the baseline for R1a.
+2. `apps/manager/src/services/embeddings.ts` — existing text embedding pipeline using `text-embedding-3-small` (1536 dims). Scene descriptions will be embedded through the same model.
+3. `apps/manager/src/workflows/videoEnrichment.ts` — enrichment workflow with parallel steps. R6 adds scene vectorization as a new branch.
+4. `apps/manager/src/services/storage.ts` — S3 artifact storage pattern (`{assetId}/{type}.json`).
+5. `apps/cms/src/api/video/content-types/video/schema.json` — Video content type with `coreId`, `label` enum, `variants` relation.
+6. `apps/cms/src/api/video-variant/content-types/video-variant/schema.json` — VideoVariant with `language` and `muxVideo` relations.
+7. `apps/cms/src/api/mux-video/content-types/mux-video/schema.json` — MuxVideo with `assetId` and `playbackId` for frame extraction.
+8. `docs/brainstorms/2026-04-02-video-content-vectorization-requirements.md` — full requirements doc with storage schema, cost model, and rollout strategy.
+
+## Grep These
+
+- `chapters` in `apps/manager/src/` — existing chapter/scene segmentation
+- `getOpenrouter` in `apps/manager/src/` — AI model client (text-only; needs multimodal extension)
+- `text-embedding-3-small` in `apps/manager/src/` — embedding model
+- `strapi.db.connection.raw` in `apps/cms/src/` — raw SQL patterns for pgvector
+- `muxAssetId` in `apps/manager/src/` — Mux asset references for frame extraction
+- `playbackId` in `apps/cms/src/` — Mux playback IDs for thumbnail URLs
+- `label` in `apps/cms/src/api/video/` — video type enum (featureFilm, shortFilm, etc.)
+
+## What To Build
+
+### R0. Data Audit (first task)
+
+Query CMS to determine English video landscape:
+
+```sql
+-- Video count by label type
+SELECT label, COUNT(*) FROM videos GROUP BY label;
+
+-- Duration distribution
+SELECT label,
+  COUNT(*) as count,
+  AVG(duration) as avg_duration,
+  MAX(duration) as max_duration
+FROM videos v
+JOIN video_variants vv ON vv.video_id = v.id
+JOIN languages l ON vv.language_id = l.id
+WHERE l.bcp47 = 'en'
+GROUP BY label;
+
+-- Chapter metadata coverage
+SELECT COUNT(DISTINCT ej.mux_asset_id)
+FROM enrichment_jobs ej
+WHERE ej.step_statuses->>'chapters' = 'completed';
+```
+
+### R1. Scene Segmentation
+
+**R1a — Transcript-based (extend chapters.ts)**:
+
+- For each English video, use existing chapter output as scene boundaries
+- Short clips (single chapter) → treat as one scene
+- Store chapter boundaries as scene candidates
+
+**R1b — Visual fusion (feature films only)**:
+
+- Extract frames at chapter boundaries using Mux thumbnail API: `https://image.mux.com/{PLAYBACK_ID}/thumbnail.jpg?time={SECONDS}`
+- Feed frame sequences + transcript to multimodal LLM to refine/merge chapter boundaries into narrative scenes
+- Research: evaluate PySceneDetect for shot boundary detection to augment
+
+### R2. Scene Content Description
+
+New service: `apps/manager/src/services/sceneDescription.ts`
+
+```typescript
+type SceneAnalysis = {
+  sceneIndex: number
+  startSeconds: number
+  endSeconds: number | null
+  description: string // concatenated extraction (all signals) — this is what gets embedded
+  themes: string[] // felt needs: ["forgiveness", "redemption", "grief", "hope"]
+  bibleVerses: string[] // ["Matthew 6:14-15", "Ephesians 4:32"]
+  demographics: string[] // ["youth", "student"] — empty if not extractable
+  chapterTitle: string | null
+}
+
+export async function analyzeScene(
+  muxAssetId: string,
+  playbackId: string,
+  startSeconds: number,
+  endSeconds: number | null,
+  transcript: string,
+  metadata: { bibleVerses?: string[]; videoLabel: string },
+  chapterTitle: string | null,
+): Promise<SceneAnalysis>
+```
+
+- Send **actual video segment** (not stills) to Gemini 2.5 Flash via its native video input, alongside transcript chunk and CMS metadata
+- LLM extracts structured signals (ordered by importance):
+  1. **Felt needs/themes** (MOST IMPORTANT): forgiveness, hope, grief, loneliness, identity, redemption, belonging, purpose, healing, doubt, courage
+  2. **Bible verses**: from CMS metadata where available + LLM-identified additional references
+  3. **Content**: narrative summary, dialogue, message being communicated
+  4. **Emotional tone**: contemplative, joyful, grieving, urgent, peaceful, hopeful
+  5. **Demographics** (where extractable): age group, life stage, cultural context
+- `description` concatenates all signals into a single text block for embedding, with themes/needs weighted first
+- Structured fields stored as arrays for filtering and display
+- **Requires new multimodal client** — existing OpenRouter client is text-only and cannot process video
+
+### R3. Scene Embedding + Storage
+
+Create `scene_embeddings` table via bootstrap SQL (same pattern as feat-009):
+
+```sql
+CREATE TABLE IF NOT EXISTS scene_embeddings (
+  id            SERIAL PRIMARY KEY,
+  video_id      INTEGER NOT NULL,
+  core_id       TEXT,
+  mux_asset_id  TEXT NOT NULL,
+  playback_id   TEXT NOT NULL,
+  scene_index   INTEGER NOT NULL,
+  start_seconds FLOAT NOT NULL,
+  end_seconds   FLOAT,
+  description   TEXT NOT NULL,              -- concatenated extraction (all signals) — embedded
+  themes        TEXT[] DEFAULT '{}',        -- felt needs: {"forgiveness","redemption","grief"}
+  bible_verses  TEXT[] DEFAULT '{}',        -- {"Matthew 6:14-15","Ephesians 4:32"}
+  demographics  TEXT[] DEFAULT '{}',        -- {"youth","student"} — may be empty
+  chapter_title TEXT,
+  embedding     vector(1536) NOT NULL,
+  model         TEXT NOT NULL DEFAULT 'text-embedding-3-small',
+  language      TEXT NOT NULL DEFAULT 'en',
+  created_at    TIMESTAMPTZ DEFAULT NOW(),
+  UNIQUE(video_id, scene_index)
+);
+
+CREATE INDEX IF NOT EXISTS scene_embeddings_hnsw
+  ON scene_embeddings USING hnsw (embedding vector_cosine_ops);
+CREATE INDEX IF NOT EXISTS scene_embeddings_video_id
+  ON scene_embeddings(video_id);
+CREATE INDEX IF NOT EXISTS scene_embeddings_language
+  ON scene_embeddings(language);
+```
+
+Indexing service: `apps/cms/src/api/scene-embedding/services/indexer.ts`
+
+```typescript
+export async function indexSceneEmbeddings(
+  videoId: number,
+  scenes: SceneDescription[],
+  embeddings: number[][],
+  meta: {
+    coreId: string
+    muxAssetId: string
+    playbackId: string
+    language: string
+  },
+): Promise<{ scenesIndexed: number }>
+```
+
+### R4. Cross-film Recommendation Query
+
+```sql
+-- Locale-aware: only return videos available in the user's language
+SELECT se.video_id, se.scene_index, se.description, se.start_seconds,
+       1 - (se.embedding <=> $1) AS similarity
+FROM scene_embeddings se
+JOIN video_variants vv ON vv.video_id = se.video_id
+JOIN languages l ON vv.language_id = l.id
+WHERE se.video_id != $2
+  AND l.bcp47 = $3               -- user's locale
+  AND se.language IN ('en', 'es', 'fr')
+ORDER BY se.embedding <=> $1
+LIMIT 10;
+```
+
+Expose as CMS service or API endpoint for web/mobile consumption. API accepts optional `rerank` parameter (no-op in Phase 1, reserved for user-driven scoring).
+
+### R5. Backfill Worker
+
+Dedicated Railway service (or separate entry point in manager) for one-time English catalog processing:
+
+- Queue-based: iterate English videos, process each through R1 → R2 → R3
+- Resumable: track processed video IDs, skip on restart
+- Cost controls: configurable batch size, rate limits, cost tracking per video, auto-pause at threshold
+- Dry-run mode: estimate cost without LLM calls
+
+### R6. Pipeline Integration
+
+Add scene vectorization to `videoEnrichment.ts` as an independent branch:
+
+- Runs after transcription completes (needs transcript)
+- Also needs muxAssetId/playbackId (for frames) — different input than other parallel steps
+- Triggers R1a → R2 → R3 for the new video
+
+## Constraints
+
+- **Phase 1 languages: en, es, fr** — filter by language in all queries and processing. `language` column enables future expansion.
+- **No locale bleed** — recommendations are locale-aware. A user's locale determines which results they see. Never recommend the same video in a different language.
+- **No human tags** — existing CMS tags are unreliable. All semantic signal comes from LLM-generated scene descriptions only.
+- **Pure vector similarity scoring** — no user feedback loop in Phase 1. API accepts optional `rerank` parameter (no-op) to prepare for user-driven scoring in Phase 2.
+- **Separate table from `video_embeddings`** — different columns, different query patterns. Do not extend feat-009's table.
+- **Do NOT use a Strapi content type** for scene embeddings — pgvector columns don't work with Strapi ORM. Use raw SQL (same pattern as feat-009).
+- **Embed once per Video, not per VideoVariant** — language variants share visual content. Dedup by `video_id`.
+- **Cost cap** — backfill worker must auto-pause if cumulative cost exceeds configurable threshold.
+- **Mux thumbnail API** for frame extraction — do not download full videos. Confirm API supports arbitrary timestamps during planning.
+
+## Verification
+
+1. **Data audit complete**: know English video count by label, duration distribution, chapter coverage
+2. **Scene segmentation**: sample 10 feature films, verify scene boundaries align with narrative scenes (not just shot cuts)
+3. **Scene descriptions**: sample 20 scenes, verify descriptions capture visual content, not just transcript paraphrasing
+4. **Embeddings indexed**: `SELECT COUNT(*) FROM scene_embeddings WHERE language IN ('en', 'es', 'fr')` matches expected scene count
+5. **Recommendation quality**: for 50 seed videos, top-10 similar scenes include at least 3 relevant cross-film results for 80% of seeds
+6. **No locale bleed**: query recommendations for a Spanish video with locale=es → results are all videos with Spanish variants. Repeat for en and fr. No cross-locale contamination.
+7. **Deduplication**: recommendations never surface the same video (different variant) as the input
+8. **Cost tracking**: backfill worker logs cumulative cost, stays within budget
+9. **Pipeline integration**: upload a new video in en/es/fr → scene embeddings appear in `scene_embeddings` table automatically
diff --git a/docs/roadmap/content-discovery/feat-038-video-vectorization-data-audit.md b/docs/roadmap/content-discovery/feat-038-video-vectorization-data-audit.md
@@ -0,0 +1,110 @@
+---
+id: "feat-038"
+title: "Video Vectorization — Data Audit"
+owner: "nisal"
+priority: "P1"
+status: "not-started"
+start_date: "2026-04-21"
+duration: 3
+depends_on:
+  - "feat-037"
+blocks:
+  - "feat-039"
+  - "feat-042"
+tags:
+  - "cms"
+  - "pgvector"
+---
+
+## Problem
+
+Before building the scene vectorization pipeline, we need to know the shape of the English, Spanish, and French video catalog: how many videos by type, duration distribution, existing chapter coverage, and critically — whether language variants share a Video parent (dedup model). This gates all downstream sizing, cost estimates, and architecture decisions.
+
+## Entry Points — Read These First
+
+1. `apps/cms/src/api/video/content-types/video/schema.json` — Video schema with `label` enum
+2. `apps/cms/src/api/video-variant/content-types/video-variant/schema.json` — VideoVariant with language relation
+3. `apps/cms/src/api/enrichment-job/content-types/enrichment-job/schema.json` — tracks chapter completion status
+4. `docs/brainstorms/2026-04-02-video-content-vectorization-requirements.md` — R0 requirements
+
+## Grep These
+
+- `label` in `apps/cms/src/api/video/` — video type enum values
+- `bcp47` in `apps/cms/src/` — language code field for filtering English
+
+## What To Build
+
+Run diagnostic queries against the CMS database:
+
+```sql
+-- Video count by label for Phase 1 languages (en, es, fr)
+SELECT v.label, l.bcp47, COUNT(*) as count
+FROM videos v
+JOIN video_variants vv ON vv.video_id = v.id
+JOIN languages l ON vv.language_id = l.id
+WHERE l.bcp47 IN ('en', 'es', 'fr')
+GROUP BY v.label, l.bcp47 ORDER BY v.label, l.bcp47;
+
+-- Unique Video count (deduped across languages) — this is what we actually process
+SELECT COUNT(DISTINCT v.id) as unique_videos
+FROM videos v
+JOIN video_variants vv ON vv.video_id = v.id
+JOIN languages l ON vv.language_id = l.id
+WHERE l.bcp47 IN ('en', 'es', 'fr');
+
+-- Duration distribution for Phase 1 languages
+SELECT v.label,
+  COUNT(*) as count,
+  ROUND(AVG(vv.duration)) as avg_duration_sec,
+  MAX(vv.duration) as max_duration_sec
+FROM videos v
+JOIN video_variants vv ON vv.video_id = v.id
+JOIN languages l ON vv.language_id = l.id
+WHERE l.bcp47 IN ('en', 'es', 'fr')
+GROUP BY v.label;
+
+-- Chapter metadata coverage
+SELECT COUNT(DISTINCT ej.mux_asset_id)
+FROM enrichment_jobs ej
+WHERE ej.step_statuses->>'chapters' = 'completed';
+
+-- CRITICAL: Confirm Video → VideoVariant dedup model
+-- Do en/es/fr variants of the same film share a Video parent?
+SELECT v.id, v.label,
+  COUNT(vv.id) as variant_count,
+  ARRAY_AGG(DISTINCT l.bcp47) as languages
+FROM videos v
+JOIN video_variants vv ON vv.video_id = v.id
+JOIN languages l ON vv.language_id = l.id
+WHERE l.bcp47 IN ('en', 'es', 'fr')
+GROUP BY v.id, v.label
+HAVING COUNT(DISTINCT l.bcp47) > 1
+ORDER BY variant_count DESC LIMIT 20;
+
+-- How many Videos have variants in multiple Phase 1 languages?
+-- (high overlap = dedup model works, low overlap = mostly unique per language)
+SELECT multi_lang_count, COUNT(*) as video_count FROM (
+  SELECT v.id, COUNT(DISTINCT l.bcp47) as multi_lang_count
+  FROM videos v
+  JOIN video_variants vv ON vv.video_id = v.id
+  JOIN languages l ON vv.language_id = l.id
+  WHERE l.bcp47 IN ('en', 'es', 'fr')
+  GROUP BY v.id
+) sub GROUP BY multi_lang_count ORDER BY multi_lang_count;
+```
+
+Deliverable: update the brainstorm doc cost model with actual numbers. Confirm or revise the ~$130-$400 Phase 1 estimate. **If the dedup model is broken (same film = separate Video records per language), flag immediately — the entire dedup strategy must be revised.**
+
+## Constraints
+
+- Read-only queries — do not modify production data
+- Use `strapi.db.connection.raw()` pattern or direct DB access
+
+## Verification
+
+- Know exact video count by label type for en, es, fr
+- Know how many unique Video entities span multiple Phase 1 languages (dedup model validation)
+- Know duration distribution (what % are short clips vs feature films)
+- Know chapter coverage (what % already have scene-like metadata)
+- Cost model in brainstorm doc updated with real numbers
+- **Dedup model confirmed or red-flagged**: en/es/fr variants of the same film share a Video parent