feat(analytics): video watch percentage and drop-off tracking#1903
feat(analytics): video watch percentage and drop-off tracking#1903ManthanNimodiya wants to merge 8 commits into
Conversation
369013b to
9c5aca3
Compare
| const raf = requestAnimationFrame(() => attach()); | ||
| return () => cancelAnimationFrame(raf); |
There was a problem hiding this comment.
RAF cleanup discards event-listener detachment
When playerRef.current is null at effect mount time, a requestAnimationFrame is scheduled and the effect returns () => cancelAnimationFrame(raf) as its cleanup. When the RAF fires and attach() successfully adds the timeupdate listener, the returned cleanup closure is thrown away — React's effect cleanup only cancels the RAF (a no-op if it already ran), so the listener is never removed on unmount. This leaves a dangling timeupdate callback that continues firing and sending beacons after the component is gone.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/app/s/[videoId]/Share.tsx
Line: 397-398
Comment:
**RAF cleanup discards event-listener detachment**
When `playerRef.current` is `null` at effect mount time, a `requestAnimationFrame` is scheduled and the effect returns `() => cancelAnimationFrame(raf)` as its cleanup. When the RAF fires and `attach()` successfully adds the `timeupdate` listener, the returned cleanup closure is thrown away — React's effect cleanup only cancels the RAF (a no-op if it already ran), so the listener is never removed on unmount. This leaves a dangling `timeupdate` callback that continues firing and sending beacons after the component is gone.
How can I resolve this? If you propose a fix, please make it concise.| if (videoRecord && userId === videoRecord.ownerId) return; | ||
|
|
||
| const tinybird = yield* Tinybird; | ||
| yield* tinybird.appendEvents([ | ||
| { | ||
| timestamp: new Date().toISOString(), | ||
| action: "video_progress", | ||
| version: "1.0", | ||
| session_id: sessionId ?? "anon", | ||
| video_id: body.videoId, |
There was a problem hiding this comment.
Non-existent video IDs written to Tinybird
When the DB lookup fails (caught by orElseSucceed) or the video simply doesn't exist, videoRecord is undefined. The combined guard if (videoRecord && userId === videoRecord.ownerId) return evaluates to false, so execution falls through and the event is appended to Tinybird with whatever arbitrary string was in body.videoId. An unauthenticated caller can pollute Tinybird with events for non-existent videos. Add an explicit if (!videoRecord) return before the owner check to prevent writes for unknown video IDs.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/app/api/analytics/track/route.ts
Line: 121-130
Comment:
**Non-existent video IDs written to Tinybird**
When the DB lookup fails (caught by `orElseSucceed`) or the video simply doesn't exist, `videoRecord` is `undefined`. The combined guard `if (videoRecord && userId === videoRecord.ownerId) return` evaluates to `false`, so execution falls through and the event is appended to Tinybird with whatever arbitrary string was in `body.videoId`. An unauthenticated caller can pollute Tinybird with events for non-existent videos. Add an explicit `if (!videoRecord) return` before the owner check to prevent writes for unknown video IDs.
How can I resolve this? If you propose a fix, please make it concise.| timestamp: new Date().toISOString(), | ||
| action: "video_progress", | ||
| version: "1.0", | ||
| session_id: sessionId ?? "anon", |
There was a problem hiding this comment.
"anon" collapses all sessionless viewers into one ClickHouse row
When sessionId is null, every event is stored with session_id = 'anon'. The aggregation query in getVideoEngagement does GROUP BY session_id, so all these events merge into a single session row. For a video with many anonymous viewers this makes count() (total sessions) report 1 and per-milestone counts wildly wrong. Using crypto.randomUUID() as the fallback would give each request a unique session and keep the per-session aggregation correct.
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/app/api/analytics/track/route.ts
Line: 129
Comment:
**`"anon"` collapses all sessionless viewers into one ClickHouse row**
When `sessionId` is `null`, every event is stored with `session_id = 'anon'`. The aggregation query in `getVideoEngagement` does `GROUP BY session_id`, so all these events merge into a single session row. For a video with many anonymous viewers this makes `count()` (total sessions) report 1 and per-milestone counts wildly wrong. Using `crypto.randomUUID()` as the fallback would give each request a unique session and keep the per-session aggregation correct.
How can I resolve this? If you propose a fix, please make it concise.…ClickHouse safety
|
Superagent didn't find any vulnerabilities or security issues in this PR. |
|
@greptileai please re-review |
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
…n session collision
…g tinybird query errors
|
@greptileai please re-review |
| if (action === "video_progress") { | ||
| const sessionId = | ||
| typeof body.sessionId === "string" | ||
| ? body.sessionId.trim().slice(0, 128) || null | ||
| : null; | ||
| const percentWatched = | ||
| typeof body.percentWatched === "number" && | ||
| body.percentWatched >= 0 && | ||
| body.percentWatched <= 100 | ||
| ? Math.round(body.percentWatched) | ||
| : null; | ||
|
|
||
| if (percentWatched !== null) { | ||
| await runPromise( | ||
| Effect.gen(function* () { | ||
| const maybeUser = yield* Effect.serviceOption(CurrentUser); | ||
| const userId = Option.match(maybeUser, { | ||
| onNone: () => null as string | null, | ||
| onSome: (user) => (user as { id: string }).id, | ||
| }); | ||
|
|
||
| const [videoRecord] = yield* Effect.tryPromise(() => | ||
| db() | ||
| .select({ ownerId: videos.ownerId }) | ||
| .from(videos) | ||
| .where(eq(videos.id, Video.VideoId.make(body.videoId))) | ||
| .limit(1), | ||
| ).pipe(Effect.orElseSucceed(() => [] as { ownerId: string }[])); | ||
|
|
||
| if (!videoRecord || userId === videoRecord.ownerId) return; | ||
|
|
||
| const tinybird = yield* Tinybird; | ||
| yield* tinybird.appendEvents([ | ||
| { | ||
| timestamp: new Date().toISOString(), | ||
| action: "video_progress", | ||
| version: "1.0", | ||
| session_id: sessionId ?? randomUUID(), | ||
| video_id: body.videoId, | ||
| percent_watched: percentWatched, | ||
| }, | ||
| ]); | ||
| }).pipe(provideOptionalAuth), | ||
| ); | ||
| } | ||
| return Response.json({ success: true }); |
There was a problem hiding this comment.
video_progress branch re-parses sessionId and shadows the already-sanitized sessionId above (including allowing the literal "anonymous" back in). That can collapse engagement sessions for clients where storage fails. Also seems better to use the already-computed timestamp.
| if (action === "video_progress") { | |
| const sessionId = | |
| typeof body.sessionId === "string" | |
| ? body.sessionId.trim().slice(0, 128) || null | |
| : null; | |
| const percentWatched = | |
| typeof body.percentWatched === "number" && | |
| body.percentWatched >= 0 && | |
| body.percentWatched <= 100 | |
| ? Math.round(body.percentWatched) | |
| : null; | |
| if (percentWatched !== null) { | |
| await runPromise( | |
| Effect.gen(function* () { | |
| const maybeUser = yield* Effect.serviceOption(CurrentUser); | |
| const userId = Option.match(maybeUser, { | |
| onNone: () => null as string | null, | |
| onSome: (user) => (user as { id: string }).id, | |
| }); | |
| const [videoRecord] = yield* Effect.tryPromise(() => | |
| db() | |
| .select({ ownerId: videos.ownerId }) | |
| .from(videos) | |
| .where(eq(videos.id, Video.VideoId.make(body.videoId))) | |
| .limit(1), | |
| ).pipe(Effect.orElseSucceed(() => [] as { ownerId: string }[])); | |
| if (!videoRecord || userId === videoRecord.ownerId) return; | |
| const tinybird = yield* Tinybird; | |
| yield* tinybird.appendEvents([ | |
| { | |
| timestamp: new Date().toISOString(), | |
| action: "video_progress", | |
| version: "1.0", | |
| session_id: sessionId ?? randomUUID(), | |
| video_id: body.videoId, | |
| percent_watched: percentWatched, | |
| }, | |
| ]); | |
| }).pipe(provideOptionalAuth), | |
| ); | |
| } | |
| return Response.json({ success: true }); | |
| if (action === "video_progress") { | |
| const percentWatched = | |
| typeof body.percentWatched === "number" && | |
| body.percentWatched >= 0 && | |
| body.percentWatched <= 100 | |
| ? Math.round(body.percentWatched) | |
| : null; | |
| if (percentWatched !== null) { | |
| await runPromise( | |
| Effect.gen(function* () { | |
| const maybeUser = yield* Effect.serviceOption(CurrentUser); | |
| const userId = Option.match(maybeUser, { | |
| onNone: () => null as string | null, | |
| onSome: (user) => (user as { id: string }).id, | |
| }); | |
| const [videoRecord] = yield* Effect.tryPromise(() => | |
| db() | |
| .select({ ownerId: videos.ownerId }) | |
| .from(videos) | |
| .where(eq(videos.id, Video.VideoId.make(body.videoId))) | |
| .limit(1), | |
| ).pipe(Effect.orElseSucceed(() => [] as { ownerId: string }[])); | |
| if (!videoRecord || userId === videoRecord.ownerId) return; | |
| const tinybird = yield* Tinybird; | |
| yield* tinybird.appendEvents([ | |
| { | |
| timestamp: timestamp.toISOString(), | |
| action: "video_progress", | |
| version: "1.0", | |
| session_id: sessionId ?? randomUUID(), | |
| video_id: body.videoId, | |
| percent_watched: percentWatched, | |
| }, | |
| ]); | |
| }).pipe(provideOptionalAuth), | |
| ); | |
| } | |
| return Response.json({ success: true }); | |
| } |
There was a problem hiding this comment.
Resolved in df58819 — the video_progress branch no longer re-parses sessionId; it reuses the sanitized sessionId (with randomUUID() fallback instead of letting "anonymous" back in) and the already-computed timestamp. tenant_id was re-added in 6f7c8b7.
|
|
||
| if (!video || video.ownerId !== user.id) throw new Error("Unauthorized"); | ||
|
|
||
| const safeId = videoId; |
There was a problem hiding this comment.
Minor: since this ends up interpolated into SQL, it’d be a bit more future-proof/consistent to reuse the existing literal escaping helper.
| const safeId = videoId; | |
| const safeId = escapeLiteral(videoId); |
There was a problem hiding this comment.
Resolved in df58819 — now uses the shared escapeLiteral helper, and videoId is also regex-validated (/^[0-9a-zA-Z_-]+$/) before it reaches the query.
…ogress, escape video id literal
|
@tembo please re-review |
re-reviewed pr #1903 at the latest commit (df58819). all four previously-flagged items are now fixed:
one item still open: the requestAnimationFrame path in Share.tsx leaks the timeupdate listener (the cleanup returned by attach() inside the raf is discarded). low real-world impact but i left an inline suggestion with a concrete fix. good to merge once that's handled. posted as a review on the pr. |
|
@greptileai please re-review |
There was a problem hiding this comment.
Re-review
Thanks for the iterations — everything I (and Greptile) flagged previously is now addressed:
- ✅ Non-existent / failed-lookup video IDs no longer written to Tinybird —
if (!videoRecord || userId === videoRecord.ownerId) return(route.ts:118). - ✅
sessionIdshadow no longer re-admits the literal"anonymous"— the inner re-declaration is gone; thevideo_progressbranch uses the outer sanitizedsessionId ?? randomUUID()(route.ts:126), giving each sessionless viewer a distinct id so the per-sessionGROUP BYingetVideoEngagementstays correct. - ✅ Reuses the existing
timestamp(route.ts:123). - ✅
getVideoEngagementis owner-gated and SQL-safe — auth + ownership check, regex^[0-9a-zA-Z_-]+$validation, andescapeLiteral(videoId)before interpolation. - ✅ Tinybird
querySqlrefactor is correct — moving theerror-field check outside theJSON.parsetry/catch means a Tinybird error object now propagates instead of silently falling through to TSV parsing.
One item still open
⚠️ RAF path leaks thetimeupdatelistener inShare.tsx(inline suggestion above). Low real-world impact sinceplayerRefis usually populated synchronously, but worth the small fix to avoid duplicate beacons after unmount / dep changes.
Looks good to merge once the listener cleanup is handled.
| const attach = () => { | ||
| const video = playerRef.current; | ||
| if (!video) return null; | ||
| video.addEventListener("timeupdate", onTimeUpdate); | ||
| return () => video.removeEventListener("timeupdate", onTimeUpdate); | ||
| }; | ||
|
|
||
| const detach = attach(); | ||
| if (detach) return detach; | ||
|
|
||
| const raf = requestAnimationFrame(() => attach()); | ||
| return () => cancelAnimationFrame(raf); |
There was a problem hiding this comment.
RAF cleanup still discards the listener detachment (open from prior review).
When playerRef.current is null at effect-mount time, a requestAnimationFrame is scheduled and the effect returns () => cancelAnimationFrame(raf). Once the RAF fires and attach() succeeds, the cleanup closure it returns is thrown away — React's cleanup only cancels the already-run RAF (a no-op), so the timeupdate listener is never removed. On unmount or when the deps (data.id, viewerId) change, the old listener keeps firing and sending beacons, and effect re-runs can stack duplicate listeners on the same element.
Capture the detach from whichever path attaches and call it in the cleanup:
| const attach = () => { | |
| const video = playerRef.current; | |
| if (!video) return null; | |
| video.addEventListener("timeupdate", onTimeUpdate); | |
| return () => video.removeEventListener("timeupdate", onTimeUpdate); | |
| }; | |
| const detach = attach(); | |
| if (detach) return detach; | |
| const raf = requestAnimationFrame(() => attach()); | |
| return () => cancelAnimationFrame(raf); | |
| let cleanup: (() => void) | null = null; | |
| const attach = () => { | |
| const video = playerRef.current; | |
| if (!video) return false; | |
| video.addEventListener("timeupdate", onTimeUpdate); | |
| cleanup = () => video.removeEventListener("timeupdate", onTimeUpdate); | |
| return true; | |
| }; | |
| let raf = 0; | |
| if (!attach()) raf = requestAnimationFrame(() => attach()); | |
| return () => { | |
| cancelAnimationFrame(raf); | |
| cleanup?.(); | |
| }; |
There was a problem hiding this comment.
Resolved in 19b779c. The current effect captures cleanup from whichever path attaches (sync or RAF-deferred) and invokes it in the returned cleanup, so the timeupdate listener is now detached on unmount/dep-change.
One residual caveat (non-blocking): when playerRef.current is still null after the single RAF retry — e.g. the <video> mounts late after processing — attach() gives up and milestones for that view are missed. ShareVideo exposes the element via useImperativeHandle(ref, ..., []) (empty deps), so playerRef.current is a one-time snapshot and won't repopulate later. Pre-existing pattern, fine to defer.
|
@tembo @greptileai please re-review |
re-reviewed pr #1903 at the latest commit (19b779c) and posted findings: #1903 (comment) quick summary: ✅ resolved since last review
🔴 still blocking
🟡 minor
recommendation: approve once the datasource schema change lands with this code. (the auth.rs greptile note is outside this pr's diff — ignored it.) |
Re-review (Tembo)Re-reviewed at ✅ Resolved since the previous review
🔴 Blocking — still unaddressed
Net: the feature is dead-on-arrival until the column is added. Add it to the (Use 🟡 Minor (non-blocking)
The Recommendation: approve once the |
| `SELECT count() as total, countIf(max_percent >= 25) as reached_25, countIf(max_percent >= 50) as reached_50, countIf(max_percent >= 75) as reached_75, countIf(max_percent >= 95) as reached_95, round(avg(max_percent)) as avg_percent FROM (SELECT session_id, max(toFloat32(percent_watched)) as max_percent FROM analytics_events WHERE action = 'video_progress' AND video_id = '${safeId}' GROUP BY session_id)`, | ||
| ) | ||
| .pipe( | ||
| Effect.catchAll(() => |
There was a problem hiding this comment.
Catching everything here and returning an empty dataset makes schema/query issues (e.g. missing percent_watched) invisible and hard to debug. Consider logging the error like getVideoAnalytics does.
| Effect.catchAll(() => | |
| .pipe( | |
| Effect.catchAll((error) => { | |
| console.error("tinybird engagement query error", error); | |
| return Effect.succeed({ | |
| data: [] as { | |
| total: number; | |
| reached_25: number; | |
| reached_50: number; | |
| reached_75: number; | |
| reached_95: number; | |
| avg_percent: number; | |
| }[], | |
| }); | |
| }), | |
| ); |
There was a problem hiding this comment.
Resolved in 6f7c8b7 — the catchAll now logs via console.error("tinybird engagement query error", e) before returning the empty dataset, matching getVideoAnalytics.
…t query errors, tag video_progress with tenant_id
|
@tembo please re-review |
re-reviewed pr #1903 at the latest commit (6f7c8b7) and posted a review on github. verdict: looks good, no blocking issues. every previously-flagged item is now resolved:
i also verified auth gating, the drop-off bar math, and the new two minor non-blocking notes left inline: (1) the localStorage-failure fallback can inflate session counts since each milestone beacon gets a fresh uuid, and (2) a late-mounting fyi the |
There was a problem hiding this comment.
Re-review — looks good ✅
Re-reviewed at 6f7c8b7. Every issue from the prior round is addressed:
| Concern | Status |
|---|---|
RAF cleanup discarded the timeupdate detach (P1, greptile + tembo) |
✅ 19b779c — cleanup captured from sync/RAF path and called on unmount |
| Non-existent video IDs written to Tinybird (P1) | ✅ 6fb2399/e99a8bf — explicit !videoRecord guard |
"anon" collapsed all sessionless viewers into one session (P1) |
✅ randomUUID() fallback |
video_progress re-parsed/shadowed sanitized sessionId + recomputed timestamp |
✅ df58819 — reuses sanitized sessionId and existing timestamp |
videoId SQL safety in getVideoEngagement |
✅ regex-validated and escapeLiteral |
| Engagement query swallowed errors silently | ✅ 6f7c8b7 — logs before returning empty |
Tinybird querySql mis-classified API errors as TSV |
✅ JSON.parse separated from the error-field check; errors now propagate |
Verified end-to-end
- Auth:
getVideoEngagementthrowsUnauthorizedfor non-owners and unauthenticated callers; owner self-views excluded from both the write path and the query. - UI:
DropOffBarmath is correct; section is owner-gated and hidden untiltotal > 0, so theavg(empty)NaN case never renders. - Schema:
percent_watched UInt8 DEFAULT 0added to the datasource +FORWARD_QUERY;page_hitevents sendpercent_watched: null→ resolves to the column default.
Minor, non-blocking (fine to defer)
- Sessionless fallback inflates session counts. When the client can't persist a session id (localStorage failure →
"anonymous"→ serverrandomUUID()), each of the 4 milestone beacons gets a distinct UUID, so one such viewer shows up as up to 4 sessions. Strictly better than the old"anon"collapse, and only hits the storage-failure edge case — just noting the trade-off. - Late-mounting
<video>misses milestones.ShareVideoexposes the element viauseImperativeHandle(ref, …, [])(empty deps), soplayerRef.currentis a one-time snapshot. If the video element mounts after the analytics effect runs (e.g. processing → ready), the single RAF retry can't recover it. Pre-existing pattern; doesn't affect already-ready videos.
The auth.rs:107 item in greptile's "Comments Outside Diff" is not part of this PR (the 6 changed files are all analytics) — it's base-branch divergence noise.
No blocking issues. 👍
Summary
Adds watch engagement analytics to the video share page, requested by the community.
Security
Greptile Summary
This PR adds video watch-engagement analytics (25/50/75/95% milestones) end-to-end: client-side
sendBeacontracking, a dedicated/api/analytics/trackbranch forvideo_progressevents that skips owner self-views and non-existent videos, an owner-onlygetVideoEngagement()server action querying per-session max milestones via a ClickHouse subquery, and an in-Analytics-tab drop-off funnel UI. The same diff also fixes a pre-existingquerySqlbug in Tinybird where API-level errors were silently swallowed by the TSV fallback path instead of propagating.route.ts:video_progresspath adds a!videoRecordguard to block writes for unknown video IDs and usesrandomUUID()as a sessionless fallback (replacing the previous"anon"literal that collapsed all anonymous viewers into one ClickHouse row).get-analytics.ts:getVideoEngagement()is auth-gated server-side (throwsUnauthorizedfor non-owners), validatesvideoIdvia regex before SQL interpolation, and aggregates the inner subquery correctly.Tinybird/index.ts: Addspercent_watchedto the event schema and separatesJSON.parsefrom the error-field check so Tinybird query errors are no longer mis-classified as TSV and silently turned into garbage data.Confidence Score: 4/5
Safe to merge once the dangling-listener concern raised in the open Share.tsx thread is confirmed resolved — all other previously flagged issues (non-existent-video writes, anon-session collapsing) are addressed in the current code.
The route and server-action logic are defensively written: the !videoRecord guard, owner-exclusion check, and randomUUID() fallback are all present and correct. The Tinybird querySql refactoring properly moves Tinybird error propagation outside the try-catch, a genuine correctness improvement. The main outstanding item is the Share.tsx RAF/cleanup pattern flagged in a prior review thread — examining the code, the cleanup variable IS read through its closure so the listener SHOULD be removed on unmount, but this warrants explicit confirmation before the thread is resolved.
apps/web/app/s/[videoId]/Share.tsx — the RAF-deferred event-listener attach/cleanup pattern has an open review thread that should be resolved before merging.
Important Files Changed
Comments Outside Diff (1)
apps/desktop/src-tauri/src/auth.rs, line 107 (link)organizations_updated_atset unconditionally outside the matchThe unconditional assignment on line 107 is now outside both arms of the
match, so it runs on both success (redundant, since theOkarm already sets it at line 101) and on every error. The original code intentionally skipped updating the timestamp on error whenauth.organizationswas non-empty, so that a subsequent call would still attempt a refresh. With this change, a transient network failure immediately marks the cache as "just checked", silencing retries for the full backoff window even when the stale data should be re-fetched.Prompt To Fix With AI
Reviews (5): Last reviewed commit: "fix(analytics): add percent_watched to t..." | Re-trigger Greptile