MediaRef's video decoders (PyAVVideoDecoder and TorchCodecVideoDecoder) follow unified playback semantics for frame retrieval. This ensures consistent behavior regardless of which decoder backend you use.
When you query a frame at a specific timestamp, MediaRef returns the frame that would be displayed on screen at that moment—not the closest frame, not the next frame, but the frame currently being shown.
frame[i].pts <= query_time < frame[i+1].pts → Returns frame[i]
This model is based on how video players work: a frame is displayed starting at its presentation timestamp (PTS) until the next frame's PTS.
For a 25 FPS video where frames have PTS values at 0.00s, 0.04s, 0.08s, ...:
| Query Time | Returned Frame | Explanation |
|---|---|---|
0.00s |
Frame 0 | Exactly at frame 0's PTS |
0.02s |
Frame 0 | Between frame 0 and frame 1 |
0.04s |
Frame 1 | Exactly at frame 1's PTS |
0.05s |
Frame 1 | Between frame 1 and frame 2 |
| Condition | Behavior |
|---|---|
timestamp < begin_stream_seconds |
ValueError |
timestamp >= end_stream_seconds |
ValueError |
timestamp == frame[i].pts |
Returns frame[i] |
frame[i].pts < timestamp < frame[i+1].pts |
Returns frame[i] |
begin_stream_seconds: The first frame's PTS. This is often0.0, but not always—some videos start at non-zero timestamps.end_stream_seconds: Calculated asbegin_stream_seconds + duration_seconds. Queries at or beyond this point raiseValueError.
The last frame's valid query range is [last_frame.pts, end_stream_seconds).
For example, if the last frame has PTS = 9.96s and end_stream_seconds = 10.0s:
- Query
9.96s→ Returns last frame ✓ - Query
9.99s→ Returns last frame ✓ - Query
10.0s→ValueError✗
This semantics is adopted from TorchCodec, which explains the reasoning:
We look at nextPts for a frame, and not its pts or duration. Our abstract player displays frames starting at the pts for that frame until the pts for the next frame. There are two consequences:
We ignore the duration for a frame. A frame is played until the next frame replaces it. This model is robust to durations being 0 or incorrect; our source of truth is the pts for frames.
In order to establish if the start of an interval maps to a particular frame, we need to figure out if it is ordered after the frame's pts, but before the next frame's pts.
By adopting this model, MediaRef ensures that:
- Both
PyAVVideoDecoderandTorchCodecVideoDecoderreturn identical frames for the same query - Frame selection is deterministic and matches real video player behavior
- Edge cases (sparse keyframes, variable frame rates) are handled consistently