Skip to content

Detect AV1 keyframes logic#239

Open
olegokunevych wants to merge 1 commit intoelixir-webrtc:masterfrom
olegokunevych:av1_keyframe_detect
Open

Detect AV1 keyframes logic#239
olegokunevych wants to merge 1 commit intoelixir-webrtc:masterfrom
olegokunevych:av1_keyframe_detect

Conversation

@olegokunevych
Copy link

In our media server utilizing ex_webrtc, we require a feature to transmit AV1 RTP packets to clients with help of WebRTC/WHEP. Consequently, we need a way to identify AV1 keyframes, that’s why we submitted this pull request.
This solution worked for us, that's why we'd like to ask to look over this PR and suggest any necessary modifications.

Copy link
Member

@sgfn sgfn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay and the radio silence, and thanks for the contribution.

We can't accept this PR in its current form, but we're open to discussion on whether our proposed changes work in your case

Comment on lines +16 to +26
According to the [AV1 RTP spec](https://aomediacodec.github.io/av1-rtp-spec/v1.0.0.html) §4.4,
the RTP aggregation header's N bit marks the start of a new coded video sequence (CVS).
A CVS must contain a sequence header and the first frame must be a KEY_FRAME as defined
by ISO/IEC 23094-1 §6.8:
- `show_existing_frame` = 0 (a new frame, not a reference reuse)
- `frame_type` = KEY_FRAME (0)
- `show_frame` = 1 (displayed frame)

Some encoders repeat sequence headers in non-key frames, therefore the
presence of a sequence header alone is not considered sufficient for keyframe
detection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please leave only the first sentence in @doc and change the rest to regular # comments
  2. I'm not sure how ISO/IEC 23094-1 is relevant here -- it defines the EVC standard and has no section 6.8. Did you mean AV1 spec, sec. 6.8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, will commit corresponding change

Comment on lines +46 to +47
(av1_payload.z == 0 and check_keyframe_in_payload(av1_payload.payload))

Copy link
Member

@sgfn sgfn Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with this approach. The AV1 RTP spec allows each OBU to be sent in a separate RTP packet with W=1. In the simplest case, the bitstream SEQ_HDR FRAME can be packetized into [ SEQ_HDR ] N=1 [ FRAME ] N=0.

AV1.keyframe? will return true for both packets. If we're looking for the right place to switch simulcast layers, both of these packets will be considered equally valid. If the first packet never got delivered due to packet loss, we're going to switch layers without changing params.

For H264, we decided that the occasional freeze which will trigger a PLI feedback (and, eventually, a new keyframe) is preferable to the green pixelated glitchy mess the end user will be seeing in the alternate case. You can refer to lib/ex_webrtc/rtp/h264.ex for more info and further reading (source code of existing SFU implementations).

I'd opt for a simple N=1 check, even though 1) it will falsely flag SEQ_HDR repeats as keyframes, and 2) it's not technically the same thing as checking for the start of a CVS, or even a keyframe?. The optimistic approach was found to work well in our previous experiments.

Copy link
Author

@olegokunevych olegokunevych Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgfn Thank you for detailed response, here are few points that we found regarding the keyframe detection:

  1. N=1 alone is insufficient — tested with AOM AV1 encoder in OBS that never set N=1. This makes N=1-only completely broken, not just suboptimal. The function is used beyond simulcast (initial stream setup, first-keyframe detection), so missing keyframes is unacceptable.
  2. Double-detection is a narrow edge case — it requires: (a) the encoder splits SEQ_HDR and FRAME into separate packets, AND (b) the SEQ_HDR packet is lost while the FRAME packet arrives. This is a subset of normal packet loss.
  3. Even in the double-detection scenario, the outcome is acceptable — if the SEQ_HDR packet was lost, the stream is already degraded regardless of whether we flag the FRAME packet as a keyframe. The simulcast switching layer should handle incomplete keyframe data gracefully (the same way it handles any packet loss).
  4. The H264 analogy doesn't fully apply — H264's SPS is reliably present as a distinct NAL unit type. AV1's N bit is an RTP-layer signal that depends on the payloader implementation, which varies. The payload-level check is more robust because it inspects the actual bitstream content.
  5. Chromium's own depacketizer inspects OBU content — the codebase references Chromium implementations (leb128.ex, payloader.ex), which also perform content inspection rather than relying solely on aggregation header flags.

Let me know if those points make sense, we are opened for the following discussion :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants