Skip to content
Draft
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
4c58bff
Add commands for screencasting
lutien Feb 6, 2026
6bdfd0e
Await on the promise returned from "capture a browser tab".
lutien Feb 9, 2026
55bcbcc
Create MediaRecorder and Blob in navigable's window realm.
lutien Feb 9, 2026
b0d2460
Remove "facingMode" and "backgroundBlur" from "streamOptnios" since t…
lutien Feb 9, 2026
2da09c5
Update index.bs
lutien Feb 13, 2026
28d45b4
Limit "browsingContext.MediaTrackConstraints".
lutien Feb 13, 2026
00a0094
Update index.bs
lutien Feb 20, 2026
f8c150c
Use "capture a browsing context viewport" algorithm instead "capture …
lutien Feb 24, 2026
d60f8bc
Add "size" to the "browsingContext.stopScreencast" return result.
lutien Feb 24, 2026
e779925
Write to the file while recording the screencast
lutien Feb 24, 2026
b2d6c41
Stop screencast media recorders during a session cleanup
lutien Feb 24, 2026
47af858
Use linkable definition in "screencast recordings map".
lutien Feb 24, 2026
96a9c40
Fix "screencast recording" struct.
lutien Feb 25, 2026
324013f
Fix remaining "screencast recording" items linking.
lutien Feb 25, 2026
7f881a8
Add "no such screencast" error to CDDL
lutien Mar 3, 2026
5b5859b
Add a command parameter for MediaRecorder timeslice.
lutien Mar 4, 2026
8322c7d
Return an error from "browsingContext.stopScreencast" when writing to…
lutien Mar 4, 2026
916d8bb
Update index.bs
lutien Mar 9, 2026
02a4a03
Use map instead of struct for media recorder options.
lutien Mar 9, 2026
7e39e5a
Use ECMA script Call when calling MediaRecorder methods.
lutien Mar 9, 2026
1e3db0d
Update the description from the "browsingContext.startScreencast" com…
lutien Mar 9, 2026
b2e1488
Remove "Add event listener".
lutien Mar 10, 2026
0f0be03
React to promise instead of awaiting.
lutien Mar 10, 2026
eae5706
Create sandbox for MediaRecorder.
lutien Mar 10, 2026
ed19972
Run "Prepare to run script" to initialize the JS for screencast recor…
lutien Mar 23, 2026
cb35b0c
Use "the specification execution environment" realm to work with Medi…
lutien Mar 23, 2026
3d63dce
Remove unneeded spec imports
lutien Apr 14, 2026
7902a5a
Add back "is type supported" import
lutien Apr 14, 2026
d500592
Use the correct syntax for an autolink to the "abstract-op"-type defi…
lutien Apr 15, 2026
21ab432
Don't allow video parameter being disabled.
lutien Apr 23, 2026
a8a90a0
Don't validate mimeType
lutien Apr 23, 2026
d77a6d6
Do not expose "timeslice" as a parameter
lutien May 4, 2026
0518ea8
Add "frameRate" parameter to "browsingContext.startScreencast".
lutien May 4, 2026
b8241ec
Make "mimeType" parameter optional
lutien May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 222 additions & 1 deletion index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ spec: SELECTORS4; urlPrefix: https://drafts.csswg.org/selectors-4/
spec: WEB-IDL; urlPrefix: https://webidl.spec.whatwg.org/
type: dfn
text: DOMException; url: #idl-DOMException
text: SyntaxError; url:#syntaxerror
text: SyntaxError; url: #syntaxerror
spec: UNICODE; urlPrefix: https://www.unicode.org/versions/Unicode15.0.0/
type: dfn
text: Unicode Default Case Conversion algorithm; url: ch03.pdf#G34944
Expand All @@ -319,6 +319,12 @@ spec: ACCNAME; urlPrefix:https://www.w3.org/TR/accname-1.2
spec: CORE-AAM; urlPrefix:https://www.w3.org/TR/core-aam-1.2
type: dfn
text: computed role; url: /#roleMappingComputedRole
spec: MEDIACAPTURE-RECORD; urlPrefix: https://w3c.github.io/mediacapture-record/
type: dfn
text: fire a blob event; url: #fire-a-blob-event
spec: MEDIACAPTURE-VIEWPORT; urlPrefix: https://w3c.github.io/mediacapture-viewport/
type: dfn
text: capture a browsing context viewport; url: #dfn-capture-a-browsing-context-viewport
spec: MEDIAQUERIES4; urlPrefix: https://drafts.csswg.org/mediaqueries-4/
type: dfn
text: resolution media feature; url: #resolution
Expand Down Expand Up @@ -671,6 +677,9 @@ with the following additional codes:
<dt><dfn for=errors export>no such request</dfn>
<dd>Tried to continue an unknown [=/request=].

<dt><dfn for=errors export>no such screencast</dfn>
<dd>Tried to stop an unknown screencast recording.

<dt><dfn for=errors export>no such script</dfn>
<dd>Tried to remove an unknown [=preload script=].

Expand Down Expand Up @@ -716,6 +725,7 @@ ErrorCode = "invalid argument" /
"no such network data" /
"no such node" /
"no such request" /
"no such screencast" /
"no such script" /
"no such storage partition" /
"no such user context" /
Expand Down Expand Up @@ -1693,6 +1703,12 @@ To <dfn>cleanup the session</dfn> given |session|:
1. For each |collected data| in [=collected network data=], [=remove collector from data=]
with |collected data| and |collector id|.

1. For each |screencast recording| in |session|'s [=screencast recordings map=]:

1. Let |media recorder| be |screencast recording|["<code>mediaRecorder</code>"].
Comment thread
lutien marked this conversation as resolved.

1. [=Call=]({{MediaRecorder/stop}}, |media recorder|).

1. If [=active sessions=] is [=list/empty=], [=cleanup remote end state=].

1. Perform any implementation-specific cleanup steps.
Expand Down Expand Up @@ -3146,6 +3162,8 @@ BrowsingContextCommand = (
browsingContext.Print //
browsingContext.Reload //
browsingContext.SetViewport //
browsingContext.StartScreencast //
browsingContext.StopScreencast //
browsingContext.TraverseHistory
)
</pre>
Expand All @@ -3165,6 +3183,8 @@ BrowsingContextResult = (
browsingContext.PrintResult /
browsingContext.ReloadResult /
browsingContext.SetViewportResult /
browsingContext.StartScreencastResult /
browsingContext.StopScreencastResult /
browsingContext.TraverseHistoryResult
)

Expand Down Expand Up @@ -3234,6 +3254,13 @@ weak map between [=user context|user contexts=] and [=unhandled prompt behavior
A [=remote end=] has a <dfn>scripting enabled overrides map</dfn> which is a weak
map between [=/navigables=] or [=user context|user contexts=] and boolean.

A [=BiDi session=] has a <dfn>screencast recordings map</dfn> which is a [=/map=] in
which the keys are [[!RFC9562|UUID]]s, and the values are <dfn>screencast recording</dfn>, which is a [=struct=] with
an [=struct/item=] named <dfn for="screencast recording">mediaRecorder</dfn>, which is a {{MediaRecorder}},
an [=struct/item=] named <dfn for="screencast recording">path</dfn>, which is a string,
an [=struct/item=] named <dfn for="screencast recording">size</dfn>, which is a number,
an [=struct/item=] named <dfn for="screencast recording">writeError</dfn>, which is a string.

### Types ### {#module-browsingcontext-types}

#### The browsingContext.BrowsingContext Type #### {#type-browsingContext-Browsingcontext}
Expand Down Expand Up @@ -5086,6 +5113,200 @@ The [=remote end steps=] with |command parameters| are:

</div>

#### The browsingContext.startScreencast Command #### {#command-browsingContext-startScreencast}

The <dfn export for=commands>browsingContext.startScreencast</dfn> command
starts the screencast of a given navigable and writes it to a file.

Note: The [=remote end=] creates and writes the screencast file, but does not delete it.
Cleaning up the file is left to the [=local end=]. In some configurations this might not be
possible — for example, if the [=remote end=] has read/write access to the filesystem but
the [=local end=] has only read-only access.

<dl>
<dt>Command Type</dt>
<dd>
<pre class="cddl" data-cddl-module="remote-cddl">
browsingContext.StartScreencast = (
method: "browsingContext.startScreencast",
params: browsingContext.StartScreencastParameters
)

browsingContext.StartScreencastParameters = {
context: browsingContext.BrowsingContext,
? mimeType: text,
? streamOptions: browsingContext.MediaStreamOptions
}

browsingContext.MediaStreamOptions = {
? video: true / browsingContext.MediaTrackConstraints;
? audio: bool .default false;
Comment thread
lutien marked this conversation as resolved.
}

browsingContext.MediaTrackConstraints = {
Comment thread
jgraham marked this conversation as resolved.
? width: js-uint,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are width / height parameters used by the underlying spec? Is it something to scale the output to or is it mix/max constraint for the stream selection? I do not seem to find mentions of it (same for timeslice).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

width and height are supposed to be used for scaling. You can see more info here: https://www.w3.org/TR/screen-capture/#constrainable-properties. timeslice is used as a parameter for MediaRecorder.start(timeslice) (spec: https://w3c.github.io/mediacapture-record/#dom-mediarecorder-start)

? height: js-uint,
? frameRate: js-uint,
}

</pre>
</dd>
<dt>Return Type</dt>
<dd>
<pre class="cddl" data-cddl-module="local-cddl">
browsingContext.StartScreencastResult = {
screencast: browsingContext.Screencast,
path: text
}

browsingContext.Screencast = text
</pre>
</dd>
</dl>

<div algorithm="remote end steps for browsingContext.startScreencast">
The [=remote end steps=] with |command parameters| are:

1. Let |navigable id| be |command parameters|["<code>context</code>"].

1. Let |navigable| be the result of [=trying=] to [=get a navigable=]
with |navigable id|.

1. If |navigable| is not a [=/top-level traversable=], return [=error=] with
[=error code=] [=invalid argument=].

1. If |command parameters| [=map/contains=] the <code>mimeType</code> field:

1. Let |mime type| be |command parameters|["<code>mimeType</code>"].

1. Otherwise, set |mime type| to the implementation-defined default format.

1. If the implementation is unable to record a screencast of |navigable| for any
reason then return [=error=] with [=error code=] [=unsupported operation=].

1. Let |environment settings| be the [=environment settings object=] representing
Comment thread
lutien marked this conversation as resolved.
a specification execution environment.

Issue: The specification execution environment has to be better defined.

1. [=Prepare to run script=] with |environment settings|.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify it without requiring the implementations to run scripts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem we have is that we want to call existing platform APIs that assume in their specification that they're being called from a WebIDL interface which can only be invoked when running script. So they do things like use promises, whose sematics are undefined outside of the context of JS execution. If we don't use that model we have to reimplement the entire API with different semantics

My plan is to have a "specification agent" which allows us to run script in a way that's invisible to the content process, and is basically only a specification formalism. Implementations will be free to not actually run script as long as the observable behaviour is correct.


1. Let |promise| be the result of [=trying=] to [=capture a browsing context viewport=] with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels a bit sketchy to me; the "capture a browsing context viewport" algorithm creates a promise without any JS running and without specifying which realm the promise is in.

We are also passing in an Infra Map as options, whereas the algorithm expects a WebIDL dictionary`. I think this is OK because WebIDL suggests that dictonary instances are Maps, but I'm not 100% sure.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we can fix something in the BiDi spec, but I guess I could update w3c/mediacapture-viewport#34 to create a promise in the realm of the passed browsing context. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just the promise but the returned MediaStream object and its embedded MediaStreamTrack objects for video and audio we need to consider. All four are JS objects given no realm to live in.

From handle an incoming message it seems remote end steps all run in parallel:

Image

So there's no JS set up here yet AFAIU. How are you planning on using these objects? From JS or c++?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I've added "Prepare to run script" to set up JS in the specification environment (something like the Firefox parent process).

|navigable| and |command parameters|["<code>streamOptions</code>"].

1. [=React=] to |promise|:

1. If |promise| was rejected, return [=error=] with [=error code=] [=unknown error=].

1. If |promise| was fulfilled with value |media stream|, then:

1. Let |path| be an implementation-defined file path where the recording will be stored.

1. Let |media recorder options| be a new [=/map=] with the <code>mimeType</code> field
set to |mime type|.

1. Let |screencast| be the string representation of a [[!RFC9562|UUID]].

1. Let |realm| be |environment settings|' [=realm execution context=]'s Realm component.

1. Let |media recorder| be a new {{MediaRecorder}} in |realm|
with {{MediaRecorder/stream}} |media stream| and
{{MediaRecorderOptions}} |media recorder options|.

1. Let |recording| be a new [=screencast recording=] with
[=screencast recording/mediaRecorder=] |media recorder|,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save objects like |media recorder| across navigations? I thought media recorder would be bound to a specific realm?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgraham @lutien do you think we can rewrite this spec proposal without using promises / IDL / requiring a JS execution context? it looks like in the media viewport the relevant portion of the spec is https://w3c.github.io/mediacapture-viewport/#dom-mediadevices-getviewportmedia (step 10.3)? And then it just becomes a MediaStream from IDL? Perhaps we could link to the constraints defined by the spec and eventually make a WebDriver BiDi stream out of it re-using the stream spec algorithms.

The provided media MUST include precisely one video track, which MUST be a live-capture of the [browser](https://www.w3.org/TR/screen-capture/#dfn-browser) [display surface](https://www.w3.org/TR/screen-capture/#dfn-display-surface) of the [relevant global object](https://html.spec.whatwg.org/multipage/webappapis.html#concept-relevant-global)'s [associated Document](https://html.spec.whatwg.org/multipage/nav-history-apis.html#concept-document-window)'s [top-level browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#top-level-browsing-context)'s [viewport](https://html.spec.whatwg.org/multipage/#viewport).

The provided media MUST include at most one audio track, which, if provided, MUST be the combined audio produced by the sum of documents that consist of the [relevant global object](https://html.spec.whatwg.org/multipage/webappapis.html#concept-relevant-global)'s [associated Document](https://html.spec.whatwg.org/multipage/nav-history-apis.html#concept-document-window)'s [top-level browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#top-level-browsing-context)'s [active document](https://html.spec.whatwg.org/multipage/document-sequences.html#nav-document), and all [active documents](https://html.spec.whatwg.org/multipage/document-sequences.html#nav-document) in nested [browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#browsing-context)s of the [relevant global object](https://html.spec.whatwg.org/multipage/webappapis.html#concept-relevant-global)'s [associated Document](https://html.spec.whatwg.org/multipage/nav-history-apis.html#concept-document-window)'s [top-level browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#top-level-browsing-context). This audio track MUST NOT be included if audio was not specified in requestedMediaTypes, or if it was specified as false.

The source of a [MediaStreamTrack](https://www.w3.org/TR/mediacapture-streams/#dom-mediastreamtrack) MUST NOT change.

If the result of the request is "[granted](https://www.w3.org/TR/permissions/#dom-permissionstate-granted)", then for each device that is sourcing the provided media, using a stable and private id for the device, deviceId, set [[devicesLiveMap]][deviceId] to true, if it isn’t already true, and set the [[devicesAccessibleMap]][deviceId] to true, if it isn’t already true.

The user agent MUST NOT store a "[granted](https://www.w3.org/TR/permissions/#dom-permissionstate-granted)" permission entry.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like MediaStream is supposed to live in some JS execution context as well (see comment here: #1069 (comment)). I've created a draft PR with an attempt to sketch something without requiring JS: #1113. I've used some working from https://w3c.github.io/mediacapture-viewport/, which we could potentially share, but mostly it's just abstract describing, since I couldn't really find the way to share more with other specs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I guess if we had #1061 (comment) specified without JS execution realms we could just use WebDriver BiDi's own stream instead of MediaStream?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've understood (tagging @jgraham to correct me/clarify), it still might be an issue for working with the streams, because the stream algorithms are really determined that they are running JS.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a conclusion here? I think it would be nice to avoid JS for specifying this and I think maybe we can specify our own stream behavior that also does not rely on running JS?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to unblock things, we decided to focus on #1113, but only for saving screencasting to the file for now. And then come back to the streaming option after we're more certain about the generic streaming API. We were planning to talk about it on Wednesday, but we can discuss it now if you have any feedback already.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, should I review https://github.com/w3c/webdriver-bidi/pull/1113/changes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure (I thought it would be good to review internally first, to save you the trouble. But if you have time, I guess there is no point in delaying 🙂 ).

[=screencast recording/path=] |path|,
[=screencast recording/size=] 0.

1. Whenever the implementation is going to [=fire a blob event=] named {{MediaRecorder/dataavailable}}
at |media recorder| with |blob|, run the following steps:
Comment on lines +5212 to +5222
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a precedent for this approach in some other command?

I'm not super familiar with WebDriver-BiDi and sandbox realms, but it's surprising to me to see JS-facing APIs used in this way in parallel. I'd normally associate this with data races. These JS objects are being created on the navigable being captured? What thread do their constructors run on?

Is this the long-term plan?

  • If yes, passing in the realm to the capture algorithm earlier might work
  • If no, might adding automation steps to https://w3c.github.io/mediacapture-record be another approach to ultimately try to pass around non-JS objects here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a precedent for this approach in some other command?

I think the answer is "not really".

Historically WebDriver has managed to call into algorithms that are written entirely in terms of abstract spec objects. In browser terms this is roughly equivalent to native code (i.e. C++ or similar).

However as we add more modern platform features we're more frequently running into cases where the spec itself assumes that it's being called from executing script, and is written in terms of operations on JS objects. That obviously makes sense if the only entry point is via scripting interfaces defined in WebIDL, but a WebDriver endpoint is not that.

A current idea I have is to create a agent/environment settings object/realm that's defined in the WebDriver spec and is only used for running spec-internal code, somewhat similar to parent process JS in Firefox. I asked about this idea on matrix, but so far no one gave any feedback on whether that was a silly idea…

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plan is to have two options for this command:

  • save the MediaStream into the file (what is this PR about now);
  • stream the MediaStream over the websocket to the client (after we introduce the general streaming support with Add io module for streams #1061).

So I think we still need to get out of me capture a browsing context viewport at least the MediaStream. I guess we could try to resolve the promise on the capture a browsing context viewport side, but I'm not sure if it would make it easier.


1. Let |bytes| be |blob|'s underlying [=byte sequence=].

1. Append |bytes| to the file at |path|. If this fails:

1. Set |recording|'s [=screencast recording/writeError=] to an
implementation-defined string describing the write failure.

1. [=Call=]({{MediaRecorder/stop}}, |media recorder|).

1. Otherwise, set |recording|'s [=screencast recording/size=] to |recording|'s [=screencast recording/size=] + |bytes|'s length.

1. Let |timeslice| be an implementation-defined value.

1. [=Call=]({{MediaRecorder/start}}, |media recorder|, |timeslice|).

1. [=Clean up after running script=] with |environment settings|.

1. Set [=screencast recordings map=][|screencast|] to |recording|.

1. Return a new [=/map=] matching the <code>browsingContext.StartScreencastResult</code>
with the <code>screencast</code> field set to |screencast| and <code>path</code> field
set to |path|.

</div>

#### The browsingContext.stopScreencast Command #### {#command-browsingContext-stopScreencast}

The <dfn export for=commands>browsingContext.stopScreencast</dfn> command
stops the screencast.

<dl>
<dt>Command Type</dt>
<dd>
<pre class="cddl" data-cddl-module="remote-cddl">
browsingContext.StopScreencast = (
method: "browsingContext.stopScreencast",
params: browsingContext.StopScreencastParameters
)

browsingContext.StopScreencastParameters = {
screencast: browsingContext.Screencast
}
</pre>
</dd>
<dt>Return Type</dt>
<dd>
<pre class="cddl" data-cddl-module="local-cddl">
browsingContext.StopScreencastResult = {
path: text,
size: js-uint
}
</pre>
</dd>
</dl>

<div algorithm="remote end steps for browsingContext.stopScreencast">
The [=remote end steps=] with |command parameters| are:

1. Let |screencast| be the value of the "<code>screencast</code>" field in |command
parameters|.

1. If [=screencast recordings map=] does not <a for=map>contain</a> |screencast|, return
[=error=] with [=error code=] [=no such screencast=].

1. Let |screencast recording| be [=screencast recordings map=][|screencast|].

1. If |screencast recording| contains [=screencast recording/writeError=], return
[=error=] with [=error code=] [=unknown error=].

1. Let |media recorder| be |screencast recording|'s [=screencast recording/mediaRecorder=].

1. Let |path| be |screencast recording|'s [=screencast recording/path=].

1. [=Call=]({{MediaRecorder/stop}}, |media recorder|).

1. Wait until |media recorder|'s {{MediaRecorder/stop}} [=fire an event|event fires=].

1. Let |size| be |screencast recording|'s [=screencast recording/size=].

1. [=map/Remove=] |screencast| from [=screencast recordings map=].

1. Return a new [=/map=] matching the <code>browsingContext.StopScreencastResult</code>
Comment thread
lutien marked this conversation as resolved.
with the <code>path</code> field set to |path| and <code>size</code> field set to |size|.

</div>

#### The browsingContext.traverseHistory Command #### {#command-browsingContext-traverseHistory}

The <dfn export for=commands>browsingContext.traverseHistory</dfn> command
Expand Down
Loading