Add commands for screencasting#1069
Conversation
|
|
||
| 1. Let |media recorder options| be a struct with <code>mimeType</code> |mime type|. | ||
|
|
||
| 1. Let |media recorder| be a new {{MediaRecorder}} with {{MediaRecorder/stream}} |media stream| and |
There was a problem hiding this comment.
This feels somewhat sketchy in that it is invoking a lot of JS machinery without being clear about which realm it's operating it.
There was a problem hiding this comment.
Ok, I've added the logic to create MediaRecorder and Blob in the navigable's realm. But since I couldn't really find a similar case in the BiDi spec, I had to improvise with the wording :) So feel free to correct me
|
Command versus events... From client side perspective it doesn't make sense to just So instead of (pseudo programming language): We can do the following: For a client |
d0b4832 to
b0d2460
Compare
|
@nvborisenko, the current idea for the screencast API is to start with commands to record the video and save it to a file (sorry, it was probably not very clear, I've updated the PR description now). And for this interface, we need start and stop commands. The next step would be to introduce the streaming option. In that case we probably will just pass a different |
Co-authored-by: Maksim Sadym <69349599+sadym-chromium@users.noreply.github.com>
juliandescottes
left a comment
There was a problem hiding this comment.
I can only provide limited feedback. From my point of view it sounds reasonable.
Co-authored-by: Julian Descottes <jdescottes@mozilla.com>
|
For the visibility, in the PR for the |
| } | ||
|
|
||
| browsingContext.MediaStreamOptions = { | ||
| ? video: bool / browsingContext.MediaTrackConstraints .default true; |
There was a problem hiding this comment.
It is hard for strongly typed languages. boolean or object? I propose to avoid it early. Kind of #1039, but in this case it is even harder.
There was a problem hiding this comment.
This is the type structure from https://w3c.github.io/mediacapture-main/#dom-mediastreamconstraints, and we use this spec as a base for these APis. I understand that it doesn't look very elegant, but I'm not sure why it's difficult to implement for strongly typed languages. We have the same type definitions for, e.g., getDisplayMedia in our C++ code. For #1039 I can see that some languages don't have the mechanism to distinguish between null and undefined, but as far as I know, usually the languages have some way to specify the union type.
Is there a reason why we only want to allow recording of a screencast for a top-level browsing context? We do not have this restriction for capturing a screenshot and I can see use cases when consumers only want to record a frame. |
There is no support for recording a video of a frame in |
It is unlikely that Chromium will support nested browsing contexts' screencasts. I would make a decision based on the following:
|
|
All comments should be addressed. The new link for the external hook PR is w3c/mediacapture-viewport#34 (also updated in the PR description). |
jan-ivar
left a comment
There was a problem hiding this comment.
Thanks for working on this!
I have some questions to aid review in w3c/mediacapture-viewport#34.
| 1. Let <var>realm</var> be the result of [=trying=] to [=get a realm from a navigable=] | ||
| with |navigable id| and null. | ||
|
|
||
| 1. Let |promise| be the result of [=trying=] to [=capture a browsing context viewport=] with |
There was a problem hiding this comment.
It's not just the promise but the returned MediaStream object and its embedded MediaStreamTrack objects for video and audio we need to consider. All four are JS objects given no realm to live in.
From handle an incoming message it seems remote end steps all run in parallel:
So there's no JS set up here yet AFAIU. How are you planning on using these objects? From JS or c++?
| 1. Let |realm| be result of [=trying=] to [=get or create a sandbox realm=] given | ||
| |screencast| and |navigable|. |
There was a problem hiding this comment.
Is this the destination realm for the promise, its stream, and tracks? If so, perhaps we can pass it into the capture algorithm earlier?
| 1. Let |media recorder| be a new {{MediaRecorder}} in |realm| | ||
| with {{MediaRecorder/stream}} |media stream| and | ||
| {{MediaRecorderOptions}} |media recorder options|. | ||
|
|
||
| 1. Let |recording| be a new [=screencast recording=] with | ||
| [=screencast recording/mediaRecorder=] |media recorder|, | ||
| [=screencast recording/path=] |path|, | ||
| [=screencast recording/size=] 0. | ||
|
|
||
| 1. Whenever the implementation is going to [=fire a blob event=] named {{MediaRecorder/dataavailable}} | ||
| at |media recorder| with |blob|, run the following steps: |
There was a problem hiding this comment.
Is there a precedent for this approach in some other command?
I'm not super familiar with WebDriver-BiDi and sandbox realms, but it's surprising to me to see JS-facing APIs used in this way in parallel. I'd normally associate this with data races. These JS objects are being created on the navigable being captured? What thread do their constructors run on?
Is this the long-term plan?
- If yes, passing in the realm to the capture algorithm earlier might work
- If no, might adding automation steps to https://w3c.github.io/mediacapture-record be another approach to ultimately try to pass around non-JS objects here?
There was a problem hiding this comment.
Is there a precedent for this approach in some other command?
I think the answer is "not really".
Historically WebDriver has managed to call into algorithms that are written entirely in terms of abstract spec objects. In browser terms this is roughly equivalent to native code (i.e. C++ or similar).
However as we add more modern platform features we're more frequently running into cases where the spec itself assumes that it's being called from executing script, and is written in terms of operations on JS objects. That obviously makes sense if the only entry point is via scripting interfaces defined in WebIDL, but a WebDriver endpoint is not that.
A current idea I have is to create a agent/environment settings object/realm that's defined in the WebDriver spec and is only used for running spec-internal code, somewhat similar to parent process JS in Firefox. I asked about this idea on matrix, but so far no one gave any feedback on whether that was a silly idea…
There was a problem hiding this comment.
So the plan is to have two options for this command:
- save the MediaStream into the file (what is this PR about now);
- stream the MediaStream over the websocket to the client (after we introduce the general streaming support with Add io module for streams #1061).
So I think we still need to get out of me capture a browsing context viewport at least the MediaStream. I guess we could try to resolve the promise on the capture a browsing context viewport side, but I'm not sure if it would make it easier.
…ding in the specification execution environment.
| 1. Let |media recorder| be a new {{MediaRecorder}} in |realm| | ||
| with {{MediaRecorder/stream}} |media stream| and | ||
| {{MediaRecorderOptions}} |media recorder options|. | ||
|
|
||
| 1. Let |recording| be a new [=screencast recording=] with | ||
| [=screencast recording/mediaRecorder=] |media recorder|, | ||
| [=screencast recording/path=] |path|, | ||
| [=screencast recording/size=] 0. | ||
|
|
||
| 1. Whenever the implementation is going to [=fire a blob event=] named {{MediaRecorder/dataavailable}} | ||
| at |media recorder| with |blob|, run the following steps: |
There was a problem hiding this comment.
Is there a precedent for this approach in some other command?
I think the answer is "not really".
Historically WebDriver has managed to call into algorithms that are written entirely in terms of abstract spec objects. In browser terms this is roughly equivalent to native code (i.e. C++ or similar).
However as we add more modern platform features we're more frequently running into cases where the spec itself assumes that it's being called from executing script, and is written in terms of operations on JS objects. That obviously makes sense if the only entry point is via scripting interfaces defined in WebIDL, but a WebDriver endpoint is not that.
A current idea I have is to create a agent/environment settings object/realm that's defined in the WebDriver spec and is only used for running spec-internal code, somewhat similar to parent process JS in Firefox. I asked about this idea on matrix, but so far no one gave any feedback on whether that was a silly idea…
…aRecorder and add the issue.
|
|
||
| Issue: The specification execution environment has to be better defined. | ||
|
|
||
| 1. [=Prepare to run script=] with |environment settings|. |
There was a problem hiding this comment.
Can we specify it without requiring the implementations to run scripts?
There was a problem hiding this comment.
The problem we have is that we want to call existing platform APIs that assume in their specification that they're being called from a WebIDL interface which can only be invoked when running script. So they do things like use promises, whose sematics are undefined outside of the context of JS execution. If we don't use that model we have to reimplement the entire API with different semantics
My plan is to have a "specification agent" which allows us to run script in a way that's invisible to the content process, and is basically only a specification formalism. Implementations will be free to not actually run script as long as the observable behaviour is correct.
| {{MediaRecorderOptions}} |media recorder options|. | ||
|
|
||
| 1. Let |recording| be a new [=screencast recording=] with | ||
| [=screencast recording/mediaRecorder=] |media recorder|, |
There was a problem hiding this comment.
Can we save objects like |media recorder| across navigations? I thought media recorder would be bound to a specific realm?
There was a problem hiding this comment.
@jgraham @lutien do you think we can rewrite this spec proposal without using promises / IDL / requiring a JS execution context? it looks like in the media viewport the relevant portion of the spec is https://w3c.github.io/mediacapture-viewport/#dom-mediadevices-getviewportmedia (step 10.3)? And then it just becomes a MediaStream from IDL? Perhaps we could link to the constraints defined by the spec and eventually make a WebDriver BiDi stream out of it re-using the stream spec algorithms.
The provided media MUST include precisely one video track, which MUST be a live-capture of the [browser](https://www.w3.org/TR/screen-capture/#dfn-browser) [display surface](https://www.w3.org/TR/screen-capture/#dfn-display-surface) of the [relevant global object](https://html.spec.whatwg.org/multipage/webappapis.html#concept-relevant-global)'s [associated Document](https://html.spec.whatwg.org/multipage/nav-history-apis.html#concept-document-window)'s [top-level browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#top-level-browsing-context)'s [viewport](https://html.spec.whatwg.org/multipage/#viewport).
The provided media MUST include at most one audio track, which, if provided, MUST be the combined audio produced by the sum of documents that consist of the [relevant global object](https://html.spec.whatwg.org/multipage/webappapis.html#concept-relevant-global)'s [associated Document](https://html.spec.whatwg.org/multipage/nav-history-apis.html#concept-document-window)'s [top-level browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#top-level-browsing-context)'s [active document](https://html.spec.whatwg.org/multipage/document-sequences.html#nav-document), and all [active documents](https://html.spec.whatwg.org/multipage/document-sequences.html#nav-document) in nested [browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#browsing-context)s of the [relevant global object](https://html.spec.whatwg.org/multipage/webappapis.html#concept-relevant-global)'s [associated Document](https://html.spec.whatwg.org/multipage/nav-history-apis.html#concept-document-window)'s [top-level browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#top-level-browsing-context). This audio track MUST NOT be included if audio was not specified in requestedMediaTypes, or if it was specified as false.
The source of a [MediaStreamTrack](https://www.w3.org/TR/mediacapture-streams/#dom-mediastreamtrack) MUST NOT change.
If the result of the request is "[granted](https://www.w3.org/TR/permissions/#dom-permissionstate-granted)", then for each device that is sourcing the provided media, using a stable and private id for the device, deviceId, set [[devicesLiveMap]][deviceId] to true, if it isn’t already true, and set the [[devicesAccessibleMap]][deviceId] to true, if it isn’t already true.
The user agent MUST NOT store a "[granted](https://www.w3.org/TR/permissions/#dom-permissionstate-granted)" permission entry.
There was a problem hiding this comment.
It looks like MediaStream is supposed to live in some JS execution context as well (see comment here: #1069 (comment)). I've created a draft PR with an attempt to sketch something without requiring JS: #1113. I've used some working from https://w3c.github.io/mediacapture-viewport/, which we could potentially share, but mostly it's just abstract describing, since I couldn't really find the way to share more with other specs.
There was a problem hiding this comment.
Thanks! I guess if we had #1061 (comment) specified without JS execution realms we could just use WebDriver BiDi's own stream instead of MediaStream?
There was a problem hiding this comment.
From what I've understood (tagging @jgraham to correct me/clarify), it still might be an issue for working with the streams, because the stream algorithms are really determined that they are running JS.
There was a problem hiding this comment.
Do we have a conclusion here? I think it would be nice to avoid JS for specifying this and I think maybe we can specify our own stream behavior that also does not rely on running JS?
There was a problem hiding this comment.
So to unblock things, we decided to focus on #1113, but only for saving screencasting to the file for now. And then come back to the streaming option after we're more certain about the generic streaming API. We were planning to talk about it on Wednesday, but we can discuss it now if you have any feedback already.
There was a problem hiding this comment.
That sounds good to me, should I review https://github.com/w3c/webdriver-bidi/pull/1113/changes?
There was a problem hiding this comment.
Sure (I thought it would be good to review internally first, to save you the trouble. But if you have time, I guess there is no point in delaying 🙂 ).
|
The Browser Testing and Tools Working Group just discussed The full IRC log of that discussion<AutomatedTester> Topic: Reminder to review the PR for screencasting API<AutomatedTester> github: https://github.com//pull/1069 <orkon> q+ <AutomatedTester> jdescottes: This is a reminder to review the PR from sasha. Sasha is on PTO at the moment, please raise questions and we will answer to the best of our ability <AutomatedTester> ack next <jgraham> q+ <AutomatedTester> orkon: I've been trying to prototype this and there web content API has been quite hard to use and there is nothing about stitching things so would need to be invited <AutomatedTester> ... but building it off web apis isn't doable <AutomatedTester> ... are there restrictions to what format needs to be output or is that up to the implementation <AutomatedTester> ack next <AutomatedTester> jgraham: I don't know what web rtc has but that format is likely the best <AutomatedTester> ... the other question has... <AutomatedTester> ... with web apis there is an idea that you are calling it from a IDL context which assumes javascript etc <AutomatedTester> ... we have a similar issue with the streams API <AutomatedTester> ... but when I suggested using something not using JS on streams the editor told me that wouldn't work as it would need a way of resolving promises <AutomatedTester> ... we would likely need to have a "specification realm" that is not a content realm and is something else <AutomatedTester> ... and it spec "fiction". It's like running script but not exactly like running a script <gives an example> <AutomatedTester> ... in Firefox this is easier and we know that doesn't mean much as in other browsers it could be a C++ type call <AutomatedTester> ... that's my best idea but if we don't do that then we are in a difficult position <orkon> q+ <AutomatedTester> ... it then becomes a very handwavy way doing "capture this and then output that with web rtc" <AutomatedTester> ... and it becomes unclear what realms this should run in <AutomatedTester> q? <AutomatedTester> ack next <AutomatedTester> orkon: thanks for the explaination. THis seems to work well for streams but I am not sure that works for screencast <jgraham> q+ <AutomatedTester> ... but I might not have fully read those spec. The main thing is that we don't depend of the side effects of javascript execution other than the algorithm itself <AutomatedTester> ack next <AutomatedTester> jgraham: I need to look to see if this would work. parts of it works well but as a whole maybe not. THere might be a possiblity is that this is very complicated. We also have a way in the future that we can stream the browser not the file what we are currently thinging about <AutomatedTester> q? |
Co-authored-by: François Daoust <fd@tidoust.net>
cb1f040 to
7902a5a
Compare
| } | ||
|
|
||
| browsingContext.MediaTrackConstraints = { | ||
| ? width: js-uint, |
There was a problem hiding this comment.
how are width / height parameters used by the underlying spec? Is it something to scale the output to or is it mix/max constraint for the stream selection? I do not seem to find mentions of it (same for timeslice).
There was a problem hiding this comment.
width and height are supposed to be used for scaling. You can see more info here: https://www.w3.org/TR/screen-capture/#constrainable-properties. timeslice is used as a parameter for MediaRecorder.start(timeslice) (spec: https://w3c.github.io/mediacapture-record/#dom-mediarecorder-start)
| context: browsingContext.BrowsingContext, | ||
| mimeType: text, | ||
| ? streamOptions: browsingContext.MediaStreamOptions, | ||
| ? timeslice: js-uint .default 1000 |
There was a problem hiding this comment.
to confirm: the timeslice only controls the size of the chunks but does not affect the framerate? Do we need to expose timeslice in this case? (we just write to file right and with the stream API the client would probably determine the size of the chunks).
There was a problem hiding this comment.
Yes, that's right. It's only required to define when to write to a file (if we want to write while recording), and I agree we don't really have to expose it.
There was a problem hiding this comment.
Since we do not have the dedicated event for this, I think clients would have difficulty to extract chunks of the right timeslice duration from the file. I would suggest we remove timeslice and leave it to be implementation -defined and instead we add a framerate that would be very useful for clients?
There was a problem hiding this comment.
Sounds good. I've updated the PR to make timeslice an implementation-defined and added frameRate to the video constraints.
|
|
||
| browsingContext.StartScreencastParameters = { | ||
| context: browsingContext.BrowsingContext, | ||
| mimeType: text, |
There was a problem hiding this comment.
wdyt to make mimeType optional to mean implementation-defined default mime type?
There was a problem hiding this comment.
I think that should be fine. Updated the PR ✅
e327cc0 to
b8241ec
Compare
Closes #636.
Introduce the interface to record a video of the screen of the provided navigable and save it to a file.
The PR for the hook in the Screen Capture specification: w3c/mediacapture-screen-share#328.The PR for the hook in the Viewport Capture specification: w3c/mediacapture-viewport#34.
Preview | Diff