Validate spec_urls based on webref ids#23958
Conversation
I'd say that the good news is that, in most cases, it seems that "something else is going on" ;) Main categories of errors I see:
And then there are actual broken links in BCD, such as https://tc39.es/proposal-temporal/#sec-get-temporal.zoneddatetime.prototype.timezone. There are also "outdated" URLs, such as https://tc39.es/ecma262/multipage/additional-ecmascript-features-for-web-browsers.html#sec-object.prototype.__defineGetter__, which redirects to https://tc39.es/ecma262/multipage/fundamental-objects.html#sec-object.prototype.__defineGetter__ that appears in Webref. There may be a few other error cases to dig into. |
|
Fantastique François!! 🎉 What I see now:
Something I would like for you to take a look:
|
As far as I can tell, all of them are examples of what I called outdated links: they work, but that's because the HTML spec has logic in place to redirect past fragments to their new page. Each time, the content referenced by the link moved to another page of the HTML spec and would better be targeted using the new fragment to avoid a redirect. For example, clicking on https://html.spec.whatwg.org/multipage/browsing-the-web.html#dom-beforeunloadevent-returnvalue makes you load the |
Via mdn/browser-compat-data#23958 (comment) Any specific lookup rule requires shipping web-specs within the package, which is what this update does. To make sure that the list matches the data, the list of specs is rather built from the crawl index. The `lookup()` function may now take an optional object with lookup options as second argument. The following options are supported: - `standing`: only keep results from specs that have the specific standing. Standing should be `"good"`, `"pending"` or `"discontinued"` (most discontinued specs aren't crawled, but some still are for historical reasons). - `version`: only keep results from the nightly or release version of the specs. Must be `"nightly"` or `"release"`. - `series`: accept series URLs. Internally, the URL will only match if the fragment exists in the spec that is known to be the current spec in the series.
tidoust
left a comment
There was a problem hiding this comment.
Testing locally with newly released v1.2.0 and suggested inline update (and re-including CSS specs), I end up with 253 errors. They seem to fall into different buckets:
- URLs that indeed need fixing.
- URLs that use "old" IDs that the spec still has (typically, in an empty span) but that are not captured in Webref. Transitioning away from old IDs would seem a good thing to do from a BCD perspective though
- A few SVG URLs for terms due to the fact that the spec does not yet follow the appropriate conventions
- A few URLs that target value definitions in CSS that are also references to a construct defined in yet another CSS spec. Webref does not record them on purpose to avoid creating duplicate definitions in the xref database. I can understand why BCD might still want to link to them though.
- A few RFC URLs because there are new redirects in place on www.rfc-editor.org since... yesterday, apparently.
|
This pull request has merge conflicts that must be resolved before it can be merged. |
a8f927d to
5261425
Compare
|
With #29809 applied, I'm down to 76 issues. Lots of CSS specs that I still need to look into. Plus some missing dfns/headings from RFCs (mostly rfc9842). |
Quick note that, for this one, browser-specs continues to prefer the |
|
This pull request has merge conflicts that must be resolved before it can be merged. |
1d3fc19 to
9aa29d9
Compare
|
30 more to go.
|
|
OK, I think we're almost at a stage where this PR will pass tests. Final steps:
|
|
This pull request has merge conflicts that must be resolved before it can be merged. |
Draft testing PR for @tidoust :)
Based on w3c/webref#1198 (comment), I wrote a quick test to see if webref ids could be used to (deeply) validate BCD's spec_urls. (that is, we want to check if the fragment ids are valid as well, not just the spec hosts).
It spits out a lot of errors and I would be interested to hear if BCD should be using different fragment ids, or if webref is missing these fragment ids, or if something else is going on. Please see the CI failure for the results.
(This is a draft PR that removes our dependency on web-specs and instead fetches raw webref JSON files, we might not want to fetch the data this way, so consider this PR just a test for now)
Fixes #29065.