Publish list of known fragment identifiers #1198

foolip · 2024-03-27T17:08:55Z

This would be helpful for web-platform-dx/web-features#84, to be able to create a spec URL validator that checks if a URL like https://w3c.github.io/webrtc-pc/#dom-datachannel-binarytype is a good spec URL.

A similar problem is solved in Bikeshed by downloading the data directly from GitHub:
https://github.com/speced/bikeshed/blob/584813e6380533a19c6656594c810bf974854e68/bikeshed/update/updateCrossRefs.py#L236

For something that should go into a CI check, that's not good though, since the build could break at any time.

tidoust · 2024-03-28T10:50:11Z

This would be helpful for web-platform-dx/web-features#84, to be able to create a spec URL validator that checks if a URL like https://w3c.github.io/webrtc-pc/#dom-datachannel-binarytype is a good spec URL.

The list of fragment identifiers appears in the ids extracts. For example, the URL you suggest as example appears in the WebRTC id extract.

For something that should go into a CI check, that's not good though, since the build could break at any time.

We could create an NPM package but I'm wondering how that would solve "could break at any time". Could you clarify?

If we go ahead with a package, I wonder about the frequency of releases and about guarantees. We don't do any data curation on fragment identifiers (and if we could avoid doing additional curation, I think we wouldn't mind ;)). We could automate the publication of the package but the list of fragment identifiers changes frequently. Should we publish a package one or more times per day? Or should we restrict publications to, say, once per week?

foolip · 2024-03-28T12:22:08Z

These are good questions. The important part to avoid sudden breakage of CI is that the IDs are pinned in some way. An NPM package makes that easy and allows depending on Dependabot. But it can also be done by pointing to a specific webref commit, perhaps using it as a submodule.

The release cadence is a good question. I guess roughly weekly would be OK. And I agree that it would be fantastic to not have to review changes to identifiers at all or make many guarantees, just expose the same stuff that Bikeshed uses.

This isn't urgent at all BTW, it's a nice-to-have.

tidoust · 2024-03-28T13:12:31Z

It suddenly occurs to me that looking at the full list of fragment identifiers is probably not a good idea in any case: the "pinning" mechanism you describe is also the sort of stability that specs need when they reference some other spec. This is what led to exported definitions. Ideally, features would only link to exported definitions... and likely section headings. In any case, links to internal definitions and other IDs should be discouraged.

The data's already in Webref too, in dfns and headings extracts.

We have tools in place that detect broken links (w3c/strudy) from Webref data and report them automatically. We could also detect changes earlier on in Webref. In the end, we could perhaps create a package that contains stable fragment identifiers (exported definitions and section ids), and use some semver logic to report breaking changes:

patch increment: new fragment identifiers added
minor increment: some fragment identifiers disappeared
major increment: major data structure change
(or major increment for any fragment identifier change)

foolip · 2024-03-28T16:32:56Z

Good point about not all IDs being good feature links, I didn't even consider other linkable things like examples and whatnot.

I strongly suspect that doing this will reveal lots of things that aren't exported but should be, and that it will be a bit of a slog.

But I like the approach!

Elchi3 · 2024-07-18T14:50:17Z

I was just looking if something like this exists! :)

My use case: In mdn/browser-compat-data, we'd like to remove status.standards_track (mdn/browser-compat-data#1531) and only refer to spec_urls. I think in order for this to happen it would be good if there was a BCD linter that checked if all of our spec_urls are actually valid, including their fragment id.

tidoust · 2024-07-25T17:36:31Z

My use case: In mdn/browser-compat-data, we'd like to remove status.standards_track (mdn/browser-compat-data#1531) and only refer to spec_urls. I think in order for this to happen it would be good if there was a BCD linter that checked if all of our spec_urls are actually valid, including their fragment id.

Are you looking for an actual NPM package? Or are you more looking for a way to validate URLs with fragments, which could live in BCD?

I'm asking the question because, per the discussion above, the data that is needed is already in Webref. You may validate URLs with fragments in one of two ways:

If you're not too worried about stability of the IDs and just want to know whether the ID exists, you could look at the IDs extracts
If you'd like to enforce some sort of stability, you might want to restrict to terms that specs actually export. For that, you could look at the dfns extracts (possibly filtering on definitions that have an "access": "public" property, and at the headings extracts for links to sections. The dfns and ids extracts contain additional information about the fragment, which might perhaps prove useful later on in MDN as well, e.g., to label the links?

An NPM package would provide some pinning ability, but a side effect of that pinning is that it also means the data will often be somewhat outdated: the dfns data gets updated every 6 hours but it does not make a lot of sense to publish an NPM package that frequently. The other NPM packages for Webref also contain somewhat outdated data, of course, but the content is the result of data curation and manual review, performed once in a while.

For the problem at hand, there's no good reason to choose a particular commit to pin the data. Perhaps what we need is an NPM package that only contains a validateUrl function that retrieves the latest data from Webref by default and can take a commit ID as parameter to retrieve Webref data at that particular commit if you need the function to return a stable result?

foolip mentioned this issue Mar 27, 2024

Validate any valid spec URL web-platform-dx/web-features#84

Open

3 tasks

Elchi3 mentioned this issue Jul 26, 2024

Validate spec_urls based on webref ids mdn/browser-compat-data#23958

Draft

tidoust mentioned this issue Sep 18, 2024

Consistent guidelines spec links, especially in CSS web-platform-dx/web-features#1785

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish list of known fragment identifiers #1198

Publish list of known fragment identifiers #1198

foolip commented Mar 27, 2024

tidoust commented Mar 28, 2024

foolip commented Mar 28, 2024

tidoust commented Mar 28, 2024

foolip commented Mar 28, 2024

Elchi3 commented Jul 18, 2024

tidoust commented Jul 25, 2024

Publish list of known fragment identifiers #1198

Publish list of known fragment identifiers #1198

Comments

foolip commented Mar 27, 2024

tidoust commented Mar 28, 2024

foolip commented Mar 28, 2024

tidoust commented Mar 28, 2024

foolip commented Mar 28, 2024

Elchi3 commented Jul 18, 2024

tidoust commented Jul 25, 2024