reduce complexity of get strongest metadata by Uzlopak · Pull Request #151 · w3c/webappsec-subresource-integrity

Uzlopak · 2025-07-09T14:37:23Z

I want to bring this to the discussion. Instead of storing a struct and calling indexOf over and over, we keep it "simpler".

I know, i know... this is a highly opinionated suggestion.

Also the step 4.1 seems to be redundant. I assume that all implementations will parse the metadata and pass it directly to getStrongestMetadata(), to filter the relevant metadata. So we dont need to actually assert again if the solution does not allow to pass arbitrary data to getStrongestMetadata.

Preview | Diff

mozfreddyb · 2025-07-09T14:48:30Z

Can you expand on the actual issue you see and how the problem should be solved first?

As an aside: I just introduced new Pull Request & issue templates in #149, did you skip them or did they not show up for you?

Uzlopak · 2025-07-09T15:47:36Z

Great I wrote so much and then the battery was empty.

Uzlopak · 2025-07-09T15:59:01Z

Basically strongest is not the strongest metadata but actually the first strongest one. We actually need only the index of the currentAlgorithm. A null check is afaik faster in any programming language compared to checking the length of an array. Also what do you do in C++ where you have to initialize an array with a specific size?

ACtually we dont need to store currentAlgorithm, because we have it already in strongest.algo.

Because we know that getStrongestMetadata is called in doBytesMatchMetadataList, we dont need to assert again in getStrongestMetadata. Also why do we throw in getStrongestMetadata and in parseMetadata we just ignore it.
So we can avoid that assert.

If we determine the index of the algorithm, we can actually check if it is -1 and then throw the error or just skip.

So this is why i suggest these changes.

If it was up to me, I would just mutate the metadataList. like this in js.

/**
 * @param {MetadataList} metadataList
 * @returns {MetadataList} The strongest hash algorithm from the metadata list.
 */
function getStrongestMetadata (metadataList) {
  // 1. Let currentAlgorithmIndex be -1.
  let currentAlgorithmIndex = -1

  // 2. Let newAlgorithmIndex be -1.
  let newAlgorithmIndex = -1
  // 3. Let offset be 0.
  let offset = 0

  // 4. For each item in set:
  for (let i = 0; i < metadataList.length; i++) {
    // 1. Let item be the current item.
    const item = metadataList[i]

    // 2. Let newAlgorithmIndex be the index of item["alg"] in the
    // valid SRI hash algorithm token set.
    newAlgorithmIndex = validSRIHashAlgorithmTokenSet.indexOf(item.alg)

    // 3. If newAlgorithmIndex is -1, then throw an AssertionError
    if (newAlgorithmIndex === -1) {
      assert(false, 'Invalid SRI hash algorithm token')
    // 4. Otherwise, if newAlgorithmIndex is smaller than
    // currentAlgorithmIndex, then continue to the next item.
    } else if (newAlgorithmIndex < currentAlgorithmIndex) {
      continue

    // 5. Otherwise, if newAlgorithmIndex is greater than
    // currentAlgorithmIndex:
    } else if (newAlgorithmIndex > currentAlgorithmIndex) {
      // 1. Set offset to 0.
      offset = 0
      // 2. Set currentAlgorithmIndex to newAlgorithmIndex.
      currentAlgorithmIndex = newAlgorithmIndex
    }
      
    // 6. Assign item to result[offset] if 
    metadataList[offset++] = item
  }

  // 5. Shorten metadataList to the length of offset.
  metadataList.length = offset

  // 6. Return filtered metadataList.
  return metadataList
}

This would pass my tests and I guess, I would have small memory footprint.

If you are using c++ or so, then I guess, having the offset as length. Then maybe it would make sense not to have getStrongestMetadata as a separate function but actually inline it, so that i can use it in the next for loop of doBytesMatchMetadataList as length value, currently step 4. in doBytesMatchMetadataList.

annevk · 2025-07-09T16:03:24Z

I haven't evaluated this change, but I wanted to comment on some of things written here. Generally we don't write algorithms in specifications with specific memory or performance targets in mind. The main goal is clarity. You can read more about this at https://infra.spec.whatwg.org/#algorithm-conformance.

Uzlopak · 2025-07-09T17:07:23Z

@annevk
I added my ideal solution in javascript for this problem to show you that I am not trying to achieve a specific performance or memory target with my PR.

But: infra is basically targetting Javascript. And it states just in the notice box in the bottom of the paragraph, which you linked, that performance should be a field for competition. So the example for eating an orange is instructing us to peel the orange, it does not instruct us how we should peel it. Using a knife? Your fingernails?

And this specification is not giving any interpretation room. We must implement it like it is defined in this spec. E.g. It is defined that the order of elements in the valid SRI hash algorithm token set is meaningful. From a spec implementer perspective I would not expect such a strict instruction. If I am deviating from the spec, because I can be faster, than I cant claim that I follow the spec to the point.

E.g. You could also define getStrongestMetadata like this:

  1.  Determine strongest algorithm of |set|
  2.  Let |result| be the empty set.
  3.  For each |item| in |set|:
      1.  If |item| is not of strongest algorithm, [=iteration/continue=].
      2.  Otherwise, [=set/Append=] |item| to |result|. 
  4.  Return |result|.

How I determine which of the algorithms is the strongest in the set is up to me. E.g. I know that |item|["algo"] can only be sha256, sha386 and sha512. So I need only to compare the fourth character of the algorithm. "2" < "3" < "5". I dont need to check the indexOf of the "valid SRI hash algorithm token set". And if I already have determined that sha512 is there, I dont need to iterate the whole |set|.

You could define Determine strongest algorithm of |set| like this:

Determine strongest algorithm of |set| is a function which takes a medataList as parameter and returns the strongest algorithm. Currently three hash algorithms are considered valid, "sha256", "sha386" and "sha512". If the set contains a medata with algo being "sha512" it returns "sha512". Otherwise if the set contains a metadata with algo being "sha386" is returns "sha386", otherwise "sha256".

We just expect that it behaves like we defined it in the text. How you determine it, is up to you. You want to use the order of the valid SRI hash algorithm token set and call indexOf all the time? Do it. You have a more different approach? Well, why not.

Just my 200 cents. :D

mozfreddyb · 2025-07-10T11:30:45Z

I understand what you're saying, but I am not convinced this is a problem in the standard that needs to be addressed by changing the standard. Am I missing something, @Uzlopak?

mozfreddyb · 2025-07-10T11:31:04Z

@Uzlopak quick question

As an aside: I just introduced new Pull Request & issue templates in #149, did you skip them or did they not show up for you?

Uzlopak · 2025-08-12T22:10:43Z

@mozfreddyb
I saw now the template

Uzlopak added 2 commits July 9, 2025 16:32

reduce complexity of get strongest metadata

10778a6

easier

befb2bb

Uzlopak mentioned this pull request Aug 12, 2025

sri: better algorithms for sri nodejs/undici#4409

Closed

7 tasks

mozfreddyb mentioned this pull request Aug 18, 2025

feat: better bytesMatch algorithm #153

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce complexity of get strongest metadata#151

reduce complexity of get strongest metadata#151
Uzlopak wants to merge 2 commits into
w3c:mainfrom
Uzlopak:improve-get-strongest

Uzlopak commented Jul 9, 2025 •

edited by pr-preview Bot

Loading

Uh oh!

mozfreddyb commented Jul 9, 2025

Uh oh!

Uzlopak commented Jul 9, 2025

Uh oh!

Uzlopak commented Jul 9, 2025

Uh oh!

annevk commented Jul 9, 2025

Uh oh!

Uzlopak commented Jul 9, 2025

Uh oh!

mozfreddyb commented Jul 10, 2025

Uh oh!

mozfreddyb commented Jul 10, 2025

Uh oh!

Uzlopak commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Uzlopak commented Jul 9, 2025 • edited by pr-preview Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mozfreddyb commented Jul 9, 2025

Uh oh!

Uzlopak commented Jul 9, 2025

Uh oh!

Uzlopak commented Jul 9, 2025

Uh oh!

annevk commented Jul 9, 2025

Uh oh!

Uzlopak commented Jul 9, 2025

Uh oh!

mozfreddyb commented Jul 10, 2025

Uh oh!

mozfreddyb commented Jul 10, 2025

Uh oh!

Uzlopak commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uzlopak commented Jul 9, 2025 •

edited by pr-preview Bot

Loading