Skip to content

reduce complexity of get strongest metadata#151

Open
Uzlopak wants to merge 2 commits into
w3c:mainfrom
Uzlopak:improve-get-strongest
Open

reduce complexity of get strongest metadata#151
Uzlopak wants to merge 2 commits into
w3c:mainfrom
Uzlopak:improve-get-strongest

Conversation

@Uzlopak
Copy link
Copy Markdown
Contributor

@Uzlopak Uzlopak commented Jul 9, 2025

I want to bring this to the discussion. Instead of storing a struct and calling indexOf over and over, we keep it "simpler".

I know, i know... this is a highly opinionated suggestion.

Also the step 4.1 seems to be redundant. I assume that all implementations will parse the metadata and pass it directly to getStrongestMetadata(), to filter the relevant metadata. So we dont need to actually assert again if the solution does not allow to pass arbitrary data to getStrongestMetadata.


Preview | Diff

@mozfreddyb
Copy link
Copy Markdown
Collaborator

Can you expand on the actual issue you see and how the problem should be solved first?

As an aside: I just introduced new Pull Request & issue templates in #149, did you skip them or did they not show up for you?

@Uzlopak
Copy link
Copy Markdown
Contributor Author

Uzlopak commented Jul 9, 2025

Great I wrote so much and then the battery was empty.

@Uzlopak
Copy link
Copy Markdown
Contributor Author

Uzlopak commented Jul 9, 2025

Basically strongest is not the strongest metadata but actually the first strongest one. We actually need only the index of the currentAlgorithm. A null check is afaik faster in any programming language compared to checking the length of an array. Also what do you do in C++ where you have to initialize an array with a specific size?

ACtually we dont need to store currentAlgorithm, because we have it already in strongest.algo.

Because we know that getStrongestMetadata is called in doBytesMatchMetadataList, we dont need to assert again in getStrongestMetadata. Also why do we throw in getStrongestMetadata and in parseMetadata we just ignore it.
So we can avoid that assert.

If we determine the index of the algorithm, we can actually check if it is -1 and then throw the error or just skip.

So this is why i suggest these changes.

If it was up to me, I would just mutate the metadataList. like this in js.

/**
 * @param {MetadataList} metadataList
 * @returns {MetadataList} The strongest hash algorithm from the metadata list.
 */
function getStrongestMetadata (metadataList) {
  // 1. Let currentAlgorithmIndex be -1.
  let currentAlgorithmIndex = -1

  // 2. Let newAlgorithmIndex be -1.
  let newAlgorithmIndex = -1
  // 3. Let offset be 0.
  let offset = 0

  // 4. For each item in set:
  for (let i = 0; i < metadataList.length; i++) {
    // 1. Let item be the current item.
    const item = metadataList[i]

    // 2. Let newAlgorithmIndex be the index of item["alg"] in the
    // valid SRI hash algorithm token set.
    newAlgorithmIndex = validSRIHashAlgorithmTokenSet.indexOf(item.alg)

    // 3. If newAlgorithmIndex is -1, then throw an AssertionError
    if (newAlgorithmIndex === -1) {
      assert(false, 'Invalid SRI hash algorithm token')
    // 4. Otherwise, if newAlgorithmIndex is smaller than
    // currentAlgorithmIndex, then continue to the next item.
    } else if (newAlgorithmIndex < currentAlgorithmIndex) {
      continue

    // 5. Otherwise, if newAlgorithmIndex is greater than
    // currentAlgorithmIndex:
    } else if (newAlgorithmIndex > currentAlgorithmIndex) {
      // 1. Set offset to 0.
      offset = 0
      // 2. Set currentAlgorithmIndex to newAlgorithmIndex.
      currentAlgorithmIndex = newAlgorithmIndex
    }
      
    // 6. Assign item to result[offset] if 
    metadataList[offset++] = item
  }

  // 5. Shorten metadataList to the length of offset.
  metadataList.length = offset

  // 6. Return filtered metadataList.
  return metadataList
}

This would pass my tests and I guess, I would have small memory footprint.

If you are using c++ or so, then I guess, having the offset as length. Then maybe it would make sense not to have getStrongestMetadata as a separate function but actually inline it, so that i can use it in the next for loop of doBytesMatchMetadataList as length value, currently step 4. in doBytesMatchMetadataList.

@annevk
Copy link
Copy Markdown
Member

annevk commented Jul 9, 2025

I haven't evaluated this change, but I wanted to comment on some of things written here. Generally we don't write algorithms in specifications with specific memory or performance targets in mind. The main goal is clarity. You can read more about this at https://infra.spec.whatwg.org/#algorithm-conformance.

@Uzlopak
Copy link
Copy Markdown
Contributor Author

Uzlopak commented Jul 9, 2025

@annevk
I added my ideal solution in javascript for this problem to show you that I am not trying to achieve a specific performance or memory target with my PR.

But: infra is basically targetting Javascript. And it states just in the notice box in the bottom of the paragraph, which you linked, that performance should be a field for competition. So the example for eating an orange is instructing us to peel the orange, it does not instruct us how we should peel it. Using a knife? Your fingernails?

And this specification is not giving any interpretation room. We must implement it like it is defined in this spec. E.g. It is defined that the order of elements in the valid SRI hash algorithm token set is meaningful. From a spec implementer perspective I would not expect such a strict instruction. If I am deviating from the spec, because I can be faster, than I cant claim that I follow the spec to the point.

E.g. You could also define getStrongestMetadata like this:

  1.  Determine strongest algorithm of |set|
  2.  Let |result| be the empty set.
  3.  For each |item| in |set|:
      1.  If |item| is not of strongest algorithm, [=iteration/continue=].
      2.  Otherwise, [=set/Append=] |item| to |result|. 
  4.  Return |result|.

How I determine which of the algorithms is the strongest in the set is up to me. E.g. I know that |item|["algo"] can only be sha256, sha386 and sha512. So I need only to compare the fourth character of the algorithm. "2" < "3" < "5". I dont need to check the indexOf of the "valid SRI hash algorithm token set". And if I already have determined that sha512 is there, I dont need to iterate the whole |set|.

You could define Determine strongest algorithm of |set| like this:

Determine strongest algorithm of |set| is a function which takes a medataList as parameter and returns the strongest algorithm. Currently three hash algorithms are considered valid, "sha256", "sha386" and "sha512". If the set contains a medata with algo being "sha512" it returns "sha512". Otherwise if the set contains a metadata with algo being "sha386" is returns "sha386", otherwise "sha256".

We just expect that it behaves like we defined it in the text. How you determine it, is up to you. You want to use the order of the valid SRI hash algorithm token set and call indexOf all the time? Do it. You have a more different approach? Well, why not.

Just my 200 cents. :D

@mozfreddyb
Copy link
Copy Markdown
Collaborator

I understand what you're saying, but I am not convinced this is a problem in the standard that needs to be addressed by changing the standard. Am I missing something, @Uzlopak?

@mozfreddyb
Copy link
Copy Markdown
Collaborator

@Uzlopak quick question

As an aside: I just introduced new Pull Request & issue templates in #149, did you skip them or did they not show up for you?

@Uzlopak
Copy link
Copy Markdown
Contributor Author

Uzlopak commented Aug 12, 2025

@mozfreddyb
I saw now the template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants