reduce complexity of get strongest metadata#151
Conversation
|
Can you expand on the actual issue you see and how the problem should be solved first? As an aside: I just introduced new Pull Request & issue templates in #149, did you skip them or did they not show up for you? |
|
Great I wrote so much and then the battery was empty. |
|
Basically strongest is not the strongest metadata but actually the first strongest one. We actually need only the index of the currentAlgorithm. A null check is afaik faster in any programming language compared to checking the length of an array. Also what do you do in C++ where you have to initialize an array with a specific size? ACtually we dont need to store currentAlgorithm, because we have it already in strongest.algo. Because we know that getStrongestMetadata is called in doBytesMatchMetadataList, we dont need to assert again in getStrongestMetadata. Also why do we throw in getStrongestMetadata and in parseMetadata we just ignore it. If we determine the index of the algorithm, we can actually check if it is -1 and then throw the error or just skip. So this is why i suggest these changes. If it was up to me, I would just mutate the metadataList. like this in js. /**
* @param {MetadataList} metadataList
* @returns {MetadataList} The strongest hash algorithm from the metadata list.
*/
function getStrongestMetadata (metadataList) {
// 1. Let currentAlgorithmIndex be -1.
let currentAlgorithmIndex = -1
// 2. Let newAlgorithmIndex be -1.
let newAlgorithmIndex = -1
// 3. Let offset be 0.
let offset = 0
// 4. For each item in set:
for (let i = 0; i < metadataList.length; i++) {
// 1. Let item be the current item.
const item = metadataList[i]
// 2. Let newAlgorithmIndex be the index of item["alg"] in the
// valid SRI hash algorithm token set.
newAlgorithmIndex = validSRIHashAlgorithmTokenSet.indexOf(item.alg)
// 3. If newAlgorithmIndex is -1, then throw an AssertionError
if (newAlgorithmIndex === -1) {
assert(false, 'Invalid SRI hash algorithm token')
// 4. Otherwise, if newAlgorithmIndex is smaller than
// currentAlgorithmIndex, then continue to the next item.
} else if (newAlgorithmIndex < currentAlgorithmIndex) {
continue
// 5. Otherwise, if newAlgorithmIndex is greater than
// currentAlgorithmIndex:
} else if (newAlgorithmIndex > currentAlgorithmIndex) {
// 1. Set offset to 0.
offset = 0
// 2. Set currentAlgorithmIndex to newAlgorithmIndex.
currentAlgorithmIndex = newAlgorithmIndex
}
// 6. Assign item to result[offset] if
metadataList[offset++] = item
}
// 5. Shorten metadataList to the length of offset.
metadataList.length = offset
// 6. Return filtered metadataList.
return metadataList
}This would pass my tests and I guess, I would have small memory footprint. If you are using c++ or so, then I guess, having the offset as length. Then maybe it would make sense not to have getStrongestMetadata as a separate function but actually inline it, so that i can use it in the next for loop of doBytesMatchMetadataList as length value, currently step 4. in doBytesMatchMetadataList. |
|
I haven't evaluated this change, but I wanted to comment on some of things written here. Generally we don't write algorithms in specifications with specific memory or performance targets in mind. The main goal is clarity. You can read more about this at https://infra.spec.whatwg.org/#algorithm-conformance. |
|
@annevk But: infra is basically targetting Javascript. And it states just in the notice box in the bottom of the paragraph, which you linked, that performance should be a field for competition. So the example for eating an orange is instructing us to peel the orange, it does not instruct us how we should peel it. Using a knife? Your fingernails? And this specification is not giving any interpretation room. We must implement it like it is defined in this spec. E.g. It is defined that the order of elements in the valid SRI hash algorithm token set is meaningful. From a spec implementer perspective I would not expect such a strict instruction. If I am deviating from the spec, because I can be faster, than I cant claim that I follow the spec to the point. E.g. You could also define getStrongestMetadata like this: How I determine which of the algorithms is the strongest in the set is up to me. E.g. I know that |item|["algo"] can only be sha256, sha386 and sha512. So I need only to compare the fourth character of the algorithm. "2" < "3" < "5". I dont need to check the indexOf of the "valid SRI hash algorithm token set". And if I already have determined that sha512 is there, I dont need to iterate the whole |set|. You could define
We just expect that it behaves like we defined it in the text. How you determine it, is up to you. You want to use the order of the valid SRI hash algorithm token set and call indexOf all the time? Do it. You have a more different approach? Well, why not. Just my 200 cents. :D |
|
I understand what you're saying, but I am not convinced this is a problem in the standard that needs to be addressed by changing the standard. Am I missing something, @Uzlopak? |
|
@mozfreddyb |
I want to bring this to the discussion. Instead of storing a struct and calling indexOf over and over, we keep it "simpler".
I know, i know... this is a highly opinionated suggestion.
Also the step 4.1 seems to be redundant. I assume that all implementations will parse the metadata and pass it directly to
getStrongestMetadata(), to filter the relevant metadata. So we dont need to actually assert again if the solution does not allow to pass arbitrary data to getStrongestMetadata.Preview | Diff