-
Notifications
You must be signed in to change notification settings - Fork 2
Description
For the concept caption validation task, we set a threshold theta_k and then find the top_k such that all those locations have a value greater (or smaller) than theta_k. This results in different values of k, as intended.
However the total number of datapoints remains equal. We can easily compute the expected value of top-k score that a random ordering should give (it's a distribution), I think it's something like k / n_points * 100. (For example for a top-10 and n=100, you would expect 10% accuracy of random ordering baseline.
So the random baseline depends on k, which varies across aux columns, and n_points, which is fixed. So to compare the percentage values of two different columns, we need to know their k-dependent baselines.
I would suggest that (additionally) we scale each top-dynamic-k with a simple min/max rescaling to: top_k_norm = (top_k - top_k_baseline) / (100 - top_k_baseline). Then we get a 0-1 score for each that can be compared.