For example, here are a couple of cases from Snapshot Serengeti: Computer Vision's recent beta
Subject 1925326 (formerly ASG0018whg in Serengeti)
The crowd opinions are:

(collectively there are 55 annotations)
but this aggregates to

which still has 17 clusters!
Compare this to a better example, subject 1714356 (formerly ASG001c9b4):

(57 clusters)
which aggregates to

which has 2 clusters, exactly corresponding to the number of animals in the picture - Perfect! Very usable and exactly what we want from the output.
In the case of this project, the science team want to use the aggregate answer to determine what images to crop out and use for training the animal detection computer vision system.
Answers like the first one can't be used (and also, can't easily be pulled out and separated from the "good" answers even if we did find some way to handle it manually)
I realise that overlapping animals/regions is a hard problem, but right now the code seems to give up and it doesn't even mark that that image has been handled differently.
Also, there are some cases where the aggregation seems to do a poor job and lose important data.
Here is another example:
Subject 1925354 (formerly ASG001e0wq)
The crowd opinions look like this:

It's quite clear that there almost everyone in the crowd agrees on the approximate location of the three zebras, and their three different directions.
The result from the aggregation is, in light of this, not at all useful, and missing important detail on the presence of three animals and their position/direction. Note that the left and right animals are lost, and only a middle animal is shown (wrongly):

In summary, I think overall we need to
- Improve aggregation answers for overlapping entities
- Where a good answer is not easy/possible, add some tagging or metadata to that image's results to show that it is suboptimal/needs further attention.
The second is most important of all, I think.
BTW for a sense of the spread of this problem, 23 images of the 66 from the beta are subject to these issues -> about a third. We definitely need to address this if aggregation is to be useful.
For example, here are a couple of cases from Snapshot Serengeti: Computer Vision's recent beta
Subject 1925326 (formerly ASG0018whg in Serengeti)
The crowd opinions are:

(collectively there are 55 annotations)
but this aggregates to

which still has 17 clusters!
Compare this to a better example, subject 1714356 (formerly ASG001c9b4):


(57 clusters)
which aggregates to
which has 2 clusters, exactly corresponding to the number of animals in the picture - Perfect! Very usable and exactly what we want from the output.
In the case of this project, the science team want to use the aggregate answer to determine what images to crop out and use for training the animal detection computer vision system.
Answers like the first one can't be used (and also, can't easily be pulled out and separated from the "good" answers even if we did find some way to handle it manually)
I realise that overlapping animals/regions is a hard problem, but right now the code seems to give up and it doesn't even mark that that image has been handled differently.
Also, there are some cases where the aggregation seems to do a poor job and lose important data.
Here is another example:
Subject 1925354 (formerly ASG001e0wq)


The crowd opinions look like this:
It's quite clear that there almost everyone in the crowd agrees on the approximate location of the three zebras, and their three different directions.
The result from the aggregation is, in light of this, not at all useful, and missing important detail on the presence of three animals and their position/direction. Note that the left and right animals are lost, and only a middle animal is shown (wrongly):
In summary, I think overall we need to
The second is most important of all, I think.
BTW for a sense of the spread of this problem, 23 images of the 66 from the beta are subject to these issues -> about a third. We definitely need to address this if aggregation is to be useful.