Skip to content
This repository was archived by the owner on Aug 27, 2019. It is now read-only.
This repository was archived by the owner on Aug 27, 2019. It is now read-only.

Aggregation doesn't give usable/useful results when marked entities overlap #144

@alexbfree

Description

@alexbfree

For example, here are a couple of cases from Snapshot Serengeti: Computer Vision's recent beta

Subject 1925326 (formerly ASG0018whg in Serengeti)

The crowd opinions are:
asg0018whg
(collectively there are 55 annotations)

but this aggregates to
asg0018whg
which still has 17 clusters!

Compare this to a better example, subject 1714356 (formerly ASG001c9b4):
asg001c9b4
(57 clusters)
which aggregates to
asg001c9b4
which has 2 clusters, exactly corresponding to the number of animals in the picture - Perfect! Very usable and exactly what we want from the output.

In the case of this project, the science team want to use the aggregate answer to determine what images to crop out and use for training the animal detection computer vision system.
Answers like the first one can't be used (and also, can't easily be pulled out and separated from the "good" answers even if we did find some way to handle it manually)

I realise that overlapping animals/regions is a hard problem, but right now the code seems to give up and it doesn't even mark that that image has been handled differently.

Also, there are some cases where the aggregation seems to do a poor job and lose important data.
Here is another example:

Subject 1925354 (formerly ASG001e0wq)
The crowd opinions look like this:
asg001e0wq
It's quite clear that there almost everyone in the crowd agrees on the approximate location of the three zebras, and their three different directions.
The result from the aggregation is, in light of this, not at all useful, and missing important detail on the presence of three animals and their position/direction. Note that the left and right animals are lost, and only a middle animal is shown (wrongly):
asg001e0wq

In summary, I think overall we need to

  1. Improve aggregation answers for overlapping entities
  2. Where a good answer is not easy/possible, add some tagging or metadata to that image's results to show that it is suboptimal/needs further attention.

The second is most important of all, I think.

BTW for a sense of the spread of this problem, 23 images of the 66 from the beta are subject to these issues -> about a third. We definitely need to address this if aggregation is to be useful.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions