Aggregation doesn't give usable/useful results when marked entities overlap

For example, here are a couple of cases from Snapshot Serengeti: Computer Vision's recent beta

[Subject 1925326](https://www.zooniverse.org/projects/alexbfree/snapshot-serengeti-computer-vision/talk/subjects/1925326) (formerly [ASG0018whg](https://talk.snapshotserengeti.org/#/subjects/ASG0018whg) in Serengeti)

The crowd opinions are:
![asg0018whg](https://cloud.githubusercontent.com/assets/1473244/14850266/c55fc494-0c6f-11e6-819b-0db28d457753.jpg)
(collectively there are 55 annotations)

but this aggregates to 
![asg0018whg](https://cloud.githubusercontent.com/assets/1473244/14850271/cf9d71a4-0c6f-11e6-9595-17fc4ce2d68d.jpg)
which still has 17 clusters!

Compare this to a better example, subject [1714356](https://www.zooniverse.org/projects/alexbfree/snapshot-serengeti-computer-vision/talk/subjects/1714356) (formerly [ASG001c9b4](https://talk.snapshotserengeti.org/#/subjects/ASG001c9b4)):
![asg001c9b4](https://cloud.githubusercontent.com/assets/1473244/14850332/38290116-0c70-11e6-89a0-2f466a51280c.jpg)
(57 clusters)
which aggregates to 
![asg001c9b4](https://cloud.githubusercontent.com/assets/1473244/14850363/6fb503f0-0c70-11e6-84a9-4c57d938cba1.jpg)
which has 2 clusters, exactly corresponding to the number of animals in the picture - Perfect! Very usable and exactly what we want from the output.

In the case of this project, the science team want to use the aggregate answer to determine what images to crop out and use for training the animal detection computer vision system.
Answers like the first one can't be used (and also, can't easily be pulled out and separated from the "good" answers even if we did find some way to handle it manually)

I realise that overlapping animals/regions is a hard problem, but right now the code seems to give up and it doesn't even mark that that image has been handled differently.

Also, there are some cases where the aggregation seems to do a poor job and lose important data.
Here is another example:

[Subject 1925354](https://www.zooniverse.org/projects/alexbfree/snapshot-serengeti-computer-vision/talk/subjects/1925354) (formerly [ASG001e0wq](https://talk.snapshotserengeti.org/#/subjects/ASG001e0wq))
The crowd opinions look like this:
![asg001e0wq](https://cloud.githubusercontent.com/assets/1473244/14850505/7b49fa12-0c71-11e6-8a5a-ddf9589de718.jpg)
It's quite clear that there almost everyone in the crowd agrees on the approximate location of the three zebras, and their three different directions.
The result from the aggregation is, in light of this, not at all useful, and missing important detail on the presence of three animals and their position/direction. Note that the left and right animals are lost, and only a middle animal is shown (wrongly):
![asg001e0wq](https://cloud.githubusercontent.com/assets/1473244/14850531/a774bbc2-0c71-11e6-9f64-6bb61eef5ae8.jpg)

In summary, I think overall we need to
1. Improve aggregation answers for overlapping entities
2. Where a good answer is not easy/possible, add some tagging or metadata to that image's results to show that it is suboptimal/needs further attention.

The second is most important of all, I think.

BTW for a sense of the spread of this problem, 23 images of the 66 from the beta are subject to these issues -> about a third. We definitely need to address this if aggregation is to be useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation doesn't give usable/useful results when marked entities overlap #144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aggregation doesn't give usable/useful results when marked entities overlap #144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions