do duplicate classifications break aggregations?

https://developer.zooniverse.org/projects/aggregation/en/latest/point_aggregation.html says the  agglomerative clustering method determines when to stop combining clusters at least in part by refusing to merge 2 clusters that have markings by the same user, under the principle that (mostly) the same user won't mark the same feature twice. 

That would be fine if our duplication rate was 0%, but it's not. Immediately after Panoptes was released the duplication rate was much higher than normal, and even now that those bugs have been squashed it's non-zero for reasons that we can't control. And if a user gets the same subject twice and classifies the same way, every single feature they have marked will be split into 2 clusters even if the agglomeration method works perfectly. I think this could explain some of the weird behavior in #144 and #165 - the presence of duplicates could mean the agglomeration stops before it should have, so even if there's only 1 duplication it might lead to detecting >>2 clusters per actual feature.

Do the aggregations throw out duplicate classifications? If not, we should agglomerate based on not including 2 marks from the same classification_id, not from the same user_name or even user_name+created_at pair.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do duplicate classifications break aggregations? #169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

do duplicate classifications break aggregations? #169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions