New Algorithm: DBSCAN clustering

I like the clustering approach, but I don't like that k-means makes you say up front how many clusters there's going to be (i'm discovering too, it's a new day, i don't know yet, right??). I want to experiment with other clustering algorithms that make different assumptions and trade-offs about the data.

[DBSCAN](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) seems interesting because it finds clusters based on density. So you have to say what the expected density should be, that threshold that defines a cluster. 

I expect that there will be a lot of tweaking to make it work for a certain embedding model, but after you get it to work it'll be a lot more dynamic and robust.

_Note:_ DBSCAN doesn't assign all posts to a cluster, so you might not be able to use the `toot_clusters.html` on it's own. You'll probably need an offshoot of it. Feel free to skip this part on the first pass of the PR, we might even be able to get someone else to do this part.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Algorithm: DBSCAN clustering #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

New Algorithm: DBSCAN clustering #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions