-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
As of 12/28/20, I came across three similarly titled articles, the difference is that some titles have the source included in it, ex:
In the new year, take a new look at immigration – starting with DACAIn the new year, take a new look at immigration – starting with DACA | Charlotte ObserverIn the new year, take a new look at immigration – starting with DACA | Raleigh News & Observer
In this function of the article pipeline, I implement a function to check article titles 15 days prior and ahead for the same title.
Some solutions:
- use Python's fuzzy wuzzy package
- Use a regex filter in Django ORM, possibly stripping between characters like
|and-
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels