Clip Searcher is a tool that helps journalists and researchers discover hidden connections between people and organizations in the news. Give it a topic, and it will find recent articles, extract the names of people and organizations mentioned, and draw an interactive map showing how they're all connected.
One of the most tedious parts of investigative reporting is the "clip search"—combing through dozens of news articles to understand who the key players are in a story and how they relate to each other. Journalists typically track this information in spreadsheets, but those flat lists don't capture the web of relationships that often matter most.
This tool automates that process. It reads news articles and figures out which people and organizations appear together in the same sentences. When two names show up together repeatedly, that's usually a sign they're connected in some meaningful way—maybe one works for the other, or they're opponents in a policy debate, or they're frequently quoted together as experts.
The result is an interactive diagram where you can see, at a glance, the network of relationships in any topic you're researching.
The tool is organized into five Jupyter notebooks, each handling a different step of the process. You can run them individually to understand each piece, or use the final orchestration notebook to run everything at once.
This notebook finds and downloads news articles about your topic. It uses DuckDuckGo's news search to find relevant URLs, then downloads the full text of each article. The articles are saved locally as .txt files.
This is where the tool reads through each article and extract the people and organizations mentioned using the SpaCy entity extraction tool.
The notebook also performs "coreference resolution," meaning it will identify when different words refer to the same person. For example, if an article says "President Biden announced the policy. He said it would help families," the tool understands that "He" refers to "President Biden." This helps capture connections that would otherwise be missed.
Once we know which people and organizations appear in each article, this notebook figures out the connections between them. The rule is simple: if two entities (or their coreferents) appear in the same sentence, they're probably related somehow.
The notebook creates "edges"—lines connecting related entities—and keeps track of what type of connection each one represents (person-to-person, person-to-organization, etc.).
This notebook takes all those connections and draws them as an interactive network diagram. People are shown in green, organizations in blue, and you can click and drag nodes around to explore the relationships. Hovering over a node shows you more details about that person or organization.
This notebook runs the entire process from start to finish with a single click. Enter your search topic and the number of articles you want to analyze, and it handles the rest. It also includes an optional web interface (using a tool called Gradio) that makes it easy to run searches without touching any code.
This project uses uv for dependency management. To set it up:
# Install dependencies
uv sync
# Install required language models (run after each uv sync)
uv pip install "en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0.tar.gz"
uv pip install "https://github.com/richardpaulhudson/coreferee/raw/master/models/coreferee_model_en.zip"├── analysis/ # Jupyter notebooks for each pipeline step
├── data/raw/ # Downloaded articles and generated outputs
├── Earlier Versions/ # Original implementation files
├── pyproject.toml # Project dependencies
└── README.md # This file
The original code for this project was written by Ari Sen. It was subsequently refactored by Claude Opus 4.5 (Anthropic) in January 2026 to use modern Python packaging with uv, reorganize the code into modular Jupyter notebooks and replace browser-based scraping with the simpler DuckDuckGo search API.
This README was also mostly written by Claude Opus 4.5.