Skip to content

alxvar/podfact

Repository files navigation

PodFact

Agentic tool for podcast fact-checking. PodFact transcribes podcast audio, extracts verifiable claims, researches them via web search, and produces fact-check verdicts.

How it works

PodFact runs a sequential pipeline:

  1. Transcribe — Audio is transcribed using Mistral Voxtral Mini with word-level timestamps. Long files are automatically chunked at silence points.
  2. Segment — The transcript is grouped into topically coherent segments using an LLM agent.
  3. Extract claims — Verifiable factual claims are extracted from each segment (filtering out opinions, predictions, and common knowledge).
  4. Research — Each claim is researched via web search (Tavily) using an LLM agent that generates targeted queries.
  5. Verify — An LLM agent evaluates the evidence and assigns a verdict: true, false, misleading, or unverified, with a confidence score and explanation. If more evidence is needed, additional research rounds run automatically (up to 3).

All pipeline stages are persisted to SQLite and are idempotent — a failed job can be re-run and will resume from where it left off.

Setup

Requires Python 3.13+ and uv.

uv sync

Create a .env file with your API keys:

MISTRAL_API_KEY=your-key
TAVILY_API_KEY=your-key

Usage

# Submit a podcast for fact-checking
uv run podfact submit "https://example.com/podcast.mp3" --wait

# Check job status
uv run podfact status <job-id>

# View results
uv run podfact results <job-id>

Use --debug-dir <path> with submit to write agent trace files for inspection.

Development

# Run tests
uv run pytest

# Run with verbose output
uv run pytest -v

About

Agentic tool for podcast fact-checking. PodFact transcribes podcast audio, extracts verifiable claims, researches them via web search, and produces fact-check verdicts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages