Skip to content

etorhub/dossier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Dossier

License: AGPL-3.0 Python 3.12+ Flask PostgreSQL 18 Docker HTMX Ollama

A news reader that puts quality and clarity first.

Most news interfaces are designed for engagement, not comprehension. They are cluttered, dense, and built to keep you scrolling rather than to leave you informed. Raw feeds mix sources of varying quality, introduce translation artefacts, and reproduce the noise of the original publication without filtering any of it.

This project places an LLM between raw news and the reader, merging sources on the same event into a single well-written article and presenting it in a clean, distraction-free interface.


What It Does

  • Fetches news from RSS feeds and open publishers on a schedule
  • Merges multiple sources on the same event and rewrites the result via LLM: correct spelling and grammar, no mixed languages, configurable tone
  • Groups news by topic sections (politics, society, culture, etc.) — click a section to filter the feed
  • Presents content in a clean, accessible interface with large fonts, high contrast, and large touch targets
  • Supports multiple users with independent profiles and preferences
  • Provides text-to-speech when the browser supports it
  • Shows a daily digest — content is ready when you open the app, no waiting

Who It's For

Anyone who wants to read the news without the noise. Users configure their own feed; a family member or caregiver can optionally set up an account on behalf of someone else.

Neither the end user nor anyone acting on their behalf ever touches the codebase. They access the web app only.


Quick Start

Prerequisites

  • Docker and docker-compose

Setup

The Ollama service is optional in Compose (profile local-llm). Clustering, embeddings, and rewrites need a reachable Ollama. .env.example sets COMPOSE_PROFILES=local-llm so a copied .env starts Ollama in Docker and runs ollama-init once per up (pulls qwen2.5:7b, qwen2.5:3b, nomic-embed-text into the ollama_data volume). The worker waits for that init to finish before running. Model pull happens at container start, not during docker build.

With Ollama in Docker (typical):

git clone https://github.com/etorhub/dossier.git
cd dossier
cp .env.example .env

# GPU (default ollama image expects NVIDIA)
docker compose up --build -d

# No NVIDIA GPU: add the CPU override for the ollama service
# docker compose -f docker-compose.yml -f docker-compose.cpu.yml up --build -d

Without a .env, pass the profile explicitly: docker compose --profile local-llm up --build -d.

Ollama on the host instead (e.g. Windows app or ollama serve in WSL): remove or comment out COMPOSE_PROFILES=local-llm in .env, pull the same model tags on the host (ollama pull …), and set OLLAMA_HOST for the worker (for example http://host.docker.internal:11434 on Docker Desktop / WSL). See .env.example.

Wait for services to be healthy (web at http://localhost:5000, worker running, ollama healthy if you use the profile). Then populate with news:

./scripts/fetch-news.sh

The script fetches feeds, extracts full text, clusters articles, and rewrites them. When it finishes, the app has real content.

Ops dashboard

Operators can monitor the pipeline at the ops dashboard: http://localhost:5001. It shows job runs, feed health, source availability, articles, stories, and user activity. No authentication by default (restrict access at the network level).

docker compose up -d ops

Admin account

A default admin is ready to use: admin@admin.com / admin. Log in to access the app.

To grant admin privileges to another user (for future use):

docker compose exec web flask make-admin your@email.com

See docs/ADMIN_DASHBOARD.md for ops dashboard documentation.

Manual pipeline control

The scheduler runs jobs on a schedule. To run them manually:

Command Where Description
flask seed-sources Web Load sources from config/sources.yaml (auto-run on startup)
python -m app.worker_cli fetch-feeds Worker Fetch all due RSS feeds
python -m app.worker_cli enrich-articles Worker Extract full article content for pending articles
python -m app.worker_cli cluster-articles Worker Embed and cluster today's articles
python -m app.worker_cli rewrite-articles Worker Rewrite articles for all user profiles
python -m app.worker_cli run-pipeline Worker Full pipeline once (seed → fetch → enrich → cluster → rewrite)

With Docker:

docker compose exec worker python -m app.worker_cli run-pipeline

Or use ./scripts/fetch-news.sh for the same result.

Running locally (without Docker)

# Requires Python 3.12+ and a running PostgreSQL instance
pip install -r requirements.txt
flask run

Tech Stack

Layer Technology
Backend Python 3.12+ / Flask
Database PostgreSQL 18
LLM Ollama (local, no API key)
Frontend HTML + CSS + HTMX (no JavaScript frameworks)
Scheduling APScheduler (worker container)
Packaging Docker + docker-compose

See docs/TECH_STACK.md for full details.


Documentation

Document Description
CONTRIBUTING.md How to contribute — setup, code standards, commits, PRs
CODE_OF_CONDUCT.md Community standards and enforcement
SECURITY.md Security policy and vulnerability reporting
CLAUDE.md AI assistant context (Claude Code) — coding rules, architecture constraints, design principles
.cursor/rules/ Cursor IDE rules — same context via project-context.mdc (always apply) plus architecture, accessibility, LLM, news-source-discovery
docs/TECH_STACK.md Tech stack, project structure, dependencies, Docker setup
docs/ARCHITECTURE.md System architecture, database schema, component map, request lifecycle
docs/ADMIN_DASHBOARD.md Ops dashboard: pipeline monitoring, job history, source availability, user activity, incidents
docs/I18N.md Internationalization: locale selection, translation catalogs, updating strings
docs/MVP_PLAN.md Phased MVP plan with tasks and success criteria
docs/news_source_discovery_agent.md News source discovery pipeline specification

Accessibility

Accessibility is a constraint, not a feature. Good defaults benefit all users:

  • Minimum 48x48px touch targets on all interactive elements
  • Base font size 22px, line height 1.6
  • WCAG AA contrast minimum (4.5:1), AAA target (7:1) in high-contrast mode
  • One article at a time — no infinite scroll
  • Text-to-speech via Web Speech API (hidden when not supported)
  • Semantic HTML throughout
  • No hover-only interactions, no timed content

License

AGPL-3.0. See LICENSE for details.

The project is a reading aid, not a republisher. Every article links to and credits the original source. Copyright remains with the publisher.

About

An AI tailored news agregattor.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages