NewsGraph

An agentic newsletter pipeline built with LangGraph and local LLMs. Give it a topic and a list of URLs — it crawls the pages, summarises the content, and writes a newsletter. No external API keys needed.

How it works

START
  ↓
Crawl Node      — fetches each URL via the MCP web-crawl tool,
                  prunes noise, and returns semantic text chunks
  ↓
Research Node   — grounds an LLM summary in the crawled content
  ↓
Writer Node     — turns the summary into a newsletter draft
  ↓
END

The crawling runs as a separate MCP server (newsgraph-mcp) so it can be reused by other agents or inspected independently. The agent (newsgraph-agent) connects to it over SSE.

Project layout

newsgraph/
├── newsgraph-mcp/        MCP server — web crawling + markdown chunking
│   └── src/
│       ├── server.py
│       ├── tools/
│       │   └── web_crawl_tool.py
│       └── utils/
│           └── markdown_cleaner.py
├── newsgraph-agent/      LangGraph agent
│   └── src/
│       ├── main.py       entry point — set your topic and URLs here
│       ├── graph.py
│       ├── nodes.py
│       ├── state.py
│       └── llm.py
└── start.sh              starts both services

Requirements

uv — Python package manager
Ollama running locally with phi3:mini pulled

ollama pull phi3:mini

Setup

Each sub-project manages its own virtual environment. Run these once:

# MCP server
cd newsgraph-mcp
uv sync
uv run crawl4ai-setup   # downloads Playwright/Chromium, needed once

# Agent
cd ../newsgraph-agent
uv sync

Running

From the project root:

./start.sh

This starts the MCP server in the background, waits for it to be ready, runs the agent, then shuts everything down when done. Server logs go to mcp-server.log.

To change the topic or URLs, edit newsgraph-agent/src/main.py:

result = graph.invoke(
    {
        "topic": "Indian Startup Ecosystem",
        "urls": [
            "https://en.wikipedia.org/wiki/Startup_India",
            "https://en.wikipedia.org/wiki/Shark_Tank_India",
        ],
    }
)

To run on a different port:

PORT=9000 ./start.sh

MCP server

The MCP server exposes one tool:

Tool	Description
`web_crawl(url)`	Fetches a page, prunes noise with `crawl4ai`, and returns chunked markdown ready for embedding

You can inspect it independently with MCP Inspector:

cd newsgraph-mcp
uv run python -m src.server   # terminal 1

npx @modelcontextprotocol/inspector   # terminal 2

Open http://localhost:6274, set transport to SSE, URL to http://localhost:8000/sse, and connect.

Tech stack

Component	Library
Agent graph	LangGraph
Local LLM	Ollama via `langchain-ollama`
Web crawling	crawl4ai
Markdown chunking	unstructured
MCP transport	Model Context Protocol Python SDK

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
newsgraph-admin/admin-app		newsgraph-admin/admin-app
newsgraph-agent		newsgraph-agent
newsgraph-mcp		newsgraph-mcp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewsGraph

How it works

Project layout

Requirements

Setup

Running

MCP server

Tech stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NewsGraph

How it works

Project layout

Requirements

Setup

Running

MCP server

Tech stack

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages