Skip to content

github-hc/newsgraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NewsGraph

An agentic newsletter pipeline built with LangGraph and local LLMs. Give it a topic and a list of URLs — it crawls the pages, summarises the content, and writes a newsletter. No external API keys needed.

How it works

START
  ↓
Crawl Node      — fetches each URL via the MCP web-crawl tool,
                  prunes noise, and returns semantic text chunks
  ↓
Research Node   — grounds an LLM summary in the crawled content
  ↓
Writer Node     — turns the summary into a newsletter draft
  ↓
END

The crawling runs as a separate MCP server (newsgraph-mcp) so it can be reused by other agents or inspected independently. The agent (newsgraph-agent) connects to it over SSE.

Project layout

newsgraph/
├── newsgraph-mcp/        MCP server — web crawling + markdown chunking
│   └── src/
│       ├── server.py
│       ├── tools/
│       │   └── web_crawl_tool.py
│       └── utils/
│           └── markdown_cleaner.py
├── newsgraph-agent/      LangGraph agent
│   └── src/
│       ├── main.py       entry point — set your topic and URLs here
│       ├── graph.py
│       ├── nodes.py
│       ├── state.py
│       └── llm.py
└── start.sh              starts both services

Requirements

  • uv — Python package manager
  • Ollama running locally with phi3:mini pulled
ollama pull phi3:mini

Setup

Each sub-project manages its own virtual environment. Run these once:

# MCP server
cd newsgraph-mcp
uv sync
uv run crawl4ai-setup   # downloads Playwright/Chromium, needed once

# Agent
cd ../newsgraph-agent
uv sync

Running

From the project root:

./start.sh

This starts the MCP server in the background, waits for it to be ready, runs the agent, then shuts everything down when done. Server logs go to mcp-server.log.

To change the topic or URLs, edit newsgraph-agent/src/main.py:

result = graph.invoke(
    {
        "topic": "Indian Startup Ecosystem",
        "urls": [
            "https://en.wikipedia.org/wiki/Startup_India",
            "https://en.wikipedia.org/wiki/Shark_Tank_India",
        ],
    }
)

To run on a different port:

PORT=9000 ./start.sh

MCP server

The MCP server exposes one tool:

Tool Description
web_crawl(url) Fetches a page, prunes noise with crawl4ai, and returns chunked markdown ready for embedding

You can inspect it independently with MCP Inspector:

cd newsgraph-mcp
uv run python -m src.server   # terminal 1

npx @modelcontextprotocol/inspector   # terminal 2

Open http://localhost:6274, set transport to SSE, URL to http://localhost:8000/sse, and connect.

Tech stack

Component Library
Agent graph LangGraph
Local LLM Ollama via langchain-ollama
Web crawling crawl4ai
Markdown chunking unstructured
MCP transport Model Context Protocol Python SDK

About

A state-driven AI newsletter agent leveraging LangGraph and SLMs. Features async multi-source scraping, automated content curation planning, and robust human-in-the-loop validation checkpoints

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors