Skip to content

ROBROICH/bundestag-rag-public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›οΈ Bundestag AI Lens β€” Chat with Your Parliament

An open-source AI application that makes German parliamentary documents accessible, searchable, and understandable for everyone.

Note: This project is for educational and research purposes. Please respect the Bundestag DIP API terms of service. The code was developed with GenAI support β€” not intended for production use without review.

Live Demo

⏳ Cold start: The demo runs on Azure Container Apps with scale-to-zero enabled to minimize costs. The first request after inactivity may take 30–60 seconds while the container starts up.


πŸ’‘ What This Does

Citizens, journalists, and researchers can ask questions in plain language about German parliamentary activity and receive:

  • AI-powered summaries of 50+ page legal documents β€” explained like a journalist would
  • Citizen impact analysis β€” what does this law mean for everyday people?
  • External media coverage β€” AI-curated news links from major German outlets (optional, via OpenAI web search)
  • Structured search results across VorgΓ€nge (procedures), Drucksachen (documents), and Plenarprotokolle (transcripts)
  • Bilingual support β€” German and English interface with LLM-based title translation
  • Full transparency β€” see which tools the AI calls, what data it fetches, and how long each step takes

πŸ‘₯ Who Benefits

Audience Value
πŸ§‘β€πŸ€β€πŸ§‘ Citizens Understand complex legislation without legal expertise. Direct access to government decisions.
πŸ“° Journalists Quickly find and summarize parliamentary activity across legislative periods.
πŸŽ“ Researchers Search structured parliamentary data with filters for party, document type, and date ranges.
πŸ›οΈ Government Demonstrate transparent, digital-first citizen engagement.

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Browser (SPA)                         β”‚
β”‚  Vanilla JS Β· SSE Streaming Β· Reasoning Panel Β· i18n    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ POST /api/chat/stream
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FastAPI Backend (Python)                     β”‚
β”‚                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Phase 1:    β”‚  β”‚ Tool         β”‚  β”‚ Phase 2:       β”‚  β”‚
β”‚  β”‚ LLM decides │─▢│ Dispatcher   │─▢│ LLM generates  β”‚  β”‚
β”‚  β”‚ which tools β”‚  β”‚ (10 tools)   β”‚  β”‚ answer         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                          β”‚                  β”‚            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Security: Rate Limit Β· Input Validation Β· CSP     β”‚   β”‚
β”‚  β”‚           Security Headers Β· CORS Β· HSTS (opt)    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β–Ό            β–Ό            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OpenAI GPT-5     β”‚ β”‚ OpenAI    β”‚ β”‚ Bundestag DIP API     β”‚
β”‚ mini (400K ctx)  β”‚ β”‚ Web Searchβ”‚ β”‚ search.dip.bundestag  β”‚
β”‚ Function calling β”‚ β”‚ (optional)β”‚ β”‚ .de/api/v1            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design Decisions

Decision Rationale
Single-file HTML UI Zero build tools, instant reload, no npm/webpack complexity
SSE streaming Real-time token-by-token display with reasoning transparency
Server-side table formatting Search results bypass the LLM entirely β†’ faster, cheaper
2-phase LLM approach Phase 1 (compact) picks tools; Phase 2 (task-specific) generates answer
MCP + REST dual interface Same tools exposed as both OpenAI functions and MCP protocol
OpenAI Web Search Optional news coverage via Responses API β€” runs parallel to Phase 2, zero added latency

πŸ”„ Data Flow

User types: "Welche Klimaschutzgesetze wurden 2026 verabschiedet?"
  β”‚
  β–Ό
1. POST /api/chat/stream { message, history, language }
  β”‚
  β–Ό
2. Phase 1 LLM (GPT-5 mini, compact prompt)
   β†’ Decides: call search_vorgaenge(query="Klimaschutz", vorgangstyp="Gesetzgebung", date_from="2026-01-01")
  β”‚
  β–Ό
3. Tool Dispatcher executes async DIP API calls
   β†’ GET /vorgang?f.suche=Klimaschutz&f.vorgangstyp=Gesetzgebung&f.datum.start=2026-01-01
   β†’ Results cached (SHA-256 key, 1h TTL, 256 entries)
  β”‚
  β–Ό
4. Results formatted as Markdown table (search-only fast path)
   β†’ No Phase 2 LLM call needed for pure searches
   β†’ Each row has: [πŸ“„ DIP] [πŸ€– AI Summary] [πŸ‘€ Citizen Impact] links
  β”‚
  β–Ό
5. SSE events streamed to browser:
   β†’ "model_thinking" β†’ "tool_call" β†’ "tool_result" β†’ "content" β†’ "done"
   β†’ Reasoning panel shows each step with live timers

When user clicks "πŸ€– AI Summary":

6. New chat message: "Fasse den Vorgang 332067 zusammen (ID:332067)"
  β”‚
  β–Ό
7. Phase 1 β†’ calls get_vorgang_details(332067) + fetches Drucksache/Plenarprotokoll text
  β”‚
  β–Ό
8. Phase 2 (Summary prompt) β†’ Journalistic explanation of the law's substance:
   problem it solves, real-world significance, key actors, financial impact
   β•‘
   β•‘ (parallel)  Web Search β†’ OpenAI Responses API with domain-filtered
   β•‘              news search β†’ appends "πŸ“° External Media Coverage" section
   β–Ό
9. Complete response with AI summary + optional news links

πŸ›‘οΈ Security

Layer Implementation
Rate Limiting 30 requests/minute per IP (in-memory, sliding window)
Input Validation Message: max 10,000 chars Β· History: max 50 messages Β· Language: de|en only
Security Headers X-Frame-Options: DENY Β· X-Content-Type-Options: nosniff Β· Referrer-Policy Β· Permissions-Policy
CSP default-src 'self' Β· frame-ancestors 'none' Β· img-src 'self' data: https:
HSTS Opt-in via ENABLE_HSTS=true (recommended for production)
XSS Prevention No inline onclick handlers β€” delegated event listeners with data-* attributes Β· URL scheme validation blocks javascript:/data:
CORS Configurable origins (no wildcard) Β· Defaults to localhost
Error Sanitization Internal errors never exposed to client β€” generic messages only
XSRF Protection Enabled in Streamlit config
Docker Non-root appuser Β· Minimal base image (python:3.11-slim)
Azure Secrets Managed identity for ACR Β· @secure() Bicep parameters Β· No admin credentials

πŸš€ Quick Start

Prerequisites

Local Development (Chat App)

# 1. Clone and setup
git clone https://github.com/ROBROICH/bundestag-rag-public.git
cd bundestag-rag-public

# 2. Create virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements-mcp.txt

# 3. Configure environment
cp env.template .env
# Edit .env with your API keys

# 4. Start the Chat App
python -m uvicorn src.chat.app:app --host 127.0.0.1 --port 8000 --reload

Access at: http://localhost:8000

Docker

docker build -f deployment/docker/Dockerfile.mcp -t bundestag-chat .
docker run -p 8000:8000 --env-file .env bundestag-chat

Azure Container Apps

.\deployment\azure\deploy-container-apps.ps1 `
    -ResourceGroup "rg-bundestag" `
    -Location "westeurope"

See Azure Deployment Docs for full guide.


πŸ”§ Environment Variables

Variable Required Description
OPENAI_API_KEY βœ… OpenAI API key with GPT-5 mini access
BUNDESTAG_API_KEY βœ… DIP API key (free)
ALLOWED_ORIGINS ❌ CORS origins (default: localhost:8000)
ENABLE_HSTS ❌ Enable Strict-Transport-Security (default: false)
ENABLE_WEB_SEARCH ❌ Enable news coverage via OpenAI web search in summaries (default: false)
MCP_HOST ❌ Bind address (default: 127.0.0.1)
LOG_LEVEL ❌ Logging level (default: INFO)

πŸ“‚ Project Structure

bundestag-rag-api/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ chat/
β”‚   β”‚   β”œβ”€β”€ app.py              # FastAPI backend β€” endpoints, tools, prompts, security
β”‚   β”‚   └── static/
β”‚   β”‚       └── index.html      # Single-file chat UI (vanilla JS, SSE streaming)
β”‚   β”œβ”€β”€ mcp/
β”‚   β”‚   └── server.py           # MCP server (10 tools via FastMCP SSE transport)
β”‚   └── web/
β”‚       └── openai_config.py    # Model configuration (GPT-5 mini, token limits)
β”œβ”€β”€ config/
β”‚   └── settings.py             # API URLs, timeouts, cache settings
β”œβ”€β”€ deployment/
β”‚   β”œβ”€β”€ docker/                 # Dockerfile, Dockerfile.mcp, Dockerfile.optimized
β”‚   β”œβ”€β”€ azure/                  # Bicep template + deploy script for Container Apps
β”‚   └── local/                  # docker-compose.yml for local dev
β”œβ”€β”€ main.py                     # Entry point (chat/mcp/streamlit modes)
β”œβ”€β”€ requirements.txt            # Full dependencies (pinned versions)
β”œβ”€β”€ requirements-mcp.txt        # Minimal production dependencies
└── env.template                # Environment variable template

πŸ› οΈ API Tools

The LLM has access to 10 tools that query the official Bundestag DIP API:

Tool Description
search_vorgaenge Search parliamentary procedures (filter by party, type, date, legislative period)
search_drucksachen Search official printed documents
search_plenarprotokolle Search plenary session transcripts
get_vorgang Get procedure metadata by ID
get_vorgang_details Get full procedure with linked Drucksache text
get_drucksache Get document metadata
get_drucksache_text Get full document text (up to 250K chars)
get_plenarprotokoll_text Get full plenary transcript text
search_personen Search Bundestag members by name
search_aktivitaeten Search parliamentary activities (speeches, votes, motions)

These same tools are exposed via MCP protocol at /mcp for integration with other LLM clients.


🎯 UI Features

Feature Description
🌍 Language Toggle German / English with LLM-based table translation
🧠 Reasoning Panel Real-time display of tool calls, timing, and intermediate results
πŸ“Š Search Tables Server-formatted Markdown tables with DIP links, PDF links, AI Summary & Citizen Impact actions
πŸ—ΊοΈ Guided Search Topic cards β†’ Party filter β†’ Document type β†’ Auto-generated query
πŸ“… Wahlperiode Slider Filter by legislative period (WP 1–21, 1949–2026+)
πŸ’¬ Streaming Responses Token-by-token display with live progress indicators
πŸ“Ž Action Tiles Follow-up buttons: πŸ“„ DIP / πŸ€– AI Summary / πŸ‘€ Citizen Impact
πŸ“° Media Coverage Optional news links from curated German outlets appended to AI summaries

πŸ”— Resources


πŸ“„ License

This project is licensed under the MIT License.

Note: This project is for educational and research purposes. Please respect the Bundestag DIP API terms of service.


For deployment issues, see the Azure Deployment Documentation troubleshooting section.

About

Bundestag RAG Application - Document Search and Analysis with AI-powered summaries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors