Production-ready RSS feed monitoring service with automated keyword tracking, web management interface, and REST API endpoints.
RSS News Monitor continuously scans multiple RSS feeds, identifies articles matching specified keywords, and provides both web and programmatic access to discovered content. Built for reliability with comprehensive logging and error handling.
- Automated RSS Monitoring: Continuous background scanning every 30 minutes
- Keyword Detection: Word-boundary matching for precise results
- Content Cleaning: Automatic HTML tag removal from article descriptions
- Duplicate Prevention: Link-based deduplication of articles
- Web Dashboard: Complete feed and keyword management via browser
- RSS Source Control: Add, activate, deactivate, and remove RSS feeds
- Keyword Management: Dynamic keyword addition and status control
- Real-time Monitoring: Start/stop monitoring with live status display
- REST Endpoints: Full programmatic access to all data and functions
- JSON Responses: Structured data for integration with external systems
- Status Monitoring: Real-time system health and statistics
- SQLite Database: Persistent storage with automatic schema management
- Comprehensive Logging: File and console logging with detailed operation tracking
- Error Handling: Robust exception management with continued operation
git clone https://github.com/dbkarashev/rss_monitor.git
cd rss_monitor
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txtpython3 rss_monitor.pyAccess Points:
- Web Interface: http://localhost:5001
- API Base URL: http://localhost:5001/api
- Article Feed: View discovered articles with source attribution and keyword highlighting
- Monitoring Controls: Start/stop automated scanning with status indicators
- RSS Management: Add new feeds, toggle active status, remove sources
- Keyword Configuration: Add search terms, enable/disable specific keywords
The system initializes with curated RSS sources and keywords:
RSS Sources:
- TechCrunch (Technology News)
- The Verge (Tech & Culture)
- Ars Technica (Science & Technology)
- Hacker News (Developer Community)
- VentureBeat (AI & Startup News)
Keywords:
- AI, artificial intelligence, technology, tech, programming, Python, software, digital
| Method | Endpoint | Description | Response |
|---|---|---|---|
GET |
/api/news |
Retrieve found articles | Array of article objects |
GET |
/api/feeds |
List RSS feed sources | Array of feed objects |
GET |
/api/keywords |
List search keywords | Array of keyword objects |
GET |
/api/status |
System status and stats | Status object |
Get Latest Articles:
curl http://localhost:5001/api/newsResponse:
[
{
"title": "AI Breakthrough in Machine Learning",
"description": "Researchers announce significant advances...",
"link": "https://example.com/article",
"feed_name": "TechCrunch",
"keywords_matched": "AI, technology",
"published_date": "2025-06-05T10:30:00",
"found_at": "2025-06-05T10:35:22"
}
]System Status:
curl http://localhost:5001/api/statusResponse:
{
"monitoring": true,
"active_feeds": 4,
"active_keywords": 8,
"total_articles": 127
}Via Web Interface: Navigate to RSS Sources section, enter feed name and URL
Supported Formats:
- RSS 2.0
- Atom 1.0
- RSS 1.0/RDF
- Word Boundaries: Uses word-boundary matching to prevent false positives
- Case Insensitive: All keyword matching is performed case-insensitively
- Phrase Support: Multi-word phrases supported (e.g., "machine learning")
- Unicode Support: Full UTF-8 support for international keywords
rss_monitor/
├── rss_monitor.py # Main application
├── requirements.txt # Python dependencies
├── README.md # Documentation
├── LICENSE # MIT license
├── .gitignore # Git exclusions
├── rss_monitor.db # SQLite database (auto-created)
└── rss_monitor.log # Application logs (auto-created)
Tables:
rss_feeds: RSS source managementkeywords: Search term configurationfound_news: Discovered articles with metadata
All operations are logged to both console and rss_monitor.log:
- Feed parsing operations and results
- Article discovery with matched keywords
- Configuration changes (feeds/keywords)
- Error conditions and recovery actions
- System startup and shutdown events
Log Levels: INFO for normal operations, WARNING for recoverable issues, ERROR for critical problems
- Python: 3.6 or higher
- Dependencies: (see requirements.txt)
- Storage: ~50MB for database and logs
- Network: Internet access for RSS feed retrieval
- Use WSGI server (gunicorn, uWSGI) instead of built-in Flask server
- Configure reverse proxy (nginx) for external access
- Set up log rotation for rss_monitor.log
- Monitor disk space for database growth
- Configure firewall rules for port access
export RSS_MONITOR_PORT=5001
export RSS_MONITOR_HOST=0.0.0.0
export RSS_MONITOR_DB_PATH=/var/lib/rss_monitor/rss_monitor.dbPort 5001 in use:
lsof -ti:5001 | xargs kill -9Database corruption:
rm rss_monitor.db
python3 rss_monitor.py # Will recreate with default dataNo articles found:
- Verify RSS feeds are accessible
- Check keyword configuration
- Review logs for parsing errors
MIT License - see LICENSE file for details
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'feat: add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request