Scalable News Analysis & Incident Visualizer
News Intelligence is a highly scalable hybrid AI system combining vectorized ML for high-throughput stream filtering and LLM reasoning (Groq/Ollama) for contextual analysis. It auto-discovers RSS feeds, extracts structured event insights, geocodes addresses automatically, and renders interactive maps for real-time security intelligence.
# Fast batch classification is fully vectorized
from src.news_binary_classifier import NewsBinaryClassifier
classifier = NewsBinaryClassifier(model_path='models/')
df_classified = classifier.predict(news_df)
# df_classified['prediction'] contains 'incident' or 'non-incident'- ⚡ ML Engine: Logistic Regression for high-throughput, low-latency filtering (processes 10,000+ items quickly).
- 🧠 LLM Brain: Deep reasoning (Groq/Ollama) generates structured impact analysis and contextual summaries for filtered incidents only.
- 🔗 RSS/Atom Feed Discovery - Auto-discovers and scrapes relevant news articles in parallel via Trafilatura.
- 🗺️ Geospatial View - Automatic address/site extraction with geocoding with Folium interactive maps.
- 🔒 Parallel Processing - Multi-threaded scraping and LLM invocation for rapid datasets updates.
- 🐼 LLM Caching - Built-in LRU cache for LLM queries saves tokens and boosts pipeline speed.
- 🛡️ Multi-backend Support - Run completely free with Ollama (local) or scale via Groq (cloud) APIs.
pip install -r requirements.txtTo use the Groq cloud backend, create a .env file in the root directory and add your API key:
GROQ_API_KEY="your_api_key_here"You can generate a new Groq API key from the Groq Console.
Ensure your trained model weights exist in the models/ directory:
Logistic Regression_62k.pkllabel_encoder_transformer.pklword_vectorizer.pkl
Run the main Streamlit application:
streamlit run src/news_intilligence.pyThe src/news_binary_classifier.py provides optimized text vectorization for separating relevant incidents from general news headlines.
| Method | Speed | Richness | Use Case |
|---|---|---|---|
predict(df) |
Fast | High | Balance of accuracy and throughput (Title + Desc) |
predict_from_headlines(df) |
🚀 Fastest | Med | Speed is critical (Title only) |
predict_with_probability(df) |
Fast | High | Thresholding by confidence |
Usage snippet:
from src.news_binary_classifier import classify_news_batch
# Returns DataFrame with 'prediction' column
results = classify_news_batch(
df,
model_path='models/',
use_headlines_only=False
)Once filtered, articles pass into an LLM analysis chain that parses structured insights back in a single inference call:
- Incident Type: Criminality, drugs, protest, bombing, maritime, etc.
- Impact Level:
High(fatalities),Medium(injuries),Low(minor). - Locations: Extracts country, city, address/landmark automatically.
- Summary: Concise five-sentence factual reporting.
Locations extracted are run through Nominatim rates-limited geocoding services to populate interactive Map Circles using folium right in the dashboard view.
- Speed: TQDM overhead was removed intentionally for vectorized text cleanup.
- Rate-Limiting: Parallel processing uses safe fallback wrappers supporting rate limits for cloud API vendors perfectly.
- Token Efficiency: Cached responses based on hash strings speed up re-scraping the same article titles drastically.
- Check out
News_Intelligence_Notebook.ipynbfor step-by-step pipeline creation. - Check
SKILL.mdfor fast recipes setup workflow code.
Issues and feature suggestions are highly welcomed.
MIT License.
Dr. Yasser Mustafa
AI & Data Science Specialist | Theoretical Physics PhD
- 🎓 PhD in Theoretical Nuclear Physics
- 💼 10+ years in production AI/ML systems
- 📍 Based in Newcastle Upon Tyne, UK
- ✉️ yasser.mustafan@gmail.com
- 🔗 LinkedIn | GitHub
Built with ❤️ for rapid threat intelligence situational awareness