Automated detection and alerting system for fertility medication offers in Hebrew Telegram communities. MedAlert uses a hybrid heuristic + LLM approach to efficiently identify legitimate medication offers (Gonal, Cetrotide, Menopur, Ovitrelle) with minimal API costs.
Israeli fertility communities use Telegram to share and offer medications. Manually monitoring these channels for medication availability is time-consuming. MedAlert automates this with intelligent filtering and LLM classification.
- Telegram Monitoring: Listens to configured channels in real-time
- Heuristic Filtering: Pre-filters messages using medication regex and intent patterns (~70% reduction in LLM calls)
- LLM Classification: GPT-4 classification for messages that pass heuristic checks
- Message Deduplication: SHA-256 hashing prevents duplicate alerts
- Cost Optimization: Only calls LLM on ~30% of messages, reducing API costs by 70%+
- Alert Notifications: Prints formatted alerts to console
Telegram Channels (4 target groups)
↓
[Telegram Ingestor]
↓
[Message Received]
↓
[Text Processor] → normalize text, remove emojis
↓
[Heuristic Filter] → medication regex + intent keywords
├→ Not relevant → Skip
└→ Relevant → Continue
↓
[Deduplication] → Check message hash in database
├→ Already seen → Touch last_seen, Skip LLM
└→ New message → Continue
↓
[LLM Classifier] → GPT-4: "offer" or "not_offer"
├→ not_offer → Cache, Done
└→ offer → Continue
↓
[Notifier] → Print alert to console
↓
[Database] → Cache message for future deduplication
| Component | Purpose | File |
|---|---|---|
| Telegram Ingestor | Connects to Telegram, receives messages, orchestrates pipeline | telegram_ingestor.py |
| Text Processor | Normalizes text, applies heuristic filters | text_processor.py |
| LLM Classifier | Calls OpenAI GPT-4 for classification | llm_classifier.py |
| Database | Caches messages using SHA-256 deduplication | db.py |
| Notifier | Sends alerts when offers are detected | notifier.py |
| Config | Environment variables and target channels | config.py |
- Python 3.10+
- Telegram API credentials (get from my.telegram.org)
- OpenAI API key
git clone https://github.com/yourusername/MedAlert.git
cd MedAlert
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the project root:
TELEGRAM_API_ID=your_api_id
TELEGRAM_API_HASH=your_api_hash
OPENAI_API_KEY=your_openai_keyTo get Telegram credentials:
- Go to my.telegram.org/apps
- Create a new application
- Copy
API_IDandAPI_HASH
python -m telegram_ingestorYou'll see alerts printed to console when medication offers are detected:
[ALERT] Offer detected!
Channel: קבוצת מסירה ולא מכירה
Sender: Sarah Cohen
Time (UTC): 2026-02-10 12:30:45
Message: יש לי גונל F-75 עם מחט
LLM Result: offer
Edit config.py to add/remove Telegram channels to monitor:
TARGET_CHANNELS = {
-1001406308981: 'קבוצת מסירה ולא מכירה',
-1001486624364: 'מחיר עלות 💗',
-1001359530267: 'זו לזו - נתינה ומסירה באהבה',
-1002314434225: 'Secret Machine Learning Jobs', # Test channel
}In llm_classifier.py, change the model:
response = await asyncio.to_thread(
openai.ChatCompletion.create,
model="gpt-4", # Change to "gpt-3.5-turbo" for cost savings
temperature=0,
max_tokens=1,
messages=[...]
)Run the test suite:
# Test database functions
python -m unittest tests.DBTest -v
# Test text processing
python -m unittest tests.TextProcessungTest -v
# Test notifier
python -m unittest tests.NotifierTest -v
# Run all tests
python -m unittest discover tests -vApplies lightweight rules before calling expensive LLM:
# Matches medication names (Hebrew + English)
MED_REGEX = r"(גונל|gonal|צטרוטייד|cetrotide|...)"
# Matches selling/giving intent
GIVING_SELLING_INTENT = ["למסירה", "למכירה", "יש לי", "נותנת", ...]
# Filters out questions
QUESTION_WORDS = ["מישהי", "יודעת", "איפה", ...]Result: ~70% of messages filtered before LLM, reducing API calls and costs.
Uses SHA-256 hashing to prevent duplicate alerts:
message_hash = sha256(normalized_text.encode()).hexdigest()
cached = get_cached_message(conn, message_hash)
if cached:
touch_message(conn, cached.id) # Update last_seen, skip LLM
else:
insert_message(conn, message_hash, json_data) # New message, proceed
result = await classify_with_llm(text)Classifies messages as "offer" or "not_offer":
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "You are a classifier. Respond only with: offer or not_offer"
},
{"role": "user", "content": text}
],
temperature=0,
max_tokens=1
)Sends alerts when offers are detected:
async def send_alert(text, channel, sender, timestamp, llm_result):
alert_msg = f"[ALERT] Offer detected!\n..."
print(alert_msg)- Message Processing: O(1) heuristic filter + O(1) hash lookup + O(1) LLM call (if needed)
- Deduplication: ~80% of messages already in cache (no LLM call)
- Heuristic Filter: ~70% messages filtered (no LLM call)
- Total LLM calls: ~20% of all messages
| Component | Cost |
|---|---|
| GPT-4 (200 calls @ $0.03/1k) | $0.006 |
| Database operations | Free (local SQLite) |
| Telegram API | Free |
| Total | $0.006 per 1000 messages |
MedAlert/
├── README.md # This file
├── ROADMAP.md # 3-month development roadmap
├── config.py # Configuration & target channels
├── db.py # SQLite deduplication cache
├── telegram_ingestor.py # Main message ingestion pipeline
├── text_processor.py # Heuristic filtering
├── llm_classifier.py # OpenAI GPT-4 classification
├── notifier.py # Alert notification
├── pyproject.toml # Project metadata
├── requirements.txt # Python dependencies
├── .env # Environment variables (not in repo)
├── .env.example # Environment template
├── .gitignore # Git ignore rules
├── tests/
│ ├── DBTest.py # Database tests
│ ├── TextProcessungTest.py # Text processor tests
│ └── NotifierTest.py # Notifier tests
└── *.session # Telegram session files (not in repo)
- Credentials: Store API keys in
.envfile (never commit) - Database: Uses local SQLite with WAL mode for reliability
- Telegram: Session files are not committed to git
See .env.example for required configuration.
- Set up
.envwith your credentials - Add target channels to
config.py - Run
python -m telegram_ingestor - Monitor console for alerts
- Read ROADMAP.md for 3-month development plan
- Run tests:
python -m unittest discover tests -v - Explore the codebase and extend as needed
This project showcases:
- LLM Integration: Production-grade OpenAI API usage with cost optimization
- System Design: Well-architected pipeline with clear separation of concerns
- Cost Optimization: 70%+ reduction through intelligent heuristic filtering
- Production Patterns: Caching, deduplication, error handling
- Async Architecture: Non-blocking message processing
- Testing: Comprehensive unit test coverage
- Database Design: SQLite with efficient query patterns
Interested in extending this project? See ROADMAP.md for potential enhancements including:
- Confidence scoring and structured LLM outputs
- Retrieval Augmented Generation (RAG) for context-aware classification
- Multi-channel notifications (email, webhooks, Telegram bot)
- Performance monitoring and observability
MIT License - see LICENSE file
Built as a portfolio project demonstrating GenAI integration, system design, and production-grade code practices.
pip install -r requirements.txt- Check
.envfile exists and has correct values - Verify credentials from my.telegram.org
- The system should handle this gracefully
- Check
GPT_API_KEYis valid in.env - Consider using GPT-3.5-turbo for higher rate limits
SQLite is configured with WAL mode to handle concurrent access. If issues persist:
rm -f messages_cache.db messages_cache.db-wal messages_cache.db-shmOpen an issue on GitHub or review ROADMAP.md for more details about the project direction.