Fully autonomous local real estate monitoring system
Scrapes, analyzes, and notifies you about apartment listings from Yad2, Madlan, and Facebook Marketplace.
- Quick Start (5 Minutes)
- Features
- Configuration Templates
- Telegram Setup
- Using the Dashboard
- How Deal Scoring Works
- Testing
- Project Structure
- Troubleshooting
- Advanced Features
- Pro Tips
IMPORTANT: The scraper now runs in Persistent Browser Mode to handle anti-bot protections. You must start Chrome with remote debugging enabled before running the application.
# Open a new terminal and run this command:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/chrome_bot_profile"# Open a new terminal and run this command:
google-chrome \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/chrome_bot_profile"# Open PowerShell or Command Prompt and run:
"C:\Program Files\Google\Chrome\Application\chrome.exe" ^
--remote-debugging-port=9222 ^
--user-data-dir="%USERPROFILE%\chrome_bot_profile"Configuration: You can customize these settings in your .env file:
CHROME_DEBUG_PORT=9222 # Change the debug port
CHROME_USER_DATA_DIR=~/chrome_bot_profile # Change the profile directory
HEADLESS=false # Set to true for headless mode (not recommended for CAPTCHA)
CAPTCHA_CHECK_INTERVAL=30 # Seconds between checking if CAPTCHA is solved
CAPTCHA_TIMEOUT_MINUTES=30 # Max wait time for CAPTCHA resolution
SCRAPER_POLLING_INTERVAL=15 # Minutes between scrape runsWhat this does:
- Opens Chrome with remote debugging on port 9222 (configurable via
CHROME_DEBUG_PORT) - Uses a separate profile (
chrome_bot_profile) to avoid conflicts with your regular Chrome - Allows the scraper to connect to this browser instead of launching new instances
- Keep this Chrome window open while the scraper is running
Benefits:
- ✅ Manual CAPTCHA solving when needed
- ✅ Browser stays open between scraping runs
- ✅ Better anti-bot evasion
- ✅ Persistent cookies and session
- ✅ Fully configurable via environment variables
# Clone the repository
git clone https://github.com/yourusername/Real-Estate-Monitor.git
cd Real-Estate-Monitor
# Run automated setup
python3 setup_project.py
# Activate virtual environment
source venv/bin/activate # On Windows: venv\Scripts\activate# Copy configuration template
cp .env.example .env
# Edit with your preferences
nano .env # Or: code .env, vim .envChoose a configuration below and paste into your .env file:
CITIES=תל אביב-יפו
MAX_PRICE=5000
MIN_ROOMS=2
MIN_SIZE_SQM=50
EXCLUDE_GROUND_FLOOR=false
REQUIRE_PARKING=falseCITIES=תל אביב-יפו,רמת גן,גבעתיים
MAX_PRICE=7000
MIN_ROOMS=2.5
MIN_SIZE_SQM=65
EXCLUDE_GROUND_FLOOR=true
REQUIRE_ELEVATOR_ABOVE_FLOOR=2
HIGH_PRIORITY_NEIGHBORHOODS=רמת אביב,בבלי,נווה צדקCITIES=תל אביב-יפו,רמת גן,גבעתיים,הרצליה
MAX_PRICE=10000
MIN_ROOMS=3.5
MIN_SIZE_SQM=85
EXCLUDE_GROUND_FLOOR=true
REQUIRE_ELEVATOR_ABOVE_FLOOR=1
REQUIRE_PARKING=true
REQUIRE_MAMAD=trueCITIES=תל אביב-יפו,רמת גן,גבעתיים
MAX_PRICE=2500000
MIN_ROOMS=3
MIN_SIZE_SQM=75
EXCLUDE_GROUND_FLOOR=true
REQUIRE_ELEVATOR_ABOVE_FLOOR=2
PREFER_PARKING=true
PREFER_MAMAD=truepython main.pyOpen your browser: http://127.0.0.1:8000
The system is now:
- ✅ Scraping listings every 15 minutes
- ✅ Calculating deal scores
- ✅ Detecting price drops
- ✅ Available via web dashboard
# View real-time logs
tail -f real_estate_monitor.log
# Check scraper status
curl http://127.0.0.1:8000/health
# View database statistics
curl http://127.0.0.1:8000/api/db-stats
# View database directly
sqlite3 real_estate.db "SELECT COUNT(*) as total FROM listings;"
sqlite3 real_estate.db "SELECT source, COUNT(*) as count FROM listings GROUP BY source;"
sqlite3 real_estate.db "SELECT city, COUNT(*) as count FROM listings GROUP BY city;"
# Add test data to see the dashboard in action
python add_test_listings.py# Press Ctrl+C in the terminal
# The application will shut down gracefully within 5 seconds
# If it takes longer, press Ctrl+C again to force quit- ✅ Automated Scraping from Yad2, Madlan & Facebook Marketplace
- ✅ Intelligent Deal Scoring (0-100 based on price, features, recency)
- ✅ Price Drop Detection - Automatically re-surfaces good deals
- ✅ Cross-Site Duplicate Detection - Avoid seeing the same listing twice
- ✅ Telegram Notifications - Get instant alerts for hot deals
- ✅ Web Dashboard - Beautiful UI to browse and manage listings
- ✅ Smart Filtering - Must-have, nice-to-have, and deal-breakers
- ✅ Neighborhood Analytics - Compare prices to local averages
- ✅ 24/7 Local Operation - No cloud, no subscriptions
- 📊 Real-time statistics (new today, high scores)
- 🔍 Advanced filtering (city, neighborhood, score, price)
- ❤️ Like/Hide/Contacted status tracking
- 📈 Price history charts
- 💬 One-click WhatsApp contact
- 🎨 Responsive Bootstrap UI
See .env.example for complete configuration options with detailed comments.
CITIES=תל אביב-יפו
MAX_PRICE=5000
MIN_ROOMS=2
MIN_SIZE_SQM=50
EXCLUDE_GROUND_FLOOR=false
REQUIRE_ELEVATOR_ABOVE_FLOOR=0
REQUIRE_PARKING=false
SCRAPING_INTERVAL_MINUTES=15CITIES=תל אביב-יפו,רמת גן,גבעתיים
MAX_PRICE=7000
MIN_ROOMS=2.5
MIN_SIZE_SQM=65
EXCLUDE_GROUND_FLOOR=true
REQUIRE_ELEVATOR_ABOVE_FLOOR=2
REQUIRE_PARKING=false
HIGH_PRIORITY_NEIGHBORHOODS=רמת אביב,בבלי,נווה צדק,יד אליהו
PREFER_BALCONY=true
PREFER_PARKING=true
SCRAPING_INTERVAL_MINUTES=15CITIES=תל אביב-יפו,רמת גן,גבעתיים,הרצליה,רמת השרון
MAX_PRICE=10000
MIN_ROOMS=3.5
MIN_SIZE_SQM=85
EXCLUDE_GROUND_FLOOR=true
REQUIRE_ELEVATOR_ABOVE_FLOOR=1
REQUIRE_PARKING=true
REQUIRE_MAMAD=true
PREFER_BALCONY=true
PREFER_TOP_FLOORS=true
SCRAPING_INTERVAL_MINUTES=15CITIES=תל אביב-יפו,רמת גן,גבעתיים
MAX_PRICE=2000000
MIN_ROOMS=2.5
MIN_SIZE_SQM=65
EXCLUDE_GROUND_FLOOR=true
REQUIRE_ELEVATOR_ABOVE_FLOOR=2
PREFER_PARKING=true
PREFER_BALCONY=true
PREFER_MAMAD=true
SCRAPING_INTERVAL_MINUTES=20CITIES=תל אביב-יפו,רמת גן,גבעתיים,הרצליה,רמת השרון
MAX_PRICE=3500000
MIN_ROOMS=4
MIN_SIZE_SQM=100
EXCLUDE_GROUND_FLOOR=true
REQUIRE_ELEVATOR_ABOVE_FLOOR=1
REQUIRE_PARKING=true
REQUIRE_MAMAD=true
PREFER_BALCONY=true
PREFER_TOP_FLOORS=true
SCRAPING_INTERVAL_MINUTES=20Get instant alerts on your phone for hot deals!
# On Telegram app:
# 1. Search for @BotFather
# 2. Start a chat and send: /newbot
# 3. Choose a name for your bot (e.g., "My Real Estate Monitor")
# 4. Choose a username (must end in 'bot', e.g., "my_realestate_bot")
# 5. Copy the bot token (looks like: 1234567890:ABCdefGHIjklMNOpqrsTUVwxyz)# 1. Message your new bot (send any message like "hello")
# 2. Open this URL in your browser (replace <YOUR_BOT_TOKEN>):
https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates
# 3. Look for "chat":{"id":123456789} in the JSON response
# 4. Copy the number (your chat_id)Edit your .env file and add:
TELEGRAM_BOT_TOKEN=1234567890:ABCdefGHIjklMNOpqrsTUVwxyz
TELEGRAM_CHAT_ID=123456789# Restart the application
python main.py
# Or test without restarting:
python -c "import asyncio; from app.core.database import init_db; from app.services.telegram_notifier import send_test_notification; from app.core.config import settings; asyncio.run(send_test_notification(init_db(settings.database_url)[1]()))"You should receive a test message! 📱
You'll receive notifications when:
- New listing with deal score ≥ 80
- New listing in high-priority neighborhood
- Price drop ≥ 3%
Open http://127.0.0.1:8000 in your browser.
- Filter by: City, neighborhood, status, minimum score, price range
- Sort by: Deal score, price, date
- Status: All, Unseen, Liked, Hidden, Contacted
- ❤️ Like - Mark as interested
- ❌ Hide - Mark as not interested
- 📞 Contacted - Mark as contacted
- 👁️ View - Open listing on source site
- 💬 WhatsApp - Direct contact with seller
- Total listings in database
- New listings today
- High-score listings (≥80)
- Active filters indicator
- Click any listing to see:
- Full description and images
- Price history chart
- Neighborhood price comparison
- All features and contact info
Each listing gets a score from 0-100 based on four factors:
Compared to neighborhood average:
- 30%+ below average = 40 points
- 20% below average = 35 points
- 10% below average = 30 points
- At average = 25 points
- 10% above average = 15 points
- 20%+ above average = 5 points
Based on your preferences:
- Parking (if preferred): 10 points
- Balcony (if preferred): 8 points
- Elevator (if preferred): 7 points
- Mamad/safe room (if preferred): 8 points
- Top floor (if preferred): 5 points
How fresh the listing is:
- Today: 15 points
- 1-2 days: 12 points
- 3-5 days: 9 points
- 6-10 days: 6 points
- 11-20 days: 3 points
- 20+ days: 1 point
Price change history:
- 10%+ price drop: 15 points
- 5-10% drop: 12 points
- 2-5% drop: 9 points
- Any drop: 7 points
- No change: 5 points
- Price increase: 2 points
- 80-100: 🔥 Excellent deal - Act immediately!
- 60-79: 👍 Good listing - Worth considering
- 40-59: 😐 Average - Meets basic criteria
- Below 40: 👎 Below expectations
# Run all tests with colored output and emojis
pytest tests/
# Run with coverage report
pytest tests/ --cov=app --cov-report=html
open htmlcov/index.html
# Run specific test file
pytest tests/unit/test_processor.py -v
# Run tests matching a pattern
pytest tests/ -k "deal_score" -v169 tests total - 159 passing (94% pass rate)
tests/
├── conftest.py # Shared fixtures & mocks
├── unit/
│ ├── test_processor.py # Deal scoring & processing (40+ tests)
│ ├── test_parsers.py # Hebrew parsing (50+ tests)
│ ├── test_duplicate_detector.py # Duplicate detection (12 tests)
│ ├── test_listing_filter.py # Filtering logic (17 tests)
│ ├── test_config.py # Configuration (18 tests)
│ └── test_database.py # Database models (10 tests)
└── mocked_scrapers/
└── test_yad2_parser.py # Yad2 scraper (22 tests)
- ✅ 100%: config.py, duplicate_detector.py
- ✅ 98%: listing_filter.py
- ✅ 93%: phone_normalizer.py
- ✅ 92%: database.py
- ✅ 87%: yad2_scraper.py
- 📊 37% overall (services not tested yet)
- No Browser Required - All tests use mocked DrissionPage
- Colored Output - ✅ Green for pass, ❌ Red for fail
- Parametrized Tests - Comprehensive coverage with minimal code
- Fast Execution - ~90 seconds for full suite
- CI/CD Ready - Runs automatically on every push/PR
Tests run automatically on:
- Every push to
main,master,develop - Every pull request
- Python 3.9, 3.10, 3.11 matrix
- Coverage threshold: 35%
See tests/README.md for detailed examples and best practices.
Real-Estate-Monitor/
├── main.py # Application entry point
├── app/ # Main application package
│ ├── core/ # Core business logic
│ │ ├── config.py # Configuration management
│ │ ├── database.py # Database models & ORM
│ │ ├── deal_score.py # Scoring algorithm
│ │ └── listing_processor.py # Listing processing
│ ├── services/ # Application services
│ │ ├── scheduler.py # Job scheduling
│ │ ├── dashboard.py # Web dashboard (FastAPI)
│ │ └── telegram_notifier.py # Notifications
│ ├── scrapers/ # Web scrapers
│ │ ├── base_scraper.py # Base scraper class
│ │ ├── yad2_scraper.py # Yad2 scraper
│ │ ├── madlan_scraper.py # Madlan scraper
│ │ └── facebook_scraper.py # Facebook scraper
│ └── utils/ # Utility modules
│ ├── phone_normalizer.py # Phone normalization
│ ├── duplicate_detector.py # Duplicate detection
│ └── listing_filter.py # Listing filtering
├── templates/ # HTML templates
│ ├── index.html # Main dashboard
│ └── listing_detail.html # Listing detail page
├── .env # Your configuration (create from .env.example)
├── .env.example # Configuration template
├── requirements.txt # Python dependencies
├── setup_project.py # Automated setup script
├── test_setup.py # System tests
└── add_test_listings.py # Test data generator
If you see an error like "Action Required: Start Chrome with remote debugging port 9222":
# 1. Make sure Chrome is running with debug mode (macOS):
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/chrome_bot_profile"
# 2. Check if the port is in use:
lsof -i :9222
# 3. If you want to use a different port, update .env:
CHROME_DEBUG_PORT=9223
CHROME_USER_DATA_DIR=~/chrome_bot_profile
# Then restart Chrome with the new port (macOS):
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9223 \
--user-data-dir="$HOME/chrome_bot_profile"When the scraper detects a CAPTCHA or anti-bot protection:
- Dashboard shows warning banner: "
⚠️ Scraper Paused: CAPTCHA Detected" - Check the open Chrome window: You'll see the CAPTCHA or security check page
- Solve it manually: Complete the CAPTCHA in the browser
- Scraper auto-resumes: Once solved, scraping continues automatically
- Timeout: If not solved within the configured timeout, the scraper aborts
Configure CAPTCHA behavior in .env:
CAPTCHA_CHECK_INTERVAL=30 # How often to check if solved (seconds)
CAPTCHA_TIMEOUT_MINUTES=30 # Maximum wait time before aborting (minutes)Detected anti-bot systems:
- PerimeterX
- ShieldSquare
- Cloudflare
- reCAPTCHA
- Hebrew security checks ("אבטחת אתר")
# Check Python version (must be 3.9+)
python --version
# Reinstall dependencies
pip install -r requirements.txt
# Check for errors
tail -f real_estate_monitor.log# Option 1: Change port in .env
DASHBOARD_PORT=8001
# Option 2: Kill process using port 8000
lsof -ti:8000 | xargs kill -9 # macOS/Linux
netstat -ano | findstr :8000 # Windows# Check logs for errors
tail -f real_estate_monitor.log
# Verify configuration
cat .env | grep CITIES
# Wait for first scrape (within 15 minutes)
# Or add test data immediately:
python add_test_listings.py# Verify configuration
cat .env | grep TELEGRAM
# Test bot token is valid
curl https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getMe
# Send test notification
python -c "import asyncio; from app.core.database import init_db; from app.services.telegram_notifier import send_test_notification; from app.core.config import settings; asyncio.run(send_test_notification(init_db(settings.database_url)[1]()))"# Check logs for specific errors
grep ERROR real_estate_monitor.log
# Update Playwright
pip install --upgrade playwright
playwright install chromium
# Test website accessibility
curl -I https://www.yad2.co.il# This should be fixed in the latest version
# If still happening, force quit:
# Press Ctrl+C twice
# Or: pkill -f "python main.py"Keep the application running continuously:
# Using nohup (Linux/macOS)
nohup python main.py > output.log 2>&1 &
# Check if running
ps aux | grep "python main.py"
# Stop it
pkill -f "python main.py"# Using screen (Linux/macOS)
screen -S real-estate
python main.py
# Press Ctrl+A then D to detach
# Reattach with: screen -r real-estate# Using systemd (Linux) - Create service file
sudo nano /etc/systemd/system/real-estate-monitor.service
# Add:
[Unit]
Description=Real Estate Monitor
After=network.target
[Service]
Type=simple
User=yourusername
WorkingDirectory=/path/to/Real-Estate-Monitor
ExecStart=/path/to/Real-Estate-Monitor/venv/bin/python main.py
Restart=always
[Install]
WantedBy=multi-user.target
# Enable and start
sudo systemctl enable real-estate-monitor
sudo systemctl start real-estate-monitor
sudo systemctl status real-estate-monitor# Backup database
cp real_estate.db real_estate_backup_$(date +%Y%m%d).db
# Reset database (WARNING: Deletes all data)
rm real_estate.db
python main.py # Will recreate
# View database with SQLite
sqlite3 real_estate.db
sqlite> SELECT COUNT(*) FROM listings;
sqlite> SELECT city, COUNT(*) FROM listings GROUP BY city;
sqlite> .quitEdit .env to customize scraping frequency:
SCRAPING_INTERVAL_MINUTES=15 # Global default
YAD2_INTERVAL_MINUTES=15 # Yad2 specific
MADLAN_INTERVAL_MINUTES=15 # Madlan specific
FACEBOOK_INTERVAL_MINUTES=20 # Facebook (slower to avoid rate limits)Facebook requires authentication via cookies:
# 1. Install "Cookie Editor" browser extension (Chrome/Firefox)
# 2. Login to Facebook in your browser
# 3. Click the Cookie Editor extension icon
# 4. Click "Export" and choose "JSON"
# 5. Save the file as facebook_cookies.json in project root
# 6. Restart the application- Start Broad: Begin with wider filters, then narrow down based on results
- Monitor Scores: Adjust preferences in
.envto match your priorities - Check Daily: Even automated, check dashboard for new matches
- WhatsApp Ready: Have template messages ready for quick responses
- Be Quick: Good deals go fast - act on high scores (80+) immediately
- Use Status Tracking: Like/Hide listings to keep dashboard organized
- Track Price History: Check price trends before contacting sellers
- Set Priority Neighborhoods: Get notified even for lower scores in favorite areas
- Adjust Thresholds: Lower
MIN_DEAL_SCORE_NOTIFYif you want more notifications - Review Regularly: Check hidden listings occasionally - preferences change!
This system is for personal use only:
- Running locally for apartment hunting
- Notifying yourself about listings
- Storing data locally on your computer
- Sharing scraped data publicly
- Selling or commercializing data
- Overloading websites with excessive requests
- Violating terms of service of source websites
Rate Limiting: Built-in delays and limits to be respectful to source sites.
- Python: 3.9 or higher
- CPU: Low (< 5% when idle)
- RAM: ~200-300 MB
- Disk: ~50 MB (database + logs)
- Network: Minimal (scraping only)
- Personal use on single machine
- 3-5 cities simultaneous monitoring
- 100-500 listings in database
- 15-minute scraping intervals
- 24/7 continuous operation
Built with:
- Python 3.9+ - Programming language
- FastAPI - Web framework
- Playwright - Browser automation
- SQLAlchemy - Database ORM
- Bootstrap 5 - UI framework
- python-telegram-bot - Notifications
- APScheduler - Job scheduling
- fuzzywuzzy - Fuzzy string matching
This is personal software for individual use. Not licensed for commercial distribution.
For issues and questions:
- Check logs:
tail -f real_estate_monitor.log - Review configuration:
cat .env - Run tests:
python test_setup.py - Search issues: GitHub Issues
- Create new issue: Include logs and configuration (remove sensitive data)
Happy House Hunting! 🏡
Made with ❤️ for apartment hunters in Israel