Automated MLS + off-market data pipeline and analytics dashboard for Palm Beach-area market tracking.
- Ingests MLS CSV exports into a normalized SQLite dataset.
- Applies timeline and status logic (including zombie listing detection).
- Standardizes subdivision naming with PCN lookup-based mapping and fallback grouping.
- Calculates market stats across monthly/quarterly/annual periods.
- Serves an interactive Streamlit dashboard and PDF report workflow.
- Supports off-market county sales scraping/import with duplicate and transfer filtering.
- Supports cabana/auxiliary-unit merge logic to avoid double-counting sales.
- Python 3
- Pandas / NumPy
- SQLite (
mls.db, WAL mode) - Streamlit + Plotly
- Selenium + requests (off-market scraping/enrichment)
- FPDF-based report generation in
report_generator.py
A starter FastAPI service is now available at:
api/main.py
Run locally:
uvicorn api.main:app --reload --port 8000Starter endpoints:
GET /api/healthGET /api/summary/kpisGET /api/market/trendsGET /api/inventory/by-statusGET /api/filters/optionsGET /api/listings/recentGET /api/market/report-summaryGET /api/market/subdivision-rankingsGET /api/market/period-seriesGET /api/ops/statusPOST /api/cma/run
Example:
curl "http://127.0.0.1:8000/api/summary/kpis?sold_since=2026-02-05"Filterable query params (where supported):
citygeo_zonefinal_subdivisionproperty_typeproperty_group(ALL,SINGLE_FAMILY,TOWNHOME_CONDO)sold_since(/api/summary/kpis)months(/api/market/trends)frequency(monthly|quarterly|annualfor/api/market/trends)periods(/api/market/period-series, e.g. 12 for trailing 12 months)report_mode(rolling|monthly|quarterly|annual|custom) for report endpointsref_year,ref_month,ref_quarterfor monthly/quarterly/annual report modesstart_date,end_date(YYYY-MM-DD) for custom date-range mode
POST /api/cma/run payload:
parcel(required)as_of_date(YYYY-MM-DD, optional; defaults to today)top_n(optional, default 10)
Example:
curl -X POST "http://127.0.0.1:8000/api/cma/run" \
-H "Content-Type: application/json" \
-d '{"parcel":"00424634100000390","as_of_date":"2026-03-02","top_n":10}'Note: GET /api/filters/options now accepts context filters (city, geo_zone, property_type, property_group) so subdivision dropdown options can be narrowed to the selected area/type.
Starter app location:
frontend/
Install and run:
cd frontend
npm install
npm run devThe frontend reads the API base URL from:
VITE_API_BASE_URL(default:http://127.0.0.1:8000)
Example:
VITE_API_BASE_URL=http://127.0.0.1:8000 npm run devCurrent React includes two views:
-
Market Dashboard(period/report analytics + print workflows) -
CMA(parcel run, subject details, comp map, comp adjustment breakdown) -
app.py: Streamlit app (filters, KPI display, charting, export entry points). -
main.py: Console menu for common operations. -
generate_db.py: MLS batch entrypoint (input_csvs/*.csv-> cleaning -> DB upsert). -
data_cleaning.py: MLS normalization, derived logic, lookup application, dedupe. -
data_loader.py: DB schema + insert/upsert implementation. -
data_analysis.py+data_analysis_functions.py: metric calculation engine. -
report_generator.py: PDF market report generation utilities. -
PalmBeachProrpertyScraper.py: off-market sales scraper + enricher. -
pbc_importer.py: importer for enriched off-market CSV intolisting_details. -
merge_cabanas.py: DB-level condo/cabana merge workflow. -
rescrape_missing_details.py: fills missing beds/baths/sqft for imported PBC records.
- Drop raw MLS CSV files into
input_csvs/(orinput_csvs/raw/if you are organizing by stage). - Run
generate_db.py(or menu option inmain.py). data_cleaning.py:- Renames raw columns to normalized schema names.
- Casts types (dates, numerics, booleans).
- Computes
effective_active_end_date,calculated_status,is_zombie. - Applies subdivision normalization via lookup sheets and PCN grouping.
- Applies geo-zone tagging.
data_loader.pyupserts intolisting_detailsinmls.db.- Streamlit dashboard (
app.py) reads DB and runs analytics.
- Run
PalmBeachProrpertyScraper.pyto pull county sales search results and enrich each row. - Scraper performs:
- Municipality search mode selection.
- Date filter support (start, optional end).
- Duplicate avoidance using PCN + fuzzy sold-date match against existing DB sales.
- Geocoding enrichment and property-detail scrape.
- CSV-level cabana combine pass for same owner + same day pairs.
- Run
pbc_importer.pyonENHANCED_*.csv:- Maps fields into
listing_detailsschema. - Creates
listing_numberasPBC-<parcel_number>. - Sets closed-sale status fields (
status='C',calculated_status='C'). - Filters likely transfer deeds (
sale_price < $10,000). - Reconciles
final_subdivisionvia matchingpcn_10_digitwhere possible.
- Maps fields into
Two layers exist:
- CSV-level (
PalmBeachProrpertyScraper.py): combines cabana rows with main property rows using sameSale Date+Owner Name. - DB-level (
merge_cabanas.py): merges likely condo+cabana pairs inlisting_detailswhen:- same building,
- within 7 days,
- and pricing pattern suggests shared transaction.
Current menu options:
- Run MLS data processing (
generate_db.py) - Launch Streamlit dashboard (
streamlit run app.py) - Update subdivisions from lookup sheets
- Reset database and restore archived input files
- Off-market pull (automation):
Palm Beach+ start from last imported PBC date - Off-market pull (custom): custom city + optional date range
- Exit
- Install dependencies:
pip install -r requirements.txt- Ensure Streamlit/Selenium/browser driver are available in your environment.
- Prepare MLS inputs:
- Add MLS CSV exports to
input_csvs/.
- Add MLS CSV exports to
- Build/update database:
python3 generate_db.py
- Launch dashboard:
streamlit run app.py- or
python3 main.pyand use menu options.
- Interactive scraper:
python3 PalmBeachProrpertyScraper.py
- Automation mode:
python3 PalmBeachProrpertyScraper.py --city "Palm Beach" --from-last-imported
- Custom date range:
python3 PalmBeachProrpertyScraper.py --city "Palm Beach" --start-date 01/01/2024 --end-date 12/31/2024
- Import scraped CSV:
python3 pbc_importer.py /path/to/ENHANCED_file.csv --dry-runpython3 pbc_importer.py /path/to/ENHANCED_file.csv
- Install deps:
pip install -r requirements.txt
- Run securely with env vars (recommended):
export MLS_EMAIL='your_email_here'export MLS_PASSWORD='your_password_here'python3 mls_auto_login.py
- Optional flags:
python3 mls_auto_login.py --headlesspython3 mls_auto_login.py --timeout 45 --stay-open-seconds 20
- On login failure, screenshot is saved to:
tmp/mls_login_error.png
- For Codex automations / unattended runs:
- place credentials in a repo-local
.envfile because the automation runner may not inherit your shell exports - supported keys:
MLS_EMAIL=...andMLS_PASSWORD=... .envis already ignored by git
- place credentials in a repo-local
- Script:
python3 mls_export_saved_search.py --search-name "PalmBeach_Wellington_NewData"
- Login-only debug:
python3 mls_export_saved_search.py --login-only --debug-dir "tmp/mls_debug"
- Update status date filters before export:
python3 mls_export_saved_search.py --search-name "PalmBeach_Wellington_NewData" --from-date "02/01/2026"- Applies value to:
from_4_2,from_4_4,from_4_5,from_4_6,from_4_7,from_4_8
- End-to-end: export -> ingest -> DB merge:
python3 mls_export_saved_search.py --search-name "PalmBeach_Wellington_NewData" --from-date "02/05/2026" --run-generate-db- This copies the downloaded CSV into
input_csvs/and runsgenerate_db.py - Prints run summary:
start_date, downloaded file, DB row count before/after, and delta
- Default download output:
output/mls_exports/
- Non-interactive runner:
python3 scripts/ops/run_mls_auto_update.py --headless
- Wrapper script used by macOS LaunchAgent:
scripts/ops/run_mls_auto_update.sh
- LaunchAgent template:
scripts/ops/com.brookesnader.restats.mls-auto-update.plist
- Cloud sync runner:
python3 scripts/ops/sync_sqlite_to_supabase.py
- Important macOS note:
- the current SSD repo path is
/Volumes/ExternalSSD/projects/restats-analytics - the checked-in LaunchAgent template now runs the SSD wrapper script directly
- the checked-in LaunchAgent template is scheduled for
9:15 PMlocal time and does not force an immediate run when installed
- the current SSD repo path is
- The auto-update runner now follows a two-step flow by default:
- refresh local SQLite from MLS
- sync local
listing_detailsinto Supabase so the live API stays aligned
- The cloud sync now prunes cloud-only rows by default so valid local/cloud row counts stay matched.
- The quick-search importer now uses:
python3 generate_db.py --db-name mls.db --skip-archive <csv paths...>- This avoids relying on an in-process
data_cleaningimport during the Selenium/export job.
- Full example:
export MLS_EMAIL='your_email_here'export MLS_PASSWORD='your_password_here'python3 mls_export_saved_search.py --search-name "PalmBeach_Wellington_NewData" --download-dir "output/mls_exports"
- Verbose debug run (action log + step captures):
python3 mls_export_saved_search.py --search-name "PalmBeach_Wellington_NewData" --debug-dir "tmp/mls_debug"- Run log defaults to
logs/mls_export_<timestamp>.log(override with--log-file) - Browser console/network logs are also saved to:
logs/mls_export_<timestamp>_browser.jsonllogs/mls_export_<timestamp>_performance.jsonl
- On export failure, screenshot is saved to:
tmp/mls_export_error.png
- Lookup CSV files under
lookups/are part of normalization quality. mls.dband raw CSV files are excluded by.gitignore.requirements.txtincludes API-related packages, but this repo currently operates primarily through scripts + Streamlit UI.- Operational scripts are organized under
scripts/(audits/,maintenance/,ops/). - Root script names are backward-compatible wrappers.
- CMA engine files live under
cma_module/and are consumed byPOST /api/cma/run.
- Duplicate audit:
python3 scripts/audits/audit_duplicates.py --window-days 7 --sample-size 20- Optional CSV report:
python3 scripts/audits/audit_duplicates.py --export-report - Save latest JSON summary (used by React Run Status panel):
python3 scripts/audits/audit_duplicates.py --window-days 7 --sample-size 20 --json-path output/audits/latest_audit_summary.json
- Board-overlap cleanup (keep
RX-, remove non-RXduplicates for same sale):- Dry run:
python3 scripts/maintenance/clean_rx_board_duplicates.py --window-days 7 - Apply:
python3 scripts/maintenance/clean_rx_board_duplicates.py --window-days 7 --apply
- Dry run:
- Legacy PBC key migration (one-time):
- Dry run:
python3 scripts/maintenance/migrate_pbc_listing_numbers.py --dry-run - Apply:
python3 scripts/maintenance/migrate_pbc_listing_numbers.py --apply
- Dry run:
- Organize generated artifacts (root CSVs, old logs, old tmp debug files):
- Dry run:
python3 scripts/maintenance/cleanup_project_artifacts.py --dry-run - Apply:
python3 scripts/maintenance/cleanup_project_artifacts.py --days-old 3
- Dry run:
- Canonical DB values are now:
Single Family HomeCondo/TH/Other
- One-time backfill (safe to re-run):
python3 scripts/maintenance/normalize_property_types.py
- Audit/fix imported PBC Palm Beach geo zones (South End consistency):
- Dry run:
python3 scripts/audits/audit_fix_pbc_geo_zones.py - Apply fixes:
python3 scripts/audits/audit_fix_pbc_geo_zones.py --apply
- Dry run:
- Latest report can be written with:
python3 scripts/audits/audit_fix_pbc_geo_zones.py --apply --report-path output/audits/pbc_geo_zone_audit_latest.csv
Compare legacy analytics metrics with new API summary for period/filter parity:
- Quarterly:
python3 scripts/ops/parity_check_dashboard.py --mode quarterly --year 2025 --quarter 4 --property-group ALL
- Monthly:
python3 scripts/ops/parity_check_dashboard.py --mode monthly --year 2026 --month 1 --property-group SINGLE_FAMILY
- Annual:
python3 scripts/ops/parity_check_dashboard.py --mode annual --year 2025 --property-group TOWNHOME_CONDO
API parity endpoint:
GET /api/ops/parity?mode=monthly&year=2026&month=1&city=Palm%20Beach- Supports
mode=monthly|quarterly|annualand the same filter params as report-summary.
Use one entrypoint for day-to-day ops:
- Ingest pipeline:
python3 -m restats ingest - Duplicate audit:
python3 -m restats audit -- --window-days 7 --sample-size 20 - Parity report:
python3 -m restats report -- --mode monthly --year 2026 --month 1 - Guardrails:
python3 -m restats guardrails