Pin-To-Place: Project Plan

Location & Positional Accuracy for Overture Maps

Team: Aaron J. Yam & Shivani Belambe | Project Terraforma @ UCSC

Current Findings

Pin-To-Place has evolved from a single-coordinate positional accuracy study into a task-aware place pin evaluation framework.

After generating ground truth for all 3,425 Overture place records, the baseline median offset was already 0.0m. This makes median offset reduction a poor primary success metric. The project now evaluates whether a pin supports the real-world arrival task, using p90/p95 error, regression risk, arrival cost, ambiguity, and manual review outcomes.

Baseline Findings

Total records: 3,425
Baseline median offset: 0.0m
Mean offset: 6.4m
p90 offset: 37.55m
p95 offset: 40.27m
Exact no-move rate: 79.6%
Places marked as likely movable: 675

Task-Aware Findings

Task-aware evaluation adds place_complexity, pin_ambiguity, should_move, and arrival_cost_m.

Overall arrival-cost p95: 42.61m
Open-space median arrival cost: 25.0m
Open-space p95 arrival cost: 65.44m
Complex-place median arrival cost: 19.62m
Complex-place p95 arrival cost: 63.98m

Open spaces and complex places are where single-coordinate pinning struggles most.

Manual Review Pilot

A 118-row manual review pilot was built from high-offset, low-confidence, multi-tenant, and zero-offset control examples.

privacy_sensitive: 38
accepted: 31
wrong_target: 28
ambiguous: 21
should_move = true: 41
manual_needs_multi_pin = true: 21

Key pilot rates:

34.7% should move
17.8% need multiple pins
32.2% are privacy-sensitive

Main Insight

A single place pin is not always sufficient. Some places should move to a more useful entrance or access point, some should not move, some should be treated conservatively for privacy, and some need multiple task-specific pins.

Multi-Pin Proxy Review Findings

A 21-row multi-pin pilot was created from places marked manual_needs_multi_pin = true.

The automated proxy review found:

13 rows with pedestrian-entry proxy labels
14 rows with vehicle-entry proxy labels
6 rows with both pedestrian and vehicle proxy labels
16 rows accepted by proxy review
5 rows still requiring human visual review

For rows with both pedestrian and vehicle proxy labels, the distance between arrival targets was substantial:

mean pedestrian/vehicle separation: 44.3m
median separation: 39.6m
max separation: 88.6m

This supports the task-aware pinning thesis: for some places, pedestrian and vehicle arrival targets are meaningfully different, so one coordinate may not serve all navigation tasks.

These results are based on proxy labels derived from existing LLM ground truth and current pin positions. They should be treated as workflow validation, not final visual ground truth. The next validation step is human review of the five high-priority rows.

1. Problem Statement

Overture Maps releases a new global map every month, but place pins may not consistently represent the most useful real-world location of a place. A pin could be offset from the building centroid, storefront entrance, rooftop center, or other operationally useful point. Because the "correct" target itself is ambiguous, the project must first define the standard before it can measure error and improve placement quality.

2. Objectives

Establish a practical and defensible definition of a correct pin location
Build a ground-truth dataset of 500-1,000 labeled places from the provided ~3,425-place sample
Measure current spatial offset in Overture place data
Prototype modeling approaches to reposition place pins more accurately
Produce a recommendation for future production work

3. Dataset Overview

Property	Value
Records	3,425 Overture places
Geography	US-only, all 50 states (top: CA, TX, FL, NY, NC)
Columns	id, geometry (WKB point), bbox, type, version, sources, names, categories, confidence, websites, socials, emails, phones, brand, addresses
Categories	100+ types — top: hotel (285), professional_services (181), accommodation (86), campground (74), lawyer (65), fast_food_restaurant (59)
Confidence	Mean 0.86, range 0.20-1.00
Primary source	Meta/Facebook

4. Approach

Phase 1: Data Ingestion, Cleaning & EDA

Load parquet data, decode WKB point geometries to lat/lon
Flatten nested fields (names.primary, categories.primary, addresses)
Identify and flag near-duplicates (same name within 50m, same address with different coordinates)
Exploratory analysis: geographic distribution, category breakdown, confidence histogram, source analysis, null patterns
Classify places by density tier (urban / suburban / rural)

Output: notebooks/01_eda.ipynb, src/data_loader.py

Phase 2: Pin Location Definition & Taxonomy

Evaluate candidate pin definitions informed by EDA findings:

Definition	Description	Pros	Cons
Building Centroid	Geometric center of building footprint	Consistent, automatable, widely available	May land in courtyard/interior; useless for strip malls
Rooftop Centroid	Center of rooftop footprint	Satellite-visible	Same issues as building centroid
Main Entrance / Storefront	Street-facing entrance	Most useful for navigation	Hard to label at scale; ambiguous for multi-entrance
Nearest Road-Facing Point	Closest point on footprint to nearest road	Good entrance proxy, automatable	May pick back alley, not main entrance
Parcel Centroid	Center of land parcel	Consistent	Often far from building; data inconsistent
Address Geocode	Lat/lon from geocoding street address	Universally available	Varies by geocoder quality; often road centerline

Recommended approach: Category-aware hierarchical definition:

Standard commercial places -> building centroid with road-facing refinement
Multi-tenant (strip malls, office suites) -> storefront entrance (LLM-assisted)
Open spaces (parks, campgrounds) -> parcel/area centroid
Fallback -> best available geocode

Output: docs/pin_definition_taxonomy.md

Phase 3: Ground Truth Construction (500-1,000 places)

Strategy: Use vision-capable LLMs (GPT-4o / Claude) to examine satellite imagery and identify correct pin locations at scale.

Pipeline:

Stratified sampling — select 750-1,000 places stratified by region, category, and density tier
Fetch satellite imagery tiles — via Google Maps Static API or Mapbox (~250m x 250m tiles at high zoom)
LLM vision annotation — prompt a vision LLM with:
- Satellite image tile with current pin marked
- Place name, address, category
- Instructions to identify building entrance, storefront, or most appropriate location
- Return coordinates as pixel offset from center
Convert pixel offsets to lat/lon using tile geographic bounds
Confidence scoring — LLM provides confidence; low-confidence places flagged for human review
Cross-validation — for ~100 places, compare LLM annotations against multi-geocoder consensus (Nominatim + Google + Mapbox)
Inter-annotator agreement — run 50 places through LLM twice (or two different LLMs) to measure consistency

Output: data/processed/ground_truth.parquet, notebooks/02_ground_truth.ipynb

Phase 4: Baseline Offset Measurement

Compute the distance between current Overture pins and ground-truth locations:

Distance metrics: Haversine distance (meters) — mean, median, p90, p95
Threshold accuracy: % within 10m, 25m, 50m, 100m, 250m
Regression rate: % of places where repositioning moves pin further from ground truth
Segmentation: by category, region, urban/suburban/rural, confidence score bucket

Output: notebooks/03_baseline_offset.ipynb, src/metrics.py

Phase 5: Prototype Repositioning Methods

Method 1: Multi-Geocoder Ensemble

Query 3+ geocoding services (Nominatim, Google Maps Geocoding API, Mapbox) for each place's address
Compute consensus position: weighted average or median
Accept only when services agree within configurable radius (e.g., 25m)

Method 2: ML Candidate Ranking (XGBoost / Random Forest)

Generate candidate positions: geocoded positions from multiple services, OSM building centroids, nearest-road-facing point, current pin
Extract features per candidate:
- Distance from current pin / geocoded consensus
- Building area, perimeter, count in vicinity
- Category encoding (one-hot)
- Overture confidence score, source count
- Road proximity, whether candidate is road-facing point
Train ranking model on ground-truth dataset (80/20 stratified split)

Method 3: LLM-Augmented Contextual Reasoning

Prompt LLM with place metadata + satellite imagery + building footprints + road network
LLM reasons about building layout, entrance locations, and place type
Returns repositioned lat/lon with explanation
Most valuable for hard cases: multi-tenant buildings, strip malls, complexes

Output: notebooks/04_repositioning.ipynb, src/geocoder_ensemble.py, src/candidate_ranker.py, src/llm_repositioner.py

Phase 6: Evaluation & Recommendation

Comparison across all methods:

Same metrics as Phase 4 (mean/median/p90/p95, threshold accuracy)
Regression rate per method
Per-category and per-region performance
Improvement over baseline (absolute and relative)
Failure case analysis with examples
Cost analysis (API calls, compute time)

Success criteria:

At least one method reduces median offset by >= 30% vs. current baseline
Regression rate below 10%
Works across >= 80% of category types

Output: notebooks/05_evaluation.ipynb, docs/recommendation.md

5. Project Structure

Pin-To-Place/
├── data/
│   ├── raw/                        # Original parquet (3,425 places)
│   └── processed/                  # Ground truth, features, results
├── src/
│   ├── data_loader.py              # Data loading & WKB parsing
│   ├── geocoder.py                 # Multi-source geocoding
│   ├── geocoder_ensemble.py        # Method 1: geocoder ensemble
│   ├── satellite_fetcher.py        # Fetch satellite imagery tiles
│   ├── llm_annotator.py            # LLM vision ground truth annotation
│   ├── ground_truth.py             # Ground truth orchestration
│   ├── metrics.py                  # Offset metrics & evaluation
│   ├── features.py                 # Feature extraction for ML
│   ├── candidate_ranker.py         # Method 2: ML candidate ranking
│   └── llm_repositioner.py         # Method 3: LLM reasoning
├── notebooks/
│   ├── 01_eda.ipynb                # Exploratory data analysis
│   ├── 02_ground_truth.ipynb       # Ground truth construction
│   ├── 03_baseline_offset.ipynb    # Baseline offset measurement
│   ├── 04_repositioning.ipynb      # Repositioning prototypes
│   └── 05_evaluation.ipynb         # Final evaluation & comparison
├── docs/
│   ├── project_plan.md             # This file
│   ├── pin_definition_taxonomy.md  # Pin location definitions
│   └── recommendation.md           # Final recommendation memo
├── requirements.txt
└── README.md

6. Key Dependencies & APIs

Dependency	Purpose	Cost
Google Maps Static API	Satellite imagery tiles	~$2 / 1,000 requests
Google Maps Geocoding API	High-quality geocoding	~$5 / 1,000 requests
Mapbox Geocoding	Second geocoder source	Free tier: 100k/month
Nominatim (OSM)	Free geocoder	Free, 1 req/sec limit
OpenAI API (GPT-4o) / Anthropic API (Claude)	LLM vision for ground truth + repositioning	Variable
OSM / osmnx	Building footprints, road network	Free

Python packages: pandas, pyarrow, geopandas, shapely, geopy, scikit-learn, xgboost, matplotlib, folium, osmnx, openai, anthropic

7. Risks & Mitigations

Risk	Mitigation
No universal definition of correct pin	Category-specific definitions with hierarchy of preferred targets
LLM vision annotations may be inconsistent	Inter-annotator agreement checks; multi-geocoder cross-validation; confidence scoring
Feature coverage varies by region	Design minimal-geometry baseline; layer richer features where available
Average error improves but outliers remain	Track p90/p95 and regression rate, not just mean
API costs exceed budget	Start with 50-place subset; estimate full-scale costs before scaling
OSM building footprint gaps in rural areas	Report coverage stats per region; fall back to geocode-based methods

8. Verification Plan

Data loading — verify all 3,425 records load with valid US coordinates
Deduplication — check for and report near-duplicate counts
Satellite tiles — visually verify 10 fetched tiles show correct area
LLM ground truth — compare LLM annotations vs. multi-geocoder consensus on 100 places
Inter-annotator — run 50 places through LLM twice, report median disagreement
Metrics — unit test haversine against known distances
Repositioning — verify each method reduces median offset vs. baseline
End-to-end — run full pipeline on 50-place subset first, then scale

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.claude		.claude
data		data
docs		docs
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pin-To-Place: Project Plan

Location & Positional Accuracy for Overture Maps

Current Findings

Baseline Findings

Task-Aware Findings

Manual Review Pilot

Main Insight

Multi-Pin Proxy Review Findings

1. Problem Statement

2. Objectives

3. Dataset Overview

4. Approach

Phase 1: Data Ingestion, Cleaning & EDA

Phase 2: Pin Location Definition & Taxonomy

Phase 3: Ground Truth Construction (500-1,000 places)

Phase 4: Baseline Offset Measurement

Phase 5: Prototype Repositioning Methods

Method 1: Multi-Geocoder Ensemble

Method 2: ML Candidate Ranking (XGBoost / Random Forest)

Method 3: LLM-Augmented Contextual Reasoning

Phase 6: Evaluation & Recommendation

5. Project Structure

6. Key Dependencies & APIs

7. Risks & Mitigations

8. Verification Plan

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pin-To-Place: Project Plan

Location & Positional Accuracy for Overture Maps

Current Findings

Baseline Findings

Task-Aware Findings

Manual Review Pilot

Main Insight

Multi-Pin Proxy Review Findings

1. Problem Statement

2. Objectives

3. Dataset Overview

4. Approach

Phase 1: Data Ingestion, Cleaning & EDA

Phase 2: Pin Location Definition & Taxonomy

Phase 3: Ground Truth Construction (500-1,000 places)

Phase 4: Baseline Offset Measurement

Phase 5: Prototype Repositioning Methods

Method 1: Multi-Geocoder Ensemble

Method 2: ML Candidate Ranking (XGBoost / Random Forest)

Method 3: LLM-Augmented Contextual Reasoning

Phase 6: Evaluation & Recommendation

5. Project Structure

6. Key Dependencies & APIs

7. Risks & Mitigations

8. Verification Plan

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages