All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
/root endpoint returns service metadata and pointers to/openapi.json,/docs,/redoc,/health, and example/lookupand/patternURLs. Replaces the previous{"detail":"Not Found"}response on the bare hostname. Markedinclude_in_schema=Falseso it doesn't clutter the OpenAPI document.- Persistent-volume support via a new
docker-entrypoint.sh: container starts as root,chown appuser:appuser /app/data(idempotent — no-op on warm starts), thenexec gosu appuser "$@"to drop privileges before uvicorn starts.Dockerfileinstallsgosuand replacesUSER appuserwithENTRYPOINT. Lets a freshly-provisioned platform persistent volume (initially root-owned) be mounted at/app/datawithout breaking the SQLite cache build. Cold-start cache survives pod recreates and redeploys; subsequent restarts skip the GISCO TERCET re-download until the configured TTL expires. - Provider-agnostic deployment: new
compose.yamlat the repo root demonstrates the canonical multi-worker production pattern (api + redis sidecar + persistent volume + multi-worker env vars) in a way that runs unmodified anywhere Docker Compose is supported and translates 1:1 to Kubernetes pods, ECS task definitions, or any orchestrator with multi-container semantics. Newcompose-up/compose-down/compose-logsMakefile targets. README "Docker deployment" section rewritten to point at it and to call out the swap-out points for switching providers. - Periodic refresh of
tercet_missing_codes.csv(#44): whenPC2NUTS_ESTIMATES_REFRESH_URLis set, a per-worker asyncio task fetches the URL on everyPC2NUTS_ESTIMATES_REFRESH_INTERVAL_SECONDStick (default 24 h), parses the body, and full-replaces the in-memory estimates table if the content has changed and passes a 50 %-of-current sanity guard. Workers also do a synchronous bootstrap fetch before reporting ready, so a fresh pod immediately reflects upstream rather than waiting up to one interval. NewPOST /admin/refresh-estimatesendpoint (trusted-token auth) lets operators force a refresh without waiting. New/healthfieldestimates_refresh_stale: bool | None. Defaults preserve the current single-source-of-truth behaviour from the bundledtercet_missing_codes.csv.
- uvicorn now runs with
--proxy-headers --forwarded-allow-ips '*'in the Dockerfile CMD, soX-Forwarded-Proto,X-Forwarded-For, andX-Forwarded-Hostare honoured for any TLS-terminating proxy in front of the service (CDN, K8s ingress, nginx, Cloudflare). Concretely, the new/route's link URLs now returnhttps://when behind a TLS proxy, and rate-limit per-IP keying correctly identifies the real client IP rather than the proxy's. docker-entrypoint.shis now safe to launch as a non-root user. When started with--user appuser(or any non-root UID), the entrypoint skips the chown branch and justexecs the CMD as the current user — operators who pre-prepared/app/dataownership get the same behaviour as a fresh root start.
- Performance re-baseline under multi-worker (#68):
docs/performance.mdupdated with the post-#68 numbers and a new rate-limit shared-storage verification subsection. Realistic-corpus knee at 35-40 RPS (vs ~30 single-worker), hot-key plateau at ~50 RPS, p99 at the old knee dropped from 4.5 s to 150 ms. Recommended operating point unchanged at 27 RPS — the win is headroom, not the operating point itself. The Redis sidecar shared-storage path is verified end-to-end: 130 anonymous requests against the published120/minutecap produced exactly 120 ×200+ 10 ×429, ruling out per-worker counter divergence.
- Concurrency: refreshes now serialised (#44 follow-up): added a module-level
asyncio.Lockaroundrefresh_estimates_once. Without it, two overlapping calls (the periodic task and the admin endpoint) could resolve their fetches in non-monotonic order and overwrite newer state with older content. Codex flagged the race on the original PR (#72); fix is internal, no API change. scripts/perf_test.shrun_warm: indexing the vegeta target file by raw line number landed on a blank line half the time, crashing the script underset -e. Now extracts only the GET URLs into an array first.__version__was stale at0.14.0since the v0.14 release; openapi.json and FastAPI'sversionfield have been reporting the wrong number for every release since then. Bumped to0.18.0. Future releases need to updateapp/__init__.pyalongside the CHANGELOG until version derivation is automated.
- Multi-worker deployment (#68): set
PC2NUTS_WORKERSto launch N uvicorn worker processes. Multi-worker mode requiresPC2NUTS_RATE_LIMIT_STORAGE_URI(e.g. a Redis URL) so the published per-IP rate limit stays accurate across workers; the service refuses to start otherwise. Transient backend unavailability is tolerated via slowapi'sin_memory_fallback_enabled— falls back to per-process in-memory rate limiting and re-probes with exponential backoff, with one WARNING log per outage and one INFO log on recovery.
- TokenDB wire protocol (#61): the v0.17.0 client assumed a generic
POST /querybody shape; the actual deployment target speaks libsql/Hrana v2 (POST /v2/pipelinewith statements wrapped as{requests: [{type: "execute", stmt: {sql, args}}]}and rows returned as arrays of typed value objects).TokenDB.executenow speaks Hrana correctly, automatically rewriteslibsql://URLs tohttps://, and accepts a Bearer auth token via the newPC2NUTS_TOKEN_DB_AUTH_TOKENenv var (and matching--auth-tokenCLI flag). Verified end-to-end against a real database instance.
- DB-backed trusted tokens (#61): trusted-token storage moved from
PC2NUTS_TRUSTED_TOKENSenv var to a managed SQLite-compatible HTTP database. New env vars:PC2NUTS_TOKEN_DB_URL(connection string),PC2NUTS_TOKEN_REFRESH_SECONDS(default60). Tokens are issued viapython -m scripts.tokens add --label "..."and take effect within ~60 s — no container restart required. The env var continues to work as a union with the DB and serves as a disaster-recovery fallback when the DB is unreachable. New/healthfieldtoken_db_staleflags refresh failures. scripts/tokens.pyoperator CLI with subcommandsinit,add,list,revoke.add --value <existing-token>lets operators migrate v1 env-var tokens while preserving their audittoken_id.
- Auth-token bypass (#60): trusted callers can bypass the per-IP rate limit by presenting
Authorization: Bearer <token>. Tokens are managed via the newPC2NUTS_TRUSTED_TOKENScomma-separated env var. Invalid tokens return401; malformedAuthorizationheaders return400. Audit lines log a non-reversible 8-char SHA-256 prefix only — token values never appear in logs. See README "Authentication & rate-limit bypass" for the operator runbook.
- Montenegro (ME) support (#53): postal-code lookups for Montenegro return
ME000/ME00/ME0via the existing single-NUTS3 fallback (Tier 5). Eurostat treats Montenegro as a single nationwide unit at every NUTS level, and GISCO publishes no TERCET file for it; ME is therefore served entirely from the newsingle_nuts3_fallbackmap inapp/settings.json(no external data download). Pattern: 5 digits starting with8, optionalME-/MEprefix accepted. single_nuts3_fallbacksettings field: data-driven seed for the Tier 5 single-NUTS3 set, allowing countries with no GISCO TERCET coverage but a single nationwide NUTS3 unit to be added via configuration alone. Auto-detected single-NUTS3 entries derived from real data take precedence on conflict.
patterns_versionbumped to 1.1 (additive change — new ME entry, no existing pattern altered).get_loaded_countries()now includes countries served only via the single-NUTS3 fallback, so/lookupaccepts them without a 400.
- Automated test suite (#25): 69 pytest tests covering
postal_patterns.py(preprocessing, tercet_map, extraction),data_loader.py(normalize functions, all 5 lookup tiers), and FastAPI endpoints (/lookup,/pattern,/health). CI now runs tests before publish. - Makefile (#24): standard targets for
lint,format,test,run,docker-build,docker-run. - Pre-commit hooks (#24): ruff lint + format via
.pre-commit-config.yaml. requirements-dev.txt(#22): dev/test dependencies (ruff, bandit, pip-audit, pytest).ruff formatCI check (#24): enforces consistent code formatting in CI.
- Centralized duplicated logic (#22):
normalize_country()replaces duplicate GR→EL blocks,_db_connection()context manager replaces 6 manual SQLite connect/close patterns,_build_result()helper replaces repetitive result dict construction across all lookup tiers. - Narrowed exception handling (#23): 9 bare
except Exceptionblocks indata_loader.pyreplaced with specific types (sqlite3.Error,httpx.RequestError,OSError, etc.). Silent catch inimport_estimates.pynow logs a message. - Return type hints added to
dispatch()and_rate_limit_handler()inmain.py.
- MT regex (#14): separator between alpha prefix and digits is now optional (
MST1000accepted alongsideMST 1000andMST-1000). Previously, codes without a space failed regex extraction and fell to approximate matching with lower confidence.
- Country-level majority-vote fallback: new Tier 4 in the lookup chain for countries where all postal codes map to the same NUTS1/NUTS2 but NUTS3 has a dominant winner. Returns
match_type: "approximate"with NUTS1/NUTS2 confidence 1.0 and NUTS3 confidence based on agreement ratio (capped at 0.80). Naturally captures MT (MT0/MT00/MT001 at ~77%). Digit-only MT codes like1043that previously returned 404 now get a valid approximate result.
- FR CEDEX estimates (#8): ~511 French CEDEX postal codes (enterprise/university mail routing) added to
tercet_missing_codes.csvwith high-confidence département→NUTS3 mappings. - FR DOM-TOM estimates (#9): 15 French overseas territory postal codes (Guadeloupe, Martinique, Guyane, La Réunion, Mayotte) added with high-confidence mappings. French Polynesia (987xx) and New Caledonia (988xx) excluded — these are OCTs with no valid NUTS mapping.
- NL missing code estimates (#13): 8 Dutch postal codes for major cities (Amsterdam, The Hague, Utrecht, Maastricht, Arnhem, Apeldoorn, Zwolle) added with high-confidence mappings. Willemstad (3059) excluded — belongs to Curaçao, not the Netherlands.
- Preprocessing order: dot thousand-separator removal now runs before
.0stripping, so locale-formatted codes like13.000correctly become13000instead of13. - IE regex (#10): space between Eircode routing key and identifier is now optional (
D02X285accepted alongsideD02 X285). - PT regex (#12): space is now accepted as a separator between digit groups (
1000 001alongside1000-001and1000001).
- #11 (NO lowercase prefix): already handled — all regexes are compiled with
re.IGNORECASEand input is uppercased before matching. Closed as resolved.
- Input preprocessing for postal codes mangled by Excel, CSV exports, or database dumps. Three country-agnostic steps are applied before regex matching:
- Strip trailing
.0— Excel float coercion (28040.0→28040) - Remove dot thousand-separators — (
13.600→13600) - Restore leading zeros — using per-country
expected_digitsmetadata (8461→08461for ES)
- Strip trailing
expected_digitsfield inpostal_patterns.jsonfor 30 countries with fixed-length all-numeric postal codes. Countries with non-numeric formats (IE, MT, NL) are excluded.
- Backward compatible: preprocessing is transparent — correctly formatted postal codes are passed through unchanged. No regex patterns were modified.
- Closes #16 (generic preprocessing for Excel artifacts and postal code mangling). Also subsumes #15 (ES-specific fixes).
- NUTS region names in
/lookupresponses:nuts1_name,nuts2_name,nuts3_namefields provide human-readable region names (Latin script) alongside NUTS codes. Names are sourced from the GISCO NUTS CSV distribution. total_nuts_namesfield in/healthendpoint showing how many region names are loaded.- NUTS names are cached in the SQLite DB (
nuts_namestable) for fast restarts.
- Backward compatible: name fields default to
nullwhen names are unavailable. Existing clients that ignore unknown fields are unaffected. - Graceful degradation: if the NUTS names CSV cannot be downloaded, all name fields are
nullbut lookups continue to work normally. Pre-0.9.0 SQLite caches (without thenuts_namestable) remain fully valid.
Prior changes were not tracked in this changelog.