A throwaway Metabase (Enterprise, workspace-dev-instance-poc branch)
instance with a deliberately messy sample database, for poking at the
mb-cli by hand.
Two Metabase instances (prod + dev) on the same from-source image and the same
warehouse data, each with its own application database (see
docker-compose.yml):
| Service | What it is | Host port |
|---|---|---|
metabase |
prod instance — EE build of the branch | 3000 |
metabase-dev |
dev instance — same image + same data | 3001 |
warehouse-db |
Postgres — the dirty sample data (shared) | 5433 |
app-db |
Postgres — prod app state | (internal) |
app-db-dev |
Postgres — dev app state | (internal) |
Build the image first (from source — there's no prebuilt image for a feature branch; ~20–40 min the first time, cached after):
cd metabase-demo
./build-branch.sh # clones the branch source + docker build (EE)Prod (port 3000):
cp .env.example .env # optional — only to change ports / add an EE token
docker compose up -d # boots prod on the built image
./metabase/bootstrap.sh # creates the admin user + connects the warehouseDev (port 3001) — one command:
./dev-up.sh # spins up + initialises the dev instanceBoth open with the same login admin@example.com / metabase123:
http://localhost:3000 (prod) · http://localhost:3001 (dev).
Point the CLI at either:
mb auth login --url http://localhost:3000 # prod
mb auth login --url http://localhost:3001 # devTear down (--profile dev so the dev services are included):
docker compose --profile dev down # stop, keep data
docker compose --profile dev down -v # stop and wipe everything (re-seeds on next up)Both instances run the identical from-source image and connect to the same
warehouse-db, so they see the same dirty data and the same admin@example.com
login — but they keep separate application databases (app-db / app-db-dev),
because two Metabase instances cannot share one app-db. Questions, models, and
transforms you create in one are invisible to the other; the underlying warehouse
tables are shared. The dev instance is gated behind the compose dev profile, so
a plain docker compose up -d only touches prod; ./dev-up.sh (or
docker compose --profile dev up -d) brings up dev.
Remote Sync is the EE feature that backs a workspace with a git
repository. A workspace's content — collections, transforms, and data apps
— is serialized into the repo (data apps live under data_apps/<slug>/, the rest
as synced collection representations), so the workspace can be versioned, shared,
and restored from git rather than living only inside one instance's app-db.
How it relates to the two instances here: each Metabase instance keeps its
own app-db (see Two instances), so content does not flow
between prod and dev directly. Git is the bridge. One instance points at the repo
in read-write mode and pushes its workspace there; the other points at the
same repo in read-only mode and imports from it (optionally on a timer).
That's how you move a data app from the instance you built it on to a clean one.
The data app under data_apps/ is built from the repo and served by Metabase, so
whatever is committed is what gets rendered.
In this sandbox only the prod metabase service is wired to a repo. The
compose file bind-mounts a bare git repo into the container at a path
identical to its host path:
# docker-compose.yml → metabase.volumes
- /Users/.../.prode/prode-sync.git:/Users/.../.prode/prode-sync.gitThe host==container path is deliberate: Remote Sync addresses a local repo with a
file:// URL, and mounting it at the same path means that single URL resolves
identically inside the container and on your machine (clone it, commit, inspect it
from the host without translating paths). Adjust this path to wherever your bare
repo lives — it is currently hardcoded to one machine. Create one with
git init --bare /path/to/sync.git.
Remote Sync URLs must be file://, http://, or https://. Configure it either
in Admin → Settings → Remote Sync after boot, or up front via env vars in
.env (all are env-readable; standard MB_ + UPPER_SNAKE naming):
| Env var | Setting | Notes |
|---|---|---|
MB_REMOTE_SYNC_URL |
remote-sync-url |
repo location — file:///Users/.../prode-sync.git here |
MB_REMOTE_SYNC_BRANCH |
remote-sync-branch |
branch to sync, e.g. main |
MB_REMOTE_SYNC_TYPE |
remote-sync-type |
read-write (push) or read-only (import); default read-only |
MB_REMOTE_SYNC_AUTO_IMPORT |
remote-sync-auto-import |
read-only only — pull new commits automatically |
MB_REMOTE_SYNC_TOKEN |
remote-sync-token |
bearer token for an https:// repo; omit for file:// |
Premium-token-gated like other EE features — set MB_PREMIUM_EMBEDDING_TOKEN
(see Enterprise features) or the Remote Sync UI stays
locked. A typical flow: point prod at the repo read-write, build/commit the
data app, then point a fresh instance (e.g. dev — add the same volume mount and
vars to metabase-dev in docker-compose.yml) at it read-only to import.
One warehouse database, two Postgres schemas representing two different business domains that describe the same people in totally different shapes — exactly the kind of raw connector output the CLI's data-cleaning skills are built to flatten and reconcile.
evt_events— events; codedcategory_code, a JSONvenueblob, mixed date formats, mixed boolean spellings, a soft-deleted (_deleted) tombstone row.evt_categories— decode table forcategory_code.evt_people— attendees; one messyfull_namecolumn, junk null placeholders (N/A,NULL,-, ``) in phone/email.evt_registrations— person ↔ event, with dirtystatusvariants (going/GOING/Going/cancelled) and a mixedchecked_inflag.evt_custom_fields/evt_custom_responses— coded custom fields (cf_1021→ "Dietary restrictions") with single/multi answers as JSON text.
crm_contacts— the same humans asevt_people, but splitfirst/last, numeric ids, a codedlifecycle_code, andemail_addressas the only (dirty, case-mismatched) bridge back to events. Includes a duplicate row and soft-deleted (is_deleted = 1) tombstones.crm_lifecycle_stages— decode table forlifecycle_code.crm_companies— the companies contacts belong to.
Coded columns needing decode tables · JSON stuffed into text columns · mixed
date formats · mixed boolean spellings (Y/N/true/1/yes) · junk null
placeholders · inconsistent casing & whitespace · soft-delete tombstones · a
duplicate contact under one email · no declared foreign keys (relationships
must be inferred) · the same people modelled two different ways across domains.
Edit warehouse/init/*.sql and docker compose down -v && docker compose up -d
to reseed.
Runs a from-source EE build of the workspace-dev-instance-poc branch.
There's no prebuilt image for a feature branch, so build-branch.sh clones the
branch into a self-contained source tree (with its .git — the frontend build
runs git rev-parse, so a worktree pointer or a git archive export won't do)
under ../../master/, reusing the local clone's objects (fast, offline) and
leaving the master checkout untouched, then docker builds it via the repo's
Dockerfile with MB_EDITION=ee. The image is tagged
metabase-mb-cli:workspace-dev-instance-poc; both instances share it.
Knobs (override in .env):
METABASE_IMAGE— use a registry image instead of building (e.g.metabase/metabase-enterprise-head:latestfor master, or a pinned release).METABASE_SRC— path to the cloned branch source (defaults undermaster/).MB_BUILD_VERSION— the version string baked into the build.
Rebuild after new commits land on the branch (re-clones + rebuilds):
./build-branch.sh
docker compose --profile dev up -d --force-recreateOn its first boot against a fresh app-db, the EE build claims the connected
Postgres for its attached-data-warehouse feature and may relabel it "Data
Warehouse" in the UI — purely cosmetic. bootstrap.sh matches the connection by
host + database name (not display name), so re-runs never create a duplicate.
The image is the enterprise build, so the EE UI is present, but premium,
token-gated features (e.g. transforms) stay locked until you supply a token
via MB_PREMIUM_EMBEDDING_TOKEN in .env. Without one it behaves like OSS.