Practical how-to guides for developers building reporting on this platform. These guides assume the platform is running and the Getting Started steps are complete.
For architecture principles and design rationale, see architecture.md. For platform debugging and verification, see development.md.
This walks through adding a new PostgreSQL table to the full pipeline: CDC ingestion → ClickHouse raw landing → dbt staging → dbt mart → Superset dashboard.
Example: adding the referencedata.orderable_display_categories table.
On the source database, add the table to the CDC publication:
ALTER PUBLICATION dbz_publication ADD TABLE referencedata.orderable_display_categories;See source-db-setup.md for initial publication setup.
In .env, append the table to SOURCE_PG_TABLE_ALLOWLIST:
SOURCE_PG_TABLE_ALLOWLIST=referencedata.facilities,...,referencedata.orderable_display_categoriesmake connector-refreshImportant: a simple make register-connector only updates the connector config — it does not load existing data from newly added tables. This is because Debezium's stored offset tells it the initial snapshot already completed. make connector-refresh resets the offset and triggers a fresh snapshot of all tables.
This command: stops the connector → resets stored offsets → deletes and re-registers → re-initializes ClickHouse raw landing → waits for the snapshot → verifies ingestion.
Existing data in ClickHouse raw tables is not lost (append-only). The snapshot produces duplicate rows for previously captured tables, but dbt staging views deduplicate via row_number().
Check that the new table has data:
make verify-ingestionOr query ClickHouse directly:
source .env && curl -s "http://localhost:${CLICKHOUSE_PORT}" \
--user "${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}" \
--data-binary "SELECT count() FROM raw.events_openlmis_referencedata_orderable_display_categories"Create a new file in your analytics package's dbt/models/staging/ directory. See Add a dbt model below for the pattern.
Add the model to dbt/models/staging/schema.yml with at minimum not_null and unique on the primary key. See Required tests below.
If the new table feeds an existing mart, update it to join the new staging model. If it's a new reporting domain, create a new mart in dbt/models/marts/. See Mart models below.
make dbt-build
make verify-dbtCreate a dataset, chart, and/or dashboard for the new data. See Add a Superset chart/dashboard below.
make superset-import
make verify-supersetAll dbt models live in the analytics package (e.g., examples/olmis-analytics-core/dbt/). The platform provides the runner project and generic macros; packages provide the domain-specific models.
Staging models reconstruct current state from the append-only CDC event stream. Each staging model reads from one raw landing table.
The pattern uses a ranked CTE to select the latest event per primary key, excluding deletes:
{{
config(
materialized='view'
)
}}
with ranked as (
select
*,
row_number() over (
partition by JSONExtractString(after, 'id')
order by ts_ms desc, _ingested_at desc
) as _rn
from raw.events_openlmis_referencedata_facilities
where op != 'd'
and JSONExtractString(after, 'id') != ''
)
select
toUUID(JSONExtractString(after, 'id')) as id,
JSONExtractString(after, 'code') as code,
JSONExtractString(after, 'name') as name,
JSONExtractBool(after, 'active') as active
from ranked
where _rn = 1Key points:
partition byon the primary key column (extracted from theafterJSON payload)order by ts_ms desc, _ingested_at descensures the latest event winswhere op != 'd'excludes deletes (deleted rows disappear from the view)JSONExtractString(after, 'id') != ''filters out events with empty payloads (e.g., tombstones)- JSON extraction functions:
JSONExtractString,JSONExtractBool,JSONExtractInt,JSONExtractFloatfor typed access to the CDC payload - Type casting:
toUUID()/toUUIDOrNull()for UUID columns, standard ClickHouse cast functions for others - Materialized as
view— staging models are lightweight and always read current data
CDC data type gotchas:
| PostgreSQL type | CDC representation | ClickHouse extraction |
|---|---|---|
uuid, varchar, text |
JSON string | JSONExtractString(after, 'col') |
integer, bigint |
JSON number | JSONExtractInt(after, 'col') |
numeric, double |
JSON number | JSONExtractFloat(after, 'col') |
boolean |
JSON boolean | JSONExtractBool(after, 'col') |
date |
Integer (days since epoch) | toDate(toDate('1970-01-01') + JSONExtractInt(after, 'col')) |
timestamp |
Integer (milliseconds since epoch) | fromUnixTimestamp64Milli(JSONExtractInt(after, 'col')) |
timestamptz |
String (ISO 8601) | parseDateTimeBestEffortOrNull(JSONExtractString(after, 'col')) |
The date and timestamp representations are a common gotcha — Debezium converts them to epoch integers, not strings. Using parseDateTimeBestEffortOrNull on an epoch integer returns NULL.
See examples/olmis-analytics-core/dbt/models/staging/stg_facilities.sql for a complete working example, and stg_processing_periods.sql for date column handling.
Mart models join staging views into analytics-ready tables. They are the stable BI contract — dashboards query only these, never raw tables.
{{
config(
materialized='table',
engine='MergeTree()',
order_by='(program_name, facility_name, status)'
)
}}
select
r.id as requisition_id,
r.status as status,
f.name as facility_name,
p.name as program_name
from {{ ref('stg_requisitions') }} r
left join {{ ref('stg_facilities') }} f on r.facility_id = f.id
left join {{ ref('stg_programs') }} p on r.program_id = p.idKey points:
- Materialized as
tablewithMergeTree()engine for query performance order_byshould match common query patterns (filter/group columns first)- Use
{{ ref('stg_...') }}to reference staging models (dbt manages dependencies) - Column names in marts are the public contract — renaming is a breaking change
- Nullable columns in
order_by: LEFT JOINs produce nullable columns. MergeTree rejects these in sorting keys by default. Either use non-nullable columns inorder_by, or addsettings={'allow_nullable_key': 1}to the config
See examples/olmis-analytics-core/dbt/models/marts/mart_requisition_summary.sql for a simple example and mart_stock_status.sql for a more complex mart with multiple joins and computed columns.
Every model must have tests in the corresponding schema.yml. Minimum requirements per architecture principles:
| Test type | Purpose | Example |
|---|---|---|
not_null |
Integrity — required columns have values | Primary keys, foreign keys, status fields |
unique |
Integrity — no duplicate rows per key | Primary key column |
relationships |
Referential — foreign keys point to valid records | facility_id references stg_facilities.id |
accepted_values |
Enumerations — status fields have known values | Requisition status in [INITIATED, SUBMITTED, ...] |
Example schema.yml entry:
models:
- name: stg_requisitions
columns:
- name: id
tests:
- not_null
- unique
- name: status
tests:
- not_null
- accepted_values:
arguments:
values: [INITIATED, SUBMITTED, AUTHORIZED, APPROVED, RELEASED]
- name: facility_id
tests:
- not_null
- relationships:
arguments:
to: ref('stg_facilities')
field: idSee examples/olmis-analytics-core/dbt/models/staging/schema.yml for a full example.
your-analytics-package/
dbt/
dbt_project.yml
models/
staging/
stg_your_table.sql # one per source table
schema.yml # tests for all staging models
marts/
mart_your_report.sql # joins staging models
schema.yml # tests for all mart models
make dbt-build # run dbt deps + build (models + tests)
make dbt-test # run tests only (faster, no model rebuild)
make verify-dbt # build + verify curated marts have dataSuperset assets are managed as code: YAML files in Git are the source of truth, imported into Superset at deploy time.
- Author in the Superset UI — create or edit charts and dashboards interactively at
http://localhost:8088 - Export — download as ZIP from the Superset UI (Dashboard →
...menu → Export) or via the API - Unzip and commit — extract the YAML files into your analytics package's
superset/assets/directory - Import — run
make superset-importon target environments
your-analytics-package/
superset/
assets/
metadata.yaml # bundle metadata (required)
databases/
reporting_clickhouse.yaml # ClickHouse connection
datasets/
reporting_clickhouse/
mart_requisition_summary.yaml # dataset on a curated mart
charts/
requisitions_by_status.yaml # chart definition
dashboards/
olmis_requisition_overview.yaml # dashboard layout + chart refs
Every asset bundle requires a metadata.yaml at its root:
version: "1.0.0"
type: Dashboard
timestamp: "2026-01-01T00:00:00+00:00"The type must be Dashboard (this is what Superset's import-dashboards CLI expects). The version must be "1.0.0".
Every Superset asset (database, dataset, chart, dashboard) has a uuid field. These are stable identifiers that enable idempotent re-imports — importing the same UUID updates the existing asset rather than creating a duplicate.
When you export from the Superset UI, UUIDs are already assigned. If creating YAML files manually, generate UUIDs with:
python3 -c "import uuid; print(uuid.uuid4())"Database credentials must never be stored in Git. The database YAML contains the connection URI without a password:
sqlalchemy_uri: "clickhousedb+connect://default@clickhouse:8123/curated"The import script (scripts/superset/import-assets.sh) patches the password from environment variables (CLICKHOUSE_USER, CLICKHOUSE_PASSWORD, CLICKHOUSE_HOST) after import.
Datasets must reference tables in the curated ClickHouse database (dbt marts). Never create datasets on raw tables — they contain append-only CDC events, not current-state data.
make superset-import # imports platform → core → extension assets
make verify-superset # checks health + dashboard existsSee examples/olmis-analytics-core/superset/assets/ for a complete working asset bundle.
Superset dashboards can be embedded in an adopter's web application (e.g., OpenLMIS UI) using the Superset Embedded SDK. Embedded dashboards use guest tokens for authentication — end users never need Superset credentials.
- The adopter UI loads the Superset Embedded SDK from
SUPERSET_URL/static/superset-embedded-sdk.js - The adopter backend requests a guest token from Superset's
/api/v1/security/guest_token/endpoint, authenticating with the admin service account - The SDK renders the dashboard in an iframe, authenticated by the guest token
- Tokens expire after 5 minutes and are automatically refreshed by the SDK
Set the following in .env:
# Origin(s) of the application embedding Superset (e.g., the OLMIS UI URL).
# Controls CORS, CSP frame-ancestors, and embedded dashboard allowed_domains.
# Leave empty to disable embedding entirely.
SUPERSET_EMBEDDING_ORIGINS=http://your-olmis-host
# JWT secret for guest tokens (defaults to SUPERSET_SECRET_KEY if unset).
# Use a dedicated secret in production.
SUPERSET_GUEST_TOKEN_SECRET=replace-with-a-random-stringSUPERSET_EMBEDDING_ORIGINS drives three security layers:
| Layer | What it does |
|---|---|
| CORS | Allows the embedding origin to call Superset APIs (/api/*, /static/*) |
CSP frame-ancestors |
Allows the embedding origin to load Superset in an iframe |
allowed_domains |
Per-dashboard allowlist in Superset DB, synced by make superset-import |
After changing SUPERSET_EMBEDDING_ORIGINS, restart Superset and re-run the import:
make superset-import # patches allowed_domains in DB from SUPERSET_EMBEDDING_ORIGINS
docker compose --env-file .env -f compose/docker-compose.yml restart supersetEach dashboard must be individually configured for embedding in the Superset UI:
- Open the dashboard in Superset →
...menu → Embed dashboard - Add the embedding origin to the allowed domains list
- Copy the embedded UUID — the adopter backend needs this to request guest tokens
The make superset-import step automatically syncs the allowed domains from SUPERSET_EMBEDDING_ORIGINS, so you only need to set them manually during initial setup in the UI.
The superset/init-guest-permissions.py script runs on every Superset startup and grants the Public role the permissions needed for guest token access (dashboard read, chart read, dataset read, etc.). No manual permission setup is required.
| Symptom | Cause | Fix |
|---|---|---|
403 on /embedded/{uuid} |
allowed_domains doesn't include the embedding origin |
Re-run make superset-import or update in Superset UI |
| CORS error in browser console | SUPERSET_EMBEDDING_ORIGINS not set or wrong |
Update .env, restart Superset |
| "Failed to load Superset SDK" | Embedding origin can't reach Superset | Check network connectivity and SUPERSET_URL |
| Guest token request fails (401) | Wrong admin credentials | Check SUPERSET_ADMIN_USER / SUPERSET_ADMIN_PASSWORD |
An analytics package provides domain-specific reporting logic for a particular adopter system. The platform loads packages at runtime via environment variables.
| Type | Purpose | Includes |
|---|---|---|
| Core (required) | Baseline ingestion config + models + dashboards | connect/ + dbt/ + superset/ |
| Extension (optional) | Additional models and dashboards | dbt/ + superset/ only |
Extensions follow the extend-only rule: they may add new models and dashboards but must not modify core assets or change ingestion configuration. Extensions must not include a connect/ directory.
your-analytics-core/
manifest.yaml # package metadata (required)
connect/
your-connector.json # Debezium connector config (core only)
dbt/
dbt_project.yml # dbt package config
models/
staging/ # current-state views from raw CDC events
marts/ # analytics-ready tables (BI contract)
superset/
assets/ # Superset YAML bundle (see above)
README.md
See examples/olmis-analytics-core/ for a complete reference implementation.
Every package must include a manifest.yaml:
name: your-analytics-core
type: core # core or extension
platform_version: ">=1.0.0" # platform compatibility constraint
description: "Your package description"
includes: # which components the package provides
- connect # core only
- dbt
- supersetThe connector JSON template in connect/ uses envsubst for variable substitution at registration time. Environment variables like ${SOURCE_PG_HOST}, ${SOURCE_PG_PASSWORD}, ${DEBEZIUM_TOPIC_PREFIX} are replaced with values from .env.
See examples/olmis-analytics-core/connect/openlmis-postgres-cdc.json for the full template.
The dbt package needs a dbt_project.yml:
name: your_analytics_core
version: "1.0.0"
config-version: 2
model-paths: ["models"]
test-paths: ["tests"]
seed-paths: ["seeds"]The platform runner (dbt/) loads this as a local package in development or fetches it from Git in production — the model paths are resolved automatically.
Set filesystem paths in .env:
ANALYTICS_CORE_PATH=path/to/your-analytics-core
# Extensions (optional, comma-separated)
ANALYTICS_EXTENSIONS_PATHS=path/to/extension-a,path/to/extension-bSet Git URLs in .env:
ANALYTICS_CORE_GIT_URL=https://github.com/org/your-analytics-core.git
ANALYTICS_CORE_GIT_REF=v1.0.0
# Extensions (optional, comma-separated)
ANALYTICS_EXTENSION_GIT_URLS=https://github.com/org/extension-a.git
ANALYTICS_EXTENSION_GIT_REFS=v1.0.0
# GIT_TOKEN=ghp_xxxx # for private reposThen fetch non-dbt components (connector config, Superset assets):
make package-fetch # clones to .packages/, sets paths for downstream scriptsdbt fetches its own packages directly from Git during make dbt-build — no manual fetch needed for dbt models.
Run validation before deploying to catch extend-only rule violations:
make package-validateThis checks that extensions don't include connect/, don't collide on dbt model names, and don't reuse core Superset UUIDs.
make package-validate # validate extension rules (if extensions configured)
make register-connector # register CDC connector from core package
make clickhouse-init # create raw landing tables
make dbt-build # build dbt models
make superset-import # import Superset assets
make verify-dbt # verify curated marts have data
make verify-superset # verify dashboards importedOr run the full pipeline in one command:
make verify-packages # validate + build + import + check dashboardsExtension packages add country-specific or domain-specific reports on top of a core package. See examples/olmis-analytics-malawi/ for a complete working example.
Key rules:
- No
connect/directory — ingestion is owned by the core package - Include the same
databases/YAML as core — Superset's import CLI requires the database definition in each bundle to resolve dataset references. Copy the core'sdatabases/reporting_clickhouse.yamlinto your extension. Using the same UUID ensures it updates the existing connection rather than creating a duplicate. - No model name collisions — your dbt model names must be unique (prefix with your country/domain)
- No UUID collisions — Superset asset UUIDs must be unique across core and all extensions
A typical extension contains:
- A dbt mart that reads from core marts (via
{{ ref('mart_...') }}) and adds an aggregation or filter - Superset assets with a dataset on the new mart, a chart, and a dashboard
- Tests in
schema.yml— same requirements as core (not_null, accepted_values, etc.)
To test your extension locally:
ANALYTICS_CORE_PATH=examples/olmis-analytics-core
ANALYTICS_EXTENSIONS_PATHS=path/to/your-extensionThen make verify-packages runs validation, dbt build, Superset import, and checks that your dashboard appears.