Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ POSTGRES_PASSWORD=""
POSTGRES_DB=""
POSTGRES_PORT=""
POSTGRES_HOST="localhost"

# Postgres bind on host (compose dev db). Default 127.0.0.1 = loopback-only. Use 0.0.0.0 for LAN/Tailscale DBeaver (trusted networks).
# POSTGRES_BIND_ADDR=127.0.0.1

DATABASE_URL="postgresql://<POSTGRES_USER>:<POSTGRES_PASSWORD>@<POSTGRES_HOST>:<POSTGRES_PORT>/<POSTGRES_DB>"

# Dagster (local: absolute path to .../dagster_home; Docker: /app/dagster_home)
Expand Down
69 changes: 69 additions & 0 deletions .github/workflows/quality-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ jobs:
prisma_schema: ${{ steps.filter.outputs.prisma_schema }}
ost_docs_paths: ${{ steps.filter.outputs.ost_docs_paths }}
dagster_cfg: ${{ steps.filter.outputs.dagster_cfg }}
postgres_suite: ${{ steps.filter.outputs.postgres_suite }}
steps:
- name: Checkout
uses: actions/checkout@v4
Expand Down Expand Up @@ -60,6 +61,11 @@ jobs:
- 'scripts/docker-entrypoint.sh'
prisma_schema:
- 'prisma/**'
postgres_suite:
- 'prisma/**'
- 'tests/api_db/**'
- 'tests/conftest.py'
- '.github/workflows/quality-checks.yml'
ost_docs_paths:
- 'ost-docs/**'
- '.gitmodules'
Expand Down Expand Up @@ -114,6 +120,69 @@ jobs:
mkdir -p "$DAGSTER_STORAGE_DIR" "$DAGSTER_LOGS_DIR"
uv run pytest -m integration -k test_dagster_startup --no-cov

postgres-db:
needs: changes
if: >-
github.event_name == 'push'
|| needs.changes.outputs.workflows == 'true'
|| needs.changes.outputs.postgres_suite == 'true'
|| needs.changes.outputs.python == 'true'
|| needs.changes.outputs.prisma_schema == 'true'
runs-on: ubuntu-latest
services:
postgres:
image: ankane/pgvector:v0.4.1
env:
POSTGRES_USER: linker_ci
POSTGRES_PASSWORD: linker_ci
POSTGRES_DB: linker_ci
ports:
- 5432:5432
options: >-
--health-cmd "pg_isready -U linker_ci -d linker_ci"
--health-interval 5s
--health-timeout 5s
--health-retries 20
env:
DATABASE_URL: postgresql://linker_ci:linker_ci@localhost:5432/linker_ci
LINKER_SKIP_SEMANTIC_INIT: "true"
OST_LINKER_REQUIRE_SERVICE_TOKEN: "false"
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install Python deps
run: uv sync --frozen

- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: "20"
cache: npm
cache-dependency-path: package-lock.json

- name: Install Node deps for Prisma CLI + seed
run: npm ci

- name: Deploy migrations
run: npx prisma migrate deploy

- name: Seed taxonomy rows
run: ./node_modules/.bin/ts-node --compiler-options '{"module":"CommonJS"}' prisma/seed/seed.ts

- name: Database-tier pytest
run: uv run pytest tests/api_db --no-cov -v --tb=short

dbt-check:
needs: changes
if: >-
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ tmp_dagster/

# Maintainer-only / local audit notes (do not commit)
docs/READINESS-AUDIT.md
docs/audit

# Local
.actrc
Expand Down
44 changes: 41 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ If `npx ts-node` fails on your Node version, use the `ts-node` line above from t
### Python / Dagster
```bash
uv sync # Install Python dependencies
dagster dev -h 0.0.0.0 -p 3000 # Run Dagster locally (outside Docker)
make dev # Dagster UI on :3000 (uses workspace.host.yaml)
```

### REST API (FastAPI)
Expand All @@ -38,6 +38,32 @@ pytest -m api # Run API tests
```
The API is a lightweight, read-only service consumed by the [ost-mcp](https://github.com/opensource-together/ost-mcp) MCP server. It exposes project search, similarity, trending recommendations, and reference data.

### FastAPI service token (`OST_LINKER_*`)

Exact behavior (see `src/services/api/auth.py` and `lifespan` in `src/services/api/main.py`; covered by `pytest -m api` in `tests/api/test_service_token.py`):

| `OST_LINKER_REQUIRE_SERVICE_TOKEN` | `OST_LINKER_SERVICE_TOKEN` | Protected routes (`/projects`, `/references`, `/recommendations`, …) | `/health` |
| ---------------------------------- | -------------------------- | -------------------------------------------------------------------- | --------- |
| `false` or unset | unset or empty | Open (no `X-Service-Token` required) | Open |
| `false` or unset | set | **401** unless `X-Service-Token` matches | Open |
| `true` | unset or empty | **Startup fails** (`RuntimeError` in lifespan) | n/a |
| `true` | set | **401** unless header matches | Open |

**MCP-facing production:** set strict mode and a strong shared token; keep transport on a private network or TLS-terminated path so the header is not leaked.

### Postgres host bind (dev override)

Compose maps the dev database as `${POSTGRES_BIND_ADDR:-127.0.0.1}:${POSTGRES_PORT:-5433}:5432` (loopback-first by host port **5433** unless you override). Use `POSTGRES_BIND_ADDR=0.0.0.0` **only on trusted LANs** (e.g. DBeaver from another machine on Tailscale) and rely on `POSTGRES_PASSWORD` strength — see `.env.example`.

### Dagster: Docker vs host

- **Containers** use `-w /app/workspace.yaml` with `working_directory: /app` (bind-mounted tree).
- **Host** `make dev` uses `workspace.host.yaml` with `working_directory: .` so `src.linker.definitions` loads from your checkout. Keep both YAML files aligned if you rename modules.

### Ingestion / Dagster regression coverage

Not every ingestion asset ships full deterministic unit tests against Go binaries. After changing subprocess wiring (`raw_github__extract_projects`, trending, etc.), run a Dagster materialization smoke in dev or document manual rehearsal on the PR.

### dbt
Target `local` in `dbt/profiles.yml` uses `POSTGRES_HOST`, `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_PORT`, `POSTGRES_DB` (defaults **ci_user** / **ci_pass** / **5433** if unset — wrong for your Docker DB). **Load the repo `.env` before running dbt:**

Expand All @@ -63,10 +89,22 @@ mypy src/ # Type check (strict mode)
```bash
pytest # Run all tests (coverage included via --cov=src)
pytest tests/test_foo.py -k test_bar # Run a single test
pytest -m unit # Run by marker (unit/integration/performance/api)
pytest -m unit # Run by marker (unit/integration/performance/api/database)
pytest -m integration # Dagster startup smoke test
pytest -m database # Only `tests/api_db/` (requires DATABASE_URL; skipped if unset)
```
`make ci-check` runs ruff (check + format), mypy, unit tests, API tests, and the Dagster smoke — aligned with `.github/workflows/quality-checks.yml`.
`make ci-check` runs ruff (check + format), mypy, unit tests, API tests, and the Dagster smoke — aligned with `.github/workflows/quality-checks.yml`. It does **not** run the Postgres tier; use **`make test-database`** when **`DATABASE_URL`** points at a migrated, seeded DB.

#### Verification tiers (CI vs local Postgres)

| Tier | Command | Needs |
| ---- | ------- | ----- |
| **Unit** | `pytest -m unit --cov-fail-under=50` | Python only |
| **API mocks** | `pytest -m api` | Python only (mocked DB + semantic) |
| **Integration (Dagster)** | `pytest -m integration -k test_dagster_startup --no-cov` | Dagster env dirs (see workflow) |
| **Database (`api_db`)** | `DATABASE_URL=... LINKER_SKIP_SEMANTIC_INIT=true make test-database` | Compose **db** (`ankane/pgvector` in docker-compose override), `npx prisma migrate deploy`, Prisma seed |

**`LINKER_SKIP_SEMANTIC_INIT`** — When set to **`true`**, FastAPI skips loading **`sentence-transformers`** (used in **GitHub Actions `postgres-db`** and **`tests/api_db`**). Routes that call **`get_semantic()`** (e.g. **`/projects`** embedding search) stay untested in that mode.

Test config is in `pyproject.toml` under `[tool.pytest.ini_options]`. Tests use class-based style (`class TestXxx`).

Expand Down
10 changes: 7 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ setup:
uv sync
$(MAKE) build-go

## Dev — run Dagster dev server locally
## Dev — run Dagster dev server locally (host paths; see workspace.host.yaml)
dev:
uv run dagster dev -h 0.0.0.0 -p 3000
uv run dagster dev -h 0.0.0.0 -p 3000 -w workspace.host.yaml

## Test — run pytest with coverage
test:
Expand Down Expand Up @@ -62,6 +62,10 @@ ci-check: lint
uv run pytest -m api --no-cov
uv run pytest -m integration -k test_dagster_startup --no-cov

## Test-database — Postgres-backed FastAPI tier (DATABASE_URL required)
test-database:
uv run pytest tests/api_db --no-cov -v

## Clean — remove Dagster storage and Python caches
clean:
bash scripts/clean_dagster.sh
Expand All @@ -74,4 +78,4 @@ help:
@echo ""
@grep -E '^## ' $(MAKEFILE_LIST) | sed 's/## / /'

.PHONY: setup dev test lint format typecheck build-go docker-up docker-down db-init dbt-build clean help doctor ci-check
.PHONY: setup dev test lint format typecheck build-go docker-up docker-down db-init dbt-build clean help doctor ci-check test-database
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ make db-init # Prisma schema + seed
make ci-check # Python parity with CI quality job (before a PR); full CI is broader — see AGENTS.md
```

See [AGENTS.md](AGENTS.md) for **API service-token behavior**, **Postgres host bind**, and **Dagster host vs Docker workspaces** (`workspace.host.yaml`, `Makefile` **`make dev`**).

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) (branch flow, conventions, **`make ci-check`**). For command cheat-sheets (**dbt**, API, Docker overrides), see [AGENTS.md](AGENTS.md).
Expand Down
4 changes: 2 additions & 2 deletions docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ services:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
# Bind on all host interfaces so peers can reach Postgres via Tailscale or LAN (e.g. DBeaver from another Mac). Use a strong POSTGRES_PASSWORD.
# Default: loopback only (POSTGRES_BIND_ADDR=127.0.0.1). Set POSTGRES_BIND_ADDR=0.0.0.0 for LAN/Tailscale (trusted networks only). Use a strong POSTGRES_PASSWORD.
ports:
- "${POSTGRES_PORT:-5433}:5432"
- "${POSTGRES_BIND_ADDR:-127.0.0.1}:${POSTGRES_PORT:-5433}:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
Expand Down
Loading
Loading