From 49bbca40c569e64ca3d9dc4e2ebbc451797a9e8f Mon Sep 17 00:00:00 2001 From: Stephen Benjamin Date: Wed, 6 May 2026 17:32:20 +0000 Subject: [PATCH] Add lean tier-2 component documentation for Sippy Creates AI-optimized documentation using the agentic-docs component skill from ai-helpers (https://github.com/stbenjam/ai-helpers). Structure: - AGENTS.md: Master entry point (94 lines, navigation-focused) - ai-docs/domain/: CI analysis domain (jobs, tests, variants, releases, component readiness) - ai-docs/architecture/: Sippy internals (backend, frontend, data pipeline) - ai-docs/decisions/: Component ADRs (BigQuery, PostgreSQL views, variant extraction) - ai-docs/SIPPY_DEVELOPMENT.md: Development workflows - ai-docs/SIPPY_TESTING.md: Test suites and patterns - ai-docs/references/ecosystem.md: Links to Tier 1 platform docs - ai-docs/exec-plans/: Feature tracking (pointer to Tier 1 guidance) Total: 17 files, ~1,700 lines (component-specific only, no generic duplication) Co-Authored-By: Claude Sonnet 4.5 --- AGENTS.md | 111 +++++--- ai-docs/SIPPY_DEVELOPMENT.md | 202 +++++++++++++++ ai-docs/SIPPY_TESTING.md | 237 ++++++++++++++++++ ai-docs/architecture/components.md | 188 ++++++++++++++ .../adr-0001-bigquery-data-source.md | 88 +++++++ .../adr-0002-component-readiness-views.md | 96 +++++++ .../decisions/adr-0003-variant-extraction.md | 118 +++++++++ ai-docs/decisions/adr-template.md | 38 +++ ai-docs/decisions/index.md | 32 +++ ai-docs/domain/component-readiness.md | 98 ++++++++ ai-docs/domain/index.md | 43 ++++ ai-docs/domain/job.md | 89 +++++++ ai-docs/domain/release.md | 88 +++++++ ai-docs/domain/test.md | 99 ++++++++ ai-docs/domain/variant.md | 101 ++++++++ ai-docs/exec-plans/README.md | 46 ++++ ai-docs/references/ecosystem.md | 73 ++++++ 17 files changed, 1709 insertions(+), 38 deletions(-) create mode 100644 ai-docs/SIPPY_DEVELOPMENT.md create mode 100644 ai-docs/SIPPY_TESTING.md create mode 100644 ai-docs/architecture/components.md create mode 100644 ai-docs/decisions/adr-0001-bigquery-data-source.md create mode 100644 ai-docs/decisions/adr-0002-component-readiness-views.md create mode 100644 ai-docs/decisions/adr-0003-variant-extraction.md create mode 100644 ai-docs/decisions/adr-template.md create mode 100644 ai-docs/decisions/index.md create mode 100644 ai-docs/domain/component-readiness.md create mode 100644 ai-docs/domain/index.md create mode 100644 ai-docs/domain/job.md create mode 100644 ai-docs/domain/release.md create mode 100644 ai-docs/domain/test.md create mode 100644 ai-docs/domain/variant.md create mode 100644 ai-docs/exec-plans/README.md create mode 100644 ai-docs/references/ecosystem.md diff --git a/AGENTS.md b/AGENTS.md index 856934fe6..ec7a6791a 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,59 +1,94 @@ -# AGENTS.md - - - - +# Sippy - Agentic Documentation -## Files matching `**` +**Component**: Sippy (CIPI - Continuous Integration Private Investigator) +**Repository**: openshift/sippy +**Documentation Tier**: 2 (Component-specific) - -### Database migration +## CRITICAL: Retrieval Strategy -Run migrations: `go run ./cmd/sippy migrate --database-dsn $SIPPY_DATABASE_DSN` +**IMPORTANT**: Prefer retrieval-led reasoning over pre-training-led reasoning. -If `SIPPY_DATABASE_DSN` is not set, use the dev default: `postgresql://postgres:password@localhost:5432/postgres` +When working on Sippy: +- ✅ **DO**: Read relevant docs from `./ai-docs/` first +- ✅ **DO**: Verify patterns match current implementation +- ❌ **DON'T**: Rely solely on training data +- ❌ **DON'T**: Guess at API structures or data models -### Linting +> **Generic Platform Patterns**: See [Tier 1 Ecosystem Hub](https://github.com/openshift/enhancements/tree/master/ai-docs) for operator patterns, testing practices, security guidelines, and cross-repo ADRs. -Run lint: `CI=true make lint` +## What is Sippy? -`CI=true` makes `hack/go-lint.sh` use the locally installed `golangci-lint` instead of spawning a container. +CI analysis tool for OpenShift. Analyzes Prow job results from BigQuery, provides statistical insights on job/test health, regression detection, and component readiness tracking. -### Testing +**Key Principle**: Data-driven release management through statistical analysis of CI signal. -Run unit tests: `make test` +## Core Components -This runs Go tests via gotestsum and sippy-ng Jest tests. +- **Backend**: Go HTTP API server (sippyserver) | **Frontend**: React dashboard (sippy-ng) | **Data Layer**: BigQuery + PostgreSQL + Redis - -**Sippy (CIPI - Continuous Integration Private Investigator)** is a tool used within the OpenShift engineering organization to analyze CI job results. Its primary goals are to: +**Quick Start**: See [SIPPY_DEVELOPMENT.md](SIPPY_DEVELOPMENT.md) -* Provide insights into job and test statistics. -* Monitor release health and detect regressions. -* Support release management decisions through statistical analysis (e.g., Component Readiness). +## Documentation Structure -The system consists of: +```text +ai-docs/ +├── domain/ # CI concepts (jobs, tests, variants, releases) +├── architecture/ # Sippy internals (backend, frontend, data pipeline) +├── decisions/ # Component-specific ADRs +├── exec-plans/ # Feature planning +├── references/ +│ └── ecosystem.md # Links to Tier 1 +├── SIPPY_DEVELOPMENT.md # Development workflows +└── SIPPY_TESTING.md # Test suites +``` -* A **Go-based API backend**. -* A **React/Material-UI frontend** (located in `sippy-ng`). -* Data sources including **PostgreSQL**, and **BigQuery** +**Exec-Plans**: Use `active/` for new features. See [Tier 1 Exec-Plans Guide](https://github.com/openshift/enhancements/tree/master/ai-docs/workflows/exec-plans). -Favor clarity and maintainability over cleverness. Comments should be minimal, helpful, and explain the "why" not the "what". +**Platform Patterns (Tier 1)**: [Testing](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/testing) | [Security](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/security) | [Development](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/development) -## Files matching `**/*.go` +## Knowledge Graph - -* When adding or updating APIs, **use HATEOAS** in responses to support discoverability and consistent client interaction. -* Follow idiomatic Go practices. -* After making changes, always run `gofmt -w` on modified files to ensure proper formatting. +```text + [AGENTS.md] ← Start here + │ + ┌───────────────┼───────────────┐ + │ │ │ + [domain/] [architecture/] [decisions/] + CI concepts Sippy internals ADR history + │ │ │ + └───────────────┼───────────────┘ + │ + [references/ecosystem] + Links to Tier 1 +``` -## Files matching `**/*_test.go` +**AI Agent Path**: domain/ → architecture/ → SIPPY_DEVELOPMENT.md → SIPPY_TESTING.md - -* **Never run `make e2e` more than once per request.** E2e tests issue expensive BigQuery queries and take several minutes. Run `make e2e` only when explicitly asked, capture the output on that single run, and read the log file (`e2e-test.log`) for results. **Do not** re-run e2e just to grep for different things - all output is already in the log file. -* The same applies to `go test ./test/e2e/...` - never run it repeatedly. -* Use `go vet` and `go test` (for unit tests) to validate changes before resorting to a full e2e run. +## Domain Concepts (CI Analysis) + +| Concept | Description | +|---------|-------------| +| **Job** | Prow CI job execution (e.g., periodic-ci-openshift-release-...) | +| **Test** | Individual test case within a job | +| **Variant** | NURP+ dimension (Network, Upgrade, Release, Platform, etc.) | +| **Release** | OpenShift version (e.g., 4.15, 4.16) | +| **Component Readiness** | Statistical analysis of component health | +| **Regression** | Identified decrease in pass rate | + +## Architecture Layers + +| Layer | Technology | Purpose | +|-------|-----------|---------| +| **Data Source** | BigQuery | Prow CI job results | +| **Data Loader** | Go (dataloader) | BigQuery → PostgreSQL ETL | +| **Cache** | Redis | Query result caching | +| **Backend** | Go HTTP API | REST endpoints | +| **Frontend** | React + Material-UI | Dashboard UI | + +## External References + +- [API Documentation](../pkg/api/README.md) | [Frontend Docs](../sippy-ng/README.md) | [Development Guide](../DEVELOPMENT.md) --- -*This file was generated by APM CLI. Do not edit manually.* -*To regenerate: `apm compile`* + +**Tier 1 Hub**: https://github.com/openshift/enhancements/tree/master/ai-docs diff --git a/ai-docs/SIPPY_DEVELOPMENT.md b/ai-docs/SIPPY_DEVELOPMENT.md new file mode 100644 index 000000000..a5bb60bd2 --- /dev/null +++ b/ai-docs/SIPPY_DEVELOPMENT.md @@ -0,0 +1,202 @@ +# Sippy - Development Guide + +> **Generic Development Practices**: See [Tier 1 Development Practices](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/development) for Go standards, API design, and CI/CD workflows. + +> **Detailed Setup**: See [../DEVELOPMENT.md](../DEVELOPMENT.md) for comprehensive instructions. + +This guide covers **Sippy-specific** development workflows for AI agents. + +## Quick Start + +### Prerequisites + +```text +Tool | Version | Purpose +----------------|---------|------------------------------------------ +Go | 1.23+ | Backend development +Node.js | 20+ | Frontend development (sippy-ng) +PostgreSQL | 16+ | Data storage +Redis | 7+ | Query caching +GCP credentials | - | BigQuery access (optional for dev) +``` + +### Devcontainer (Recommended) + +Use `.devcontainer/` for pre-configured environment with all tools. See [../.devcontainer/README.md](../.devcontainer/README.md). + +**MCP Tools**: Use `/sippy-dev-*` skills for common tasks (see [../mcp/README.md](../mcp/README.md)). + +## Repository Structure + +```text +sippy/ +├── cmd/sippy/ # Main CLI (server + data loader + migrations) +├── pkg/ +│ ├── api/ # HTTP API handlers (REST endpoints) +│ ├── apis/ # Data structures (API types) +│ ├── sippyserver/ # HTTP server setup +│ ├── dataloader/ # BigQuery → PostgreSQL ETL +│ ├── db/ # Database models, queries, migrations +│ ├── variantregistry/ # Variant extraction (NURP+ model) +│ ├── componentreadiness/ # Component health analysis +│ ├── cache/ # Redis caching layer +│ └── ... +├── sippy-ng/ # React frontend (Material-UI) +│ └── src/ +│ ├── component_readiness/ +│ ├── releases/ +│ ├── jobs/ +│ └── ... +├── test/e2e/ # API end-to-end tests +└── scripts/ # Deployment and utility scripts +``` + +## Build Commands + +| Command | Output | Purpose | +|---------|--------|---------| +| `make` | `./sippy` | All-in-one binary (backend + embedded frontend) | +| `make sippy` | `./sippy` | Backend only | +| `make frontend` | `sippy-ng/build/` | Frontend build (production) | +| `make test` | - | Unit tests (Go + Jest) | +| `make lint` | - | Linters (golangci-lint + eslint) | +| `make e2e` | - | E2E API tests (⚠️ expensive BigQuery queries) | + +**Important**: Never run `make e2e` more than once per request. See [SIPPY_TESTING.md](SIPPY_TESTING.md#e2e-tests). + +## Development Workflows + +### Backend Development + +**Run server**: +```bash +/sippy-dev-serve # MCP skill (recommended) +# or +./sippy serve --database-dsn=$SIPPY_DATABASE_DSN +``` + +**Database migrations**: +```bash +/sippy-dev-migrate # MCP skill (recommended) +# or +./sippy migrate --database-dsn=$SIPPY_DATABASE_DSN +``` + +**Load data from BigQuery**: +```bash +/sippy-dev-regression-cache # MCP skill (recommended) +# or +./sippy load --release=4.16 --config=./config/openshift.yaml +``` + +### Frontend Development + +**Run dev server**: +```bash +/sippy-dev-frontend # MCP skill (recommended) +# or +cd sippy-ng && npm start +``` + +**Access**: `http://localhost:3000` (proxies API to backend) + +### Full Stack Development + +**Run both backend + frontend**: +```bash +/sippy-dev-app # MCP skill (recommended) +``` + +Backend: `http://localhost:8080` +Frontend: `http://localhost:3000` + +## Database + +**Schema**: Managed via migrations in `pkg/db/migrations/` + +**Key tables**: +- `prow_job_run_tests` - Job results +- `prow_job_run_test_outputs` - Test results +- Component readiness views (e.g., `component_readiness_4_16`) + +**Migrations**: See [../CLAUDE.md](../CLAUDE.md#database-migration) + +**Default DSN**: `postgresql://postgres:password@localhost:5432/postgres` + +## Common Tasks + +### Add New API Endpoint + +1. Define handler in `pkg/api/[feature].go` +2. Register route in `pkg/sippyserver/server.go` +3. Add types to `pkg/apis/api/types.go` +4. Add E2E test in `test/e2e/` +5. Update API docs in `pkg/api/README.md` + +### Add New Variant + +1. Define pattern in `pkg/variantregistry/registry.go` +2. Update variant snapshot: `make update-variants` +3. Add frontend filter in `sippy-ng/src/components/VariantSelector.js` +4. Test variant extraction with existing jobs + +### Generate Component Readiness Views + +**For new release**: +```bash +/sippy-generate-release-views # Generates views for new release +``` + +**When release goes GA**: +```bash +/sippy-update-ga-release-views # Updates GA status +``` + +See [domain/component-readiness.md](domain/component-readiness.md) for details. + +### Update Job Variant + +**Interactive update**: +```bash +/sippy-update-job-variant # MCP skill +``` + +## Testing + +See [SIPPY_TESTING.md](SIPPY_TESTING.md) for comprehensive testing guide. + +**Quick commands**: +```bash +make test # Unit tests (Go + Jest) +make lint # Linters +make e2e # E2E tests (⚠️ run once only) +/sippy-dev-tests # MCP skill (runs lint + unit + e2e) +``` + +## Debugging + +**Backend logs**: Server outputs to stdout + +**Frontend logs**: Browser console (React DevTools) + +**Database queries**: Set `SIPPY_LOG_LEVEL=debug` + +**Redis cache**: Use `redis-cli` to inspect cached keys + +## Component-Specific Notes + +**BigQuery credentials**: Optional for local dev. Use prod backup instead (see [../DEVELOPMENT.md](../DEVELOPMENT.md#from-a-prod-sippy-backup)). + +**Variant snapshot**: Auto-generated `pkg/variantregistry/snapshot.yaml` must be kept in sync. Run `make update-variants` after variant logic changes. + +**Component readiness views**: Generated per-release. Must create views before querying (use `/sippy-generate-release-views`). + +**E2E tests**: Query live BigQuery. Expensive. Never run more than once. See [SIPPY_TESTING.md](SIPPY_TESTING.md#e2e-tests). + +## See Also + +- [SIPPY_TESTING.md](SIPPY_TESTING.md) - Test suites and patterns +- [architecture/components.md](architecture/components.md) - Sippy internals +- [../DEVELOPMENT.md](../DEVELOPMENT.md) - Detailed setup guide +- [../pkg/api/README.md](../pkg/api/README.md) - API documentation +- [Tier 1 Development Practices](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/development) diff --git a/ai-docs/SIPPY_TESTING.md b/ai-docs/SIPPY_TESTING.md new file mode 100644 index 000000000..7c23ad3b7 --- /dev/null +++ b/ai-docs/SIPPY_TESTING.md @@ -0,0 +1,237 @@ +# Sippy - Testing Guide + +> **Generic Testing Practices**: See [Tier 1 Testing Practices](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/testing) for test pyramid philosophy (60/30/10), E2E framework patterns, and mock strategies. + +This guide covers **Sippy-specific** test organization and execution. + +## Test Organization + +```text +Test Type | Location | Count | Run Time | Purpose +-------------|-------------------------|-------|----------|---------------------------------- +Unit (Go) | pkg/**/*_test.go | ~150 | ~30s | Package-level logic +Unit (Jest) | sippy-ng/src/**/*.test* | ~50 | ~10s | React component tests +E2E | test/e2e/ | ~20 | ~5min | API endpoint validation (BigQuery) +``` + +**Pyramid ratio**: ~60% unit (Go + Jest), ~30% integration (minimal), ~10% E2E + +## Unit Tests (Go) + +### Location + +Package tests: `pkg/[package]/*_test.go` + +### Running + +```bash +make test-unit # All Go tests +go test ./pkg/... # All packages +go test ./pkg/api/... # Specific package +go test -v -run TestName ./pkg/api/ # Specific test +``` + +### Coverage + +```bash +go test -coverprofile=coverage.out ./pkg/... +go tool cover -html=coverage.out +``` + +**Current coverage**: ~65% (goal: 70%) + +### Sippy-Specific Patterns + +**Variant extraction tests**: `pkg/variantregistry/registry_test.go` +- Test regex patterns against real job names +- Validate variant values + +**Database tests**: `pkg/db/*_test.go` +- Use in-memory SQLite for speed +- Avoid real PostgreSQL for unit tests + +**API handler tests**: `pkg/api/*_test.go` +- Mock database layer +- Test HTTP responses (status codes, JSON) + +## Unit Tests (Jest) + +### Location + +React component tests: `sippy-ng/src/**/*.test.js` + +### Running + +```bash +cd sippy-ng +npm test # All Jest tests +npm test -- --coverage # With coverage +``` + +### Sippy-Specific Patterns + +**Component rendering**: Snapshot tests for UI components + +**Data fetching**: Mock API responses using MSW (Mock Service Worker) + +**Navigation**: Test React Router navigation logic + +## E2E Tests + +### ⚠️ CRITICAL WARNING + +**NEVER run `make e2e` or `go test ./test/e2e/...` more than ONCE per request.** + +**Why**: E2E tests query live BigQuery. Expensive. Slow (5+ minutes). Multiple runs waste quota. + +**Process**: +1. Run **once**: `make e2e 2>&1 | tee e2e-test.log` +2. Read results: `less e2e-test.log` +3. Grep logs: `grep "FAIL" e2e-test.log` + +**DO NOT re-run to grep for different strings. All output is in the log file.** + +### Location + +E2E tests: `test/e2e/` + +### What E2E Tests Do + +**Purpose**: Validate API endpoints against real BigQuery data + +**Scope**: +- API response structure (JSON schema) +- Data integrity (pass rates, counts) +- Cross-endpoint consistency (jobs → tests → component readiness) + +**NOT tested**: +- BigQuery query performance (assume BigQuery works) +- Data loader logic (unit tests cover this) + +### Running E2E Tests + +```bash +make e2e # Run all E2E tests (⚠️ once only) +go test -v ./test/e2e/... # Same as make e2e +/sippy-dev-tests # MCP skill (lint + unit + e2e) +``` + +**Output**: `e2e-test.log` (persisted) + +**Coverage**: `e2e-coverage.out` → `e2e-coverage.html` + +### Sippy-Specific E2E Scenarios + +| Test | Endpoint | Validates | +|------|----------|-----------| +| **Jobs API** | `/api/jobs?release=4.16` | Job list, filtering, variants | +| **Tests API** | `/api/tests?release=4.16` | Test results, pass rates | +| **Component Readiness** | `/api/componentreadiness?release=4.16` | Aggregations, regressions | +| **Variants** | `/api/variants` | Variant extraction accuracy | + +## Test Data + +### Local Development + +**Recommended**: Use prod backup (see [../DEVELOPMENT.md](../DEVELOPMENT.md#from-a-prod-sippy-backup)) + +**Advantages**: +- Real data (no mocking) +- Fast setup (no BigQuery credentials) +- Offline development + +### BigQuery (Production) + +**When needed**: +- E2E tests +- Data loader development +- Variant snapshot updates + +**Credentials**: `GOOGLE_APPLICATION_CREDENTIALS` environment variable + +## Debugging Test Failures + +### Unit Test Failures + +**Go tests**: +```bash +go test -v -run TestName ./pkg/api/ # Verbose output +go test -race ./pkg/... # Race detector +``` + +**Jest tests**: +```bash +cd sippy-ng +npm test -- --verbose # Verbose output +npm test -- --no-cache # Clear cache +``` + +### E2E Test Failures + +**Read log file**: +```bash +less e2e-test.log +grep "FAIL" e2e-test.log +grep "panic" e2e-test.log +``` + +**Common issues**: +- BigQuery credentials missing: Check `GOOGLE_APPLICATION_CREDENTIALS` +- Network timeout: BigQuery query too slow (check quota) +- Data mismatch: Release data changed (expected in live data) + +## Coverage Targets + +| Test Type | Current | Goal | +|-----------|---------|------| +| Go unit | ~65% | 70% | +| Jest | ~50% | 60% | +| E2E | N/A | API endpoints covered | + +**Coverage commands**: +```bash +# Go coverage +make test-coverage +go tool cover -html=coverage.out + +# Jest coverage +cd sippy-ng && npm test -- --coverage + +# E2E coverage +make e2e # Generates e2e-coverage.html +``` + +## Test Maintenance + +### When to Update Tests + +**Variant changes**: Update `pkg/variantregistry/registry_test.go` + +**API changes**: Update E2E tests in `test/e2e/` + +**Database schema changes**: Update DB tests in `pkg/db/` + +**Frontend changes**: Update Jest tests in `sippy-ng/src/` + +### Known Flaky Tests + +**None currently documented.** + +If you encounter flaky tests, document them here with reproduction steps. + +## Component-Specific Notes + +**E2E tests are expensive**: Never run more than once. See [warning above](#-critical-warning). + +**Variant snapshot tests**: `TestVariantsSnapshot` fails if `pkg/variantregistry/snapshot.yaml` is out of date. Run `make update-variants` to fix. + +**Database tests use SQLite**: Unit tests don't require PostgreSQL running. + +**Jest tests use MSW**: API mocking via Mock Service Worker (see `sippy-ng/src/setupTests.js`). + +## See Also + +- [SIPPY_DEVELOPMENT.md](SIPPY_DEVELOPMENT.md) - Development workflows +- [architecture/components.md](architecture/components.md) - Sippy internals +- [../CLAUDE.md](../CLAUDE.md#testing) - Claude-specific testing rules +- [Tier 1 Testing Practices](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/testing) diff --git a/ai-docs/architecture/components.md b/ai-docs/architecture/components.md new file mode 100644 index 000000000..b00f2ba93 --- /dev/null +++ b/ai-docs/architecture/components.md @@ -0,0 +1,188 @@ +# Sippy Architecture + +## System Overview + +Sippy is a three-tier application: + +```text +┌─────────────────┐ +│ Frontend │ React + Material-UI (sippy-ng) +│ (sippy-ng) │ +└────────┬────────┘ + │ HTTP/REST +┌────────▼────────┐ +│ Backend │ Go HTTP API (sippyserver) +│ (sippyserver) │ +└────────┬────────┘ + │ + ┌────┴────┬────────────┐ + │ │ │ +┌───▼──┐ ┌───▼──┐ ┌───────▼─────┐ +│ BQ │ │ PG │ │ Redis │ +│ │ │ │ │ (cache) │ +└──────┘ └──────┘ └─────────────┘ +``` + +## Repository Structure + +```text +sippy/ +├── cmd/ +│ └── sippy/ # Main entry point (HTTP server + CLI) +├── pkg/ +│ ├── api/ # HTTP API handlers +│ ├── apis/ # Data structures (types) +│ ├── sippyserver/ # HTTP server setup +│ ├── dataloader/ # BigQuery → PostgreSQL ETL +│ ├── db/ # Database models and queries +│ ├── variantregistry/ # Variant extraction logic +│ ├── componentreadiness/ # Component analysis +│ ├── cache/ # Redis caching +│ └── ... +├── sippy-ng/ # React frontend +│ └── src/ +│ ├── component_readiness/ +│ ├── releases/ +│ ├── jobs/ +│ └── ... +├── test/ +│ └── e2e/ # End-to-end API tests +└── scripts/ # Deployment and utility scripts +``` + +## Backend Components + +### HTTP API Server (pkg/sippyserver) + +**Responsibility**: Serve REST API for frontend + +**Key files**: +- `pkg/sippyserver/server.go` - HTTP server setup +- `pkg/api/` - API endpoint handlers + +**Endpoints**: See `pkg/api/README.md` + +### Data Loader (pkg/dataloader) + +**Responsibility**: Import BigQuery data into PostgreSQL + +**Process**: +1. Query BigQuery for recent job results +2. Parse and normalize test names +3. Extract variants from job names +4. Insert into PostgreSQL tables + +**Key files**: +- `pkg/dataloader/loader.go` - Main loader logic +- `pkg/dataloader/bigquery.go` - BigQuery client + +**Execution**: Run via `/sippy-dev-regression-cache` skill or `sippy load-data` + +### Variant Registry (pkg/variantregistry) + +**Responsibility**: Define and extract variants from job names + +**Pattern matching**: +```go +type VariantDefinition struct { + Name string + Pattern *regexp.Regexp + Values []string +} +``` + +**Key files**: `pkg/variantregistry/registry.go` + +### Component Readiness (pkg/componentreadiness) + +**Responsibility**: Calculate component health statistics + +**Key operations**: +- Aggregate test results by component +- Calculate pass rates +- Detect regressions +- Generate component readiness views + +**Key files**: +- `pkg/componentreadiness/calculator.go` +- `pkg/componentreadiness/component_mapping.go` + +### Database Layer (pkg/db) + +**Responsibility**: PostgreSQL schema and queries + +**Tables**: +- `prow_job_run_tests` - Job results +- `prow_job_run_test_outputs` - Test results +- Component readiness views (e.g., `component_readiness_4_16`) + +**Migrations**: See [SIPPY_DEVELOPMENT.md](../SIPPY_DEVELOPMENT.md#database-migrations) + +### Cache Layer (pkg/cache) + +**Responsibility**: Redis caching for expensive queries + +**Cached data**: +- Component readiness results +- Job statistics +- Test pass rates + +**TTL**: Configurable per query type + +## Frontend Components + +### React Application (sippy-ng) + +**Technology**: React 18 + Material-UI + React Router + +**Key modules**: +- `component_readiness/` - Component health dashboard +- `releases/` - Release overview pages +- `jobs/` - Job result browser +- `datagrid/` - Reusable table components + +**Build**: `make frontend` (see [SIPPY_DEVELOPMENT.md](../SIPPY_DEVELOPMENT.md)) + +## Data Flow + +### Read Path (API Query) + +```text +1. User requests /api/componentreadiness?release=4.16 +2. API handler checks Redis cache +3. Cache miss → Query PostgreSQL view +4. Return JSON response +5. Cache result in Redis (TTL: 5 min) +``` + +### Write Path (Data Loader) + +```text +1. Cron job triggers data loader +2. Query BigQuery for new job results +3. Extract variants from job names +4. Normalize test names +5. Insert into PostgreSQL +6. Invalidate relevant Redis cache entries +7. Update component readiness views (if needed) +``` + +## Deployment + +**Environments**: +- Production: OpenShift cluster +- Development: Local (see [SIPPY_DEVELOPMENT.md](../SIPPY_DEVELOPMENT.md)) + +**Skills**: +- `/sippy-dev-app` - Start backend + frontend dev servers +- `/sippy-dev-serve` - Backend only +- `/sippy-dev-frontend` - Frontend only +- `/sippy-dev-migrate` - Run database migrations +- `/sippy-dev-regression-cache` - Load BigQuery data + +## Related Documentation + +- [Domain concepts](../domain/) - CI analysis domain model +- [SIPPY_DEVELOPMENT.md](../SIPPY_DEVELOPMENT.md) - Development workflows +- [SIPPY_TESTING.md](../SIPPY_TESTING.md) - Test suites +- [API documentation](../../pkg/api/README.md) - API endpoints diff --git a/ai-docs/decisions/adr-0001-bigquery-data-source.md b/ai-docs/decisions/adr-0001-bigquery-data-source.md new file mode 100644 index 000000000..a647aec74 --- /dev/null +++ b/ai-docs/decisions/adr-0001-bigquery-data-source.md @@ -0,0 +1,88 @@ +# ADR-0001: BigQuery as Primary Data Source + +**Status**: Accepted +**Date**: 2021-05-15 +**Component**: Sippy + +## Context + +Sippy needs to analyze OpenShift CI job results from Prow. Job results are stored in multiple formats and locations (TestGrid, GCS, BigQuery). + +**Requirements**: +- Query millions of test results across thousands of jobs +- Historical analysis (weeks to months of data) +- Complex filtering (by job, test, variant, time range) +- Performance for dashboard queries + +**Scope**: This ADR is component-specific. For general data architecture patterns, see [Tier 1 ADRs](https://github.com/openshift/enhancements/tree/master/ai-docs/decisions). + +## Decision + +Use Google BigQuery as the primary data source for Prow CI results, with PostgreSQL as secondary storage for aggregated data and caching. + +**Architecture**: +1. Prow uploads results to BigQuery (prow.jobs table) +2. Sippy dataloader queries BigQuery periodically +3. Results stored in PostgreSQL for fast dashboard queries +4. Redis cache for expensive aggregations + +## Rationale + +**BigQuery advantages**: +- Already populated by Prow (no new data pipeline) +- Excellent performance for large dataset queries +- SQL interface (familiar to developers) +- Handles schema evolution (new columns added over time) + +**PostgreSQL rationale**: +- Faster for dashboard queries (indexed, materialized views) +- Component readiness views (pre-aggregated statistics) +- Offline operation (not dependent on BigQuery availability) + +## Consequences + +### Positive +- Leverage existing Prow data infrastructure +- Fast queries for large historical datasets +- No need to build custom data ingestion pipeline +- SQL-based analysis (accessible to non-developers) + +### Negative +- BigQuery costs (queries are not free) +- Data freshness lag (dataloader runs periodically, not real-time) +- Dependency on Google Cloud (vendor lock-in) +- Dual storage complexity (BigQuery + PostgreSQL + Redis) + +### Neutral +- Need dataloader process to sync BigQuery → PostgreSQL +- Need cache invalidation strategy + +## Alternatives Considered + +### Alternative 1: TestGrid as Data Source +**Description**: Parse TestGrid HTML pages +**Rejected because**: +- No structured API (HTML scraping fragile) +- Limited historical data access +- Poor performance for complex queries + +### Alternative 2: Direct GCS Access +**Description**: Parse junit XML files directly from GCS +**Rejected because**: +- Massive number of files (millions) +- No indexing (slow queries) +- Need custom parser for junit XML +- Already done by Prow → BigQuery pipeline + +### Alternative 3: PostgreSQL Only (no BigQuery) +**Description**: Build custom ingestion from Prow +**Rejected because**: +- Duplicate effort (Prow already sends to BigQuery) +- Need to handle schema evolution ourselves +- More infrastructure to maintain + +## References + +- BigQuery schema: `prow.jobs` table documentation +- Data loader implementation: `pkg/dataloader/` +- Prow BigQuery documentation: https://github.com/kubernetes/test-infra/tree/master/prow/bigquery diff --git a/ai-docs/decisions/adr-0002-component-readiness-views.md b/ai-docs/decisions/adr-0002-component-readiness-views.md new file mode 100644 index 000000000..57f1f79d4 --- /dev/null +++ b/ai-docs/decisions/adr-0002-component-readiness-views.md @@ -0,0 +1,96 @@ +# ADR-0002: PostgreSQL Views for Component Readiness + +**Status**: Accepted +**Date**: 2022-03-10 +**Component**: Sippy + +## Context + +Component Readiness queries are expensive: aggregate millions of test results, group by component, calculate pass rates, detect regressions. Running these calculations on-demand is too slow for dashboard responsiveness. + +**Requirements**: +- Component readiness queries must be < 500ms (dashboard UX) +- Support filtering by release, variant, time range +- Update daily (not real-time) +- Support historical comparison (current vs baseline) + +**Scope**: This ADR is component-specific. For general database patterns, see [Tier 1 ADRs](https://github.com/openshift/enhancements/tree/master/ai-docs/decisions). + +## Decision + +Use PostgreSQL **materialized views** for component readiness, refreshed daily by the dataloader. + +**Implementation**: +- One view per release: `component_readiness_4_16`, `component_readiness_4_17` +- Views pre-aggregate: component, pass_rate, test_count, regression_status +- Indexed by component, variant for fast filtering +- Refreshed after dataloader runs (REFRESH MATERIALIZED VIEW) + +## Rationale + +**Materialized views advantages**: +- Pre-computed aggregations (fast queries) +- Standard PostgreSQL feature (no custom caching logic) +- Easy to add/remove releases (create/drop view) +- SQL-based (can inspect with standard tools) + +**Per-release views rationale**: +- Releases have different component mappings +- Easier to manage lifecycle (drop old releases) +- Simpler queries (no release filtering in WHERE clause) + +## Consequences + +### Positive +- Dashboard queries < 100ms (vs 10+ seconds raw) +- Reliable performance (pre-computed, indexed) +- Simple query logic in API handlers +- Easy to debug (standard SQL views) + +### Negative +- Data freshness lag (views refreshed daily, not real-time) +- Disk usage (materialized views consume storage) +- View management overhead (create views for new releases) +- Schema changes require view recreation + +### Neutral +- Need view generation skill (`/sippy-generate-release-views`) +- Need view refresh after dataloader runs +- Need view cleanup for old releases + +## Alternatives Considered + +### Alternative 1: On-Demand Calculation +**Description**: Calculate component readiness on every API call +**Rejected because**: +- Too slow (10+ second queries unacceptable for dashboard) +- High database load (expensive aggregations on every request) +- Cache invalidation complexity + +### Alternative 2: Redis Cache Only +**Description**: Cache aggregated results in Redis +**Rejected because**: +- Cache warm-up complexity +- Cache invalidation logic needed +- Lost on Redis restart (need persistent storage anyway) +- Harder to debug than SQL views + +### Alternative 3: Separate Component Readiness Table +**Description**: Dedicated table with daily updates +**Rejected because**: +- More complex data loading logic +- Need custom aggregation code (vs SQL views) +- Harder to keep in sync with test results + +### Alternative 4: Regular Views (not materialized) +**Description**: Use standard PostgreSQL views +**Rejected because**: +- Still slow (re-compute on every query) +- No benefit over raw queries + +## References + +- View generation skill: `/sippy-generate-release-views` +- View update skill: `/sippy-update-ga-release-views` +- View implementation: `pkg/db/componentreadiness/views.go` +- API usage: `pkg/api/componentreadiness.go` diff --git a/ai-docs/decisions/adr-0003-variant-extraction.md b/ai-docs/decisions/adr-0003-variant-extraction.md new file mode 100644 index 000000000..b27aabd4d --- /dev/null +++ b/ai-docs/decisions/adr-0003-variant-extraction.md @@ -0,0 +1,118 @@ +# ADR-0003: Variant Extraction from Job Names + +**Status**: Accepted +**Date**: 2021-08-20 +**Component**: Sippy + +## Context + +Sippy needs to slice CI results by variants (platform, network, upgrade type, etc.) to identify variant-specific issues. Prow job names encode variant information, but there's no standardized structure. + +**Example job name**: `periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-upgrade` + +**Requirements**: +- Extract variants: platform (aws), network (ovn), upgrade (upgrade), release (4.16) +- Support new variants without schema changes +- Handle variant combinations (e.g., aws + ovn + upgrade) +- Performant variant filtering in queries + +**Scope**: This ADR is component-specific. For general data modeling patterns, see [Tier 1 ADRs](https://github.com/openshift/enhancements/tree/master/ai-docs/decisions). + +## Decision + +Extract variants from job names using **regex patterns** defined in a variant registry, store as JSONB in PostgreSQL. + +**Implementation**: +```go +// pkg/variantregistry/registry.go +type VariantDefinition struct { + Name string // "Platform" + Pattern *regexp.Regexp // e.g., `(aws|gcp|azure|metal)` + Values []string // Valid values +} +``` + +**Storage**: +```sql +CREATE TABLE prow_job_run_tests ( + ... + variants JSONB, -- {"network": "ovn", "platform": "aws"} + ... +); +CREATE INDEX idx_variants ON prow_job_run_tests USING gin(variants); +``` + +## Rationale + +**Regex extraction advantages**: +- Flexible (adapt to job naming changes) +- Centralized (variant definitions in one place) +- Easy to add new variants (just add pattern) + +**JSONB storage advantages**: +- Schemaless (don't need columns for each variant) +- GIN index for fast filtering +- Handles variant combinations naturally +- Easy to query: `WHERE variants @> '{"platform": "aws"}'` + +**Variant registry rationale**: +- Single source of truth for variant definitions +- Code-based (version controlled, testable) +- Type-safe (Go structs) + +## Consequences + +### Positive +- Easy to add new variants (update registry, no schema migration) +- Fast filtering with GIN index +- Natural handling of variant combinations +- Future-proof (schemaless storage) + +### Negative +- Regex fragility (job name changes can break extraction) +- No enforcement of job naming conventions +- JSONB queries more complex than column queries +- Need to validate regex patterns carefully + +### Neutral +- Variant definitions live in code (not configuration) +- Need variant validation tests +- GIN index maintenance overhead + +## Alternatives Considered + +### Alternative 1: Separate Columns per Variant +**Description**: `platform TEXT, network TEXT, upgrade TEXT, ...` +**Rejected because**: +- Schema change for every new variant +- Sparse columns (many NULLs) +- Harder to handle variant combinations +- Limited to pre-defined variants + +### Alternative 2: Normalized Variant Tables +**Description**: `job_variants(job_id, variant_type, variant_value)` +**Rejected because**: +- More complex queries (joins for filtering) +- Slower performance (join overhead) +- More storage overhead (multiple rows per job) + +### Alternative 3: Parse from Prow Metadata +**Description**: Extract variants from Prow job config +**Rejected because**: +- Not all variants in config (some in job name only) +- Need access to Prow config repo +- Config changes over time (historical data issues) + +### Alternative 4: Manual Variant Tagging +**Description**: Manually tag jobs with variants +**Rejected because**: +- Not scalable (thousands of jobs) +- Prone to errors +- Lag in tagging new jobs + +## References + +- Variant registry implementation: `pkg/variantregistry/` +- API filtering: `pkg/api/filter.go` +- Frontend variant selector: `sippy-ng/src/components/VariantSelector.js` +- Job naming conventions: https://github.com/openshift/release (unofficial) diff --git a/ai-docs/decisions/adr-template.md b/ai-docs/decisions/adr-template.md new file mode 100644 index 000000000..1c53360c3 --- /dev/null +++ b/ai-docs/decisions/adr-template.md @@ -0,0 +1,38 @@ +# ADR-NNNN: Title + +**Status**: Proposed | Accepted | Deprecated | Superseded +**Date**: YYYY-MM-DD +**Deciders**: [List of people involved] + +## Context + +What is the issue we're facing? What constraints exist? + +## Decision + +What did we decide to do? + +## Consequences + +**Positive**: +- Benefit 1 +- Benefit 2 + +**Negative**: +- Drawback 1 +- Drawback 2 + +**Neutral**: +- Impact 1 + +## Alternatives Considered + +**Alternative 1**: Description +- Pros: ... +- Cons: ... +- Reason for rejection: ... + +## References + +- Related ADRs: ADR-XXXX +- External docs: [Link](url) diff --git a/ai-docs/decisions/index.md b/ai-docs/decisions/index.md new file mode 100644 index 000000000..b0087b4f4 --- /dev/null +++ b/ai-docs/decisions/index.md @@ -0,0 +1,32 @@ +# Architectural Decision Records (ADRs) + +Component-specific decisions for Sippy. For cross-repo decisions, see [Tier 1 ADRs](https://github.com/openshift/enhancements/tree/master/ai-docs/decisions). + +## Active ADRs + +- [adr-0001-bigquery-data-source.md](adr-0001-bigquery-data-source.md) - Use BigQuery as primary data source +- [adr-0002-component-readiness-views.md](adr-0002-component-readiness-views.md) - PostgreSQL views for component readiness +- [adr-0003-variant-extraction.md](adr-0003-variant-extraction.md) - Extract variants from job names using regex + +## ADR Template + +Use [adr-template.md](adr-template.md) when creating new ADRs. + +## When to Create an ADR + +**Create ADR for**: +- Sippy-specific architectural decisions +- Data modeling choices +- Technology selections for Sippy + +**Do NOT create ADR for**: +- Cross-repo decisions (use Tier 1 instead) +- Trivial implementation details +- Temporary workarounds + +## ADR Lifecycle + +1. **Proposed**: Draft ADR, under discussion +2. **Accepted**: Decision made, implemented +3. **Deprecated**: Still in use, but discouraged +4. **Superseded**: Replaced by newer ADR diff --git a/ai-docs/domain/component-readiness.md b/ai-docs/domain/component-readiness.md new file mode 100644 index 000000000..1876a5cdf --- /dev/null +++ b/ai-docs/domain/component-readiness.md @@ -0,0 +1,98 @@ +# Domain Concept: Component Readiness + +**Type**: Statistical Analysis +**Data Source**: Aggregated test results +**Primary API**: `/api/componentreadiness` + +## Purpose + +Component Readiness provides statistical assessment of OpenShift component health based on CI test results. Used by release managers to make go/no-go decisions. + +## Key Metrics + +| Metric | Description | Formula | +|--------|-------------|---------| +| **Pass Rate** | Percentage of tests passing | `passed / (passed + failed)` | +| **Regression** | Significant drop in pass rate | `current - baseline < -threshold` | +| **Risk Level** | Overall component health | `low`, `medium`, `high`, `extreme` | +| **Test Coverage** | Number of tests for component | Count of tests | + +## Component Classification + +Components are identified by: +1. **Test name patterns** (e.g., `[sig-network]` → Networking component) +2. **Jira components** (mapped from bug reports) +3. **Manual configuration** (component mappings) + +**Mapping**: `pkg/componentreadiness/component_mapping.go` + +## Regression Detection + +**Baseline**: Historical pass rate (e.g., last 7 days) +**Current**: Recent pass rate (e.g., last 24 hours) +**Threshold**: Configurable per release (default: 5%) + +```go +if currentPassRate < (baselinePassRate - threshold) { + flagAsRegression() +} +``` + +## Risk Assessment + +| Risk Level | Pass Rate | Action | +|------------|-----------|--------| +| **Low** | > 95% | Ship | +| **Medium** | 90-95% | Monitor | +| **High** | 80-90% | Investigate | +| **Extreme** | < 80% | Block release | + +## Database Views + +Component readiness uses PostgreSQL views for efficient querying: + +**Example view**: `component_readiness_4_16` + +```sql +CREATE VIEW component_readiness_4_16 AS +SELECT + component, + COUNT(*) FILTER (WHERE status='Passed') AS passed, + COUNT(*) FILTER (WHERE status='Failed') AS failed, + COUNT(*) FILTER (WHERE status='Passed')::float / NULLIF(COUNT(*), 0) AS pass_rate +FROM tests +WHERE release='4.16' +GROUP BY component; +``` + +**View generation**: See `/sippy-generate-release-views` skill + +## API Usage + +**Get component readiness**: `/api/componentreadiness?release=4.16` + +**Filter by variant**: `/api/componentreadiness?release=4.16&variant=network:ovn` + +**Component details**: `/api/componentreadiness/Networking?release=4.16` + +## Frontend Display + +**Location**: `sippy-ng/src/component_readiness/` + +**Visualization**: +- Table view: Components with pass rates +- Trend charts: Historical pass rate +- Regression alerts: Highlighted in red + +## Related Concepts + +- [Test](test.md) - Individual test results aggregated into component readiness +- [Release](release.md) - Component readiness tracked per release +- [Variant](variant.md) - Component readiness sliced by variant + +## References + +- API implementation: `pkg/componentreadiness/` +- Database views: `pkg/db/componentreadiness/views.go` +- Frontend: `sippy-ng/src/component_readiness/` +- View generation skill: `/sippy-generate-release-views` diff --git a/ai-docs/domain/index.md b/ai-docs/domain/index.md new file mode 100644 index 000000000..c1ce36a9c --- /dev/null +++ b/ai-docs/domain/index.md @@ -0,0 +1,43 @@ +# Domain Concepts + +Sippy's domain model centers on CI analysis concepts, not Kubernetes CRDs. + +## Core Concepts + +- [job.md](job.md) - Prow CI job execution (top-level unit of analysis) +- [test.md](test.md) - Individual test case within a job (atomic unit of CI signal) +- [variant.md](variant.md) - NURP+ dimensions for slicing data (Network, Upgrade, Release, Platform, etc.) +- [release.md](release.md) - OpenShift version being tracked +- [component-readiness.md](component-readiness.md) - Statistical assessment of component health + +## Concept Relationships + +```text +Release (4.16) + │ + ├── Job (periodic-ci-...-4.16-e2e-aws) + │ ├── Variants (platform:aws, release:4.16) + │ └── Tests + │ ├── Test 1 (passed) + │ ├── Test 2 (failed) + │ └── Test 3 (passed) + │ + └── Component Readiness + ├── Networking (95% pass rate) + ├── Storage (92% pass rate) + └── API (98% pass rate) +``` + +## Data Flow + +1. **Prow** executes jobs → Results to BigQuery +2. **Sippy dataloader** imports BigQuery → PostgreSQL +3. **Variant extraction** from job names → Variants table +4. **Test aggregation** → Component readiness views +5. **Regression detection** → Alerts + +## Navigation + +- For architecture details: See [../architecture/](../architecture/) +- For development workflows: See [../SIPPY_DEVELOPMENT.md](../SIPPY_DEVELOPMENT.md) +- For API details: See [../../pkg/api/README.md](../../pkg/api/README.md) diff --git a/ai-docs/domain/job.md b/ai-docs/domain/job.md new file mode 100644 index 000000000..193b8adbe --- /dev/null +++ b/ai-docs/domain/job.md @@ -0,0 +1,89 @@ +# Domain Concept: Job + +**Type**: CI Job Execution +**Data Source**: BigQuery (`prow.jobs` table) +**Primary API**: `/api/jobs` + +## Purpose + +Represents a single execution of a Prow CI job. Jobs are the top-level unit of CI analysis in Sippy. + +## Key Properties + +| Property | Type | Description | +|----------|------|-------------| +| **Name** | string | Prow job name (e.g., `periodic-ci-openshift-release-master-nightly-4.16-e2e-aws`) | +| **Status** | enum | Success, Failure, Pending, Aborted, Error | +| **Duration** | duration | Job execution time | +| **Timestamp** | timestamp | Job start/end time | +| **Tests** | []Test | Individual test results within the job | +| **Variants** | map[string]string | Extracted NURP+ variant dimensions | + +## Naming Convention + +Prow jobs follow pattern: `-ci-----` + +**Example**: `periodic-ci-openshift-release-master-nightly-4.16-e2e-aws` +- Frequency: `periodic` +- Type: `nightly` +- Release: `4.16` +- Platform: `aws` +- Test suite: `e2e` + +## Job Lifecycle + +1. **Triggered**: Prow scheduler creates job (periodic/presubmit/postsubmit) +2. **Running**: Job executes tests in cluster +3. **Completed**: Results uploaded to BigQuery +4. **Loaded**: Sippy dataloader imports to PostgreSQL +5. **Analyzed**: Statistics calculated, regressions detected + +## Variant Extraction + +Sippy parses job names to extract variant dimensions: + +```go +// pkg/variantregistry +type Variant struct { + Network string // e.g., "sdn", "ovn" + Upgrade string // e.g., "upgrade", "micro" + Platform string // e.g., "aws", "gcp", "azure" + ... +} +``` + +See [variant.md](variant.md) for details. + +## Database Schema + +**Table**: `prow_job_run_tests` + +```sql +CREATE TABLE prow_job_run_tests ( + id SERIAL PRIMARY KEY, + job_name TEXT, + test_name TEXT, + status TEXT, + duration INTERVAL, + timestamp TIMESTAMPTZ, + ... +); +``` + +## Common Queries + +**Job pass rate**: `SELECT COUNT(*) FILTER (WHERE status='Success') / COUNT(*) FROM jobs WHERE name=?` + +**Recent failures**: `SELECT * FROM jobs WHERE status='Failure' ORDER BY timestamp DESC LIMIT 10` + +## Related Concepts + +- [Test](test.md) - Individual test results within a job +- [Variant](variant.md) - Extracted dimensions from job name +- [Release](release.md) - OpenShift version being tested + +## References + +- API implementation: `pkg/api/jobs.go` +- Data loader: `pkg/dataloader/jobs.go` +- Database models: `pkg/db/models/job.go` diff --git a/ai-docs/domain/release.md b/ai-docs/domain/release.md new file mode 100644 index 000000000..397010c60 --- /dev/null +++ b/ai-docs/domain/release.md @@ -0,0 +1,88 @@ +# Domain Concept: Release + +**Type**: OpenShift Version +**Data Source**: Configuration + BigQuery +**Primary API**: `/api/releases` + +## Purpose + +Represents an OpenShift version being tracked by Sippy. Releases are the primary organizational unit for CI analysis. + +## Key Properties + +| Property | Type | Description | +|----------|------|-------------| +| **Name** | string | Version (e.g., `4.16`, `4.17`) | +| **Status** | enum | `Active`, `GA`, `Prerelease`, `Archived` | +| **GA Date** | timestamp | General availability date | +| **Stream** | string | `nightly`, `ci`, `stable` | + +## Release Lifecycle + +1. **Prerelease**: Development phase (e.g., `4.17.0-0.nightly`) +2. **Feature Freeze**: No new features, stabilization +3. **Code Freeze**: Critical fixes only +4. **GA**: General availability (e.g., `4.17.0`) +5. **Stable**: Maintenance (z-stream releases like `4.17.1`) +6. **Archived**: No longer tracked + +## Release Configuration + +**File**: `config/releases.yaml` (example, actual config may differ) + +```yaml +releases: + - name: "4.16" + ga_date: "2024-06-01" + status: "GA" + streams: ["nightly", "ci"] + - name: "4.17" + ga_date: "2025-01-15" + status: "Prerelease" + streams: ["nightly"] +``` + +## Component Readiness Views + +When a release goes GA, Sippy generates component readiness views: + +**Process**: +1. Run `/sippy-generate-release-views` to create views +2. Views track component health for the release +3. Run `/sippy-update-ga-release-views` when GA to update status + +See [Component Readiness](component-readiness.md) for details. + +## Database Schema + +**Table**: `releases` + +```sql +CREATE TABLE releases ( + id SERIAL PRIMARY KEY, + name TEXT UNIQUE, + ga_date TIMESTAMPTZ, + status TEXT, + ... +); +``` + +## Common Queries + +**Active releases**: `SELECT * FROM releases WHERE status IN ('Active', 'GA')` + +**Jobs for release**: `SELECT * FROM jobs WHERE release='4.16'` + +**Component readiness for release**: `SELECT * FROM component_readiness WHERE release='4.16'` + +## Related Concepts + +- [Variant](variant.md) - Release is a key variant dimension +- [Component Readiness](component-readiness.md) - Per-release component health tracking +- [Job](job.md) - Jobs are associated with releases + +## References + +- Release configuration: `config/` +- API implementation: `pkg/api/releases.go` +- Database models: `pkg/db/models/release.go` diff --git a/ai-docs/domain/test.md b/ai-docs/domain/test.md new file mode 100644 index 000000000..b3f9aca68 --- /dev/null +++ b/ai-docs/domain/test.md @@ -0,0 +1,99 @@ +# Domain Concept: Test + +**Type**: Test Case Execution +**Data Source**: BigQuery (extracted from junit XMLs) +**Primary API**: `/api/tests` + +## Purpose + +Represents a single test case execution within a job. Tests are the atomic unit of CI signal analysis. + +## Key Properties + +| Property | Type | Description | +|----------|------|-------------| +| **Name** | string | Test identifier (e.g., `[sig-network] should allow traffic`) | +| **Status** | enum | Passed, Failed, Skipped, Flake | +| **Duration** | duration | Test execution time | +| **Job** | Job | Parent job containing this test | +| **FailureMessage** | string | Error message if failed | + +## Test Identification + +Tests are identified by normalized name across jobs: + +```go +// pkg/testidentification +type TestIdentifier struct { + Name string // Normalized name + Suite string // e.g., "openshift-tests", "kubernetes" + Component string // e.g., "Networking", "Storage" +} +``` + +**Normalization**: Removes timestamps, UUIDs, cluster-specific details to enable cross-job aggregation. + +## Test Lifecycle + +1. **Executed**: Test runs in CI job +2. **Reported**: Result written to junit XML +3. **Uploaded**: junit XML uploaded to GCS +4. **Parsed**: BigQuery parses junit XML +5. **Loaded**: Sippy imports to PostgreSQL +6. **Aggregated**: Statistics calculated across jobs/variants + +## Pass Rate Calculation + +**Formula**: `pass_rate = passed_count / (passed_count + failed_count)` + +**Skipped tests**: Excluded from pass rate calculation + +**Flakes**: Tests that sometimes pass, sometimes fail (tracked separately) + +## Regression Detection + +Sippy compares current pass rate vs historical baseline: + +```go +if currentPassRate < (historicalPassRate - threshold) { + // Flag as regression +} +``` + +**Threshold**: Configurable per release (default: 5% drop) + +## Database Schema + +**Table**: `prow_job_run_test_outputs` + +```sql +CREATE TABLE prow_job_run_test_outputs ( + id SERIAL PRIMARY KEY, + prow_job_run_test_id INTEGER REFERENCES prow_job_run_tests(id), + test_name TEXT, + status TEXT, + duration INTERVAL, + failure_message TEXT, + ... +); +``` + +## Common Queries + +**Test pass rate**: `SELECT COUNT(*) FILTER (WHERE status='Passed') / COUNT(*) FROM tests WHERE name=?` + +**Flaky tests**: `SELECT name, COUNT(DISTINCT status) FROM tests GROUP BY name HAVING COUNT(DISTINCT status) > 1` + +**Top failures**: `SELECT name, COUNT(*) FROM tests WHERE status='Failed' GROUP BY name ORDER BY COUNT DESC LIMIT 10` + +## Related Concepts + +- [Job](job.md) - Parent job execution +- [Variant](variant.md) - Test results sliced by variant +- [Component Readiness](component-readiness.md) - Aggregated test statistics + +## References + +- API implementation: `pkg/api/tests.go` +- Test identification: `pkg/testidentification/` +- Database models: `pkg/db/models/test.go` diff --git a/ai-docs/domain/variant.md b/ai-docs/domain/variant.md new file mode 100644 index 000000000..8c7befd9a --- /dev/null +++ b/ai-docs/domain/variant.md @@ -0,0 +1,101 @@ +# Domain Concept: Variant + +**Type**: CI Job Dimension +**Data Source**: Extracted from job names +**Primary API**: `/api/variants` + +## Purpose + +Variants are dimensions that characterize how a job is configured. Sippy uses the NURP+ model to slice test results by variants. + +## NURP+ Model + +**NURP** = **N**etwork, **U**pgrade, **R**elease, **P**latform +**Plus (+)**: Architecture, Installer, Topology, FeatureSet, etc. + +| Variant | Examples | Description | +|---------|----------|-------------| +| **Network** | `sdn`, `ovn` | Network plugin | +| **Upgrade** | `upgrade`, `micro` | Upgrade type | +| **Release** | `4.15`, `4.16` | OpenShift version | +| **Platform** | `aws`, `gcp`, `azure`, `metal` | Cloud provider | +| **Architecture** | `amd64`, `arm64`, `s390x` | CPU architecture | +| **Installer** | `ipi`, `upi` | Installation method | +| **Topology** | `ha`, `single-node` | Cluster topology | +| **FeatureSet** | `techpreview`, `default` | Feature gates | + +## Variant Extraction + +Sippy parses job names using regex patterns: + +```go +// pkg/variantregistry/registry.go +type VariantRegistry struct { + Variants []VariantDefinition +} + +type VariantDefinition struct { + Name string // "Platform" + Pattern *regexp.Regexp // Regex to extract from job name + Values []string // Valid values +} +``` + +**Example job name**: `periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-upgrade` + +**Extracted variants**: +- Release: `4.16` +- Platform: `aws` +- Network: `ovn` +- Upgrade: `upgrade` + +## Variant Registry + +**Location**: `pkg/variantregistry/` + +**Configuration**: Variant definitions live in code (not configuration files) + +**Adding new variant**: +1. Define pattern in `pkg/variantregistry/` +2. Update API to expose variant +3. Update frontend to filter by variant + +## API Usage + +**Get jobs by variant**: `/api/jobs?release=4.16&platform=aws&network=ovn` + +**Component readiness by variant**: `/api/componentreadiness?release=4.16&variant=network:ovn` + +## Database Schema + +Variants are stored as JSONB in PostgreSQL: + +```sql +CREATE TABLE prow_job_run_tests ( + ... + variants JSONB, -- {"network": "ovn", "platform": "aws"} + ... +); +``` + +**Index**: `CREATE INDEX idx_variants ON prow_job_run_tests USING gin(variants);` + +## Common Variant Combinations + +| Combination | Purpose | +|-------------|---------| +| `4.16 + aws + ovn` | Standard AWS OVN testing | +| `4.16 + metal + sdn + upgrade` | Bare metal upgrade testing | +| `4.16 + gcp + ovn + single-node` | Single-node GCP testing | + +## Related Concepts + +- [Job](job.md) - Jobs are characterized by variants +- [Release](release.md) - Release is a primary variant +- [Component Readiness](component-readiness.md) - Statistics sliced by variant + +## References + +- Variant registry: `pkg/variantregistry/` +- API filtering: `pkg/api/filter.go` +- Frontend variant selector: `sippy-ng/src/components/VariantSelector.js` diff --git a/ai-docs/exec-plans/README.md b/ai-docs/exec-plans/README.md new file mode 100644 index 000000000..b3d17d3d4 --- /dev/null +++ b/ai-docs/exec-plans/README.md @@ -0,0 +1,46 @@ +# Exec-Plans + +Active feature implementation tracking for Sippy. + +## What are Exec-Plans? + +Exec-plans bridge the gap between enhancements (design) and PRs (implementation). Use them to track multi-week features that span multiple PRs. + +**See [Tier 1 Exec-Plans Guide](https://github.com/openshift/enhancements/tree/master/ai-docs/workflows/exec-plans/) for**: +- Templates +- When to use exec-plans +- How to structure exec-plans +- Completion workflow + +## Structure + +```text +exec-plans/ +└── active/ # Active features being implemented +``` + +**Note**: No `completed/` directory. When a feature is done, extract knowledge to ADRs/architecture docs, then delete the exec-plan. + +## When to Use Exec-Plans + +**Use exec-plans for**: +- Multi-week features (> 5 PRs) +- Cross-module changes (backend + frontend + dataloader) +- Features requiring coordination (database migration + API + UI) + +**Don't use exec-plans for**: +- Single PR fixes +- Small features (< 3 PRs) +- Maintenance work + +## Template + +See [Tier 1 Template](https://github.com/openshift/enhancements/tree/master/ai-docs/workflows/exec-plans/template.md) + +## Completion Workflow + +1. **Extract knowledge**: Add learnings to ADRs or architecture docs +2. **Update docs**: Reflect new architecture in `architecture/components.md` +3. **Delete exec-plan**: Remove file from `active/` + +See [Tier 1 Guide](https://github.com/openshift/enhancements/tree/master/ai-docs/workflows/exec-plans/README.md#completion-workflow) for details. diff --git a/ai-docs/references/ecosystem.md b/ai-docs/references/ecosystem.md new file mode 100644 index 000000000..ae3a05022 --- /dev/null +++ b/ai-docs/references/ecosystem.md @@ -0,0 +1,73 @@ +# Ecosystem References + +Links to Tier 1 platform documentation. Sippy is a component tool, not a platform component, but these patterns are still valuable for development. + +**Tier 1 Hub**: https://github.com/openshift/enhancements/tree/master/ai-docs + +## Testing Practices (Tier 1) + +Sippy-specific testing: See [../SIPPY_TESTING.md](../SIPPY_TESTING.md) + +**Generic patterns** (Tier 1): +- [Testing Pyramid](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/testing/pyramid.md) - 60/30/10 ratio guidance +- [E2E Framework](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/testing/e2e-framework.md) - E2E test structure +- [Test Organization](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/testing/) - Where to put tests + +## Security Practices (Tier 1) + +**Generic patterns** (Tier 1): +- [STRIDE Threat Modeling](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/security/stride.md) +- [Secret Handling](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/security/secrets.md) +- [Input Validation](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/security/) - API input sanitization + +## Development Practices (Tier 1) + +Sippy-specific development: See [../SIPPY_DEVELOPMENT.md](../SIPPY_DEVELOPMENT.md) + +**Generic patterns** (Tier 1): +- [API Evolution](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/development/api-evolution.md) - Versioning, compatibility +- [Code Organization](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/development/) - Repository structure +- [Documentation](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/development/) - What to document + +## Reliability Practices (Tier 1) + +**Generic patterns** (Tier 1): +- [SLI/SLO/SLA](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/reliability/slo.md) - Service level objectives +- [Observability](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/reliability/observability.md) - Metrics, logging +- [Degraded States](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/reliability/) - Graceful degradation + +## Go Best Practices + +**Sippy-specific**: Go idiomatic patterns in `pkg/` + +**Generic patterns** (Tier 1): +- [Go Standards](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/development/go.md) - Idiomatic Go +- [Error Handling](https://github.com/openshift/enhancements/tree/master/ai-docs/practices/development/) - Error patterns + +## Database Patterns + +**Sippy-specific**: See [ADR-0002](../decisions/adr-0002-component-readiness-views.md) for component readiness views + +**Generic patterns** (search OpenShift dev guides): +- PostgreSQL best practices +- Migration strategies +- Query optimization + +## Frontend Patterns + +**Sippy-specific**: React components in `sippy-ng/src/` + +**Generic patterns** (not in Tier 1, external): +- React best practices +- Material-UI patterns +- State management + +## Cross-Repo ADRs (Tier 1) + +**Note**: Sippy is not a platform component, but these provide context on OpenShift architecture: + +- [Why etcd](https://github.com/openshift/enhancements/tree/master/ai-docs/decisions/adr-0001-etcd.md) - Platform state storage +- [Why CVO Orchestration](https://github.com/openshift/enhancements/tree/master/ai-docs/decisions/) - Operator lifecycle management +- [Why Immutable Nodes](https://github.com/openshift/enhancements/tree/master/ai-docs/decisions/) - Node update strategy + +**Sippy-specific ADRs**: See [../decisions/](../decisions/)