Skip to content

Latest commit

 

History

History
691 lines (480 loc) · 17.4 KB

File metadata and controls

691 lines (480 loc) · 17.4 KB

Contributing to Transcript Create

Thank you for considering contributing to Transcript Create! 🎉

We're excited to have you here. Whether you're fixing a bug, adding a feature, improving documentation, or helping others, every contribution makes a difference.

This document provides guidelines and information about our development process to help you contribute effectively.

Code of Conduct

This project and everyone participating in it is governed by our Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to security@subculture.community.

Table of Contents

Getting Started

  1. Fork the repository on GitHub
  2. Clone your fork locally
  3. Create a new branch for your feature or bug fix
  4. Make your changes following our code quality guidelines
  5. Run tests and linting locally
  6. Push to your fork and submit a pull request

New to open source? Check out our First-Time Contributors Guide for a detailed walkthrough.

Development Setup

Backend Setup

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install development tools
pip install ruff black isort mypy pytest pre-commit

# Set up pre-commit hooks
pre-commit install

Running Tests

Backend Tests

# Install test dependencies
pip install -r requirements-dev.txt

# Set up test database (PostgreSQL required)
export DATABASE_URL="postgresql+psycopg://postgres:postgres@localhost:5432/postgres"

# Run migrations to set up schema
python scripts/run_migrations.py upgrade

# Run all tests
pytest tests/

# Run tests with coverage
pytest tests/ --cov=app --cov-report=html --cov-report=term

# Run specific test file
pytest tests/test_routes_jobs.py -v

# View HTML coverage report
open htmlcov/index.html  # macOS
xdg-open htmlcov/index.html  # Linux

Test Structure:

  • tests/conftest.py - Shared fixtures (database, test client)
  • tests/test_crud.py - CRUD operation tests
  • tests/test_routes_*.py - API endpoint tests
  • tests/test_schemas.py - Pydantic model validation

See tests/README.md for detailed testing documentation.

Frontend Setup

cd frontend
npm install

Running Locally

See the main README.md for detailed instructions on running the full stack with Docker Compose or individual services.

Database Migrations

We use Alembic to manage database schema changes. Migrations provide version control for the database schema and enable safe, reproducible schema evolution across environments.

Understanding Migrations

  • Migrations are stored in alembic/versions/
  • Each migration has an upgrade() function to apply changes and a downgrade() function to revert them
  • Migrations are applied sequentially in the order they were created
  • The alembic_version table tracks which migrations have been applied

Running Migrations

Using the Helper Script

# Apply all pending migrations
python scripts/run_migrations.py upgrade

# Check current migration version
python scripts/run_migrations.py current

# View migration history
python scripts/run_migrations.py history

# Downgrade one migration (careful in production!)
python scripts/run_migrations.py downgrade

# Stamp database at a specific revision (for existing databases)
python scripts/run_migrations.py stamp head

Using Alembic Directly

# Apply all pending migrations
alembic upgrade head

# Upgrade to a specific revision
alembic upgrade abc123

# Downgrade to a specific revision
alembic downgrade def456

# Downgrade one revision
alembic downgrade -1

# Show current revision
alembic current

# Show migration history
alembic history --verbose

Creating New Migrations

When making schema changes, you must create a migration:

# Create a new migration file
alembic revision -m "descriptive_name"

# This creates a file like: alembic/versions/20251024_1234_abc123_descriptive_name.py

The generated file contains empty upgrade() and downgrade() functions that you must implement:

def upgrade() -> None:
    """Apply schema changes."""
    # Add a new column
    op.add_column('videos', sa.Column('thumbnail_url', sa.String(), nullable=True))

    # Create an index
    op.create_index('idx_videos_thumbnail', 'videos', ['thumbnail_url'])

def downgrade() -> None:
    """Revert schema changes."""
    # Drop the index
    op.drop_index('idx_videos_thumbnail', 'videos')

    # Drop the column
    op.drop_column('videos', 'thumbnail_url')

Migration Guidelines

  1. Always test both upgrade and downgrade

    # Test upgrade
    python scripts/run_migrations.py upgrade
    
    # Test downgrade
    python scripts/run_migrations.py downgrade
    
    # Re-apply
    python scripts/run_migrations.py upgrade
  2. Write idempotent migrations when possible

    • Use IF NOT EXISTS / IF EXISTS clauses
    • Check for existence before creating/dropping objects
    • Handle cases where migration is partially applied
  3. Keep migrations focused and atomic

    • One logical change per migration
    • Don't mix DDL and data migrations
    • Easier to review, test, and potentially revert
  4. Document complex migrations

    • Add comments explaining the purpose
    • Document any manual steps required
    • Note any data transformations
  5. Test with production-like data

    • Test on a copy of production data when possible
    • Consider performance impact of migrations
    • Plan for zero-downtime deployment if needed
  6. Never edit existing migrations

    • Once a migration is committed and deployed, never modify it
    • Create a new migration to fix issues
    • Exception: migrations not yet in main branch

For Existing Databases

If you have an existing database created from sql/schema.sql, you need to "stamp" it to indicate it's at the baseline:

# Stamp the database as being at the initial migration
export DATABASE_URL="postgresql+psycopg://postgres:postgres@localhost:5432/transcripts"
python scripts/run_migrations.py stamp head

This tells Alembic that your database already has the baseline schema, so it won't try to re-apply it.

Docker Compose Integration

When running with Docker Compose, migrations are automatically applied on startup via the migrations service:

services:
  migrations:
    image: transcript-create:latest
    command: ["python3", "scripts/run_migrations.py", "upgrade"]
    depends_on:
      db:
        condition: service_healthy

The API and worker services wait for migrations to complete before starting.

CI/CD Validation

All migrations are automatically validated in CI:

  • Fresh Database Test: Applies migrations to an empty database
  • Existing Schema Test: Stamps an existing schema and verifies no conflicts
  • Up/Down Test: Tests upgrade and downgrade functionality

See .github/workflows/migrations-ci.yml for details.

Migration Templates and Examples

For detailed migration examples and templates, see:

Quick Examples

Adding a column:

def upgrade() -> None:
    op.execute("ALTER TABLE videos ADD COLUMN IF NOT EXISTS thumbnail_url TEXT")

def downgrade() -> None:
    op.execute("ALTER TABLE videos DROP COLUMN IF EXISTS thumbnail_url")

Creating an index:

def upgrade() -> None:
    op.execute("CREATE INDEX IF NOT EXISTS idx_videos_youtube_id ON videos(youtube_id)")

def downgrade() -> None:
    op.execute("DROP INDEX IF EXISTS idx_videos_youtube_id")

For more examples including:

  • Adding tables
  • Data migrations
  • Enum modifications
  • Triggers and functions
  • Concurrent indexes
  • Constraint additions

See alembic/MIGRATION_TEMPLATE.md.

Code Quality

We maintain high code quality standards through automated linting, formatting, and type checking.

Python (Backend & Worker)

Linting:

# Check code quality
ruff check app/ worker/ scripts/

# Auto-fix issues
ruff check --fix app/ worker/ scripts/

Formatting:

# Check formatting
black --check app/ worker/ scripts/

# Auto-format
black app/ worker/ scripts/

Import Sorting:

# Check imports
isort --check-only app/ worker/

# Auto-sort imports
isort app/ worker/

Type Checking:

# Run type checks
mypy app/ worker/

Configuration:

  • Line length: 120 characters
  • Target Python version: 3.11+
  • Configuration in pyproject.toml

TypeScript (Frontend)

Linting:

cd frontend
npm run lint

Formatting:

cd frontend
# Check formatting
npm run format:check

# Auto-format
npm run format

Type Checking:

cd frontend
npx tsc --noEmit

Building:

cd frontend
npm run build

Pre-commit Hooks

We use pre-commit hooks to catch issues before they're committed:

# Install hooks (one-time setup)
pre-commit install

# Run manually on all files
pre-commit run --all-files

The hooks automatically run:

  • ruff: Fast Python linter
  • black: Code formatter
  • isort: Import sorting
  • mypy: Type checking
  • gitleaks: Secret detection
  • commitlint: Commit message validation
  • Additional checks for trailing whitespace, YAML/JSON/TOML syntax

Commit Message Guidelines

We follow Conventional Commits for automated changelog generation and semantic versioning.

Format:

<type>: <subject>

[optional body]

[optional footer(s)]

Types:

  • feat: New feature (triggers MINOR version bump)
  • fix: Bug fix (triggers PATCH version bump)
  • docs: Documentation changes
  • style: Code style changes (formatting, semicolons, etc.)
  • refactor: Code refactoring without changing functionality
  • perf: Performance improvements
  • test: Adding or updating tests
  • build: Build system or dependency changes
  • ci: CI/CD configuration changes
  • chore: Other changes that don't modify src or test files
  • revert: Reverting a previous commit

Breaking Changes:

Add BREAKING CHANGE: in the footer or append ! after the type (triggers MAJOR version bump):

feat!: remove support for Python 3.10

BREAKING CHANGE: Minimum Python version is now 3.11

Examples:

feat: add speaker diarization support
fix: resolve memory leak in audio processing
docs: update API authentication guide
perf: optimize database query for search
test: add integration tests for job creation
ci: add release workflow for automated versioning

Using Commitizen (Interactive):

If you prefer an interactive prompt:

# Install commitizen (one-time)
npm install

# Use interactive commit
npm run commit

The pre-commit hook will validate your commit message format automatically.

CI/CD Pipeline

All pull requests and pushes to main automatically trigger our CI/CD pipeline.

Backend CI (backend-ci.yml)

Runs on changes to:

  • app/**
  • worker/**
  • *.py files
  • requirements.txt
  • pyproject.toml

Jobs:

  1. Lint & Format Check (Python 3.11, 3.12)

    • ruff check
    • black check
    • isort check
    • mypy type check (informational)
  2. Security Scan

    • pip-audit (dependency vulnerabilities)
    • bandit (code security issues)
  3. Test with PostgreSQL

    • Apply database schema
    • Run pytest suite with coverage
    • Generate coverage reports (XML, HTML, terminal)
    • Check 70%+ coverage threshold
    • Upload coverage artifacts
    • Add GitHub Actions summary with coverage stats
  4. Docker Build

    • Build Docker image (CPU-compatible check)
    • Verify image builds successfully

Frontend CI (frontend-ci.yml)

Runs on changes to:

  • frontend/**

Jobs:

  1. Lint & Type Check (Node 20, 22)

    • ESLint (informational - some errors pre-existing)
    • Prettier formatting (enforced)
    • TypeScript type check
  2. Build Verification

    • Vite build
    • Bundle size check (warns if > 500KB)
    • Upload build artifacts

Docker Build & Publish (docker-build.yml)

Runs on:

  • Push to main
  • Tags matching v*
  • Manual workflow dispatch

Features:

  • Builds Docker image with ROCm support
  • Publishes to GitHub Container Registry (ghcr.io)
  • Multiple tagging strategies (latest, semver, sha)
  • Layer caching for fast rebuilds
  • SBOM and provenance attestations
  • Build time verification (target < 5 min with cache)

Security Audit (security-audit.yml)

Runs on:

  • Push/PR to main/develop (when dependency files change)
  • Weekly schedule (Mondays at 9 AM UTC)
  • Manual workflow dispatch

Checks:

  • Dependency vulnerabilities (pip-audit, safety)
  • Secret scanning (gitleaks)

Pull Request Process

  1. Create a branch from main with a descriptive name

    • Feature: feature/description
    • Bug fix: fix/description
    • Enhancement: enhance/description
  2. Make your changes

    • Write clear, concise commit messages
    • Follow code quality guidelines
    • Add tests if applicable
  3. Test locally

    # Backend
    pytest tests/
    ruff check app/ worker/
    black --check app/ worker/
    
    # Frontend
    cd frontend
    npm run lint
    npm run format:check
    npm run build
  4. Run pre-commit hooks

    pre-commit run --all-files
  5. Push and create PR

    • All CI checks must pass (see status badges on PR)
    • Provide clear description of changes
    • Link related issues
  6. Code Review

    • Address reviewer feedback
    • Ensure all CI checks remain green
  7. Merge

    • Once approved and all checks pass, maintainers will merge

Branch Protection Rules

The main branch is protected with the following requirements:

Required Status Checks

Before merging to main, the following checks must pass:

Backend CI:

  • ✅ Lint & Format Check (Python 3.11)
  • ✅ Lint & Format Check (Python 3.12)
  • ✅ Security Scan
  • ✅ Test with PostgreSQL
  • ✅ Docker Build

Frontend CI:

  • ✅ Lint & Type Check (Node 20)
  • ✅ Lint & Type Check (Node 22)
  • ✅ Build Verification

Note: Some checks use continue-on-error: true for informational warnings (mypy, ESLint some rules, security scans) that don't block merges but should be addressed when possible.

Additional Requirements

  • Require branches to be up to date: PRs must be rebased on latest main
  • Require pull request reviews: At least one approving review from maintainers
  • Dismiss stale reviews: New commits dismiss previous approvals
  • No force pushes: Protect commit history
  • Linear history: Prefer squash or rebase merges

Workflow Timing

Our CI/CD is designed for fast feedback:

  • Target: Most checks complete in < 5 minutes
  • Docker builds: < 5 min with layer caching, < 15 min cold
  • Full test suite: < 3 minutes

If checks take significantly longer, please report as an issue.

First-Time Contributors

👋 New to the project? Welcome! We're here to help.

Start here:

  1. Read our First-Time Contributors Guide for a step-by-step walkthrough
  2. Look for issues labeled good first issue - these are beginner-friendly
  3. Check out our Development Setup guide
  4. Don't hesitate to ask questions!

Tips for success:

  • Start small - documentation fixes and small bug fixes are great first contributions
  • Ask questions early and often - we're happy to help
  • Read existing code and pull requests to understand our style
  • Join discussions in issues to learn more about the project

Getting Help

If you have questions or need help:

  1. Documentation: Check the README.md and docs/ folder
  2. Existing Issues: Search existing issues for similar questions
  3. Ask a Question: Open a new issue with the question template
  4. Development Questions: Check docs/development/ for architecture and code guidelines

We strive to respond to all questions within 48 hours. Don't be shy - there are no stupid questions!

Security

Please review docs/security.md for:

  • Reporting security vulnerabilities
  • Secrets management guidelines
  • Production security checklist
  • Dependency update procedures

Recognition

All contributors are recognized in our docs/contributors.md file. Your contributions, big or small, are valuable to us!

License

By contributing to Transcript Create, you agree that your contributions will be licensed under the Apache License 2.0. See LICENSE for details.


Thank you for contributing! 🚀 Your help makes Transcript Create better for everyone.