Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
338 commits
Select commit Hold shift + click to select a range
8ddcf0e
fix: lineage dependancies
spideystreet Dec 8, 2025
a9a78e1
docs: add dbt models documentation
spideystreet Dec 8, 2025
f7c1fc6
feat(dbt): add staging and intermediate models for scraper ELT
spideystreet Dec 8, 2025
de7a18f
feat(dbt): update pivot and prod models for ELT
spideystreet Dec 8, 2025
54bab84
feat(scraper): update assets to write to raw tables and link to dbt
spideystreet Dec 8, 2025
52274c2
feat(embedding): update context preparation to use flat dbt columns
spideystreet Dec 8, 2025
56437b5
refactor(pipeline): remove legacy python enrichment assets
spideystreet Dec 8, 2025
6b2d285
refactor(elt): migrate schema, implement upsert, and streamline dbt m…
spideystreet Dec 8, 2025
9e86468
docs: up env example
spideystreet Dec 8, 2025
ee57018
refactor(elt): rename prod model and update env example
spideystreet Dec 8, 2025
ff49a09
refactor: no map config needed anymore
spideystreet Dec 14, 2025
7a73b16
feat(pipeline): implement tech stack sync and fix classification assets
spideystreet Dec 17, 2025
1322b96
fix(ingestion): update readme asset schema, group and persist logic
spideystreet Dec 17, 2025
7a365c8
fix(ingestion): update languages asset schema, group and persist logic
spideystreet Dec 17, 2025
f506294
fix(ingestion): update topics asset schema, group and persist logic
spideystreet Dec 17, 2025
b68c599
fix(ingestion): update extract asset group and cleanup logic
spideystreet Dec 17, 2025
9ad0d2d
fix(ingestion): update load asset group name
spideystreet Dec 17, 2025
8365121
chore(jobs): remove legacy embedding_jobs.py and cleanup
spideystreet Dec 17, 2025
aef2713
style(resources): translate comments to english
spideystreet Dec 17, 2025
4a98387
chore(config): update dagster definitions and sensor
spideystreet Dec 17, 2025
60807f6
build(deps): add transformers and accelerate
spideystreet Dec 17, 2025
c368046
chore(db): update prisma schema with new models and trending field
spideystreet Dec 17, 2025
1473775
fix: readme link
spideystreet Dec 17, 2025
72ac35c
refactor(dbt): reorganize models by domain (users/projects) and clean…
spideystreet Dec 17, 2025
f396eda
chore(db): remove dbt-managed IntGithubProject from prisma schema
spideystreet Dec 17, 2025
835fab9
chore(dbt): update project configuration for new model structure
spideystreet Dec 17, 2025
984db56
feat(dbt): add context generation and utility macros
spideystreet Dec 17, 2025
f5e8f07
chore(scripts): update language fixtures generator to use correct schema
spideystreet Dec 17, 2025
3443503
Merge pull request #17 from opensource-together/ost-410-feat-projects…
spideystreet Dec 17, 2025
96c9883
Merge branch 'ost-408-feat-embeddings-for-cosine-similarities' of htt…
spideystreet Dec 17, 2025
dd321b9
fix(pipeline): remove shadowing sensors.py to allow package import
spideystreet Dec 17, 2025
288aa71
docs: simplify README description to be product-focused
spideystreet Dec 20, 2025
e582636
docs: up README
spideystreet Dec 20, 2025
899df7b
docs: update quick start guide with poetry and docker commands
spideystreet Dec 20, 2025
cea0449
style(resources): translate comments to english in LLM classifier
spideystreet Dec 20, 2025
cf15a04
perf(llm): optimize prompt to reduce tokens and strict json format
spideystreet Dec 20, 2025
fbc41c2
feat: improve context with cat & domain only
spideystreet Dec 20, 2025
e2e84ad
test(dbt): add unique, not_null and relationship tests to staging/int…
spideystreet Dec 21, 2025
0951e05
test(dbt): ensure projects have a url
spideystreet Dec 21, 2025
283112b
feat(dbt): implement ml context pipeline (stg_public_project, raw_pro…
spideystreet Dec 21, 2025
0ae9821
feat(ml): add embedding pipeline (resource, asset, job)
spideystreet Dec 21, 2025
5469248
fix(pipeline): explicit public/project dependency via asset key
spideystreet Dec 21, 2025
aabc1c7
docs(dbt): explain raw_github_readme dependency in stg_public_project
spideystreet Dec 21, 2025
b348db1
fix(dbt): restore missing CTE definition in stg_public_project
spideystreet Dec 21, 2025
8c3bb46
refactor(dbt): centralize ml config in dbt_project.yml
spideystreet Dec 21, 2025
529015d
refactor(dbt): split schema.yml into per-model yamls
spideystreet Dec 21, 2025
55ff320
chore: cleanup unused dbt models, legacy assets, and refactor pipelin…
spideystreet Dec 21, 2025
c4fb157
refactor(pipeline): switch to int->raw->stg flow and cleanup schema
spideystreet Dec 21, 2025
21a08ef
fix(pipeline): refactor IO Manager, fix scraper timeout, and serializ…
spideystreet Dec 21, 2025
8dc71ec
refactor: config on dagster
spideystreet Dec 21, 2025
13ba29d
refactor(config): consolidate config into single cfg_resource.py
spideystreet Dec 21, 2025
2b4ed38
refactor(dbt): optimize clean_llm_context macro for LLM understanding
spideystreet Dec 21, 2025
7361e53
refactor(dbt): enhance generate_project_context with skip_empty logic
spideystreet Dec 21, 2025
13c9eb0
refactor(dbt): add normalization to json_array_to_string macro
spideystreet Dec 21, 2025
6fda9ce
refactor(dbt): rename json_array_to_string to jsonb_to_list
spideystreet Dec 21, 2025
92525ff
refactor(dbt): rename macros for clarity
spideystreet Dec 21, 2025
c23df98
docs(dbt): update model contracts with concise descriptions
spideystreet Dec 21, 2025
f8b71fd
refactor(dbt): rename ML models and organize into subdirectories
spideystreet Dec 21, 2025
c4914b3
fix(pipeline): update embed asset to source from pvt_public_project
spideystreet Dec 21, 2025
d380484
refactor(pipeline): rename job and reorganize asset groups
spideystreet Dec 21, 2025
78b56df
refactor(dbt): assign ml_preparation group to ml models
spideystreet Dec 21, 2025
4e50653
fix: io manager key usage instead of pandas one, return correct dicti…
spideystreet Jan 19, 2026
5101eee
chore: debug log for upserting
spideystreet Jan 19, 2026
92ab3e2
fix: added explicit string casting for uuids
spideystreet Jan 19, 2026
c60763b
fix: cast main pid
spideystreet Jan 19, 2026
5544812
fix: asset name for lineafe
spideystreet Jan 19, 2026
35ed09d
feat: add users embedding
spideystreet Jan 19, 2026
85f3856
feat: embedding user asset
spideystreet Jan 19, 2026
2fa835f
feat(dbt): add user models to prepare computing
spideystreet Jan 19, 2026
f586301
fix: column name (context)
spideystreet Jan 19, 2026
d2c023b
fix: last query parameters string
spideystreet Jan 19, 2026
468be47
feat: add matching model projects<->users
spideystreet Jan 20, 2026
954d58e
feat: add ml prep models related to users
spideystreet Jan 20, 2026
12b9e20
feat: add complete flow on dbt project
spideystreet Jan 20, 2026
b5c9a1c
feat: embedding assets projects/users
spideystreet Jan 20, 2026
db79a5f
feat: sync asset to up projects
spideystreet Jan 20, 2026
70a8ede
fix: github default queryarguments limit
spideystreet Jan 20, 2026
92103ab
fix: match view to table
spideystreet Jan 20, 2026
f09a7e4
feat: order by star to limit quality projects
spideystreet Jan 20, 2026
c89c10b
refactor(dbt): assign ml_preparation group to ml/int models
spideystreet Jan 21, 2026
ea7ae31
fix(pipeline): update job selections to match new groups
spideystreet Jan 21, 2026
ba86741
refactor: build user context alligned with projects one
spideystreet Jan 21, 2026
1569bc9
docs(dbt): enhance match recommendation contracts
spideystreet Jan 21, 2026
30a9cf8
feat: add matching models for recommendations
spideystreet Jan 21, 2026
fe60740
feat: add context prep model for machine learning
spideystreet Jan 21, 2026
28f3ea1
docs(dbt): enhance project model contracts
spideystreet Jan 21, 2026
8e29b1b
docs(dbt): update sources.yml contract
spideystreet Jan 22, 2026
057690c
docs(dbt): reco precision
spideystreet Jan 22, 2026
e956d44
fix(pipeline): wire embedding asset to int_project_embedding_candidate
spideystreet Jan 22, 2026
8655385
docs: improve dbt model and dagster asset descriptions
spideystreet Jan 22, 2026
d2f0d9b
chore(dbt): remove stale config for non-existent model int_github_emb…
spideystreet Jan 22, 2026
170211c
config: update excluded terms list for scraper
spideystreet Jan 22, 2026
a0d5fae
chore(infra): dockerize application
spideystreet Jan 22, 2026
4b85c5e
config: 10 ops max for github query
spideystreet Jan 22, 2026
58b8ee1
chore: add logs for classified projects evolution
spideystreet Jan 22, 2026
5b25f33
config: up to date config with needed vars & parameters
spideystreet Jan 26, 2026
dd78aa0
config: up lineage with llm classifier as resource + good parameters …
spideystreet Jan 26, 2026
db6102a
feat: optimised query parameters to find acurate projects
spideystreet Jan 26, 2026
e1df582
config: group name ml
spideystreet Jan 26, 2026
3b69861
build: up dockerignore
spideystreet Jan 26, 2026
7de373c
fix: seed import syntax
spideystreet Jan 26, 2026
0ab4690
docs: up env example
spideystreet Jan 26, 2026
3ed6b16
docs: add embedding & raw tables not managed by dbt, used by linker t…
spideystreet Jan 26, 2026
b37c14e
fix: correct lineage of groups, to ensure they launch together
spideystreet Jan 26, 2026
b279790
build: correct env var usage
spideystreet Jan 26, 2026
5249257
docs: up README to date
spideystreet Jan 26, 2026
210e0ec
feat(prisma): allign with backend & add extensions for linker
spideystreet Jan 28, 2026
bf71e03
build: entrypoint script to dbt build & deps
spideystreet Jan 28, 2026
7bfdf01
chore: up gitignore
spideystreet Jan 28, 2026
e08e0d2
chore(docker): configure entrypoint script and dependencies
spideystreet Jan 28, 2026
a005caf
fix: pg client no need
spideystreet Jan 28, 2026
332d490
chore: entrypoint pg is ready step outdated
spideystreet Jan 28, 2026
381b5f6
feat(schedule): add run_all_schedule 5x daily (Europe/Paris)
spideystreet Jan 28, 2026
25f5f34
feat: migrate LLM classifier to OpenRouter and tune dbt matching logic
spideystreet Jan 30, 2026
4794754
refactor(linker): rename src/pipeline to src/linker
spideystreet Mar 2, 2026
32a98be
docs(claude): split CLAUDE.md into .claude/rules/
spideystreet Mar 2, 2026
fcc9d5b
fix(config): remove hardcoded secret defaults
spideystreet Mar 2, 2026
f194839
fix(go): harden scraper and fetcher with retry, rate-limit, and upsert
spideystreet Mar 2, 2026
b6b2562
refactor(dbt): restructure models from domain-based to layer-based la…
spideystreet Mar 2, 2026
05cd813
refactor(linker): update asset keys to match renamed dbt models
spideystreet Mar 2, 2026
a3938e3
feat(go): add open_issues_count field to scraper struct
spideystreet Mar 2, 2026
80a8d78
fix(dagster): align DAGSTER_HOME path, gitignore, and Dockerfile config
spideystreet Mar 2, 2026
9dc326d
ci(github-actions): add sqlfluff + quality gates to CI workflows
spideystreet Mar 2, 2026
db49305
chore(gitignore): ignore dagster/ runtime directory
spideystreet Mar 2, 2026
cdb55a7
chore(deps): migrate from Poetry to uv
spideystreet Mar 2, 2026
729d7d7
fix(linker): make GitHub query date dynamic instead of stale at import
spideystreet Mar 2, 2026
cde225b
refactor(linker): migrate PipelineConfig from legacy @resource to Con…
spideystreet Mar 2, 2026
f5c6ec6
fix(linker): remove dead site_url/site_name fields from LLM classifier
spideystreet Mar 2, 2026
6a5e4ee
refactor(linker): remove dead scraper utils, unused schedule, and emp…
spideystreet Mar 2, 2026
71583ca
fix(linker): clean up definitions.py dead code and duplicate comments
spideystreet Mar 2, 2026
f8b39c6
refactor(linker): fix embed_projects config access and add encode_batch
spideystreet Mar 2, 2026
b33901b
fix(linker): use encode_batch in embed_projects for batch encoding
spideystreet Mar 2, 2026
13e5439
refactor(resources): migrate PipelineConfig fields to EnvVar
spideystreet Mar 2, 2026
a3fd1da
refactor(resources): migrate IO manager to ConfigurableIOManager with…
spideystreet Mar 2, 2026
73b0b9d
refactor(resources): migrate FastText and LLM resources to EnvVar
spideystreet Mar 2, 2026
c74b9d2
refactor(assets): use build_fetcher_env in fetcher and scraper assets
spideystreet Mar 2, 2026
0a31f04
test(resources): add unit tests for config resource helpers
spideystreet Mar 2, 2026
cc165b1
chore(lint): fix import sorting and unused imports
spideystreet Mar 2, 2026
156d27b
docs: update .env.example, add CONTRIBUTING.md, sync docs submodule
spideystreet Mar 2, 2026
31803b2
feat(resources): add STAR_RANGES and multi-query support to build_scr…
spideystreet Mar 2, 2026
95c77bc
feat(scraper): rewrite Go scraper for parallel multi-query execution
spideystreet Mar 2, 2026
748598b
feat(assets): update raw_github__extract_projects to handle multi-que…
spideystreet Mar 2, 2026
1c0490c
fix(scraper): use token auth header for GitHub PAT
spideystreet Mar 2, 2026
7b27c31
fix(resources): trim EXCLUDED_TERMS to 4 to stay within GitHub NOT limit
spideystreet Mar 2, 2026
ae5d681
fix(assets): access sentence_transformer via context.resources
spideystreet Mar 2, 2026
368e19e
fix(dagster): use cautious indirect selection in dbt build
spideystreet Mar 2, 2026
8616f49
fix(dbt): add asset_key meta to source tables for Dagster key resolution
spideystreet Mar 2, 2026
0061bb8
docs: document GITHUB_API_URL and GITHUB_SCRAPING_QUERIES in .env.exa…
spideystreet Mar 2, 2026
382fffe
chore: add .mypy_cache to .gitignore
spideystreet Mar 2, 2026
9d4646f
docs(contributing): remove Discord link
spideystreet Mar 2, 2026
a96c16c
refactor(dbt): replace binary pre-filter with continuous preference s…
spideystreet Mar 3, 2026
636c4af
fix(dbt): remove FK relationship tests on staging enrichment models
spideystreet Mar 3, 2026
5b34d8c
feat(fetcher): skip already-fetched projects via incremental lookup
spideystreet Mar 3, 2026
ba5d680
refactor(classifier): add hard timeout and httpx timeouts to LLM calls
spideystreet Mar 3, 2026
6ff02f3
feat(seed): add test users with preferences for recommendation testing
spideystreet Mar 3, 2026
c1f5222
chore: minor .env.example formatting
spideystreet Mar 3, 2026
2f71d58
chore: add GitHub issue and PR templates
spideystreet Mar 3, 2026
e85fd26
chore: add Makefile for common dev commands
spideystreet Mar 3, 2026
d6adf2f
chore: add project metadata to pyproject.toml
spideystreet Mar 3, 2026
db498bc
docs: add contributing and license sections to README
spideystreet Mar 3, 2026
605ef53
refactor: DRY Makefile setup target via build-go delegation
spideystreet Mar 3, 2026
960af2d
fix: move dependencies to correct TOML section and resolve all ruff e…
spideystreet Mar 4, 2026
26d854f
fix: add type annotations and resolve all mypy errors
spideystreet Mar 4, 2026
30bf27e
style(dbt): fix all sqlfluff lint errors across models and tests
spideystreet Mar 4, 2026
5029747
fix(dbt): add default values to profiles.yml for CI compatibility
spideystreet Mar 4, 2026
2b25e66
ci: add format check and switch dbt-check job to uv
spideystreet Mar 4, 2026
264f700
style: fix ruff UP038 isinstance union syntax
spideystreet Mar 4, 2026
4e07a05
refactor(ci): extract quality and dbt-check into reusable workflow
spideystreet Mar 4, 2026
1cf2fc2
fix(dbt): use neutral default password in profiles.yml
spideystreet Mar 4, 2026
786bc7a
docs: sync docs submodule with latest AI pages
spideystreet Mar 4, 2026
75a4544
chore(docker): clean up .dockerignore and reduce build context
spideystreet Mar 4, 2026
7f148c1
fix(docker): harden Dockerfile with non-root user, stripped binaries,…
spideystreet Mar 4, 2026
04a6d19
fix(docker): add missing env vars, DB healthcheck, and localhost bind…
spideystreet Mar 4, 2026
a8eeaf0
fix(docker): make init.sh resilient and remove hardcoded defaults
spideystreet Mar 4, 2026
4cb9fb9
chore(dagster): reduce max concurrent runs and document SQLite limita…
spideystreet Mar 4, 2026
c83dc8f
fix: fix .env.example typo and document missing Dagster vars
spideystreet Mar 4, 2026
7e0afe1
feat(dagster): add workspace.yaml and prod config for production depl…
spideystreet Mar 4, 2026
395fee2
fix(docker): split Dagster into webserver and daemon services
spideystreet Mar 4, 2026
16a15d1
fix(docker): add g++ for fasttext and strip editable install from req…
spideystreet Mar 4, 2026
51601a6
refactor(docker): move dev DB to docker-compose.override.yml
spideystreet Mar 4, 2026
4b55f3b
ci(docs): add submodule SHA check and remove obsolete deploy-docs wor…
spideystreet Mar 5, 2026
4675aa9
ci(docs): add workflow to sync submodule changes to ost-docs
spideystreet Mar 5, 2026
bd657ae
chore(docs): update submodule pointer to latest ost-docs
spideystreet Mar 5, 2026
f57ba0c
docs: make README more concise with tech stack table and Makefile qui…
spideystreet Mar 5, 2026
8d43e39
chore: clean up .gitignore and untrack FastText model binary
spideystreet Mar 5, 2026
db648a8
chore: track utility scripts previously hidden by global *.sh ignore
spideystreet Mar 5, 2026
329a516
ci: add Go, Docker, Prisma, security, and coverage checks
spideystreet Mar 5, 2026
14e5e24
chore(deps): add pip-audit to dev dependencies
spideystreet Mar 5, 2026
5ed2431
refactor(docker): install torch CPU-only to reduce image size by ~2GB
spideystreet Mar 5, 2026
ab15212
fix(deps): upgrade dbt-common 1.37.2 → 1.37.3 (GHSA-w75w-9qv4-j5xj)
spideystreet Mar 5, 2026
01b042f
fix(lint): stabilize import sorting between local and CI environments
spideystreet Mar 5, 2026
f6d37d0
fix(ci): fix Prisma, SQLFluff, gitleaks, and docs-sync CI failures
spideystreet Mar 5, 2026
e3135a4
fix(ci): replace paid gitleaks action with free CLI
spideystreet Mar 5, 2026
aacb3c0
ci: enable uv cache for Python CI jobs
spideystreet Mar 5, 2026
821d49e
ci: add gitleaks allowlist for README false positives
spideystreet Mar 5, 2026
a3ae0c0
docs: update submodule pointer after MDX rewrite
spideystreet Mar 5, 2026
eb875a4
feat(dagster): add user_recommendation_job and rebalance schedules
spideystreet Mar 5, 2026
7135725
fix(prisma): fix verification mapping, drop dead ProjectEmbedding, ad…
spideystreet Mar 5, 2026
4e0b1af
refactor(prisma): convert prisma/ to shared submodule
spideystreet Mar 5, 2026
e07e7dc
ci: add prisma submodule checks and sync workflow
spideystreet Mar 5, 2026
eae5718
revert(prisma): convert back from submodule to regular directory
spideystreet Mar 5, 2026
9fd58ac
ci: replace prisma submodule sync with backend file sync
spideystreet Mar 5, 2026
ce9c4a5
ci: add Claude GitHub Actions workflows
spideystreet Mar 6, 2026
93920e9
feat(agents): add 4 custom Claude subagents for project-specific work…
spideystreet Mar 6, 2026
2ec230f
docs(claude): add test-first bug fixing rule to CLAUDE.md
spideystreet Mar 6, 2026
facf2bf
Merge pull request #19 from opensource-together/refactor/project-stru…
spideystreet Mar 6, 2026
1a5ff8a
ci(review): set Claude Sonnet as model for PR review workflow
spideystreet Mar 6, 2026
17dbefa
docs: add CODE_OF_CONDUCT, SECURITY policy, and update CLAUDE.md
spideystreet Mar 6, 2026
82821c5
fix(ci): set write permissions for Claude GitHub Action
spideystreet Mar 6, 2026
7e385d9
fix(ci): skip quality checks and sync workflows on PRs to develop
spideystreet Mar 6, 2026
b0f67db
revert(ci): remove redundant base_ref guards from workflows
spideystreet Mar 6, 2026
32f976b
Merge pull request #20 from opensource-together/fix/post-review-fixes
spideystreet Mar 6, 2026
3f4a56f
test(ci): verify @claude responds on PR comments
spideystreet Mar 6, 2026
a998511
feat(agents): rename agents with JJK theme, add infra agent and CI rules
spideystreet Mar 6, 2026
e646e98
Merge pull request #24 from opensource-together/feat/agents-infra-readme
spideystreet Mar 6, 2026
0bb9872
fix(dbt): remove hardcoded credentials and fix O(n³) join + score clamp
spideystreet Mar 6, 2026
57a1b62
fix: resolve critical and high-severity audit findings across all layers
spideystreet Mar 6, 2026
92858aa
docs(agents): mark fixed vulnerabilities in agent known issues lists
spideystreet Mar 6, 2026
7ccaa6e
Merge pull request #25 from opensource-together/fix/audit-findings
spideystreet Mar 6, 2026
868892a
fix(dagster): resolve job orchestration issues and concurrency conflicts
spideystreet Mar 6, 2026
e6b5b62
refactor(dagster): split ml_preparation into user/project groups and …
spideystreet Mar 6, 2026
806283c
refactor(dagster): merge classification and embedding into project_en…
spideystreet Mar 6, 2026
fce4540
refactor(dagster): restructure groups into project_ml and user_ml flows
spideystreet Mar 6, 2026
ab9648a
chore(dagster): rename files to match exports and remove dead sensor
spideystreet Mar 6, 2026
392d6c6
feat(dbt): add data contracts, tests, and utility macros on mart models
spideystreet Mar 6, 2026
8435dd5
refactor(dbt): integrate clamp/safe_divide macros and enrich intermed…
spideystreet Mar 6, 2026
d1e177a
docs(dbt): add yml contracts for all 8 macros
spideystreet Mar 6, 2026
6066067
docs(dbt): split macro contracts into one yml per macro
spideystreet Mar 6, 2026
7df50ee
docs(dbt): add yml contracts for singular data tests
spideystreet Mar 6, 2026
69a3d26
docs(agents): update dbt-six-eyes with file convention, group mapping…
spideystreet Mar 6, 2026
5b69021
docs: update docs submodule with new orchestration documentation
spideystreet Mar 6, 2026
d08e1f4
docs: update submodule ref with review fixes
spideystreet Mar 6, 2026
6873e0e
fix: resolve findings from final agent review
spideystreet Mar 6, 2026
edc5b3b
refactor(dagster): merge scraper into project_enrichment_job
spideystreet Mar 6, 2026
3125b6d
perf(classification): skip already-classified projects
spideystreet Mar 6, 2026
cdc1328
fix(dbt): cast freshness_score to double precision for contract compl…
spideystreet Mar 6, 2026
06a3a0c
refactor: extract shared utils, harden resources, and fix scraper log…
spideystreet Mar 6, 2026
8ef1f28
test: add comprehensive test suite for Python and Go services
spideystreet Mar 6, 2026
c6cbad7
docs: update project rules, CLAUDE.md, and agent memory
spideystreet Mar 6, 2026
1d5158c
Merge pull request #26 from opensource-together/feat/test-strategy
spideystreet Mar 7, 2026
627d5e2
fix(ci): add git author config in sync workflows (#27)
spideystreet Mar 7, 2026
15a8e64
fix(ci): resolve dbt-check, quality, and docs-submodule CI failures (…
spideystreet Mar 7, 2026
9b562e4
chore(ci): unify sync tokens and add security contact email (#30)
spideystreet Mar 7, 2026
00c445b
fix(ci): rename token to OST_LINKER_SYNC_TOKEN and lower coverage to …
spideystreet Mar 7, 2026
ca07d90
fix(ci): make dagster startup smoke test non-blocking in CI
spideystreet Mar 7, 2026
556a21a
feat(api): add FastAPI REST API for MCP server consumption (#33)
spideystreet Mar 10, 2026
e5f3985
chore(ci): remove claude-code-review and claude mention workflows (#35)
spideystreet Apr 21, 2026
f0e8e36
feat(api): optional X-Service-Token auth for backend-only access (#34)
spideystreet Apr 21, 2026
349baf9
refactor(dagster): clarify bootstrap vs runtime config (#37)
spideystreet Apr 22, 2026
1f2955d
feat(api): Wire API_RATE_LIMIT, optional OpenAPI off, ops/docs (#38)
spideystreet Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 68 additions & 6 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,70 @@
.git/
.github/
docs/

.env*
# ==============================================================================
# STRICT WHITELIST STRATEGY
# 1. Ignore EVERYTHING by default
# ==============================================================================
*

github-scraper
gitlab-scraper
# ==============================================================================
# 2. Allow specific source directories (and their contents)
# ==============================================================================
!src/
!dbt/
!prisma/
!scripts/
!models/

# ==============================================================================
# 3. Allow specific configuration files
# ==============================================================================
!pyproject.toml
!uv.lock
!Dockerfile
!docker-compose.yml
!.env.example
!dagster.yaml
!dagster.prod.yaml
!workspace.yaml
!README.md
!LICENSE

# ==============================================================================
# 4. Filter out junk from the allowed directories
# (Rules here override the !includes above because they come later)
# ==============================================================================
# Python/Bytecode
**/__pycache__
**/*.pyc
**/*.pyo
**/*.pyd

# Mac
**/.DS_Store

# Logs/Tmp
**/logs
**/tmp
**/.cache
**/.pytest_cache
**/.mypy_cache
**/.ruff_cache

# Node
**/node_modules

# DBT
**/dbt_packages
**/target

# Compiled Go binaries (built inside Docker)
src/services/go/**/github-scraper
src/services/go/**/ost-fetcher
src/services/go/**/ost-scraper

# dbt user config
dbt/.user.yml

# Git (redundant with * but safe)
.git
.gitignore
.github
76 changes: 41 additions & 35 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,35 +1,41 @@
# ================================================
# OST AI ENGINE - Environment (example)
# Copy and adapt for your environment. Keep secrets out of VCS.
# ================================================
# ───────────────────────────────────────────────────────── #

DAGSTER_HOME="/app/.dagster_home"
DAGSTER_STORAGE_DIR="/app/.dagster_home/history"
DAGSTER_LOGS_DIR="/app/.dagster_home/logs"

# ───────────────────────────────────────────────────────── #
# Paths inside the container
HOME="/app"
XDG_CACHE_HOME="/app/.cache"
PRISMA_BINARY_CACHE_DIR="/app/.cache/prisma"

# ───────────────────────────────────────────────────────── #
# Project layout
PROJECT_ROOT="/app"
CFG_PATH="/app/config/cfg.py"
OST_CONFIG_PATH="/app/config/cfg.yaml"

# ───────────────────────────────────────────────────────── #
# API tokens (replace with real tokens in your local .docker.env)
# Keep these secret and never commit to git
GITHUB_ACCESS_TOKEN="<your_github_token_here>"
GITLAB_ACCESS_TOKEN="<your_gitlab_token_here>"

# ───────────────────────────────────────────────────────── #
# Database used by docker-compose (service name 'postgres' / container 'ost-db')
# When connecting from host use localhost:7777 (mapped port)
DATABASE_URL="postgresql://postgres:postgres@ost-db:5432/ost_dev?schema=public"
POSTGRES_PASSWORD="postgres"
POSTGRES_USER="postgres"
POSTGRES_DB="ost_dev"
# Copy to .env and fill in. See AGENTS.md for full variable reference.

# Optional host ports for compose (default Dagster 3000, API 8000 if unset).
# DAGSTER_HOST_PORT=3030
# LINKER_API_HOST_PORT=8010

# Database (match compose / Prisma / dbt local profile)
POSTGRES_USER=""
POSTGRES_PASSWORD=""
POSTGRES_DB=""
POSTGRES_PORT=""
POSTGRES_HOST="localhost"
DATABASE_URL="postgresql://<POSTGRES_USER>:<POSTGRES_PASSWORD>@<POSTGRES_HOST>:<POSTGRES_PORT>/<POSTGRES_DB>"

# Dagster (local: absolute path to .../dagster_home; Docker: /app/dagster_home)
DAGSTER_HOME="/app/dagster_home"
# DAGSTER_PG_URL= # defaults to DATABASE_URL in compose
# DAGSTER_STORAGE_DIR= # DAGSTER_LOGS_DIR=

GITHUB_ACCESS_TOKEN="<your_fine_grained_token>"
# GITHUB_API_URL= # GITHUB_SCRAPING_QUERY= # GITHUB_SCRAPING_QUERIES=

GO_SCRAPER_PATH="/path/to/ost-linker/src/services/go/scraper/github-scraper"
GO_FETCHER_PATH="/path/to/ost-linker/src/services/go/fetcher/ost-fetcher"
GO_TRENDING_PATH="/path/to/ost-linker/src/services/go/trending/ost-trending"

FASTTEXT_MODEL_PATH="models/lid.176.ftz"
MISTRAL_API_KEY="<your_mistral_api_key>"

API_HOST=0.0.0.0
API_PORT=8000
API_RATE_LIMIT=60
# Optional: true = require X-Service-Token on routes (still open until OST_LINKER_SERVICE_TOKEN is set; use with strict startup).
OST_LINKER_SERVICE_TOKEN=
# production: true — process exits at startup unless token above is non-empty (see src/services/api/main.py lifespan).
OST_LINKER_REQUIRE_SERVICE_TOKEN=false
# production: false — hides Swagger, ReDoc, and OpenAPI schema JSON.
API_ENABLE_OPENAPI=true

# DBT_TARGET=local # or docker in container
# DBT_PROJECT_DIR=/app/dbt
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @spideystreet
66 changes: 66 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
name: Bug Report
description: Report a bug or unexpected behavior
labels: ["bug"]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to report a bug! Please fill out the sections below.

- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of the bug.
validations:
required: true

- type: textarea
id: steps
attributes:
label: Steps to Reproduce
description: How can we reproduce this issue?
placeholder: |
1. Run `make dev`
2. Navigate to ...
3. See error
validations:
required: true

- type: textarea
id: expected
attributes:
label: Expected Behavior
description: What did you expect to happen?
validations:
required: true

- type: textarea
id: actual
attributes:
label: Actual Behavior
description: What actually happened?
validations:
required: true

- type: textarea
id: environment
attributes:
label: Environment
description: Any relevant environment details.
placeholder: |
- OS: Ubuntu 22.04 / macOS 14 / Windows 11 (WSL2)
- Python: 3.11
- Docker: 24.x
- Go: 1.24
validations:
required: false

- type: textarea
id: logs
attributes:
label: Logs / Error Output
description: Paste any relevant log output or stack traces.
render: shell
validations:
required: false
5 changes: 5 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: Contributing Guide
url: https://github.com/opensource-together/ost-linker/blob/staging/CONTRIBUTING.md
about: Read our contributing guide before opening an issue
40 changes: 40 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Feature Request
description: Suggest a new feature or improvement
labels: ["enhancement"]
body:
- type: markdown
attributes:
value: |
Thanks for suggesting a feature! Please fill out the sections below.

- type: textarea
id: summary
attributes:
label: Summary
description: A brief description of the feature.
validations:
required: true

- type: textarea
id: motivation
attributes:
label: Motivation / Use Case
description: Why is this feature needed? What problem does it solve?
validations:
required: true

- type: textarea
id: solution
attributes:
label: Proposed Solution
description: How would you like this to work?
validations:
required: false

- type: textarea
id: alternatives
attributes:
label: Alternatives Considered
description: Any alternative solutions or features you've considered.
validations:
required: false
16 changes: 16 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## Summary

<!-- What does this PR do? Link related issues with "Closes #123". -->

## Changes

<!-- List the key changes made in this PR. -->

-

## Checklist

- [ ] **`make ci-check`** passes (matches GitHub Actions Python quality job: ruff, format, mypy, unit + API + Dagster smoke — see [CONTRIBUTING.md](../CONTRIBUTING.md))
- [ ] If you changed `dbt/`: `cd dbt && uv run dbt parse` (and `dbt test` when you have a matching DB)
- [ ] Commits are atomic and follow [Conventional Commits](https://www.conventionalcommits.org/)
- [ ] PR targets **`staging`** (not `main`)
58 changes: 0 additions & 58 deletions .github/workflows/deploy-docs.yml

This file was deleted.

14 changes: 13 additions & 1 deletion .github/workflows/publish-develop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,15 @@ on:
- staging

jobs:
publish:
checks:
uses: ./.github/workflows/quality-checks.yml
secrets:
OST_LINKER_SYNC_TOKEN: ${{ secrets.OST_LINKER_SYNC_TOKEN }}

build:
runs-on: ubuntu-latest
needs: [checks]
if: github.event_name == 'push'
permissions:
contents: read
packages: write
Expand All @@ -34,6 +41,9 @@ jobs:
type=sha
type=raw,value=develop

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Upload server artifact
uses: docker/build-push-action@v6
with:
Expand All @@ -42,3 +52,5 @@ jobs:
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
Loading
Loading