-
Notifications
You must be signed in to change notification settings - Fork 0
Doc/update-2 #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Doc/update-2 #56
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
cdd5d9e
Enhance Documentation on Data Flows and Indexing Schedules
stewartshea d466e80
Refactor Docker Compose and Update Documentation for Vector Search In…
stewartshea 1870eba
Refactor Vector Services and Enhance Embedding Handling
stewartshea aa1921d
Refactor Vector Search Endpoints to Synchronous Functions
stewartshea 001bb80
Add Documentation Sources and Enhance Vector Indexing
stewartshea 1e3a5c9
Refactor Configuration and Vector Models for Consistency
stewartshea 2495999
Enhance Codebundle Indexing with Orphan Handling and Duplicate Detection
stewartshea 308377c
Enhance Vector Services with Thread Safety and Validation
stewartshea 345bae5
Update Documentation Sources and Configuration Paths
stewartshea e21ab92
Enhance Search Input Styling Across Pages
stewartshea 0297d11
Sort Embedding Responses and Enhance Vector Service Validation
stewartshea b56074f
Enhance Error Handling in Web Crawler
stewartshea File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| """ | ||
| SQLAlchemy models for pgvector tables. | ||
|
|
||
| Maps to the tables created by database/migrations/006_add_pgvector.sql. | ||
| """ | ||
| from sqlalchemy import Column, String, Text, DateTime, func, text | ||
| from sqlalchemy.dialects.postgresql import JSONB | ||
| from pgvector.sqlalchemy import Vector | ||
|
|
||
| from app.core.database import Base | ||
|
|
||
| # Must match the migration (006_add_pgvector.sql) and the Azure OpenAI | ||
| # text-embedding-3-small model output. Do NOT change without also | ||
| # altering the migration and rebuilding all vector tables. | ||
| EMBEDDING_DIMENSIONS = 1536 | ||
|
|
||
| _JSONB_EMPTY = text("'{}'::jsonb") | ||
|
|
||
|
|
||
| class VectorCodebundle(Base): | ||
| __tablename__ = "vector_codebundles" | ||
|
|
||
| id = Column(String, primary_key=True) | ||
| embedding = Column(Vector(EMBEDDING_DIMENSIONS)) | ||
| document = Column(Text) | ||
| metadata_ = Column("metadata", JSONB, nullable=False, server_default=_JSONB_EMPTY) | ||
| created_at = Column(DateTime(timezone=True), server_default=func.now()) | ||
| updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now()) | ||
|
|
||
|
|
||
| class VectorCodecollection(Base): | ||
| __tablename__ = "vector_codecollections" | ||
|
|
||
| id = Column(String, primary_key=True) | ||
| embedding = Column(Vector(EMBEDDING_DIMENSIONS)) | ||
| document = Column(Text) | ||
| metadata_ = Column("metadata", JSONB, nullable=False, server_default=_JSONB_EMPTY) | ||
| created_at = Column(DateTime(timezone=True), server_default=func.now()) | ||
| updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now()) | ||
|
|
||
|
|
||
| class VectorLibrary(Base): | ||
| __tablename__ = "vector_libraries" | ||
|
|
||
| id = Column(String, primary_key=True) | ||
| embedding = Column(Vector(EMBEDDING_DIMENSIONS)) | ||
| document = Column(Text) | ||
| metadata_ = Column("metadata", JSONB, nullable=False, server_default=_JSONB_EMPTY) | ||
| created_at = Column(DateTime(timezone=True), server_default=func.now()) | ||
| updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now()) | ||
|
|
||
|
|
||
| class VectorDocumentation(Base): | ||
| __tablename__ = "vector_documentation" | ||
|
|
||
| id = Column(String, primary_key=True) | ||
| embedding = Column(Vector(EMBEDDING_DIMENSIONS)) | ||
| document = Column(Text) | ||
| metadata_ = Column("metadata", JSONB, nullable=False, server_default=_JSONB_EMPTY) | ||
| created_at = Column(DateTime(timezone=True), server_default=func.now()) | ||
| updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,211 @@ | ||
| """ | ||
| Vector search API endpoints. | ||
|
|
||
| Exposes semantic (embedding-based) search over codebundles, codecollections, | ||
| libraries, and documentation. Used by the MCP server and the frontend chat. | ||
| """ | ||
| import logging | ||
| from typing import Any, Dict, List, Optional | ||
|
|
||
| from fastapi import APIRouter, Depends, HTTPException, Query | ||
| from sqlalchemy.orm import Session | ||
|
|
||
| from app.core.database import get_db | ||
| from app.services.embedding_service import get_embedding_service | ||
| from app.services.vector_service import VectorSearchResult, get_vector_service | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| router = APIRouter(prefix="/api/v1/vector", tags=["vector-search"]) | ||
|
|
||
|
|
||
| def _result_to_dict(r: VectorSearchResult) -> Dict[str, Any]: | ||
| return { | ||
| "id": r.id, | ||
| "document": r.document[:500], | ||
| "metadata": r.metadata, | ||
| "score": round(r.score, 4), | ||
| "distance": round(r.distance, 4), | ||
| } | ||
|
|
||
|
|
||
| # -------------------------------------------------------------------------- | ||
| # Unified semantic search | ||
| # -------------------------------------------------------------------------- | ||
|
|
||
| @router.get("/search") | ||
| def semantic_search( | ||
| query: str, | ||
| tables: Optional[str] = Query( | ||
| None, | ||
| description="Comma-separated table keys to search (codebundles,codecollections,libraries,documentation). Default: all.", | ||
| ), | ||
| max_results: int = Query(10, ge=1, le=50), | ||
| platform: Optional[str] = None, | ||
| category: Optional[str] = None, | ||
|
cursor[bot] marked this conversation as resolved.
|
||
| db: Session = Depends(get_db), | ||
| ): | ||
| """Run a semantic similarity search across one or more vector tables.""" | ||
| embed_svc = get_embedding_service() | ||
| vec_svc = get_vector_service() | ||
|
|
||
| if not embed_svc.available: | ||
| raise HTTPException( | ||
| status_code=503, | ||
| detail="Embedding service is not configured. Set AZURE_OPENAI_EMBEDDING_* environment variables.", | ||
| ) | ||
|
|
||
| query_embedding = embed_svc.embed_text(query) | ||
|
cursor[bot] marked this conversation as resolved.
|
||
| if not query_embedding: | ||
| raise HTTPException(status_code=500, detail="Failed to generate query embedding") | ||
|
|
||
| table_keys = [t.strip() for t in tables.split(",")] if tables else None | ||
|
|
||
| if table_keys: | ||
| valid_keys = {"codebundles", "codecollections", "libraries", "documentation"} | ||
| invalid = set(table_keys) - valid_keys | ||
| if invalid: | ||
| raise HTTPException(status_code=400, detail=f"Invalid table keys: {invalid}") | ||
|
|
||
| filters: Optional[Dict[str, str]] = {} | ||
| if platform: | ||
| filters["platform"] = platform | ||
| if category: | ||
| filters["category"] = category | ||
|
|
||
| results_map = vec_svc.search_all( | ||
| query_embedding, n_results=max_results, table_keys=table_keys, | ||
| metadata_filters=filters or None, db=db, | ||
| ) | ||
|
|
||
| output: Dict[str, Any] = {} | ||
| for key, results in results_map.items(): | ||
| output[key] = [_result_to_dict(r) for r in results] | ||
|
|
||
| return output | ||
|
|
||
|
|
||
| # -------------------------------------------------------------------------- | ||
| # Per-table endpoints | ||
| # -------------------------------------------------------------------------- | ||
|
|
||
| @router.get("/search/codebundles") | ||
| def search_codebundles( | ||
| query: str, | ||
| max_results: int = Query(10, ge=1, le=50), | ||
| platform: Optional[str] = None, | ||
| collection_slug: Optional[str] = None, | ||
| db: Session = Depends(get_db), | ||
| ): | ||
| """Semantic search over codebundles.""" | ||
| embed_svc = get_embedding_service() | ||
| vec_svc = get_vector_service() | ||
|
|
||
| if not embed_svc.available: | ||
| raise HTTPException(status_code=503, detail="Embedding service not configured") | ||
|
|
||
| query_embedding = embed_svc.embed_text(query) | ||
| if not query_embedding: | ||
| raise HTTPException(status_code=500, detail="Embedding generation failed") | ||
|
|
||
| filters: Optional[Dict[str, str]] = {} | ||
| if platform: | ||
| filters["platform"] = platform | ||
| if collection_slug: | ||
| filters["collection_slug"] = collection_slug | ||
|
|
||
| results = vec_svc.search( | ||
| "codebundles", query_embedding, n_results=max_results, | ||
| metadata_filters=filters or None, db=db, | ||
| ) | ||
| return {"results": [_result_to_dict(r) for r in results], "query": query} | ||
|
|
||
|
|
||
| @router.get("/search/documentation") | ||
| def search_documentation( | ||
| query: str, | ||
| max_results: int = Query(10, ge=1, le=50), | ||
| category: Optional[str] = None, | ||
| db: Session = Depends(get_db), | ||
| ): | ||
| """Semantic search over documentation.""" | ||
| embed_svc = get_embedding_service() | ||
| vec_svc = get_vector_service() | ||
|
|
||
| if not embed_svc.available: | ||
| raise HTTPException(status_code=503, detail="Embedding service not configured") | ||
|
|
||
| query_embedding = embed_svc.embed_text(query) | ||
| if not query_embedding: | ||
| raise HTTPException(status_code=500, detail="Embedding generation failed") | ||
|
|
||
| filters = {"category": category} if category else None | ||
| results = vec_svc.search( | ||
| "documentation", query_embedding, n_results=max_results, | ||
| metadata_filters=filters, db=db, | ||
| ) | ||
| return {"results": [_result_to_dict(r) for r in results], "query": query} | ||
|
|
||
|
|
||
| @router.get("/search/libraries") | ||
| def search_libraries( | ||
| query: str, | ||
| max_results: int = Query(10, ge=1, le=50), | ||
| category: Optional[str] = None, | ||
| db: Session = Depends(get_db), | ||
| ): | ||
| """Semantic search over libraries.""" | ||
| embed_svc = get_embedding_service() | ||
| vec_svc = get_vector_service() | ||
|
|
||
| if not embed_svc.available: | ||
| raise HTTPException(status_code=503, detail="Embedding service not configured") | ||
|
|
||
| query_embedding = embed_svc.embed_text(query) | ||
| if not query_embedding: | ||
| raise HTTPException(status_code=500, detail="Embedding generation failed") | ||
|
|
||
| filters = {"category": category} if category else None | ||
| results = vec_svc.search( | ||
| "libraries", query_embedding, n_results=max_results, | ||
| metadata_filters=filters, db=db, | ||
| ) | ||
| return {"results": [_result_to_dict(r) for r in results], "query": query} | ||
|
|
||
|
|
||
| # -------------------------------------------------------------------------- | ||
| # Stats / health | ||
| # -------------------------------------------------------------------------- | ||
|
|
||
| @router.get("/stats") | ||
| def vector_stats(db: Session = Depends(get_db)): | ||
| """Return row counts for each vector table.""" | ||
| vec_svc = get_vector_service() | ||
| return vec_svc.get_stats(db=db) | ||
|
|
||
|
|
||
| @router.post("/reindex") | ||
| async def trigger_reindex(): | ||
| """Trigger a full reindex (async Celery task).""" | ||
| from app.tasks.indexing_tasks import reindex_all_task | ||
|
|
||
| task = reindex_all_task.apply_async() | ||
| return {"task_id": task.id, "status": "queued"} | ||
|
|
||
|
|
||
| @router.post("/reindex/codebundles") | ||
| async def trigger_reindex_codebundles(): | ||
| """Trigger codebundle reindexing.""" | ||
| from app.tasks.indexing_tasks import index_codebundles_task | ||
|
|
||
| task = index_codebundles_task.apply_async() | ||
| return {"task_id": task.id, "status": "queued"} | ||
|
|
||
|
|
||
| @router.post("/reindex/documentation") | ||
| async def trigger_reindex_documentation(): | ||
| """Trigger documentation reindexing.""" | ||
| from app.tasks.indexing_tasks import index_documentation_task | ||
|
|
||
| task = index_documentation_task.apply_async() | ||
| return {"task_id": task.id, "status": "queued"} | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.