[FEATURE] PgVectorProvider for PostgreSQL with pgvector Extension

**Is your feature request related to a problem? Please describe.**

Currently, `toolpack-knowledge` only supports two providers:
- `MemoryProvider` - In-memory, no persistence, limited to small datasets
- `PersistentKnowledgeProvider` - SQLite-backed, single-process, not suitable for distributed systems

For production deployments that require:
- Multi-instance horizontal scaling
- Shared knowledge base across multiple agents
- High-performance vector search at scale
- Integration with existing PostgreSQL infrastructure

There is no suitable provider. PostgreSQL with the pgvector extension is the most widely deployed production-ready vector database and should be supported.

**Describe the solution you'd like**

Implement `PgVectorProvider` that uses PostgreSQL with the pgvector extension for vector storage and retrieval.

## Implementation Details

### 1. Create PgVectorProvider Class

**File:** `packages/toolpack-knowledge/src/providers/pgvector.ts`

Implement a provider that:
- Uses `pg` (node-postgres) for database connectivity
- Supports both connection string and individual connection parameters
- Implements connection pooling for multi-instance deployments
- Configures pool size, idle timeout, and connection timeout
- Handles pool errors gracefully

**Configuration Options:**
- `connectionString` - PostgreSQL connection string
- `host`, `port`, `database`, `user`, `password` - Individual connection params
- `ssl` - SSL configuration (boolean or object)
- `table` - Table name (default: 'knowledge_chunks')
- `schema` - Schema name (default: 'public')
- `poolSize` - Connection pool size (default: 10)
- `idleTimeoutMs` - Idle connection timeout (default: 30000)
- `connectionTimeoutMs` - Connection timeout (default: 2000)

### 2. Database Initialization

**Automatic Setup:**
- Check for pgvector extension, throw helpful error if missing
- Create schema if it doesn't exist (for non-public schemas)
- Create table with columns: id (TEXT PRIMARY KEY), content (TEXT), metadata (JSONB), vector (vector type), created_at, updated_at
- Create GIN index on metadata for efficient filtering
- Create HNSW index on vector column for fast similarity search
- Create updated_at trigger for automatic timestamp updates

**Table Schema:**
- Use JSONB for flexible metadata storage
- Use pgvector's `vector` type for embeddings
- Add timestamps for auditing
- Support upsert with ON CONFLICT DO UPDATE

### 3. Core Methods Implementation

**validateDimensions():**
- Query existing vectors to check dimensions
- Compare with embedder dimensions
- Throw `DimensionMismatchError` if mismatch detected
- Create HNSW vector index after dimensions are known
- Use cosine distance operator for similarity

**add():**
- Use transactions for batch operations
- Support upsert (INSERT ... ON CONFLICT DO UPDATE)
- Convert vector arrays to pgvector format
- Serialize metadata as JSONB
- Handle errors and rollback on failure

**query():**
- Use cosine distance operator (`<=>`) for similarity search
- Calculate similarity score as `1 - distance`
- Support metadata filtering with WHERE clauses
- Build dynamic SQL for complex filters ($in, $gt, $lt operators)
- Order by distance and limit results
- Filter by threshold score
- Optionally include metadata and vectors

**keywordQuery():**
- Use PostgreSQL full-text search (tsvector/tsquery)
- Rank results using ts_rank
- Support same filtering as vector query
- Enable hybrid search capabilities

**delete():**
- Delete by array of IDs using ANY operator
- Handle empty arrays gracefully

**clear():**
- TRUNCATE table for fast deletion
- Reset stored dimensions

**getAllChunks():**
- Fetch all chunks for migration/export
- Parse pgvector format back to number arrays

**close():**
- End connection pool
- Clean up resources

### 4. Metadata Filtering

Implement `buildFilterConditions()` helper to convert `MetadataFilter` to SQL:
- Simple equality: `metadata->>'key' = value`
- $in operator: `metadata->>'key' = ANY($1)`
- $gt operator: `(metadata->>'key')::numeric > $1`
- $lt operator: `(metadata->>'key')::numeric < $1`
- Combine multiple conditions with AND

### 5. Export from Index

**File:** `packages/toolpack-knowledge/src/index.ts`

Add exports for `PgVectorProvider` and `PgVectorProviderOptions`.

### 6. Add Dependencies

**File:** `packages/toolpack-knowledge/package.json`

Add dependencies:
- `pg` - PostgreSQL client for Node.js
- `@types/pg` - TypeScript types

### 7. Testing

**File:** `packages/toolpack-knowledge/src/__tests__/pgvector-provider.test.ts`

Create comprehensive tests (requires running PostgreSQL instance):
- **Add and retrieve chunks** - Basic CRUD operations
- **Metadata filtering** - Test equality, $in, $gt, $lt operators
- **Keyword search** - Test full-text search functionality
- **Dimension validation** - Test mismatch detection
- **Delete operations** - Test chunk deletion
- **Upsert behavior** - Test updating existing chunks
- **Connection pooling** - Test concurrent operations
- **Error handling** - Test failure scenarios
- **Schema support** - Test non-public schemas

Use environment variable `DATABASE_URL` for test database connection.
Skip tests if database not available.

### 8. Documentation

**File:** `packages/toolpack-knowledge/README.md`

Add section on PostgreSQL setup:
- Installation of pgvector extension
- Connection configuration examples
- Production deployment recommendations
- Performance tuning tips
- Multi-tenancy patterns

### 9. Migration Guide

**File:** `packages/toolpack-knowledge/docs/pgvector-migration.md`

Provide migration paths:
- From MemoryProvider (simple config change)
- From PersistentKnowledgeProvider (export/import chunks)
- Schema migration scripts
- Performance optimization guide

## Acceptance Criteria

- [ ] `PgVectorProvider` implements all `KnowledgeProvider` methods
- [ ] Automatic table and index creation
- [ ] pgvector extension check on initialization
- [ ] HNSW index for efficient vector search
- [ ] GIN index for metadata filtering
- [ ] Connection pooling support
- [ ] SSL connection support
- [ ] Schema support for multi-tenancy
- [ ] Metadata filtering with `$in`, `$gt`, `$lt` operators
- [ ] Keyword search using PostgreSQL full-text search
- [ ] Dimension validation
- [ ] Upsert support (ON CONFLICT DO UPDATE)
- [ ] Transaction support for batch operations
- [ ] Comprehensive tests (requires test database)
- [ ] Documentation with setup instructions
- [ ] Migration guide from other providers

**Describe alternatives you've considered**

1. **Supabase client:** Use Supabase's client library - rejected because it's vendor-specific and adds unnecessary abstraction
2. **TypeORM/Prisma:** Use an ORM - rejected to avoid heavy dependencies and maintain direct SQL control
3. **Raw SQL only:** No abstraction layer - rejected because we need consistent interface with other providers

**Additional context**

- PostgreSQL with pgvector is the most widely deployed production vector database
- HNSW index provides excellent performance for large datasets (millions of vectors)
- Native PostgreSQL features (JSONB, full-text search) enable powerful filtering
- Connection pooling is critical for multi-instance deployments
- Schema support enables multi-tenancy in a single database

**Dependencies:**
- `pg` - PostgreSQL client for Node.js
- `@types/pg` - TypeScript types
- Requires PostgreSQL 12+ with pgvector extension installed

**Related Issues:**
- Complements existing `MemoryProvider` and `PersistentKnowledgeProvider`
- Enables production-scale deployments
- Required for multi-instance agent architectures


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] PgVectorProvider for PostgreSQL with pgvector Extension #28

Implementation Details

1. Create PgVectorProvider Class

2. Database Initialization

3. Core Methods Implementation

4. Metadata Filtering

5. Export from Index

6. Add Dependencies

7. Testing

8. Documentation

9. Migration Guide

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] PgVectorProvider for PostgreSQL with pgvector Extension #28

Description

Implementation Details

1. Create PgVectorProvider Class

2. Database Initialization

3. Core Methods Implementation

4. Metadata Filtering

5. Export from Index

6. Add Dependencies

7. Testing

8. Documentation

9. Migration Guide

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions