Is your feature request related to a problem? Please describe.
Currently, toolpack-knowledge only supports two providers:
MemoryProvider - In-memory, no persistence, limited to small datasets
PersistentKnowledgeProvider - SQLite-backed, single-process, not suitable for distributed systems
For production deployments that require:
- Multi-instance horizontal scaling
- Shared knowledge base across multiple agents
- High-performance vector search at scale
- Integration with existing PostgreSQL infrastructure
There is no suitable provider. PostgreSQL with the pgvector extension is the most widely deployed production-ready vector database and should be supported.
Describe the solution you'd like
Implement PgVectorProvider that uses PostgreSQL with the pgvector extension for vector storage and retrieval.
Implementation Details
1. Create PgVectorProvider Class
File: packages/toolpack-knowledge/src/providers/pgvector.ts
Implement a provider that:
- Uses
pg (node-postgres) for database connectivity
- Supports both connection string and individual connection parameters
- Implements connection pooling for multi-instance deployments
- Configures pool size, idle timeout, and connection timeout
- Handles pool errors gracefully
Configuration Options:
connectionString - PostgreSQL connection string
host, port, database, user, password - Individual connection params
ssl - SSL configuration (boolean or object)
table - Table name (default: 'knowledge_chunks')
schema - Schema name (default: 'public')
poolSize - Connection pool size (default: 10)
idleTimeoutMs - Idle connection timeout (default: 30000)
connectionTimeoutMs - Connection timeout (default: 2000)
2. Database Initialization
Automatic Setup:
- Check for pgvector extension, throw helpful error if missing
- Create schema if it doesn't exist (for non-public schemas)
- Create table with columns: id (TEXT PRIMARY KEY), content (TEXT), metadata (JSONB), vector (vector type), created_at, updated_at
- Create GIN index on metadata for efficient filtering
- Create HNSW index on vector column for fast similarity search
- Create updated_at trigger for automatic timestamp updates
Table Schema:
- Use JSONB for flexible metadata storage
- Use pgvector's
vector type for embeddings
- Add timestamps for auditing
- Support upsert with ON CONFLICT DO UPDATE
3. Core Methods Implementation
validateDimensions():
- Query existing vectors to check dimensions
- Compare with embedder dimensions
- Throw
DimensionMismatchError if mismatch detected
- Create HNSW vector index after dimensions are known
- Use cosine distance operator for similarity
add():
- Use transactions for batch operations
- Support upsert (INSERT ... ON CONFLICT DO UPDATE)
- Convert vector arrays to pgvector format
- Serialize metadata as JSONB
- Handle errors and rollback on failure
query():
- Use cosine distance operator (
<=>) for similarity search
- Calculate similarity score as
1 - distance
- Support metadata filtering with WHERE clauses
- Build dynamic SQL for complex filters ($in, $gt, $lt operators)
- Order by distance and limit results
- Filter by threshold score
- Optionally include metadata and vectors
keywordQuery():
- Use PostgreSQL full-text search (tsvector/tsquery)
- Rank results using ts_rank
- Support same filtering as vector query
- Enable hybrid search capabilities
delete():
- Delete by array of IDs using ANY operator
- Handle empty arrays gracefully
clear():
- TRUNCATE table for fast deletion
- Reset stored dimensions
getAllChunks():
- Fetch all chunks for migration/export
- Parse pgvector format back to number arrays
close():
- End connection pool
- Clean up resources
4. Metadata Filtering
Implement buildFilterConditions() helper to convert MetadataFilter to SQL:
- Simple equality:
metadata->>'key' = value
- $in operator:
metadata->>'key' = ANY($1)
- $gt operator:
(metadata->>'key')::numeric > $1
- $lt operator:
(metadata->>'key')::numeric < $1
- Combine multiple conditions with AND
5. Export from Index
File: packages/toolpack-knowledge/src/index.ts
Add exports for PgVectorProvider and PgVectorProviderOptions.
6. Add Dependencies
File: packages/toolpack-knowledge/package.json
Add dependencies:
pg - PostgreSQL client for Node.js
@types/pg - TypeScript types
7. Testing
File: packages/toolpack-knowledge/src/__tests__/pgvector-provider.test.ts
Create comprehensive tests (requires running PostgreSQL instance):
- Add and retrieve chunks - Basic CRUD operations
- Metadata filtering - Test equality, $in, $gt, $lt operators
- Keyword search - Test full-text search functionality
- Dimension validation - Test mismatch detection
- Delete operations - Test chunk deletion
- Upsert behavior - Test updating existing chunks
- Connection pooling - Test concurrent operations
- Error handling - Test failure scenarios
- Schema support - Test non-public schemas
Use environment variable DATABASE_URL for test database connection.
Skip tests if database not available.
8. Documentation
File: packages/toolpack-knowledge/README.md
Add section on PostgreSQL setup:
- Installation of pgvector extension
- Connection configuration examples
- Production deployment recommendations
- Performance tuning tips
- Multi-tenancy patterns
9. Migration Guide
File: packages/toolpack-knowledge/docs/pgvector-migration.md
Provide migration paths:
- From MemoryProvider (simple config change)
- From PersistentKnowledgeProvider (export/import chunks)
- Schema migration scripts
- Performance optimization guide
Acceptance Criteria
Describe alternatives you've considered
- Supabase client: Use Supabase's client library - rejected because it's vendor-specific and adds unnecessary abstraction
- TypeORM/Prisma: Use an ORM - rejected to avoid heavy dependencies and maintain direct SQL control
- Raw SQL only: No abstraction layer - rejected because we need consistent interface with other providers
Additional context
- PostgreSQL with pgvector is the most widely deployed production vector database
- HNSW index provides excellent performance for large datasets (millions of vectors)
- Native PostgreSQL features (JSONB, full-text search) enable powerful filtering
- Connection pooling is critical for multi-instance deployments
- Schema support enables multi-tenancy in a single database
Dependencies:
pg - PostgreSQL client for Node.js
@types/pg - TypeScript types
- Requires PostgreSQL 12+ with pgvector extension installed
Related Issues:
- Complements existing
MemoryProvider and PersistentKnowledgeProvider
- Enables production-scale deployments
- Required for multi-instance agent architectures
Is your feature request related to a problem? Please describe.
Currently,
toolpack-knowledgeonly supports two providers:MemoryProvider- In-memory, no persistence, limited to small datasetsPersistentKnowledgeProvider- SQLite-backed, single-process, not suitable for distributed systemsFor production deployments that require:
There is no suitable provider. PostgreSQL with the pgvector extension is the most widely deployed production-ready vector database and should be supported.
Describe the solution you'd like
Implement
PgVectorProviderthat uses PostgreSQL with the pgvector extension for vector storage and retrieval.Implementation Details
1. Create PgVectorProvider Class
File:
packages/toolpack-knowledge/src/providers/pgvector.tsImplement a provider that:
pg(node-postgres) for database connectivityConfiguration Options:
connectionString- PostgreSQL connection stringhost,port,database,user,password- Individual connection paramsssl- SSL configuration (boolean or object)table- Table name (default: 'knowledge_chunks')schema- Schema name (default: 'public')poolSize- Connection pool size (default: 10)idleTimeoutMs- Idle connection timeout (default: 30000)connectionTimeoutMs- Connection timeout (default: 2000)2. Database Initialization
Automatic Setup:
Table Schema:
vectortype for embeddings3. Core Methods Implementation
validateDimensions():
DimensionMismatchErrorif mismatch detectedadd():
query():
<=>) for similarity search1 - distancekeywordQuery():
delete():
clear():
getAllChunks():
close():
4. Metadata Filtering
Implement
buildFilterConditions()helper to convertMetadataFilterto SQL:metadata->>'key' = valuemetadata->>'key' = ANY($1)(metadata->>'key')::numeric > $1(metadata->>'key')::numeric < $15. Export from Index
File:
packages/toolpack-knowledge/src/index.tsAdd exports for
PgVectorProviderandPgVectorProviderOptions.6. Add Dependencies
File:
packages/toolpack-knowledge/package.jsonAdd dependencies:
pg- PostgreSQL client for Node.js@types/pg- TypeScript types7. Testing
File:
packages/toolpack-knowledge/src/__tests__/pgvector-provider.test.tsCreate comprehensive tests (requires running PostgreSQL instance):
Use environment variable
DATABASE_URLfor test database connection.Skip tests if database not available.
8. Documentation
File:
packages/toolpack-knowledge/README.mdAdd section on PostgreSQL setup:
9. Migration Guide
File:
packages/toolpack-knowledge/docs/pgvector-migration.mdProvide migration paths:
Acceptance Criteria
PgVectorProviderimplements allKnowledgeProvidermethods$in,$gt,$ltoperatorsDescribe alternatives you've considered
Additional context
Dependencies:
pg- PostgreSQL client for Node.js@types/pg- TypeScript typesRelated Issues:
MemoryProviderandPersistentKnowledgeProvider