Skip to content

Latest commit

 

History

History
263 lines (196 loc) · 7.44 KB

File metadata and controls

263 lines (196 loc) · 7.44 KB

InstantIndex Example Usage

Quick Start

# Install dependencies
npm install

# Build
npm run build

# Analyze your database
node dist/cli.js analyze postgresql://user:pass@localhost:5432/mydb

Prerequisites

Before running InstantIndex, your PostgreSQL database needs:

1. Enable pg_stat_statements

Add to postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

Restart PostgreSQL, then:

CREATE EXTENSION pg_stat_statements;

2. Install hypopg (required)

CREATE EXTENSION hypopg;

hypopg is required for InstantIndex to create hypothetical indexes without disk I/O.

Example Session

$ node dist/cli.js analyze postgresql://localhost/ecommerce

InstantIndex - Automatic Database Index Optimization

Connecting to database...
✓ Connected

Checking required extensions...
✓ hypopg available (fast mode)

Analyzing database schema...
✓ Found 12 tables, 87 columns, 15 existing indexes

Analyzing query workload...
✓ Found 247 unique queries (1,245,789 total executions)

Generating index candidates...
✓ Generated 342 candidate indexes

Finding optimal index configuration...

Baseline cost: 487,392 units

Testing 342 index candidates...

  Added: users(email,status) - improvement: 24.3%
  Added: orders(user_id,created_at DESC) - improvement: 18.7%
  Added: events(city,date) - improvement: 5.2%
  Pruned redundant: users(email)

Progress |████████████████████████████████| 100% || 342/342 candidates tested

================================================================================
OPTIMIZATION RESULTS
================================================================================

CURRENT STATE:
  Indexes: 15
  Total cost: 487,392 units/day

RECOMMENDED CONFIGURATION:
  Indexes: 17 (3 new)
  Total cost: 124,891 units/day
  Improvement: 74.4% faster
  Storage overhead: 167.45 MB
  Write overhead: +6.0%

NEW INDEXES TO CREATE:

1. CREATE INDEX idx_users_email_status ON users (email, status);
   Affects: 3 query patterns (55,234 queries/day)
   - SELECT * FROM users WHERE email = $1 (45000/day: 1200ms → 28ms, 42.9x faster)
   - SELECT * FROM users WHERE email = $1 AND status = $2 (8500/day: 890ms → 12ms, 74.2x faster)
   - SELECT * FROM users WHERE email = $1 ORDER BY status (1734/day: 2100ms → 45ms, 46.7x faster)

2. CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);
   Affects: 2 query patterns (89,234 queries/day)
   - SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC (85000/day: 450ms → 8ms, 56.3x faster)
   - SELECT * FROM orders WHERE user_id = $1 AND created_at > $2 (4234/day: 1800ms → 15ms, 120.0x faster)

3. CREATE INDEX idx_events_city_date ON events (city, date) WHERE date > NOW() - INTERVAL '30 days';
   Affects: 1 query pattern (12,456 queries/day)
   - Dashboard recent events query (12456/day: 2300ms → 18ms, 127.8x faster)

REDUNDANT INDEXES (can be removed):
  1. idx_users_email on users(email)
     Made redundant by new composite indexes

TOP IMPACTED QUERIES:

1. SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 10
   Frequency: 85,000/day
   Current: 450ms → Optimized: 8ms
   Speedup: 56.3x (saves 10.5h/day)

2. SELECT * FROM users WHERE email = $1
   Frequency: 45,000/day
   Current: 1200ms → Optimized: 28ms
   Speedup: 42.9x (saves 14.7h/day)

[... more queries ...]

================================================================================

Apply these changes? (y/n): y

Creating indexes...
  CREATE INDEX CONCURRENTLY idx_users_email_status ON users (email, status);
  ✓ Created idx_users_email_status
  CREATE INDEX CONCURRENTLY idx_orders_user_created ON orders (user_id, created_at DESC);
  ✓ Created idx_orders_user_created
  CREATE INDEX CONCURRENTLY idx_events_city_date ON events (city, date) WHERE date > NOW() - INTERVAL '30 days';
  ✓ Created idx_events_city_date

✓ Database optimized!

Advanced Usage

Limit storage overhead

node dist/cli.js analyze postgresql://localhost/mydb --max-storage 200MB

Limit write overhead

node dist/cli.js analyze postgresql://localhost/mydb --max-write-overhead 5

Auto-apply without prompting

node dist/cli.js analyze postgresql://localhost/mydb --apply

Combine constraints

node dist/cli.js analyze postgresql://localhost/mydb \
  --max-storage 500MB \
  --max-write-overhead 10 \
  --max-indexes 5

How It Works

  1. Reads schema: Introspects tables, columns, existing indexes
  2. Reads workload: Gets query statistics from pg_stat_statements
  3. Generates candidates: Analyzes WHERE/JOIN/ORDER BY clauses to suggest indexes
  4. Calculates costs: Uses PostgreSQL's EXPLAIN to get exact query costs
  5. Optimizes globally: Finds the set of indexes that minimizes total cost across ALL queries
  6. Shows impact: Displays which queries benefit from each index

The key insight: PostgreSQL's query planner has deterministic cost functions. We don't need to run queries or use ML - we can mathematically calculate the optimal index configuration.

Testing with Sample Data

To test InstantIndex, you need a database with:

  • Some tables with data
  • pg_stat_statements enabled
  • Some query history

Here's a quick test setup:

-- Enable extensions
CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION hypopg;

-- Create sample tables
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255),
  status VARCHAR(50),
  created_at TIMESTAMP
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER,
  total DECIMAL,
  created_at TIMESTAMP
);

-- Insert sample data
INSERT INTO users (email, status, created_at)
SELECT
  'user' || i || '@example.com',
  CASE WHEN random() < 0.3 THEN 'active' ELSE 'inactive' END,
  NOW() - (random() * INTERVAL '1 year')
FROM generate_series(1, 100000) i;

INSERT INTO orders (user_id, total, created_at)
SELECT
  (random() * 100000)::int + 1,
  (random() * 1000)::numeric(10,2),
  NOW() - (random() * INTERVAL '1 year')
FROM generate_series(1, 500000) i;

-- Run some queries to populate pg_stat_statements
SELECT * FROM users WHERE email = 'user12345@example.com';
SELECT * FROM users WHERE email LIKE '%@example.com' AND status = 'active';
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;
SELECT u.*, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id
HAVING COUNT(o.id) > 5;

-- Now run InstantIndex
-- node dist/cli.js analyze postgresql://localhost/testdb

Next Steps

After running InstantIndex:

  1. Review the recommendations - Make sure they make sense for your workload
  2. Test in staging first - Don't apply directly to production
  3. Monitor performance - Verify the indexes help as predicted
  4. Re-run periodically - As your workload changes, optimal indexes may change

Troubleshooting

"pg_stat_statements extension not found"

  • Add to postgresql.conf: shared_preload_libraries = 'pg_stat_statements'
  • Restart PostgreSQL
  • Run CREATE EXTENSION pg_stat_statements;

"No query data found"

  • Make sure your application has been running and executing queries
  • Check SELECT * FROM pg_stat_statements; to verify data exists

"Permission denied"

  • Ensure your database user has permission to:
    • Create indexes (CREATE)
    • Query statistics tables
    • Use extensions

Index creation fails

  • Check disk space
  • Verify table exists and is accessible
  • Try without CONCURRENTLY if in development