Status: Complete - v0.2 Feature
Date: 2026-06-20
SQL cell execution provides native support for SQL queries directly in notebooks. Write SQL cells using standard markers and get optimized query suggestions.
Two ways to mark a cell as SQL:
Option 1: SQL Comment Marker
--sql
SELECT * FROM users WHERE created_at > '2024-01-01'Option 2: Jupyter Magic
%sql
SELECT COUNT(*) as user_count FROM usersPrismNote supports 5 OSS-compliant databases out-of-the-box:
- PostgreSQL - Full ACID compliance
- MySQL - Widely-deployed relational DB
- SQLite - Embedded, zero-config
- DuckDB - OLAP analytics engine
- MongoDB - Document database with SQL query support
Step 1: Create Connection
API Endpoint: POST /databases
Request:
{
"driver": "postgresql",
"host": "localhost",
"port": 5432,
"database": "mydb",
"username": "user",
"password": "pass"
}
Step 2: Test Connection
API Endpoint: POST /databases/:id/test
Step 3: Execute SQL Cell
%sql
SELECT * FROM products LIMIT 10Every SQL cell is analyzed for performance bottlenecks:
{
"optimizations": [
{
"issue": "SELECT * is used",
"severity": "medium",
"suggestion": "Specify only needed columns to reduce data transfer",
"estimated_impact": "5-20% faster"
},
{
"issue": "No WHERE clause detected",
"severity": "high",
"suggestion": "Add WHERE clause to filter results",
"estimated_impact": "50-90% faster"
}
]
}- SELECT * Usage - Specify needed columns
- Missing WHERE - Add filters to reduce scan
- Leading Wildcard LIKE - Use pattern starting with literal
- Subqueries - Consider JOIN instead for better optimization
- NOT IN with Subquery - Use NOT EXISTS instead
- Multiple OR Conditions - Try IN clause or UNION
- Functions on WHERE Columns - Store normalized data instead
Execute SQL Query
POST /sql/execute
Request:
{
"query": "SELECT * FROM users WHERE id = 1",
"connection_id": "conn-123"
}
Response:
{
"html": "<table>...</table>",
"optimizations": [...],
"row_count": 1,
"execution_time_ms": 150
}
Get Query Optimizations
POST /sql/optimize
Request:
{
"query": "SELECT * FROM users",
"connection_id": "conn-123"
}
Response:
{
"optimizations": [...],
"total_issues": 2,
"high_priority": 1
}
SQL results are rendered as:
- HTML Table - Default, rendered in notebook
- JSON - For complex queries
- CSV - For export
- Display Rows: 1000 (showing "... X more rows" for larger)
- Total Execution: Configurable timeout (default: 30s)
- Memory: Automatic truncation at 10MB output
<table border='1' cellpadding='5'>
<thead>
<tr>
<th>id</th>
<th>name</th>
<th>email</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Alice</td>
<td>alice@example.com</td>
</tr>
<tr>
<td>2</td>
<td>Bob</td>
<td>bob@example.com</td>
</tr>
</tbody>
</table>
<p><i>Rows: 2, Execution time: 45ms</i></p>Reference Python variables in SQL:
user_id = 123
%sql
SELECT * FROM orders WHERE user_id = {user_id}Store results for reuse:
%sql
SELECT * FROM large_table WHERE year = 2024
result_df = _
# Use result_df in Python cells
print(result_df.shape)PostgreSQL & MySQL:
%sql BEGIN;
DELETE FROM logs WHERE created_at < DATE_SUB(NOW(), INTERVAL 30 DAY);
COMMIT;Analyze query execution:
%sql EXPLAIN ANALYZE
SELECT users.name, COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name
HAVING COUNT(orders.id) > 5Output:
HashAggregate (cost=1000.00..2000.00 rows=100 width=36)
-> Hash Left Join (cost=500.00..1000.00 rows=500 width=36)
Hash Cond: (users.id = orders.user_id)
-> Seq Scan on users (cost=0.00..100.00 rows=1000 width=8)
-> Hash (cost=300.00..300.00 rows=5000 width=8)
-> Seq Scan on orders (cost=0.00..300.00 rows=5000 width=8)
- Always Filter - Add WHERE clauses to reduce data scanned
- Index Key Columns - Create indexes on JOIN and WHERE columns
- Select Only Needed - Specify columns instead of SELECT *
- Use Joins Over Subqueries - Better optimization potential
- Partition Large Tables - Improve query performance
- Use Materialized Views - Pre-compute expensive aggregations
- Batch Operations - Group multiple operations in transactions
Common SQL errors and fixes:
| Error | Cause | Fix |
|---|---|---|
| Connection refused | Database not running | Start database server |
| Column not found | Typo in column name | Check schema with DESC table_name |
| Table not found | Wrong database/schema | Check active database |
| Lock timeout | Table locked by other query | Check running queries, retry |
| Disk full | Out of storage | Free disk space or increase limit |
| Memory exceeded | Query too large | Add LIMIT or split into chunks |
SQL queries are analyzed alongside Python code for package suggestions:
%sql SELECT * FROM data
import pandas as pd
df = pd.DataFrame(...) # Suggested: pandas is used with SQLSQL queries are included in notebook versions:
Version 1: SELECT * FROM users
Version 2: SELECT id, name FROM users WHERE active = true # Optimized
Auto-profile results from SQL queries:
{
"columns": [
{
"name": "id",
"type": "integer",
"null_count": 0,
"unique_count": 1000,
"stats": {...}
}
]
}Key Structs:
SQLQuery- Query, connection ID, timeoutQueryResult- Columns, rows, execution time, memoryQueryOptimization- Issue, severity, suggestion, impact
Key Methods:
parse_sql_cell()- Detects SQL markerexecute_query()- Runs query on connectionanalyze_query()- Finds optimization opportunitiesformat_result_as_html()- Renders table
Example:
let query = "--sql SELECT * FROM users";
if SQLExecutor::is_sql_cell(query) {
let parsed = SQLExecutor::parse_sql_cell(query);
let result = SQLExecutor::execute_query(&parsed, "conn-id").await?;
let optimizations = SQLExecutor::analyze_query(&parsed);
}v0.2 (Current):
- SQL cell detection (--sql, %sql markers)
- Query optimization suggestions
- Result formatting as HTML tables
- 5 database driver support
v0.3 (Planned):
- Actual database connection execution
- Explain plans and query analysis
- Query result caching and materialization
- SQL syntax highlighting in editor
- Auto-completion for table/column names
- Query execution history and bookmarking
v0.4+ (Future):
- Distributed SQL via Spark SQL
- Federated queries across databases
- Real-time streaming SQL
- SQL query optimization recommendations
- Interactive SQL notebook cells
Cell 1:
%sql
SELECT product_name, SUM(quantity) as total_sold
FROM sales
GROUP BY product_name
ORDER BY total_sold DESC
LIMIT 10Output:
product_name total_sold
Apple iPhone 15 5,234,234
Samsung Galaxy S24 4,123,432
Google Pixel 8 3,124,234
...
Cell:
%sql
SELECT * FROM users WHERE status = 'active'Optimizations Shown:
Medium priority: SELECT * is used
→ Suggestion: Specify only needed columns
→ Impact: 5-20% faster
High priority: No WHERE clause on joined table
→ Suggestion: Add WHERE to filter earlier
→ Impact: 50-90% faster
Cell 1:
# Load from PostgreSQL
import sqlite3
pg_conn = psycopg2.connect("...")
pg_data = pd.read_sql("SELECT * FROM users", pg_conn)
# Save to SQLite
sqlite_conn = sqlite3.connect("local.db")
pg_data.to_sql("users", sqlite_conn)Cell 2:
%sql
SELECT user_id, COUNT(*) as transactions
FROM users
WHERE created_at > '2024-01-01'
GROUP BY user_idQ: SQL cell not executing
A: Ensure you start with --sql or %sql marker. Check connection is active with POST /databases/:id/test.
Q: Optimization suggestions are too generic
A: As of v0.2, this is pattern-based. v0.3 will include actual explain plans from database.
Q: Performance is slow
A: Check suggested optimizations first. Add indexes on join keys. Consider partitioning large tables.
Q: Cannot connect to database
A: Verify credentials, host, port. Check firewall. Test with CLI: psql -h host -U user -d dbname
Query Complexity vs Execution Time:
| Query Type | Complexity | Typical Time | With Index |
|---|---|---|---|
| Single table, 1M rows | Low | 100-200ms | 10-50ms |
| Join, 2 tables | Medium | 200-500ms | 50-150ms |
| 3+ table join | High | 500-2000ms | 100-500ms |
| Aggregation, 10M rows | High | 1000-5000ms | 200-1000ms |
Optimization Impact:
SELECT *→ Specific columns: 5-20% faster- No WHERE → With WHERE: 50-90% faster
- Index on join key: 10-100x faster
- LIKE with wildcard: 10-100x slower
SQL execution feature for v0.2 complete
Actual database drivers coming in v0.3