You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update README and CLAUDE.md for polars→arrow migration
Update all code examples, API docs, and architecture description to
reflect that the data bridge now uses pyarrow instead of polars.
register() accepts any type that pyarrow.table() can convert (pyarrow,
polars, pandas, etc.).
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+
## What This Is
6
+
7
+
Python bindings for the [ggsql](https://github.com/posit-dev/ggsql) Rust crate — a SQL extension for declarative data visualization. The Rust crate handles parsing, validation, and Vega-Lite generation; this repo wraps it via PyO3/maturin and adds a Python-native API layer (`render_altair()`, `VegaLiteWriter.render_chart()`).
8
+
9
+
## Build & Development
10
+
11
+
Requires Rust toolchain and Python 3.10+. Uses `uv` for Python dependency management.
12
+
13
+
```bash
14
+
# Install dev dependencies
15
+
uv sync
16
+
17
+
# Build the Rust extension in-place (required after any Rust changes)
18
+
uv run maturin develop
19
+
20
+
# Run all tests
21
+
uv run pytest tests/ -v
22
+
23
+
# Run a single test
24
+
uv run pytest tests/test_ggsql.py::TestValidate::test_valid_query_with_visualise -v
25
+
26
+
# Rust checks (CI runs these)
27
+
cargo fmt -- --check
28
+
cargo clippy -- -D warnings
29
+
```
30
+
31
+
To pick up a new version of the upstream `ggsql` Rust crate, bump its version in `Cargo.toml` and re-run `maturin develop`.
32
+
33
+
## Architecture
34
+
35
+
**Rust layer** (`src/lib.rs`): Single-file PyO3 module exposing `DuckDBReader`, `VegaLiteWriter`, `Validated`, `Spec`, `validate()`, and `execute()` to Python. Data crosses the Rust↔Python boundary via Arrow IPC serialization (`arrow::ipc::StreamWriter`/`StreamReader` on the Rust side, `pyarrow.ipc` on the Python side). The `py_to_df` helper accepts any object that `pyarrow.table()` can convert (pyarrow Tables, polars DataFrames, pandas DataFrames, etc.). Custom Python readers are bridged to the Rust `Reader` trait via `PyReaderBridge`; native readers (currently just `DuckDBReader`) use a fast path that skips the bridge (see `try_native_readers!` macro).
36
+
37
+
**Python layer** (`python/ggsql/__init__.py`): Re-exports Rust bindings and adds `render_altair()` (convenience function that registers a DataFrame, executes, and returns an Altair chart) and a Python `VegaLiteWriter` wrapper that adds `render_chart()`. The `_json_to_altair_chart()` helper dispatches to the correct Altair chart class based on the Vega-Lite spec structure (layer, facet, concat, etc.).
38
+
39
+
**Key design pattern**: Two-stage API — `reader.execute(query) -> Spec`, then `writer.render(spec) -> str` or `writer.render_chart(spec) -> AltairChart`. The `render_altair()` shortcut collapses both stages.
40
+
41
+
## `.cargo/config.toml`
42
+
43
+
Sets `GGSQL_SKIP_GENERATE=1` so tree-sitter uses its pre-generated parser rather than regenerating from `grammar.js`. Don't remove this.
ggsql.render_altair(df, "VISUALISE x, y, category AS color DRAW point")
234
+
ggsql.render_altair(table, "VISUALISE x, y, category AS color DRAW point")
233
235
```
234
236
235
237
### Custom Readers
236
238
237
-
You can use any Python object with an `execute_sql(sql: str) -> polars.DataFrame` method as a reader. This enables integration with any data source.
239
+
You can use any Python object with an `execute_sql(sql: str)` method as a reader. The method should return a `pyarrow.Table` (or any type that `pyarrow.table()` can convert, such as a `polars.DataFrame`).
238
240
239
241
```python
240
242
import ggsql
241
-
import polars as pl
243
+
import pyarrow as pa
244
+
import pyarrow.csv
242
245
243
246
classCSVReader:
244
247
"""Custom reader that loads data from CSV files."""
245
248
246
249
def__init__(self, data_dir: str):
247
250
self.data_dir = data_dir
248
251
249
-
defexecute_sql(self, sql: str) -> pl.DataFrame:
252
+
defexecute_sql(self, sql: str) -> pa.Table:
250
253
# Simple implementation: ignore SQL and return fixed data
251
254
# A real implementation would parse SQL to determine which file to load
0 commit comments