Skip to content

Commit 5c0f8ed

Browse files
committed
Update README and CLAUDE.md for polars→arrow migration
Update all code examples, API docs, and architecture description to reflect that the data bridge now uses pyarrow instead of polars. register() accepts any type that pyarrow.table() can convert (pyarrow, polars, pandas, etc.).
1 parent 412a460 commit 5c0f8ed

2 files changed

Lines changed: 85 additions & 39 deletions

File tree

CLAUDE.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## What This Is
6+
7+
Python bindings for the [ggsql](https://github.com/posit-dev/ggsql) Rust crate — a SQL extension for declarative data visualization. The Rust crate handles parsing, validation, and Vega-Lite generation; this repo wraps it via PyO3/maturin and adds a Python-native API layer (`render_altair()`, `VegaLiteWriter.render_chart()`).
8+
9+
## Build & Development
10+
11+
Requires Rust toolchain and Python 3.10+. Uses `uv` for Python dependency management.
12+
13+
```bash
14+
# Install dev dependencies
15+
uv sync
16+
17+
# Build the Rust extension in-place (required after any Rust changes)
18+
uv run maturin develop
19+
20+
# Run all tests
21+
uv run pytest tests/ -v
22+
23+
# Run a single test
24+
uv run pytest tests/test_ggsql.py::TestValidate::test_valid_query_with_visualise -v
25+
26+
# Rust checks (CI runs these)
27+
cargo fmt -- --check
28+
cargo clippy -- -D warnings
29+
```
30+
31+
To pick up a new version of the upstream `ggsql` Rust crate, bump its version in `Cargo.toml` and re-run `maturin develop`.
32+
33+
## Architecture
34+
35+
**Rust layer** (`src/lib.rs`): Single-file PyO3 module exposing `DuckDBReader`, `VegaLiteWriter`, `Validated`, `Spec`, `validate()`, and `execute()` to Python. Data crosses the Rust↔Python boundary via Arrow IPC serialization (`arrow::ipc::StreamWriter`/`StreamReader` on the Rust side, `pyarrow.ipc` on the Python side). The `py_to_df` helper accepts any object that `pyarrow.table()` can convert (pyarrow Tables, polars DataFrames, pandas DataFrames, etc.). Custom Python readers are bridged to the Rust `Reader` trait via `PyReaderBridge`; native readers (currently just `DuckDBReader`) use a fast path that skips the bridge (see `try_native_readers!` macro).
36+
37+
**Python layer** (`python/ggsql/__init__.py`): Re-exports Rust bindings and adds `render_altair()` (convenience function that registers a DataFrame, executes, and returns an Altair chart) and a Python `VegaLiteWriter` wrapper that adds `render_chart()`. The `_json_to_altair_chart()` helper dispatches to the correct Altair chart class based on the Vega-Lite spec structure (layer, facet, concat, etc.).
38+
39+
**Key design pattern**: Two-stage API — `reader.execute(query) -> Spec`, then `writer.render(spec) -> str` or `writer.render_chart(spec) -> AltairChart`. The `render_altair()` shortcut collapses both stages.
40+
41+
## `.cargo/config.toml`
42+
43+
Sets `GGSQL_SKIP_GENERATE=1` so tree-sitter uses its pre-generated parser rather than regenerating from `grammar.js`. Don't remove this.

README.md

Lines changed: 42 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -44,21 +44,21 @@ pip install target/wheels/ggsql-*.whl
4444

4545
### Simple Usage with `render_altair`
4646

47-
For quick visualizations, use the `render_altair` convenience function:
47+
For quick visualizations, use the `render_altair` convenience function. It accepts any narwhals-compatible DataFrame (polars, pandas, pyarrow, etc.):
4848

4949
```python
5050
import ggsql
51-
import polars as pl
51+
import pyarrow as pa
5252

53-
# Create a DataFrame
54-
df = pl.DataFrame({
53+
# Create a table
54+
table = pa.table({
5555
"x": [1, 2, 3, 4, 5],
5656
"y": [10, 20, 15, 30, 25],
5757
"category": ["A", "B", "A", "B", "A"]
5858
})
5959

6060
# Render to Altair chart
61-
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")
61+
chart = ggsql.render_altair(table, "VISUALISE x, y DRAW point")
6262

6363
# Display or save
6464
chart.display() # In Jupyter
@@ -71,18 +71,18 @@ For more control, use the two-stage API with explicit reader and writer:
7171

7272
```python
7373
import ggsql
74-
import polars as pl
74+
import pyarrow as pa
7575

7676
# 1. Create a DuckDB reader
7777
reader = ggsql.DuckDBReader("duckdb://memory")
7878

79-
# 2. Register your DataFrame as a table
80-
df = pl.DataFrame({
79+
# 2. Register your data as a table (accepts pyarrow, polars, pandas, etc.)
80+
table = pa.table({
8181
"date": ["2024-01-01", "2024-01-02", "2024-01-03"],
8282
"revenue": [100, 150, 120],
8383
"region": ["North", "South", "North"]
8484
})
85-
reader.register("sales", df)
85+
reader.register("sales", table)
8686

8787
# 3. Execute the ggsql query
8888
spec = reader.execute(
@@ -102,7 +102,7 @@ print(f"Layers: {spec.layer_count()}")
102102
# 5. Inspect SQL/VISUALISE portions and data
103103
print(f"SQL: {spec.sql()}")
104104
print(f"Visual: {spec.visual()}")
105-
print(spec.layer_data(0)) # Returns polars DataFrame
105+
print(spec.layer_data(0)) # Returns pyarrow.Table
106106

107107
# 6. Render to Vega-Lite JSON
108108
writer = ggsql.VegaLiteWriter()
@@ -125,9 +125,9 @@ reader = ggsql.DuckDBReader("duckdb:///path/to/file.db") # File database
125125

126126
**Methods:**
127127

128-
- `register(name: str, df: polars.DataFrame, replace: bool = False)` - Register a DataFrame as a queryable table
128+
- `register(name: str, table, replace: bool = False)` - Register data as a queryable table (accepts `pyarrow.Table`, `polars.DataFrame`, `pandas.DataFrame`, etc.)
129129
- `unregister(name: str)` - Unregister a previously registered table
130-
- `execute_sql(sql: str) -> polars.DataFrame` - Execute SQL and return results
130+
- `execute_sql(sql: str) -> pyarrow.Table` - Execute SQL and return results
131131

132132
#### `VegaLiteWriter()`
133133

@@ -161,9 +161,9 @@ Result of `reader.execute()`, containing resolved visualization ready for render
161161
- `sql() -> str` - The executed SQL query
162162
- `visual() -> str` - The VISUALISE clause
163163
- `layer_count() -> int` - Number of DRAW layers
164-
- `data() -> polars.DataFrame | None` - Main query result DataFrame
165-
- `layer_data(index: int) -> polars.DataFrame | None` - Layer-specific data (if filtered)
166-
- `stat_data(index: int) -> polars.DataFrame | None` - Statistical transform data
164+
- `data() -> pyarrow.Table | None` - Main query result data
165+
- `layer_data(index: int) -> pyarrow.Table | None` - Layer-specific data (if filtered)
166+
- `stat_data(index: int) -> pyarrow.Table | None` - Statistical transform data
167167
- `layer_sql(index: int) -> str | None` - Layer filter SQL
168168
- `stat_sql(index: int) -> str | None` - Stat transform SQL
169169
- `warnings() -> list[dict]` - Validation warnings from execution
@@ -205,51 +205,54 @@ Convenience function to render a DataFrame with a VISUALISE spec to an Altair ch
205205
**Returns:** An Altair chart object (Chart, LayerChart, FacetChart, etc.)
206206

207207
```python
208-
import polars as pl
208+
import pyarrow as pa
209209
import ggsql
210210

211-
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
212-
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")
211+
table = pa.table({"x": [1, 2, 3], "y": [10, 20, 30]})
212+
chart = ggsql.render_altair(table, "VISUALISE x, y DRAW point")
213213
```
214214

215215
## Examples
216216

217217
### Mapping Styles
218218

219219
```python
220-
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30], "category": ["A", "B", "A"]})
220+
import pyarrow as pa
221+
222+
table = pa.table({"x": [1, 2, 3], "y": [10, 20, 30], "category": ["A", "B", "A"]})
221223

222224
# Explicit mapping
223-
ggsql.render_altair(df, "VISUALISE x AS x, y AS y DRAW point")
225+
ggsql.render_altair(table, "VISUALISE x AS x, y AS y DRAW point")
224226

225227
# Implicit mapping (column name = aesthetic name)
226-
ggsql.render_altair(df, "VISUALISE x, y DRAW point")
228+
ggsql.render_altair(table, "VISUALISE x, y DRAW point")
227229

228230
# Wildcard mapping (map all matching columns)
229-
ggsql.render_altair(df, "VISUALISE * DRAW point")
231+
ggsql.render_altair(table, "VISUALISE * DRAW point")
230232

231233
# With color encoding
232-
ggsql.render_altair(df, "VISUALISE x, y, category AS color DRAW point")
234+
ggsql.render_altair(table, "VISUALISE x, y, category AS color DRAW point")
233235
```
234236

235237
### Custom Readers
236238

237-
You can use any Python object with an `execute_sql(sql: str) -> polars.DataFrame` method as a reader. This enables integration with any data source.
239+
You can use any Python object with an `execute_sql(sql: str)` method as a reader. The method should return a `pyarrow.Table` (or any type that `pyarrow.table()` can convert, such as a `polars.DataFrame`).
238240

239241
```python
240242
import ggsql
241-
import polars as pl
243+
import pyarrow as pa
244+
import pyarrow.csv
242245

243246
class CSVReader:
244247
"""Custom reader that loads data from CSV files."""
245248

246249
def __init__(self, data_dir: str):
247250
self.data_dir = data_dir
248251

249-
def execute_sql(self, sql: str) -> pl.DataFrame:
252+
def execute_sql(self, sql: str) -> pa.Table:
250253
# Simple implementation: ignore SQL and return fixed data
251254
# A real implementation would parse SQL to determine which file to load
252-
return pl.read_csv(f"{self.data_dir}/data.csv")
255+
return pyarrow.csv.read_csv(f"{self.data_dir}/data.csv")
253256

254257
# Use custom reader with ggsql.execute()
255258
reader = CSVReader("/path/to/data")
@@ -263,7 +266,7 @@ json_output = writer.render(spec)
263266

264267
**Additional methods** for custom readers:
265268

266-
- `register(name: str, df: polars.DataFrame, replace: bool = False) -> None` - Register a DataFrame as a queryable table (required)
269+
- `register(name: str, table, replace: bool = False) -> None` - Register data as a queryable table (required). Receives a `pyarrow.Table`.
267270
- `unregister(name: str) -> None` - Unregister a previously registered table (optional)
268271

269272
```python
@@ -273,12 +276,12 @@ class AdvancedReader:
273276
def __init__(self):
274277
self.tables = {}
275278

276-
def execute_sql(self, sql: str) -> pl.DataFrame:
279+
def execute_sql(self, sql: str) -> pa.Table:
277280
# Your SQL execution logic here
278281
...
279282

280-
def register(self, name: str, df: pl.DataFrame, replace: bool = False) -> None:
281-
self.tables[name] = df
283+
def register(self, name: str, table: pa.Table, replace: bool = False) -> None:
284+
self.tables[name] = table
282285

283286
def unregister(self, name: str) -> None:
284287
del self.tables[name]
@@ -292,7 +295,7 @@ Native readers like `DuckDBReader` use an optimized fast path, while custom Pyth
292295

293296
```python
294297
import ggsql
295-
import polars as pl
298+
import pyarrow as pa
296299
import ibis
297300

298301
class IbisReader:
@@ -305,22 +308,22 @@ class IbisReader:
305308
self.con = ibis.sqlite.connect()
306309
# Add other backends as needed
307310

308-
def execute_sql(self, sql: str) -> pl.DataFrame:
309-
return self.con.con.execute(sql).pl()
311+
def execute_sql(self, sql: str) -> pa.Table:
312+
return self.con.con.execute(sql).arrow()
310313

311-
def register(self, name: str, df: pl.DataFrame, replace: bool = False) -> None:
312-
self.con.create_table(name, df.to_arrow(), overwrite=replace)
314+
def register(self, name: str, table: pa.Table, replace: bool = False) -> None:
315+
self.con.create_table(name, table, overwrite=replace)
313316

314317
def unregister(self, name: str) -> None:
315318
self.con.drop_table(name)
316319

317320
# Usage
318321
reader = IbisReader()
319-
df = pl.DataFrame({
322+
table = pa.table({
320323
"date": ["2024-01-01", "2024-01-02", "2024-01-03"],
321324
"revenue": [100, 150, 120],
322325
})
323-
reader.register("sales", df)
326+
reader.register("sales", table)
324327

325328
spec = ggsql.execute(
326329
"SELECT * FROM sales VISUALISE date AS x, revenue AS y DRAW line",
@@ -356,7 +359,7 @@ pytest tests/ -v
356359
- Python >= 3.10
357360
- altair >= 5.0
358361
- narwhals >= 2.15
359-
- polars >= 1.0
362+
- pyarrow >= 14.0
360363

361364
## License
362365

0 commit comments

Comments
 (0)