From c3356e5fd81bf2ec730c110b790bb2bf284ba9b4 Mon Sep 17 00:00:00 2001 From: "Victor M. Varela" Date: Fri, 1 May 2026 19:10:20 +0200 Subject: [PATCH] docs: add real-world examples and JSON/NDJSON input documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Update intro: 'reads CSV from stdin' → 'reads CSV, JSON, or NDJSON' - Add JSON/NDJSON input paragraph and example to Usage section - Add -I/--input-format and -O/--output-format to flags table - Fix broken chain example (missing -H on first sql-pipe call) - Add Real-world examples section with 6 verified live examples: La Liga wins/goals (CSV), OWID energy solar/wind (CSV+chain), restcountries density (NDJSON), Open-Meteo Madrid forecast (NDJSON) Closes #127 --- README.md | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 146 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 6c373b3..dba86b6 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ [![Release](https://img.shields.io/github/v/release/vmvarela/sql-pipe)](https://github.com/vmvarela/sql-pipe/releases/latest) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) -`sql-pipe` reads CSV from stdin, loads it into an in-memory SQLite database, runs a SQL query, and prints the results as CSV. No server, no schema files, no setup. +`sql-pipe` reads CSV, JSON, or NDJSON from stdin, loads it into an in-memory SQLite database, runs a SQL query, and prints the results. No server, no schema files, no setup. It exists because `awk` is cryptic, spinning up a Python interpreter for a one-liner feels wrong, and `sqlite3 :memory:` takes four commands before you can query anything. If you know SQL and work with CSV in the terminal, this is the tool you've been reaching for. @@ -143,7 +143,7 @@ Binary lands at `./zig-out/bin/sql-pipe`. SQLite is compiled from the official a ## Usage -The CSV comes from stdin. The first row must be a header — those column names become the schema for a table called `t`. Results go to stdout as comma-separated values. +The input comes from stdin. For CSV and TSV, the first row must be a header — those column names become the schema for a table called `t`. Results go to stdout as comma-separated values by default. ```sh $ printf 'name,age\nAlice,30\nBob,25\nCarol,35' | sql-pipe 'SELECT * FROM t' @@ -152,6 +152,14 @@ Bob,25 Carol,35 ``` +For JSON and NDJSON input, pass `-I json` (reads an array of objects) or `-I ndjson` (one object per line). Column names are taken from the keys of the first object: + +```sh +$ printf '[{"name":"Alice","score":95},{"name":"Bob","score":72}]' \ + | sql-pipe -I json 'SELECT name, score FROM t WHERE score > 80' +Alice,95 +``` + Columns are auto-detected as `INTEGER`, `REAL`, or `TEXT` based on the first 100 rows. Use `--no-type-inference` to force all columns to `TEXT`: ```sh @@ -183,11 +191,11 @@ $ printf 'name,age\nAlice,30\nBob,25' | sql-pipe --json 'SELECT * FROM t' `--json` is mutually exclusive with `-H`/`--header`. It can be combined with `-d`/`--delimiter` and `--tsv` to read non-comma-separated input. -Chain queries by piping back in — useful for two-pass aggregations: +Chain queries by piping back in — useful for two-pass aggregations. Pass `-H` to the first call so the second one sees column names: ```sh $ cat events.csv \ - | sql-pipe 'SELECT user_id, COUNT(*) as n FROM t GROUP BY user_id' \ + | sql-pipe -H 'SELECT user_id, COUNT(*) as n FROM t GROUP BY user_id' \ | sql-pipe 'SELECT * FROM t WHERE n > 100' ``` @@ -197,9 +205,11 @@ $ cat events.csv \ |------|-------------| | `-d`, `--delimiter ` | Input field delimiter (single character, default `,`) | | `--tsv` | Alias for `--delimiter '\t'` | +| `-I`, `--input-format ` | Input format: `csv` (default), `tsv`, `json`, `ndjson` | +| `-O`, `--output-format ` | Output format: `csv` (default), `tsv`, `json`, `ndjson` | | `--no-type-inference` | Treat all columns as TEXT (skip auto-detection) | | `-H`, `--header` | Print column names as the first output row | -| `--json` | Output results as a JSON array of objects (mutually exclusive with `-H`) | +| `--json` | Alias for `--output-format json` (mutually exclusive with `-H`) | | `--max-rows ` | Stop if more than `n` data rows are read (exit 1) | | `--columns` | Read the CSV header row, print each column name on its own line, and exit 0. With `-v`/`--verbose`, also shows the inferred type per column (`name INTEGER`). Respects `--delimiter` and `--tsv`. Mutually exclusive with a query argument. | | `--output ` | Write results to the given file instead of stdout. Creates or overwrites the file. Exits 1 if the file cannot be created. | @@ -289,6 +299,137 @@ $ cat products.csv | sql-pipe 'SELECT name, price, ROUND(price * 0.9, 2) as disc $ cat orders.csv | sql-pipe 'SELECT region, SUM(CASE WHEN status="paid" THEN amount ELSE 0 END) as paid, SUM(CASE WHEN status="refunded" THEN amount ELSE 0 END) as refunded FROM t GROUP BY region' ``` +## Real-world examples + +These run against live public URLs — no local files needed. + +**La Liga: all-time home wins (1929–present)** + +The [engsoccerdata](https://github.com/jalapic/engsoccerdata) dataset covers +Spanish first-division football since the inaugural season: + +```sh +$ curl -s https://raw.githubusercontent.com/jalapic/engsoccerdata/master/data-raw/spain.csv \ + | sql-pipe 'SELECT home AS team, COUNT(*) AS wins + FROM t WHERE CAST(hgoal AS INTEGER) > CAST(vgoal AS INTEGER) AND tier=1 + GROUP BY home ORDER BY wins DESC LIMIT 8' +Real Madrid,1174 +FC Barcelona,1163 +Atletico Madrid,956 +Athletic Bilbao,942 +Valencia CF,917 +Sevilla FC,815 +Espanyol Barcelona,777 +Real Sociedad,721 +``` + +**La Liga: highest-scoring seasons as JSON** + +Same dataset, different angle — output as JSON for downstream tools: + +```sh +$ curl -s https://raw.githubusercontent.com/jalapic/engsoccerdata/master/data-raw/spain.csv \ + | sql-pipe --json \ + 'SELECT Season, COUNT(*) AS matches, + ROUND(CAST(SUM(CAST(hgoal AS INTEGER)+CAST(vgoal AS INTEGER)) AS REAL)/COUNT(*),2) AS avg_goals + FROM t WHERE tier=1 GROUP BY Season ORDER BY avg_goals DESC LIMIT 5' +[{"Season":1929,"matches":90,"avg_goals":4.67},{"Season":1932,"matches":90,"avg_goals":4.44},...] +``` + +**OWID: countries by solar electricity share (2023)** + +[Our World in Data](https://github.com/owid/energy-data) publishes annual +energy statistics for 200+ countries. Find who leads on solar: + +```sh +$ curl -s https://raw.githubusercontent.com/owid/energy-data/refs/heads/master/owid-energy-data.csv \ + | sql-pipe 'SELECT country, ROUND(solar_share_elec,1) AS solar_pct + FROM t WHERE year=2023 AND solar_share_elec IS NOT NULL + AND iso_code NOT LIKE "%OWID%" + ORDER BY solar_pct DESC LIMIT 8' +Cook Islands,50.0 +Palestine,40.0 +Namibia,27.0 +Kiribati,25.0 +Lebanon,22.3 +Luxembourg,20.6 +Chile,20.1 +El Salvador,20.1 +``` + +**OWID: wind + solar combined — two-pass query** + +Add wind and solar in a first pass, then filter above 30% in a second. +`-H` passes column names through to the next stage. Spain sits at 40%: + +```sh +$ ENERGY=https://raw.githubusercontent.com/owid/energy-data/refs/heads/master/owid-energy-data.csv +$ curl -s "$ENERGY" \ + | sql-pipe -H 'SELECT country, + ROUND(solar_share_elec,1) AS solar, + ROUND(wind_share_elec,1) AS wind, + ROUND(solar_share_elec+wind_share_elec,1) AS total + FROM t WHERE year=2023 AND iso_code NOT LIKE "%OWID%" + AND solar_share_elec IS NOT NULL AND wind_share_elec IS NOT NULL' \ + | sql-pipe 'SELECT country, solar, wind, total FROM t + WHERE CAST(total AS REAL) >= 30 ORDER BY total DESC LIMIT 10' +Denmark,10.8,57.2,68.0 +Lithuania,13.0,47.9,60.9 +Luxembourg,20.6,35.5,56.0 +Cook Islands,50.0,0.0,50.0 +Netherlands,16.3,24.6,41.0 +Uruguay,3.8,37.1,41.0 +Greece,18.2,22.5,40.7 +Spain,17.4,23.0,40.4 +Germany,12.6,27.7,40.3 +Palestine,40.0,0.0,40.0 +``` + +**REST API: European population density** + +[restcountries.com](https://restcountries.com) returns a JSON array. Reshape +with `jq` into NDJSON (one object per line) and query directly with `-I ndjson`: + +```sh +$ curl -s https://restcountries.com/v3.1/region/europe \ + | jq -c '.[] | {country: .name.common, pop: .population, area: .area}' \ + | sql-pipe -I ndjson \ + 'SELECT country, pop, area, ROUND(CAST(pop AS REAL)/area,1) AS density + FROM t WHERE area > 0 ORDER BY density DESC LIMIT 8' +Monaco,38423,2.02,19021.3 +Gibraltar,38000,6.0,6333.3 +Malta,574250,316.0,1817.2 +Vatican City,882,0.49,1800.0 +Jersey,103267,116.0,890.2 +Guernsey,64781,78.0,830.5 +San Marino,34132,61.0,559.5 +Netherlands,18100436,41865.0,432.4 +``` + +**Live weather: 7-day Madrid forecast** + +[Open-Meteo](https://open-meteo.com) serves free forecasts as JSON. The daily +arrays need transposing into objects — `jq` handles that, then `-I ndjson` loads +the result: + +```sh +$ curl -s "https://api.open-meteo.com/v1/forecast?latitude=40.4168&longitude=-3.7038\ +&daily=temperature_2m_max,temperature_2m_min,precipitation_sum\ +&timezone=Europe%2FMadrid&forecast_days=7" \ + | jq -c '.daily + | [.time, .temperature_2m_max, .temperature_2m_min, .precipitation_sum] + | transpose + | .[] | {day:.[0], max_c:.[1], min_c:.[2], rain_mm:.[3]}' \ + | sql-pipe -I ndjson 'SELECT day, max_c, min_c, rain_mm FROM t ORDER BY day' +2026-05-01,24.3,11.8,0.0 +2026-05-02,19.2,14.5,3.9 +2026-05-03,20.5,12.5,7.0 +2026-05-04,19.3,11.3,0.2 +2026-05-05,16.9,9.1,1.8 +2026-05-06,19.7,7.3,0.0 +2026-05-07,19.6,10.7,2.1 +``` + ## How it works Each run opens a fresh `:memory:` SQLite database. The header row drives a `CREATE TABLE t (...)` with all columns as `TEXT`. Rows are loaded in a single transaction via a prepared `INSERT` statement, then `sqlite3_exec` runs your query and prints rows one by one.