Skip to content

Commit 73a4e77

Browse files
authored
docs: simplified tutorial
2 parents 7e622ad + 97eaf70 commit 73a4e77

1 file changed

Lines changed: 6 additions & 7 deletions

File tree

docs/tutorial.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -153,14 +153,13 @@ for url, domain in rows:
153153
con.execute("CREATE TABLE surt_lookup (url VARCHAR, surt_host_name VARCHAR, url_surtkey VARCHAR)")
154154
con.executemany("INSERT INTO surt_lookup VALUES (?, ?, ?)", surt_data)
155155
156-
result = con.sql("""
157-
SELECT s.surt_host_name, s.url_surtkey, e.category_path as category
158-
FROM sites e JOIN surt_lookup s ON e.url = s.url
159-
ORDER BY s.surt_host_name
156+
con.sql("""
157+
COPY (
158+
SELECT s.surt_host_name, s.url_surtkey, e.category_path as category
159+
FROM sites e JOIN surt_lookup s ON e.url = s.url
160+
ORDER BY s.surt_host_name
161+
) TO 'curlie.parquet' (FORMAT PARQUET)
160162
""")
161-
162-
import pyarrow.parquet as pq
163-
pq.write_table(result.fetch_arrow_table(), 'curlie.parquet')
164163
```
165164

166165
The real script also extracts languages, validates domains, and handles edge cases — but this is the core idea: **read the source data → add SURT columns → write parquet**.

0 commit comments

Comments
 (0)