Skip to content

Commit a62f7b4

Browse files
committed
fix(doc): remove repetitions and copy-paste leftover
1 parent 566831a commit a62f7b4

File tree

1 file changed

+3
-6
lines changed

1 file changed

+3
-6
lines changed

README.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -790,8 +790,7 @@ The program then writes that one record into a local Parquet file, does a second
790790

791791
### Bonus: download a full crawl index and query with DuckDB
792792

793-
If you want to run many of these queries, and you have a lot of disk space, you'll want to download the 300 gigabyte index and query it repeatedly. Run
794-
All of these scripts run the same SQL query and should return the same record (written as a parquet file).
793+
If you want to run many of these queries, and you have a lot of disk space, you'll want to download the 300 gigabyte index and query it repeatedly. Run:
795794

796795
```shell
797796
mkdir -p 'crawl=CC-MAIN-2024-22/subset=warc'
@@ -822,7 +821,7 @@ rm cc-index-table.paths
822821
cd -
823822
```
824823

825-
The structure should be something like this:
824+
In both ways, the file structure should be something like this:
826825
```shell
827826
tree my_data
828827
my_data
@@ -835,10 +834,8 @@ my_data
835834

836835
Then, you can run `make duck_local_files LOCAL_DIR=/path/to/the/downloaded/data` to run the same query as above, but this time using your local copy of the index files.
837836

838-
> [!IMPORTANT]
839-
> If you happen to be using the Common Crawl Foundation development server, we've already downloaded these files, and you can run ```make duck_ccf_local_files```
837+
Both `make duck_ccf_local_files` and `make duck_local_files LOCAL_DIR=/path/to/the/downloaded/data` run the same SQL query and should return the same record (written as a parquet file).
840838

841-
All of these scripts run the same SQL query and should return the same record (written as a parquet file).
842839

843840
## Bonus 2: combine some steps
844841

0 commit comments

Comments
 (0)