DuckPond is a query-native filesystem for time-series data, built on Apache Arrow, DataFusion, and Delta Lake. Every filesystem object can be queried with SQL, and SQL queries create new filesystem objects that appear as native files and directories.
Built by the Caspar Water System.
# Build
make build
# Run unit tests
make test
# Initialize a pond and try it out
export POND=/tmp/mypond
pond init
pond mkdir /data
echo "hello" | pond copy - /data/greeting.txt
pond list /**
pond cat /data/greeting.txt- Rust stable toolchain (see
rust-toolchain.toml) - Docker (for integration tests and site deployment)
- Node.js >= 22 (for browser tests and vendor download)
make build # Build pond binary (debug)
make test # Run all unit tests
make integration # Build Docker test image + run integration tests
make check # fmt + clippy + test (CI equivalent)Run make with no arguments to see all available targets.
Download JavaScript vendor dependencies for offline site generation:
make vendor # Downloads DuckDB-WASM, Observable Plot, D3This populates crates/sitegen/vendor/dist/ (gitignored, ~35MB).
After this, pond run sitegen build produces sites that work without
network access.
crates/
tinyfs/ Pure filesystem abstractions (FS, WD, Node, path resolution)
tlogfs/ Delta Lake persistence (OpLog, transactions, DataFusion)
steward/ Transaction orchestration, control table, factory execution
provider/ URL-based data access, factory registry, table providers
cmd/ CLI commands (pond init/list/cat/copy/run/...)
sitegen/ Static site generator (factory)
remote/ S3 backup & replication (factory)
hydrovu/ HydroVu API collector (factory)
utilities/ Shared helpers (glob, chunked files, perf tracing)
scripts/ Shared deployment scripts
testsuite/ Integration tests (Docker-based)
tests/ Individual test scripts (NNN-description.sh)
browser/ Puppeteer browser validation tests
docs/ Architecture and design documentation
water/ Water monitoring demo site
septic/ Septic system demo site
noyo/ Noyo Harbor demo site
See docs/duckpond-overview.md for the full architecture description. Key layers (bottom to top):
| Layer | Crate | Role |
|---|---|---|
| Filesystem | tinyfs |
Pure abstractions: FS, WD, Node, path resolution |
| Persistence | tlogfs |
Delta Lake storage, OpLog, DataFusion integration |
| Orchestration | steward |
Transactions, control table, factory lifecycle |
| Data Access | provider |
URL schemes, factory registry, table providers |
| CLI | cmd |
User-facing commands |
See docs/cli-reference.md for the complete command reference. Common commands:
pond init # Create a new pond
pond list '/**' # List all entries
pond cat /path/to/file # Read a file
pond cat --sql "SELECT * FROM source WHERE ..." /path # Query a table
pond copy host:///local/file /pond/path # Import a file
pond copy host+series:///data.parquet /pond/series # Import time-series
pond mkdir /dir # Create a directory
pond mknod <factory> /path --config-path config.yaml # Install a factory
pond run /path/to/factory <command> # Execute a factory
pond log # Transaction historyTests live in testsuite/tests/ as numbered shell scripts. Each test
runs in a fresh Docker container with the pond binary:
make test-image # Build the test Docker image
make integration # Run all tests (skips browser tests)
make integration-all # Run all tests including browser
# Run a single test
cd testsuite && ./run-test.sh 201
# Run interactively (explore in container)
cd testsuite && ./run-test.sh --interactiveEach demo site (water/, septic/, noyo/) rsyncs data from its remote machine and runs everything locally:
# First time: configure your site
cp water/deploy.env.example water/deploy.env
# Edit deploy.env with your remote host and S3 credentials
# Site workflow (all run locally)
cd water
./setup-local.sh # rsync data + init pond + install factories
./run-local.sh # rsync new data + ingest
./generate-local.sh # build static site + preview
./update-local.sh # after editing YAML/templatesCredentials are kept in deploy.env (gitignored) — never in the YAML
configs checked into the repository. Remote machines use container
images built by GitHub Actions.
| Document | Contents |
|---|---|
| CLI Reference | Complete command syntax and examples |
| Architecture Overview | System design and crate map |
| System Patterns | Transaction model, factories, providers |
| Sitegen Design | Static site generator architecture |
| Cross-Pond Import | Foreign pond import status |
| Large File Storage | Content-addressed storage for large files |
| Releasing | Release process and supply chain security |
Apache-2.0 — see LICENSES/ for details.
