Skip to content

Cross-pond linkage now tested#59

Merged
jmacd merged 30 commits intomainfrom
jmacd/33
Apr 3, 2026
Merged

Cross-pond linkage now tested#59
jmacd merged 30 commits intomainfrom
jmacd/33

Conversation

@jmacd
Copy link
Copy Markdown
Owner

@jmacd jmacd commented Apr 2, 2026

Also a prototype logs viewer for jsonlogs:// files.
Attempting to run DuckDB-WASM from vendored copy, to run offline.

jmacd and others added 30 commits March 25, 2026 21:01
Add effective_root: NodePath to WD, enabling a sub-directory to act as
the root for path resolution. This is the foundation for cross-pond
sitegen where foreign ponds define site.yaml with absolute paths that
must resolve within their own root, not the importing pond's root.

Key changes:
- WD struct gains effective_root field (always present, not Option)
- FS::root() sets effective_root to actual root; FS::wd() takes parameter
- resolve(): Component::RootDir resets stack to effective root
- resolve(): Component::ParentDir clamps at effective root boundary
- Symlink targets are contained within effective root scope
- Glob/visitor patterns honor effective root for leading / strip
- All child WD creation propagates effective root via child_wd()
- New API: as_root(), effective_root(), is_at_root_boundary()

9 new unit tests cover absolute paths, .. clamping, symlink containment,
glob patterns, child WD inheritance, and backward compatibility.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove stdin_open/tty from docker-compose.test.yaml — they caused
docker compose run to hang in non-interactive shells. Interactive
debugging still works via run-test.sh --interactive (uses docker run
-it directly, not compose).

Add timeout --foreground to run-test.sh so timeout can detect child
process exit when invoked from non-interactive shells (e.g., run-all.sh).
Without --foreground, timeout creates a new process group and cannot
detect when docker compose run exits.

S3/MinIO tests now complete in 7-25s instead of hitting the 300s timeout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow source_path "/" (and "/**") in cross-pond import config.
When importing the foreign root, the mount point is a fresh local
directory (not the foreign root UUID, which would collide with the
local root). The foreign root's children are linked with their
original foreign IDs via recursive import.

New testsuite test 532-cross-pond-path-boundaries.sh (12 checks):
- Imports producer pond by root into consumer at /imports/producer
- Positive: imported files accessible, data matches byte-for-byte
- Negative: consumer-only paths not resolvable in import context,
  same-name files in consumer vs import have distinct content
- Provenance: pond_ids are distinct

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oot()

Add effective_root field to FactoryContext with root() method that
returns a chrooted WD when set, or the global root otherwise.
Factories should use context.root() instead of
context.context.filesystem().root() for automatic cross-pond
path scoping.

Sitegen updated to use context.root() at all 3 call sites.

Struct literal constructions simplified to use FactoryContext::new()
instead of spelling out all fields (removes field-listing duplication
in 6 test helpers + 2 internal sites).

Next step: add pond_id to Node so tinyfs resolve() can auto-detect
import boundaries and set effective_root during traversal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FileID now carries pond_id: Uuid as part of identity (participates in
Eq/Hash). This distinguishes nodes from different ponds, enabling
cross-pond imports where the same root UUID exists in multiple ponds.

Key changes:
- FileID struct gains pond_id field, included in Eq/Hash/Serialize
- FileID::root_for(pond_id) creates pond-scoped root identity
- FileID::root() uses local_pond_uuid() default for backward compat
- All FileID constructors gain pond_id parameter
- new_child_id() and child_id() inherit pond_id from parent
- local_pond_uuid() constant for memory/hostmount/test contexts
- ~18 files updated across tinyfs, tlogfs, provider, remote, steward, cmd

All 181 tinyfs + 62 tlogfs + 184 provider tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- journal-ingest: track per-file timestamp bounds, store temporal
  metadata and extended_attributes on FilePhysicalSeries writes
- tlogfs: persist temporal metadata for FilePhysicalSeries in both
  small and large file paths; relax new_large_file_series assertion
  to accept FilePhysicalSeries entry type
- export: refactor into export_series_to_parquet (wrapper) and
  export_table_provider_to_parquet (core) with configurable
  timestamp column name
- sitegen: detect URL-scheme patterns in export stages, route
  through UrlPatternMatcher and format provider registry; add
  timestamp_column config field to ExportStage
- sitegen: add log_viewer shortcode, logs layout, log-viewer.js
  client-side viewer with DuckDB-WASM, unit filter pills,
  pagination, and priority coloring
- linux: add site.yaml, site templates, and sitegen setup to
  setup.sh for local log viewing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When resolve() encounters a child directory with a different pond_id
than its parent, it detects a cross-pond boundary. The parent directory
(mount point) is automatically set as the effective root on the
returned WD. This means absolute paths from that point resolve within
the imported subtree, not the global root.

Detection is transparent — any code that resolves paths through an
imported partition gets automatic chroot scoping. No explicit
as_root() call needed for cross-pond imports.

Two new unit tests:
- test_auto_detect_pond_boundary: verifies effective_root is set
  to the mount point when crossing a pond boundary
- test_auto_detect_absolute_path_scoped_to_mount: verifies that
  after auto-detection, absolute paths resolve within the mount

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- chart.js/overlay.js: use import.meta.url for absolute vendor URLs
  (fixes blob Worker importScripts with relative paths, fixes Vite
  rewriting ./vendor/ to /vendor without trailing slash)
- noyo: remove hardcoded ROOT path, use cargo run --release, source
  deploy.env for S3 credentials, envsubst for backup.yaml
- noyo/export.sh: preview command uses correct BASE_PATH for subdir site
- septic/backup.yaml: fix envsubst syntax (remove shell default)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Major changes to cross-pond import infrastructure:

Persistence layer - pond_id query scoping:
- Add pond_uuid() method to PersistenceLayer trait with default fallback
- Implement pond_uuid() on State using real pond UUID from OpLogPersistence
- Forward pond_uuid() through CachingPersistence wrapper
- Filter committed records by pond_id in query_latest_record,
  query_latest_directory_record, and query_records
- FS::root() now uses FileID::root_for(persistence.pond_uuid())
- initialize_root_directory takes explicit pond_id parameter

Import architecture simplification:
- Remove create_child_dirs_recursive - one directory entry per mount point
  is sufficient; foreign partition data contains the full directory tree
- Extract foreign pond UUID from backup OpLog for correct FileID construction
- Include foreign root partition in import set for root imports
- Add collect_partitions_recursive for partition ID discovery (metadata only)
- Fix deep recursion bug in execute_import partition discovery

Bug fixes:
- Fix Ship::create_pond double PondMetadata::default() - was creating two
  different UUIDs for control table vs data persistence
- Thread pond_id through tinyfs_object_store path parsing
- Fix S3 endpoint hostname in deploy.env.example files

New cross-pond example (cross/):
- setup.sh discovers pond IDs from noyo, water, septic
- import.sh pulls data from all three source ponds
- generate.sh builds combined site via sitegen
- Combined site.yaml with exports from all three sources

WIP: Foreign root partition directory listing not yet resolving correctly
through pond_id scoping - the cache lookup needs further debugging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DirectoryEntry did not store the child's pond_id. When
OpLogDirectory::insert() added a foreign node to a local directory,
the foreign pond_id was discarded. OpLogDirectory::get() then used
the parent's pond_id to reconstruct the child FileID, causing
queries to filter by the wrong pond_id and return the local root's
directory record instead of the foreign one.

Added pond_id: Option<String> to DirectoryEntry:
- None = same pond as parent (default, backward compatible)
- Some(uuid) = child from a foreign pond (cross-pond import)

insert() now compares child vs parent pond_id and stores it when
they differ. get() and remove() use the stored pond_id for child
FileID construction. flush_directory_operations() preserves pond_id
when recreating entries at flush time.

Backward compatible: existing Arrow IPC data without the pond_id
column deserializes as None via #[serde(default)].

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sql_derived, temporal_reduce, and timeseries_pivot resolved absolute
paths via fs.root() which always returns the global pond root. When
these factories run inside a cross-pond import mount, absolute paths
like /sensors/station_a need to resolve within the foreign tree, not
the consumer's tree.

Changed all read-path factories to use context.root() which respects
the effective_root set by cross-pond boundary detection. This is the
same API sitegen already uses.

Write-path factories (remote, hydrovu, logfile_ingest, journal_ingest)
are not changed — they legitimately need the global root to write to
the local pond.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a dynamic node (dynamic-dir, timeseries-join, etc.) belongs to
a foreign pond (cross-pond import), its FactoryContext needs an
effective_root so that absolute paths in factory configs resolve
within the imported tree, not the consumer's global root.

Three changes:
- create_dynamic_node_from_oplog_entry (persistence.rs): detects
  foreign nodes by comparing node pond_id vs persistence pond_uuid,
  loads the foreign root, and sets effective_root on the context
- DynamicDirDirectory::create_child_context (dynamic_dir.rs):
  propagates effective_root to child factory contexts
- FactoryContext::effective_root() getter (context.rs): added for
  propagation

Also adds test 533-cross-pond-factory-resolution.sh which imports
a producer pond containing a dynamic-dir with timeseries-join and
verifies the factory output matches in the consumer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Documents three fixes from this session:
- DirectoryEntry pond_id for cross-pond child identity
- Factory path resolution via context.root()
- effective_root threading into dynamic factory contexts

Updates architecture notes with factory path resolution convention
and effective_root derivation model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add subsites: directive to site.yaml config. When building a combined
site, the top-level sitegen reads each imported pond's own sitegen
config and generates it into a subdirectory, using WD::as_root() to
scope path resolution to the import mount point.

Key changes:
- config.rs: SubsiteConfig struct (name, path, config, base_url)
- factory.rs: Extract build_site_from_root() from execute(), iterate
  subsites in execute() reading foreign configs and building scoped
- layouts.rs: Add root_base_url to LayoutContext for shared asset
  references; use base_url for per-site theme.css link
- factory.rs: Split write_builtin_assets into write_shared_assets
  (once at top level) and write_theme_css (per site)
- Refactor run_export_stages and run_content_stages to take WD root
  directly instead of FactoryContext

cross/ example simplified from 140-line manual duplication to 50-line
config using subsites: directive. Per-system templates removed; each
sub-site uses its own templates from the imported pond.

Integration test: testsuite/tests/540-recursive-sitegen.sh

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… resolution

Three bugs prevented cross-pond import from working end-to-end:

1. list_transaction_files() constructed bundle_id with today's date via
   transaction_bundle_id(), but stored bundles contain the original push
   date. Changed to LIKE prefix+suffix matching so the date component
   is not required to match exactly.

2. extract_foreign_pond_id() used a non-deterministic SQL query on the
   oplog. Replaced with RemoteTable::extract_pond_id() which reads the
   pond_id directly from FILE-META partition keys in the Delta table.
   Also removed the config_url parameter from execute_import() which
   parsed the URL to guess the pond_id.

3. Provider path resolution ignored effective_root. All four
   self.fs.root() calls in provider_api.rs bypassed cross-pond root
   scoping, so absolute paths inside imported factories resolved from
   the consumer pond's root instead of the foreign pond's root. Added
   Provider::with_root() and wired it through temporal_reduce and
   sql_derived factories. Fixed WD::as_root() to reset display paths
   to "/" (chroot semantics) so collect_matches returns paths that
   round-trip correctly through resolve_path.

Infrastructure:
- Makefile: added site-noyo, site-septic, site-water, sites, site-cross
- noyo: added rm -rf to setup.sh, allow_http to backup config
- water: added backup.yaml and backup factory to setup-local.sh
- septic/water: skip rsync when local data already exists

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three issues prevented large files from being backed up and imported:

1. get_large_files() only matched 'sha256=' prefix but large files
   are now named with 'blake3=' prefix. Added blake3= to the match.

2. execute_push() returned early when all transactions were already
   backed up, skipping the large file backup section. Restructured
   to always run the large file check regardless of transaction state.

3. execute_import() never downloaded large files from the foreign
   backup. Added a post-partition step that scans for _large_files/
   entries in the remote table and downloads them to the local pond.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two issues prevented the cross-pond site from functioning:

1. TinyFS ObjectStore path format lacked pond_id. The tinyfs:// URL
   format was part/{part_id}/node/{node_id}/version/ which meant
   the ObjectStore always constructed FileIDs with the local pond's
   UUID. For foreign mount files this caused list_file_versions to
   return empty results (wrong pond_id filter). Added pond/{pond_id}
   prefix to the URL format so the correct pond_id is preserved
   through the DataFusion -> ObjectStore -> persistence round-trip.
   Updated TinyFsPathBuilder, parse_tinyfs_path (with legacy
   fallback), memory file as_table_provider, and sql_derived URL
   construction.

2. Subsite sidebar links used the source site's original base_url
   (e.g., /noyo-harbor/) instead of the cross site's mount point
   (e.g., /noyo/). Added SiteConfig::rewrite_sidebar_urls() which
   replaces the old base_url prefix with the new one in all sidebar
   href values when a subsite is mounted at a different path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The sidebar.md had {{ content_nav }} (missing closing /}),
which was treated as a literal template variable instead of
a shortcode invocation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use correct temporal-reduce config fields (in_pattern, out_pattern,
  time_column, aggregations) instead of deprecated names
- Fix shortcode syntax: {{ chart /}} and {{ content_nav /}}
- Add missing mkdir -p /system/site for consumer portal templates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Shared assets (style.css, chart.js, overlay.js, log-viewer.js) must
always load from the web server root ('/'), not prefixed with base_url.
The root_base_url mechanism incorrectly prefixed them for subdir builds.

Removed root_base_url from LayoutContext and all call sites. Shared
asset references are now hardcoded to '/' in layouts, matching the
original design where style.css and chart.js are served from the
top-level output directory.

Updated test 209 to check theme.css instead of style.css for theme
overrides, since theme overrides are now written to a separate file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three issues in the browser test (202):

1. Vendor files (DuckDB-WASM, Observable Plot, D3) were missing from
   the Docker test image. Added vendor/ copy to build-image.sh and
   Dockerfile so sitegen can bundle them into generated sites.

2. Test 201's synthetic data used 2025 dates which fall outside
   chart.js's default 3-month window (now April 2026). Updated to
   span 2025-06 through 2026-06 so charts render data.

3. Vite's base path rewriting doubled the base_url prefix in
   pre-rendered HTML hrefs (e.g., /myapp/myapp/theme.css). Changed
   the subdir browser test to nest the output under myapp/ and
   serve from the parent with base '/' — matching how a real
   reverse proxy deployment works.

Also simplified DuckDB Worker creation (direct URL instead of
blob+importScripts) and added @vite-ignore hints on vendor imports.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The browser test needs DuckDB-WASM and Observable Plot vendor files
bundled into the Docker test image so sitegen can include them in
generated sites. Added 'make vendor' step before build-image.sh.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jmacd jmacd merged commit 9aaad0c into main Apr 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant