Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,70 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [5.10.0] - 2026-05-09

Group-audit-aware Method-A edge-list export — the v5.9.0 single-entity
Method-A network is now extended to the consolidated-group level so
the `vynfi-group-audit-enterprise-2000` archive (and any other
group-audit run) emits per-entity *and* consolidated `je_network`
artefacts.

### Added — group-audit `je_network` export

- **Per-entity** `entities/{code}/graphs/je_network.{csv,parquet}` —
the same Method-A edges the single-entity output_writer would
produce, extended with `ic_pair_id` + `ic_partner_entity` columns
so consumers can join the IC postings back into pairs.
- **Consolidated** `consolidated/je_network.{csv,parquet}` — every
entity's edges concatenated, plus the elimination JEs (flagged
`is_eliminated=true`), with `entity_code` as a partition column.
- Schema additions on the consolidated file: `entity_code`,
`ic_pair_id`, `ic_partner_entity`, `is_eliminated`,
`eliminates_ic_pair_id`. Per-entity adds `ic_pair_id` +
`ic_partner_entity` only.
- Both formats use Zstd-compressed parquet (~5× smaller than CSV)
matching the convention from the single-entity dataset.

### Refactored — shared Method-A helper

- New `datasynth_runtime::je_network::build_je_network_edges()` —
pure builder reused by `output_writer::write_je_network_csv` and
the new group emitter. Single-entity CSV output is byte-identical
to v5.9.0 (verified by the existing
`tests/je_network_export.rs` integration test).
- New struct `JeNetworkEdge` exposes the IC fields through the
existing edge model so single-entity runs that *do* have IC
postings (multi-company-in-one-shard configs) can still surface
them — though the single-entity CSV writer keeps the v5.8.0
13-column schema for backwards compatibility.

### Validated against `mini_nestle.yaml`

End-to-end smoke through `datasynth-data group generate` against
`configs/examples/group/mini_nestle.yaml` produced:
- 4 per-entity `je_network.{csv,parquet}` files
(NESTLE_SA / NESTLE_USA / NESTLE_DE / NESTLE_BR)
- 368 elimination edges
- 69,287 consolidated edges
- 376 IC-pair edges (matched seller + buyer sides) with
`ic_pair_id` populated
- Coverage 0.9583 (matched / planned IC pairs)

`vynfi-group-audit-enterprise-2000` regeneration is the next
deliverable; this release ships the engine changes that make it
possible.

### Implementation notes

- Wired into `aggregate/driver.rs` immediately after
`eliminations_to_journal_entries` runs (step 7b) so elimination
edges can be flagged `is_eliminated=true` while the contributing
entity-tagged JEs are still in their pre-TB-rewrite form.
- `eliminates_ic_pair_id` is left empty on v5.10 elimination edges
— the synthetic JEs produced by the elimination factory don't
carry the source `IcPairId` on their headers. Plumbing it
through the elimination → JE conversion can land in v5.10.x.

## [5.9.0] - 2026-05-08

Customer-feedback follow-up release on top of v5.8.0. Bundles a
Expand Down
41 changes: 23 additions & 18 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 18 additions & 18 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ exclude = ["fuzz", "attic/datasynth-graph-export"]
# Root package for workspace-level benchmarks
[package]
name = "datasynth-workspace"
version = "5.9.0"
version = "5.10.0"
edition = "2021"
publish = false

Expand All @@ -44,7 +44,7 @@ tempfile = { workspace = true }
serde_json = { workspace = true }

[workspace.package]
version = "5.9.0"
version = "5.10.0"
edition = "2021"
license = "Apache-2.0"
rust-version = "1.88"
Expand All @@ -60,22 +60,22 @@ categories = ["simulation", "command-line-utilities"]
# Internal crates - version required for crates.io publishing
# Version must match workspace.package.version to prevent cargo from resolving
# old incompatible versions during publish verification
datasynth-core = { version = "5.9.0", path = "crates/datasynth-core" }
datasynth-config = { version = "5.9.0", path = "crates/datasynth-config" }
datasynth-generators = { version = "5.9.0", path = "crates/datasynth-generators" }
datasynth-output = { version = "5.9.0", path = "crates/datasynth-output" }
datasynth-runtime = { version = "5.9.0", path = "crates/datasynth-runtime" }
datasynth-graph = { version = "5.9.0", path = "crates/datasynth-graph" }
datasynth-server = { version = "5.9.0", path = "crates/datasynth-server" }
datasynth-test-utils = { version = "5.9.0", path = "crates/datasynth-test-utils" }
datasynth-eval = { version = "5.9.0", path = "crates/datasynth-eval" }
datasynth-ocpm = { version = "5.9.0", path = "crates/datasynth-ocpm" }
datasynth-banking = { version = "5.9.0", path = "crates/datasynth-banking" }
datasynth-fingerprint = { version = "5.9.0", path = "crates/datasynth-fingerprint" }
datasynth-standards = { version = "5.9.0", path = "crates/datasynth-standards" }
datasynth-audit-fsm = { version = "5.9.0", path = "crates/datasynth-audit-fsm" }
datasynth-audit-optimizer = { version = "5.9.0", path = "crates/datasynth-audit-optimizer" }
datasynth-group = { version = "5.9.0", path = "crates/datasynth-group" }
datasynth-core = { version = "5.10.0", path = "crates/datasynth-core" }
datasynth-config = { version = "5.10.0", path = "crates/datasynth-config" }
datasynth-generators = { version = "5.10.0", path = "crates/datasynth-generators" }
datasynth-output = { version = "5.10.0", path = "crates/datasynth-output" }
datasynth-runtime = { version = "5.10.0", path = "crates/datasynth-runtime" }
datasynth-graph = { version = "5.10.0", path = "crates/datasynth-graph" }
datasynth-server = { version = "5.10.0", path = "crates/datasynth-server" }
datasynth-test-utils = { version = "5.10.0", path = "crates/datasynth-test-utils" }
datasynth-eval = { version = "5.10.0", path = "crates/datasynth-eval" }
datasynth-ocpm = { version = "5.10.0", path = "crates/datasynth-ocpm" }
datasynth-banking = { version = "5.10.0", path = "crates/datasynth-banking" }
datasynth-fingerprint = { version = "5.10.0", path = "crates/datasynth-fingerprint" }
datasynth-standards = { version = "5.10.0", path = "crates/datasynth-standards" }
datasynth-audit-fsm = { version = "5.10.0", path = "crates/datasynth-audit-fsm" }
datasynth-audit-optimizer = { version = "5.10.0", path = "crates/datasynth-audit-optimizer" }
datasynth-group = { version = "5.10.0", path = "crates/datasynth-group" }

# Serialization
serde = { version = "1.0", features = ["derive"] }
Expand Down
4 changes: 4 additions & 0 deletions crates/datasynth-group/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,16 @@ blake3 = { workspace = true }
hex = { workspace = true }
strsim = "0.11"
rayon = { workspace = true }
arrow = { workspace = true }
parquet = { workspace = true }

[dev-dependencies]
datasynth-test-utils = { workspace = true }
rust_decimal_macros = { workspace = true }
pretty_assertions = "1"
tempfile = { workspace = true }
smallvec = { workspace = true }
uuid = { workspace = true }
# Subprocess-based determinism harness in `tests/determinism_in_process.rs`
# (Task 11.4) drives the `datasynth-data` CLI binary as a subprocess.
# `assert_cmd` is used for ergonomics; the binary itself is discovered
Expand Down
18 changes: 18 additions & 0 deletions crates/datasynth-group/src/aggregate/driver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,24 @@ pub fn run_aggregate(
// ── 7. Convert to elimination JEs (Task 5.5) ────────────────────────
let elim_jes = eliminations_to_journal_entries(&elim_result);

// ── 7b. v5.10 — emit per-entity + consolidated je_network artefacts.
// Hooks in here (after eliminations land but before TB consolidation
// rewrites contributing_jes) so we can mark elimination edges with
// is_eliminated=true while the original IC pair JEs are still in
// their entity-tagged form.
let je_network_summary = crate::aggregate::je_network::write_je_network_artefacts(
&contributing_jes,
&elim_jes,
out_dir,
)?;
tracing::info!(
"v5.10 je_network: {} per-entity files, {} elim edges, {} consolidated edges -> {:?}",
je_network_summary.per_entity_edge_count.len(),
je_network_summary.elim_edge_count,
je_network_summary.consolidated_edge_count,
je_network_summary.consolidated_csv_path,
);

// ── 8. Apply eliminations to pre-elim TB (Task 5.6) ─────────────────
let post_elim = apply_eliminations_to_tb(&pre_elim, &elim_jes)?;

Expand Down
Loading
Loading