From 99a3db6ac7df890214c0a1dd2595af8752b33571 Mon Sep 17 00:00:00 2001
From: tianhao <vvv214wth@gmail.com>
Date: Wed, 17 Jun 2026 23:54:09 +0800
Subject: [PATCH] fix docs api correctness

---
 README.md                    | 12 ++++++------
 docs/data_and_terminology.md |  7 ++++++-
 docs/in_memory_api.md        | 13 ++++++-------
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/README.md b/README.md
index a43a8eb..1dc28ba 100644
--- a/README.md
+++ b/README.md
@@ -194,18 +194,18 @@ Both code paths support the following DP mechanisms:
 
 ## Further Documentation
 
-Detailed guides are available in the [`documentation/`](dpsynth/documentation/)
+Detailed guides are available in the [`docs/`](docs/)
 directory:
 
-*   **In-Memory DataFrame API Guide** (`documentation/in_memory_api.md`):
+*   **In-Memory DataFrame API Guide** (`docs/in_memory_api.md`):
     Detailed guide to using the Pandas-based API and local CLI.
-*   **Scalable Pipeline API Guide** (`documentation/scalable_beam_api.md`):
+*   **Scalable Pipeline API Guide** (`docs/scalable_beam_api.md`):
     Guide for distributed data generation.
-*   **Data Model & Terminology** (`documentation/data_and_terminology.md`):
+*   **Data Model & Terminology** (`docs/data_and_terminology.md`):
     Attributes, schema specifications, and `domain.yaml` format.
-*   **Processing Lifecycle** (`documentation/processing_lifecycle.md`):
+*   **Processing Lifecycle** (`docs/processing_lifecycle.md`):
     The 5-stage mathematical lifecycle shared by both code paths.
-*   **Contributor Guide** (`documentation/contributors_guide.md`):
+*   **Contributor Guide** (`docs/contributors_guide.md`):
     Architecture, PipelineBackend programming rules, and evaluation framework.
 
 *This is not an officially supported Google product. This project is
diff --git a/docs/data_and_terminology.md b/docs/data_and_terminology.md
index d4928ac..752b544 100644
--- a/docs/data_and_terminology.md
+++ b/docs/data_and_terminology.md
@@ -52,7 +52,12 @@ string categories. * **Boolean (`BOOL`)**: True/False binary flags. * **Enum
 
 ### 3. Record Independence (Differential Privacy Assumption)
 
-It is assumed that each **record** comes from different **privacy unit**.
+> [!IMPORTANT]
+> DPSynth provides record-level differential privacy: each **record** is assumed
+> to come from a different **privacy unit**. If one person or entity can
+> contribute multiple rows, callers must enforce the appropriate user-level
+> contribution bounds before running DPSynth; otherwise the guarantee is not
+> user-level DP.
 
 ## Supported Attribute Classifications
 
diff --git a/docs/in_memory_api.md b/docs/in_memory_api.md
index a9868a2..4204e78 100644
--- a/docs/in_memory_api.md
+++ b/docs/in_memory_api.md
@@ -31,7 +31,7 @@ synthetic_df = dpsynth.generate(
     epsilon: float,
     delta: float,
     *,
-    discrete_config: discrete_mechanisms.DiscreteMechanismConfig = discrete_mechanisms.MSTConfig(),
+    discrete_config: discrete_mechanisms.DiscreteMechanism = discrete_mechanisms.MSTMechanism(),
     numerical_bins: int = 32,
     one_way_marginal_budget_fraction: float = 0.1,
     cross_attribute_constraints: list = (),
@@ -63,8 +63,7 @@ synthetic_df = dpsynth.generate(
 ## End-to-End Python Example
 
 Here is a complete Python script demonstrating how to load data, parse a domain
-YAML file, configure the AIM mechanism with a fixed random seed, and generate
-synthetic records.
+YAML file, configure the AIM mechanism, and generate synthetic records.
 
 ```python
 import dpsynth
@@ -80,8 +79,7 @@ attribute_domains = domain.from_yaml_file("transaction_domain.yaml")
 
 # 3. Configure the synthesis mechanism (AIM)
 aim_config = discrete_mechanisms.AIMConfig(
-    seed=42,
-    rounds=50,
+    max_rounds=50,
     pgm_iters=1000,
 )
 
@@ -130,8 +128,9 @@ python3 bin/main.py \
 *   `--epsilon`, `--delta`: Total DP privacy budget.
 *   `--mechanism`: Supported options are `mst`, `aim`, `independent`, and
     `aim_gdp`.
-*   `--seed`: Integer seed for reproducible randomness across DP sampling and
-    PGM inference.
+*   `--seed`: Seeds NumPy's legacy global random state. The in-memory generator
+    also creates `np.random.default_rng()` internally, so identical CLI
+    invocations are not guaranteed to be bit-for-bit reproducible.
 *   `--output_path`: Destination filepath where the synthetic CSV will be
     written.