SynthGL generates synthetic accounting data for fintech QA teams and AI model training. Integration teams at companies like Plaid, Ramp, and Mercury use it to test accounting connectors against deterministic general ledger (GL) data - journal entries, trial balances, financial statements - without touching production systems.
SynthGL exports GL journal entries to Excel workbooks. Each row is a 14-column journal line: transaction IDs, dates, account codes, debit/credit amounts, memos, and policy metadata. A typical export spans 5,000 to 50,000 rows.
These exports run continuously in CI pipelines, so Excel I/O speed directly shapes developer feedback loops.
Data profile: 14 columns per row (mixed types: strings, floats, dates, IDs). Real accounting data with realistic value distributions.
SynthGL relied on openpyxl for all Excel I/O. Below 1,000 rows, performance was fine. Larger test datasets exposed the bottleneck:
- A 50,000-row GL export took 2.9s to write and 4.3s to read back for validation
- The test suite generates dozens of bundles per run, making Excel I/O a significant share of total CI time
- Reading a workbook to verify exported data took longer than generating the data itself
SynthGL replaced openpyxl with WolfXL in one import change:
- from openpyxl import load_workbook, Workbook
+ from wolfxl import load_workbook, WorkbookFor write-heavy paths, they adopted WolfXL's append() API, which buffers rows in Python and flushes them to Rust in a single call at save time:
from wolfxl import Workbook
wb = Workbook()
ws = wb.active
ws.title = "GL Journal"
ws.append(["GLID", "EntryID", "TxnDate", "AccountCode", "Debit", "Credit", ...])
for line in journal_lines:
ws.append([line.line_id, line.entry_id, line.date, line.account_code, ...])
wb.save("gl_export.xlsx")For reads, iter_rows(values_only=True) pulls all values in a single Rust call:
wb = load_workbook("gl_export.xlsx")
ws = wb[wb.sheetnames[0]]
for row in ws.iter_rows(values_only=True):
validate(row)
wb.close()The rest of the codebase worked unchanged. Cell access patterns like ws.cell() and ws["A1"].value kept working as before.
Benchmarked on Apple M-series, Python 3.12, mean of 5 runs per scale. Data generated by SynthGL's BundleGenerator using the saas company profile.
| Scale | Cells | openpyxl | WolfXL | Speedup |
|---|---|---|---|---|
| 1,000 rows | 14,000 | 0.070s | 0.015s | 4.7x |
| 5,000 rows | 70,000 | 0.278s | 0.076s | 3.7x |
| 10,000 rows | 140,000 | 0.614s | 0.151s | 4.1x |
| 20,000 rows | 280,000 | 1.219s | 0.299s | 4.1x |
| 50,000 rows | 700,000 | 2.967s | 0.761s | 3.9x |
| Scale | Cells | openpyxl | WolfXL | Speedup |
|---|---|---|---|---|
| 1,000 rows | 14,000 | 0.071s | 0.010s | 7.2x |
| 5,000 rows | 70,000 | 0.394s | 0.049s | 8.1x |
| 10,000 rows | 140,000 | 0.838s | 0.102s | 8.2x |
| 20,000 rows | 280,000 | 1.749s | 0.205s | 8.5x |
| 50,000 rows | 700,000 | 4.616s | 0.515s | 9.0x |
- Writes are 3.7-4.7x faster across all dataset sizes using
append()orwrite_rows(). - Reads get faster at scale: 7.2x at 1K rows, 9.0x at 50K. WolfXL's fixed Rust initialization cost amortizes across more data, so larger files benefit most.
- One import change, no rewrite: Batch APIs (
append,write_rows,iter_rows(values_only=True)) delivered additional gains with minimal refactoring. - 50K-row exports dropped from 7.6s to 1.3s (write + read combined), cutting CI cycle time directly.
The benchmark script is open source:
pip install wolfxl openpyxl
# Clone SynthGL and run the benchmark
git clone https://github.com/SynthGL/synthgl.git
cd synthgl
uv sync --extra dev
uv run python3 scripts/benchmark_wolfxl.py \
--scales 1000,5000,10000,20000,50000 \
--rounds 5 \
--output results/wolfxl_benchmark.jsonOr benchmark WolfXL against any Excel file you already have:
pip install wolfxl
wolfxl bench your_file.xlsx