Skip to content

Feat/rust bindings#6

Merged
Verdenroz merged 7 commits intomasterfrom
feat/rust-bindings
Dec 25, 2025
Merged

Feat/rust bindings#6
Verdenroz merged 7 commits intomasterfrom
feat/rust-bindings

Conversation

@Verdenroz
Copy link
Copy Markdown
Owner

Rust + PyO3 Bindings for High-Performance AGON Encoding

Summary

This PR introduces a complete Rust core implementation for AGON with PyO3 bindings, delivering significant performance improvements while maintaining 100% API compatibility with the Python-only implementation.

Key Achievements

  • 10-50x faster encoding for large datasets (see Performance section)
  • Parallel format evaluation using rayon for auto mode
  • Zero-copy PyO3 bindings between Python and Rust
  • 100% API compatibility - no breaking changes for users
  • Reduced memory allocations and optimized string operations

Motivation

The original Python-only implementation worked well for small datasets but faced performance bottlenecks on larger payloads:

  • Pure Python string operations and list comprehensions were slow
  • Sequential format evaluation in auto mode (no parallelism)
  • High memory allocations during encoding/decoding
  • Token counting overhead when using tiktoken

Solution: Rewrite the core encoding/decoding engine in Rust, expose it to Python via PyO3, and leverage Rust's performance advantages (compiled code, zero-cost abstractions, parallel processing with rayon).


Changes

New Rust Core (crates/agon-core/)

  • src/formats/rows.rs: AGONRows format implementation (1,339 lines)
  • src/formats/columns.rs: AGONColumns format implementation (1,309 lines)
  • src/formats/struct_fmt.rs: AGONStruct format implementation (1,501 lines)
  • src/lib.rs: PyO3 bindings and Python interface (513 lines)
  • src/error.rs: Custom error types with PyO3 integration (102 lines)
  • src/types.rs: Shared type definitions (130 lines)
  • src/utils.rs: Utility functions (74 lines)

Python Layer Refactoring

  • Moved Python code from src/ to python/ for clarity
  • Simplified python/agon/core.py to use Rust bindings (reduced from ~1,000 lines to 166 lines)
  • Removed pure Python format implementations:
    • src/agon/formats/text.py (837 lines) → ✅ crates/agon-core/src/formats/rows.rs
    • src/agon/formats/columns.py (895 lines) → ✅ crates/agon-core/src/formats/columns.rs
    • src/agon/formats/struct.py (1,070 lines) → ✅ crates/agon-core/src/formats/struct_fmt.rs

Documentation Updates

  • Renamed AGONText → AGONRows for clarity and alignment with TOON format
  • Updated all documentation to reflect Rust-powered architecture
  • Added comprehensive performance benchmarks with encode/decode times
  • Added Rust+PyO3 architecture section to benchmarks.md
  • Updated API docs to reflect fast byte-length estimation as default (opt-in tiktoken)

Build & Tooling

CI/CD Enhancements:

  • .github/workflows/ci.yml: Added Rust toolchain installation and Rust coverage reporting to the CI workflow, ensuring Rust code is built, tested, and covered alongside Python code [1] [2] [3]
  • .github/workflows/publish.yml: Refactored to build wheels for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x64) using maturin, with separate jobs for each platform. Added source distribution build and aggregates all artifacts before uploading to PyPI
  • Cargo.toml: Introduced Rust workspace with common dependencies and release profile optimizations for performance

Development Tooling:

  • Makefile: Updated to include Rust build, test, and benchmark targets, improved help output to clarify hybrid workflow. Both Python and Rust linting/formatting now run in parallel [1] [2]
  • .pre-commit-config.yaml: Enhanced with Rust formatting (cargo fmt) and linting (cargo clippy) hooks, upgraded Python tooling (Ruff, codespell), improved Python type checking coverage
  • pyproject.toml: Configured maturin for building PyO3 extension
  • codecov.yml: Added coverage configuration for both Python and Rust

For Contributors

  • Rust toolchain required for development (cargo, rustc via rustup)
  • Use make build to compile Rust code (automatically done by maturin develop)
  • Pre-commit hooks now include Rust linting (cargo clippy, cargo fmt)

Testing

All existing tests pass with Rust implementation:

  • tests/test_core.py: Core AGON.encode/decode functionality
  • tests/test_rows.py: AGONRows format (renamed from test_text.py)
  • tests/test_columns.py: AGONColumns format
  • tests/test_struct.py: AGONStruct format
  • tests/test_benchmarks.py: Performance benchmarks with encode/decode times

Running Tests

# Install with development dependencies
uv sync --dev

# Build Rust extension
make build  # or: uv run maturin develop

# Run tests
uv run pytest

# Run benchmarks
make benchmarks

Documentation Changes

  • Renamed AGONText → AGONRows throughout all documentation for clarity and alignment with TOON format
  • README.md: Updated to highlight Rust and PyO3 integration, clarify that the core is now Rust-powered, and consistently use "rows" instead of "text" for the main encoding format. Added details on parallel format selection, token savings thresholds, and clarified example outputs and explanations [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
  • docs/benchmarks.md: Added Rust+PyO3 performance section with comprehensive encode/decode times, split into separate tables for clarity
  • docs/api.md: Updated API reference with encoding: Encoding | None = None parameter, moved hint() method to AGONEncoding class
  • docs/index.md: Added "Rust-Powered Performance" callout, reorganized Quick Start to show all format outputs in tabs
  • docs/formats/: Renamed text.md → rows.md, updated all format documentation with correct import statements
  • mkdocs.yml: Fixed mermaid diagram rendering configuration

Future Work

Potential optimizations and enhancements:

  • SIMD optimizations for token counting
  • Custom memory allocator for even faster string building
  • Cross-language implementations (Go, TypeScript ports)
  • Additional format implementations (e.g., AGONTable for markdown tables)

@github-actions github-actions bot added ci documentation Improvements or additions to documentation enhancement New feature or request dependencies labels Dec 25, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 25, 2025

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@Verdenroz Verdenroz merged commit 5339afc into master Dec 25, 2025
12 of 13 checks passed
@Verdenroz Verdenroz deleted the feat/rust-bindings branch December 26, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci dependencies documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant