Skip to content

Implement std.codec.hexEncode and std.codec.utf8Valid direct emitters#258

Open
555aaditya wants to merge 2 commits into
vercel-labs:mainfrom
555aaditya:555aaditya/std-codec-helpers
Open

Implement std.codec.hexEncode and std.codec.utf8Valid direct emitters#258
555aaditya wants to merge 2 commits into
vercel-labs:mainfrom
555aaditya:555aaditya/std-codec-helpers

Conversation

@555aaditya
Copy link
Copy Markdown

Description

Why this change was made

This PR implements direct compilation support and native runtime implementation for two core standard library codec helpers: std.codec.hexEncode and std.codec.utf8Valid.

Currently, direct backend executables are required to compile standard library helpers natively without a generic interpreter fallback. Adding these emitters is a vital step toward enabling fully direct-compiled Zero applications to compile, validate, and serialize standard data formats.


How it improves or helps

  1. Bypasses Calling Convention Complexity:
    Returning a 24-byte struct (like Maybe<String>) by value is a complex operation under AMD64/AArch64 System V ABIs, which would normally require caller-allocated stack slots passed via a hidden pointer.

    Instead, we utilize a clean struct-return bypass:

    • zero_hex_encode returns a simple int64_t byte count (or -1 on failure).
    • The ELF (x64) and Mach-O (ARM64) emitters intercept the call, execute the function enregistered, test the returned value in rax / x0, and dynamically unpack the returned length onto the stack variable slot of Maybe<String> in-place.
  2. Robust Multi-Byte UTF-8 Validation:
    Implements fully compliant UTF-8 validation (zero_utf8_valid) in the C runtime for enregistered ZeroByteView structs.

  3. Pristine Verification and Verification Pipelines:
    Integrates seamlessly with the compiler's MIR verifier, target facts metrics auditor, and strict size/complexity line-budget guardrails.


Key Changes

  • C Runtime: Implemented lookup-table based zero_hex_encode and state-machine zero_utf8_valid in native/zero-c/runtime/zero_runtime.c.
  • Direct Emitters:
    • ELF X64: Wired elf_emit_utf8_valid_call and elf_emit_hex_encode_to_local with stack variable unpacking.
    • Mach-O ARM64: Wired macho_emit_utf8_valid_call_at and macho_emit_hex_encode_to_local with scratch-spill register safety and condition checks.
  • AST/IR Lowering: Wired dotted-path call resolution in native/zero-c/src/ir.c for std.codec.hexEncode and std.codec.utf8Valid.
  • Guardrail Budgets: Adjusted metrics limits in scripts/compiler-metrics.mts.

How the improvement can be verified

1. Native Build and Run

Recompile the native compiler from source and execute the standard data format tests:

make -C native/zero-c
bin/zero run examples/std-data-formats.0

2. Run the Full Test Suite Locally

Verify that all conformance, language server, metrics, and documentation tests pass cleanly:

pnpm run conformance:local
pnpm run test:zero
pnpm run docs:test
pnpm run zls -- --self-test

Closes #257

@vercel
Copy link
Copy Markdown

vercel Bot commented May 24, 2026

@555aaditya is attempting to deploy a commit to the Vercel Labs Team on Vercel.

A member of the Team first needs to authorize it.

Comment thread native/zero-c/src/emit_elf64.c Outdated
- Implement zero_hex_encode and zero_utf8_valid runtime helpers in C
- Add IR lowering and verification for std.codec.hexEncode and std.codec.utf8Valid
- Wire direct AArch64 Mach-O and x64 ELF emitters to bypass System V struct-return ABI
- Adjust metrics budgets and update embedded runtime sources
@555aaditya 555aaditya force-pushed the 555aaditya/std-codec-helpers branch from dd7b9e7 to 3bbbea3 Compare May 24, 2026 15:54
@555aaditya
Copy link
Copy Markdown
Author

Hi @ctate,

I have implemented the compiler direct emitters and C runtime helpers for std.codec.hexEncode and std.codec.utf8Valid, resolving the System V AMD64/AArch64 ABI struct-return bypass.

All native conformance tests, JS CLI tests, ZLS self-tests, and compiler metrics budgets are passing cleanly. Could you please take a look, review, and authorize/merge this when you get a chance?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Implement std.codec.hexEncode and std.codec.utf8Valid in direct backend and runtime

1 participant