Skip to content

Improve ZaString safety, correctness, and performance#27

Draft
CorentinGS wants to merge 15 commits intomainfrom
feature/zastring-final
Draft

Improve ZaString safety, correctness, and performance#27
CorentinGS wants to merge 15 commits intomainfrom
feature/zastring-final

Conversation

@CorentinGS
Copy link
Copy Markdown
Owner

Summary

  • harden the pooled builder and UTF-8 APIs with disposal guards, overflow/runtime validation, safer formatting retries, and improved edge-case handling
  • expand the fluent API with alignment support, form URL encoding, query param helpers, span-based formatting overloads, and missing pooled-builder mutation helpers
  • improve performance and quality with faster escaping/UTF-8 paths, updated build configuration, and broader test coverage for disposal, UTF-8, escaping, and mutation scenarios

- Change IndexOutOfRangeException to ArgumentOutOfRangeException in ZaSpanString indexer
- Add [DoesNotReturn] attribute to ThrowOutOfRangeException
- Add alignment support overloads to ZaInterpolatedStringHandler
- Add AppendFormUrlEncoded (spaces as +) and AppendQueryParam with ref bool isFirst
- Add AppendJoin for IEnumerable<string?> and AppendFormat with ReadOnlySpan<char> format
- Add SetLength, RemoveLast, indexer, and TryAppend to ZaPooledStringBuilder
- Update tests for new exception types
- Document ReadOnlySpan<char> AppendFormat overloads as allocating
- Add TODO for zero-allocation format parser
- Add tests for interpolated string alignment
- Add tests for AppendFormUrlEncoded
- Add tests for AppendQueryParam with ref bool isFirst
- Add tests for AppendJoin with IEnumerable<string?>
- Add tests for ZaPooledStringBuilder SetLength, RemoveLast, indexer, TryAppend
- Add tests for ReadOnlySpan<char> AppendFormat
… quoting, TryAppendLine atomicity comment, ZaSpanString XML docs, ZaPooledStringBuilder validation
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens ZaString’s span/pooled builder APIs around disposal, bounds checking, UTF-8 handling, and escaping, while expanding the fluent surface area (query params, form URL encoding, alignment, span-based formatting) and adding regression tests/benchmarks.

Changes:

  • Harden pooled builder + UTF-8 APIs (dispose guards, negative-length handling, capacity growth tweaks, formatting retry limit).
  • Expand fluent APIs (alignment in interpolated handler, AppendFormUrlEncoded, AppendQueryParam w/ ref bool, AppendJoin(IEnumerable<string?>), span-based AppendFormat).
  • Update escaping/encoding fast paths and broaden test/benchmark coverage.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/ZaString/Core/ZaSpanString.cs Tightens index/advance validation and adds negative-length guard for pointer UTF-8 method.
src/ZaString/Core/ZaPooledStringBuilder.cs Adds disposal guards, new mutation helpers, TryAppend APIs, UTF-8 validation, and capacity growth updates.
src/ZaString/Core/ZaUtf8Handle.cs Switches handle to ref struct for byref-like safety.
src/ZaString/Extensions/ZaInterpolatedStringHandler.cs Adds alignment support for interpolated formatting.
src/ZaString/Extensions/ZaSpanStringBuilderExtensions.cs Adds new join/query/form-url helpers and optimizes escaping/encoding paths.
src/ZaString/Extensions/ZaUtf8SpanWriterExtensions.cs Uses span-based UTF-8 encoding overloads and simplifies char append.
src/ZaString/ZaString.csproj Updates CI build property and relies on MinVer config.
tests/ZaString.Tests/ZaUtf8SpanWriterTests.cs Improves exception assertions for buffer-too-small paths.
tests/ZaString.Tests/ZaSpanStringBuilderUtf8Tests.cs Adds negative-length UTF-8 pointer tests.
tests/ZaString.Tests/ZaSpanStringBuilderUrlHelpersTests.cs Adds ref-bool query param tests and form-url encoding tests.
tests/ZaString.Tests/ZaSpanStringBuilderInterpolationTests.cs Adds alignment test coverage for interpolation.
tests/ZaString.Tests/ZaSpanStringBuilderFormatTests.cs Adds span-based AppendFormat test coverage.
tests/ZaString.Tests/ZaSpanStringBuilderEscapingTests.cs Adds “insufficient capacity doesn’t modify” tests for escaping Try* APIs.
tests/ZaString.Tests/ZaSpanStringBuilderBasicTests.cs Adds Advance guards tests and updates indexer exception expectations.
tests/ZaString.Tests/ZaSpanStringBuilderAppendHelpersTests.cs Adds AppendJoin(IEnumerable<string?>) tests.
tests/ZaString.Tests/ZaPooledStringBuilderUtf8Tests.cs Adds negative-length pointer tests and a ref-struct-related test.
tests/ZaString.Tests/ZaPooledStringBuilderTests.cs Adds extensive disposal/mutation/formatting/overflow/indexer tests.
tests/ZaString.Benchmarks/StringBuildingBenchmarks.cs Benchmarks now return built strings for more realistic measurement.
.gitignore Ignores additional local tooling/worktree folders.
Comments suppressed due to low confidence (2)

src/ZaString/Extensions/ZaSpanStringBuilderExtensions.cs:364

  • TryAppendLine(string?) computes required = valueLength + newlineLength without overflow protection. With very large strings this can overflow and make the capacity check pass incorrectly, causing CopyTo to throw even though this is a Try* API. Use an overflow-safe check (e.g., if (valueLength > builder.RemainingSpan.Length - newlineLength) return false; with a guard for newlineLength > RemainingSpan.Length).
        // Note: .NET strings are immutable, so the length check and copy are atomic.
        var valueLength = value?.Length ?? 0;
        var newlineLength = Environment.NewLine.Length;
        var required = valueLength + newlineLength;

        if (required > builder.RemainingSpan.Length)
        {
            return false;
        }

src/ZaString/Core/ZaPooledStringBuilder.cs:180

  • EnsureCapacity verifies required won’t exceed Array.MaxLength, but newCapacity = Math.Max(required, _buffer.Length + (_buffer.Length / 2)) can still exceed Array.MaxLength when _buffer is already large. That can cause ArrayPool.Rent(newCapacity) to throw even though required is valid. Clamp newCapacity to Array.MaxLength (and ensure it’s still >= required).
    private void EnsureCapacity(int additionalRequired)
    {
        ArgumentOutOfRangeException.ThrowIfNegative(additionalRequired);

        if (Length > Array.MaxLength - additionalRequired)
            throw new ArgumentOutOfRangeException(nameof(additionalRequired), "Required capacity exceeds maximum array length.");

        var required = Length + additionalRequired;
        if (required <= _buffer.Length) return;

        var newCapacity = Math.Max(required, _buffer.Length + (_buffer.Length / 2));
        if (newCapacity < 256) newCapacity = 256;

        var newBuffer = _pool.Rent(newCapacity);
        _buffer.AsSpan(0, Length).CopyTo(newBuffer);
        _pool.Return(_buffer);
        _buffer = newBuffer;
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +662 to +679
// Fast path: check if any escaping is needed
var needsEscape = value.IndexOfAny("\"\\\b\f\n\r\t".AsSpan()) >= 0;
if (!needsEscape)
{
return false;
for (int i = 0; i < value.Length; i++)
{
if (value[i] < ' ')
{
needsEscape = true;
break;
}
}
}

if (required == value.Length)
if (!needsEscape)
{
value.CopyTo(builder.RemainingSpan);
builder.Advance(value.Length);
return true;
return builder.TryAppend(value);
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryAppendJsonEscaped's fast-path escape detection doesn’t account for U+2028 and U+2029, but the write/length logic does escape them. As a result, inputs containing those characters will incorrectly take the fast path and be appended unescaped. Include \u2028/\u2029 in the fast-path detection (or explicitly check for them) so behavior matches the escaping switch.

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +155
var remaining = _builder.RemainingSpan;
if (!value.TryFormat(remaining, out var charsWritten, format, _provider))
{
throw new ArgumentOutOfRangeException("value", "The destination buffer is too small.");
}

var padCount = alignment > 0 ? alignment - charsWritten : -alignment - charsWritten;

if (padCount <= 0)
{
_builder.Advance(charsWritten);
return;
}

if (alignment > 0)
{
if (remaining.Length < charsWritten + padCount)
{
throw new ArgumentOutOfRangeException("value", "The destination buffer is too small.");
}

for (var i = charsWritten - 1; i >= 0; i--)
{
remaining[i + padCount] = remaining[i];
}

for (var i = 0; i < padCount; i++)
{
remaining[i] = ' ';
}

_builder.Advance(charsWritten + padCount);
}
else
{
_builder.Advance(charsWritten);
_builder.AppendRepeat(' ', padCount);
}
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same partial-write issue as above for the (alignment, format) overload: on negative alignment, Advance(charsWritten) happens before padding is appended, so an out-of-range during padding can leave a partially appended value visible. Add a capacity pre-check for charsWritten + padCount (when padCount > 0) before advancing, and apply padding before updating the builder length.

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +106
var remaining = _builder.RemainingSpan;
if (!value.TryFormat(remaining, out var charsWritten, default, _provider))
{
throw new ArgumentOutOfRangeException("value", "The destination buffer is too small.");
}

var padCount = alignment > 0 ? alignment - charsWritten : -alignment - charsWritten;

if (padCount <= 0)
{
_builder.Advance(charsWritten);
return;
}

if (alignment > 0)
{
if (remaining.Length < charsWritten + padCount)
{
throw new ArgumentOutOfRangeException("value", "The destination buffer is too small.");
}

for (var i = charsWritten - 1; i >= 0; i--)
{
remaining[i + padCount] = remaining[i];
}

for (var i = 0; i < padCount; i++)
{
remaining[i] = ' ';
}

_builder.Advance(charsWritten + padCount);
}
else
{
_builder.Advance(charsWritten);
_builder.AppendRepeat(' ', padCount);
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the negative-alignment path, the handler advances the builder and then calls AppendRepeat for padding. If there isn’t enough space for the padding, AppendRepeat will throw after Advance(charsWritten), leaving the builder partially modified (formatted value appended without required spaces). Pre-check capacity for charsWritten + padCount before advancing, and write padding into the current remaining span so the operation is all-or-nothing when throwing for insufficient capacity.

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +161
/// <summary>
/// Attempts to append a read-only span without throwing.
/// </summary>
public bool TryAppend(ReadOnlySpan<char> value)
{
ThrowIfDisposed();
try
{
EnsureCapacity(value.Length);
value.CopyTo(_buffer.AsSpan(Length));
Length += value.Length;
return true;
}
catch
{
return false;
}
}

/// <summary>
/// Attempts to append a string without throwing.
/// </summary>
public bool TryAppend(string? value)
{
return value is null || TryAppend(value.AsSpan());
}

/// <summary>
/// Attempts to append a single character without throwing.
/// </summary>
public bool TryAppend(char value)
{
ThrowIfDisposed();
try
{
EnsureCapacity(1);
_buffer[Length++] = value;
return true;
}
catch
{
return false;
}
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryAppend(ReadOnlySpan<char>) and TryAppend(char) catch all exceptions and return false. This will also swallow non-recoverable exceptions (e.g., OutOfMemoryException) and can mask serious failures. Prefer explicit non-throwing checks (e.g., capacity/overflow guards) and/or catch only the specific expected exceptions (typically ArgumentOutOfRangeException) rather than catch {}.

Copilot uses AI. Check for mistakes.
Comment thread src/ZaString/Core/ZaUtf8Handle.cs
Comment on lines +140 to +157
public void ZaUtf8Handle_IsRefStruct_PreventsHeapAllocation()
{
// This test verifies that ZaUtf8Handle is a ref struct
// by ensuring it can be used with pattern-based disposal
// but cannot be stored in fields (compile-time check)
using var builder = ZaPooledStringBuilder.Rent();
builder.Append("Test");

using (var handle = builder.ToUtf8NullTerminated())
{
var span = handle.Span;
Assert.Equal(5, span.Length);
Assert.Equal((byte)'T', span[0]);
}

// After disposal, the handle should be invalid
// Since it's a ref struct, it can't be copied, preventing use-after-free
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn’t actually verify the claim that ZaUtf8Handle is a ref struct (the body would also pass for a regular struct with a Dispose method), and the comment “it can't be copied” is inaccurate (ref structs are still copyable as values). If you want a runtime assertion, you can check typeof(ZaUtf8Handle).IsByRefLike (and optionally add a compile-time-only snippet via conditional compilation) so the test meaning matches its name/comments.

Copilot uses AI. Check for mistakes.
Comment thread src/ZaString/Core/ZaSpanString.cs Outdated
CorentinGS and others added 6 commits April 25, 2026 08:36
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Fix RemoveLast extension method to use SetLength instead of Advance(-count)
which was throwing on negative values.

Make AppendQueryParam overloads atomic by precomputing required length,
checking space, writing, then advancing. The ref bool isFirst variant now
only mutates isFirst after successful write, preventing partial state on
buffer overflow.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
else
{
length += 9;
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetUrlEncodedLengthReplacingInvalid returns 9 chars for the general non-ASCII/non-surrogate-pair case. That treats valid BMP characters (e.g. 'é') as invalid and effectively encodes them as U+FFFD. This helper should only substitute for invalid surrogate code units; valid BMP scalars should compute 2- or 3-byte UTF-8 percent-encoded length.

Suggested change
length += 9;
length += char.IsSurrogate(c) ? 9 : (c <= 0x7FF ? 2 * 3 : 3 * 3);

Copilot uses AI. Check for mistakes.
Comment on lines +1346 to +1349
else
{
w += WriteReplacementChar(dest[w..]);
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteUrlEncodedReplacingInvalid falls back to WriteReplacementChar for any non-ASCII char that isn’t part of a surrogate pair. That will corrupt valid Unicode BMP input in query params. This branch should only replace invalid surrogate code units; valid BMP scalars should be UTF-8 percent-encoded (and match the length calculation).

Suggested change
else
{
w += WriteReplacementChar(dest[w..]);
}
else if (char.IsSurrogate(c))
{
w += WriteReplacementChar(dest[w..]);
}
else
{
w += PercentEncodeUtf8FromCodePoint(c, dest[w..]);
}

Copilot uses AI. Check for mistakes.
Comment on lines +1105 to +1108
else
{
w += WriteReplacementChar(dest[w..]);
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else branch replaces all non-ASCII, non-surrogate-pair characters with U+FFFD. That breaks form URL encoding for valid Unicode input (e.g. "café"). It should percent-encode valid BMP scalars as UTF-8 and only use the replacement sequence for invalid surrogate code units.

Suggested change
else
{
w += WriteReplacementChar(dest[w..]);
}
else if (char.IsSurrogate(c))
{
w += WriteReplacementChar(dest[w..]);
}
else
{
w += PercentEncodeUtf8FromCodePoint(c, dest[w..]);
}

Copilot uses AI. Check for mistakes.
Comment on lines 1000 to 1002
var required = GetUrlEncodedLength(value);
if (required > builder.RemainingSpan.Length)
{
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryAppendUrlEncoded still uses GetUrlEncodedLength/the existing write loop, which percent-encodes lone surrogate code units as if they were valid Unicode scalar values. That emits invalid UTF-8 sequences. Consider switching URL encoding to the same "replace invalid surrogates with U+FFFD" behavior introduced for query/form helpers and add a regression test for lone-surrogate input.

Copilot uses AI. Check for mistakes.
Comment on lines +24 to 25
public ref struct ZaUtf8Handle
{
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing ZaUtf8Handle to a ref struct is a breaking public API change (it can’t be stored in fields, used in async methods, passed through many generic APIs, etc.). If this is intentional, consider calling it out explicitly in release notes/versioning; otherwise consider an alternative that preserves the type shape while preventing double-dispose.

Copilot uses AI. Check for mistakes.
Comment on lines 61 to 63
if ((uint)index >= (uint)Length)
throw new IndexOutOfRangeException();
throw new ArgumentOutOfRangeException(nameof(index));
return ref _buffer[index];
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexer now throws ArgumentOutOfRangeException instead of IndexOutOfRangeException. This is a breaking behavioral change for consumers who catch IndexOutOfRangeException (and also differs from typical indexer semantics in .NET). If the new exception type is intentional, consider documenting it prominently as part of the public API contract/versioning.

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +121
[Fact]
public void AppendFormUrlEncoded_EncodesSpecialChars()
{
Span<char> buffer = stackalloc char[64];
var builder = ZaSpanStringBuilder.Create(buffer);
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no regression test covering valid non-ASCII characters for the new query/form encoding paths (e.g. key/value containing "café" or "€"). Adding one would help ensure only lone surrogates are replaced and valid Unicode scalars are UTF-8 percent-encoded.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants