Improve ZaString safety, correctness, and performance#27
Improve ZaString safety, correctness, and performance#27CorentinGS wants to merge 15 commits intomainfrom
Conversation
- Change IndexOutOfRangeException to ArgumentOutOfRangeException in ZaSpanString indexer - Add [DoesNotReturn] attribute to ThrowOutOfRangeException - Add alignment support overloads to ZaInterpolatedStringHandler - Add AppendFormUrlEncoded (spaces as +) and AppendQueryParam with ref bool isFirst - Add AppendJoin for IEnumerable<string?> and AppendFormat with ReadOnlySpan<char> format - Add SetLength, RemoveLast, indexer, and TryAppend to ZaPooledStringBuilder - Update tests for new exception types
- Document ReadOnlySpan<char> AppendFormat overloads as allocating - Add TODO for zero-allocation format parser - Add tests for interpolated string alignment - Add tests for AppendFormUrlEncoded - Add tests for AppendQueryParam with ref bool isFirst - Add tests for AppendJoin with IEnumerable<string?> - Add tests for ZaPooledStringBuilder SetLength, RemoveLast, indexer, TryAppend - Add tests for ReadOnlySpan<char> AppendFormat
… quoting, TryAppendLine atomicity comment, ZaSpanString XML docs, ZaPooledStringBuilder validation
…sertion modernization
…t netstandard2.1 exclusion
There was a problem hiding this comment.
Pull request overview
This PR strengthens ZaString’s span/pooled builder APIs around disposal, bounds checking, UTF-8 handling, and escaping, while expanding the fluent surface area (query params, form URL encoding, alignment, span-based formatting) and adding regression tests/benchmarks.
Changes:
- Harden pooled builder + UTF-8 APIs (dispose guards, negative-length handling, capacity growth tweaks, formatting retry limit).
- Expand fluent APIs (alignment in interpolated handler,
AppendFormUrlEncoded,AppendQueryParamw/ref bool,AppendJoin(IEnumerable<string?>), span-basedAppendFormat). - Update escaping/encoding fast paths and broaden test/benchmark coverage.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ZaString/Core/ZaSpanString.cs | Tightens index/advance validation and adds negative-length guard for pointer UTF-8 method. |
| src/ZaString/Core/ZaPooledStringBuilder.cs | Adds disposal guards, new mutation helpers, TryAppend APIs, UTF-8 validation, and capacity growth updates. |
| src/ZaString/Core/ZaUtf8Handle.cs | Switches handle to ref struct for byref-like safety. |
| src/ZaString/Extensions/ZaInterpolatedStringHandler.cs | Adds alignment support for interpolated formatting. |
| src/ZaString/Extensions/ZaSpanStringBuilderExtensions.cs | Adds new join/query/form-url helpers and optimizes escaping/encoding paths. |
| src/ZaString/Extensions/ZaUtf8SpanWriterExtensions.cs | Uses span-based UTF-8 encoding overloads and simplifies char append. |
| src/ZaString/ZaString.csproj | Updates CI build property and relies on MinVer config. |
| tests/ZaString.Tests/ZaUtf8SpanWriterTests.cs | Improves exception assertions for buffer-too-small paths. |
| tests/ZaString.Tests/ZaSpanStringBuilderUtf8Tests.cs | Adds negative-length UTF-8 pointer tests. |
| tests/ZaString.Tests/ZaSpanStringBuilderUrlHelpersTests.cs | Adds ref-bool query param tests and form-url encoding tests. |
| tests/ZaString.Tests/ZaSpanStringBuilderInterpolationTests.cs | Adds alignment test coverage for interpolation. |
| tests/ZaString.Tests/ZaSpanStringBuilderFormatTests.cs | Adds span-based AppendFormat test coverage. |
| tests/ZaString.Tests/ZaSpanStringBuilderEscapingTests.cs | Adds “insufficient capacity doesn’t modify” tests for escaping Try* APIs. |
| tests/ZaString.Tests/ZaSpanStringBuilderBasicTests.cs | Adds Advance guards tests and updates indexer exception expectations. |
| tests/ZaString.Tests/ZaSpanStringBuilderAppendHelpersTests.cs | Adds AppendJoin(IEnumerable<string?>) tests. |
| tests/ZaString.Tests/ZaPooledStringBuilderUtf8Tests.cs | Adds negative-length pointer tests and a ref-struct-related test. |
| tests/ZaString.Tests/ZaPooledStringBuilderTests.cs | Adds extensive disposal/mutation/formatting/overflow/indexer tests. |
| tests/ZaString.Benchmarks/StringBuildingBenchmarks.cs | Benchmarks now return built strings for more realistic measurement. |
| .gitignore | Ignores additional local tooling/worktree folders. |
Comments suppressed due to low confidence (2)
src/ZaString/Extensions/ZaSpanStringBuilderExtensions.cs:364
TryAppendLine(string?)computesrequired = valueLength + newlineLengthwithout overflow protection. With very large strings this can overflow and make the capacity check pass incorrectly, causingCopyToto throw even though this is aTry*API. Use an overflow-safe check (e.g.,if (valueLength > builder.RemainingSpan.Length - newlineLength) return false;with a guard fornewlineLength > RemainingSpan.Length).
// Note: .NET strings are immutable, so the length check and copy are atomic.
var valueLength = value?.Length ?? 0;
var newlineLength = Environment.NewLine.Length;
var required = valueLength + newlineLength;
if (required > builder.RemainingSpan.Length)
{
return false;
}
src/ZaString/Core/ZaPooledStringBuilder.cs:180
EnsureCapacityverifiesrequiredwon’t exceedArray.MaxLength, butnewCapacity = Math.Max(required, _buffer.Length + (_buffer.Length / 2))can still exceedArray.MaxLengthwhen_bufferis already large. That can causeArrayPool.Rent(newCapacity)to throw even thoughrequiredis valid. ClampnewCapacitytoArray.MaxLength(and ensure it’s still >=required).
private void EnsureCapacity(int additionalRequired)
{
ArgumentOutOfRangeException.ThrowIfNegative(additionalRequired);
if (Length > Array.MaxLength - additionalRequired)
throw new ArgumentOutOfRangeException(nameof(additionalRequired), "Required capacity exceeds maximum array length.");
var required = Length + additionalRequired;
if (required <= _buffer.Length) return;
var newCapacity = Math.Max(required, _buffer.Length + (_buffer.Length / 2));
if (newCapacity < 256) newCapacity = 256;
var newBuffer = _pool.Rent(newCapacity);
_buffer.AsSpan(0, Length).CopyTo(newBuffer);
_pool.Return(_buffer);
_buffer = newBuffer;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Fast path: check if any escaping is needed | ||
| var needsEscape = value.IndexOfAny("\"\\\b\f\n\r\t".AsSpan()) >= 0; | ||
| if (!needsEscape) | ||
| { | ||
| return false; | ||
| for (int i = 0; i < value.Length; i++) | ||
| { | ||
| if (value[i] < ' ') | ||
| { | ||
| needsEscape = true; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (required == value.Length) | ||
| if (!needsEscape) | ||
| { | ||
| value.CopyTo(builder.RemainingSpan); | ||
| builder.Advance(value.Length); | ||
| return true; | ||
| return builder.TryAppend(value); | ||
| } |
There was a problem hiding this comment.
TryAppendJsonEscaped's fast-path escape detection doesn’t account for U+2028 and U+2029, but the write/length logic does escape them. As a result, inputs containing those characters will incorrectly take the fast path and be appended unescaped. Include \u2028/\u2029 in the fast-path detection (or explicitly check for them) so behavior matches the escaping switch.
| var remaining = _builder.RemainingSpan; | ||
| if (!value.TryFormat(remaining, out var charsWritten, format, _provider)) | ||
| { | ||
| throw new ArgumentOutOfRangeException("value", "The destination buffer is too small."); | ||
| } | ||
|
|
||
| var padCount = alignment > 0 ? alignment - charsWritten : -alignment - charsWritten; | ||
|
|
||
| if (padCount <= 0) | ||
| { | ||
| _builder.Advance(charsWritten); | ||
| return; | ||
| } | ||
|
|
||
| if (alignment > 0) | ||
| { | ||
| if (remaining.Length < charsWritten + padCount) | ||
| { | ||
| throw new ArgumentOutOfRangeException("value", "The destination buffer is too small."); | ||
| } | ||
|
|
||
| for (var i = charsWritten - 1; i >= 0; i--) | ||
| { | ||
| remaining[i + padCount] = remaining[i]; | ||
| } | ||
|
|
||
| for (var i = 0; i < padCount; i++) | ||
| { | ||
| remaining[i] = ' '; | ||
| } | ||
|
|
||
| _builder.Advance(charsWritten + padCount); | ||
| } | ||
| else | ||
| { | ||
| _builder.Advance(charsWritten); | ||
| _builder.AppendRepeat(' ', padCount); | ||
| } | ||
| } |
There was a problem hiding this comment.
Same partial-write issue as above for the (alignment, format) overload: on negative alignment, Advance(charsWritten) happens before padding is appended, so an out-of-range during padding can leave a partially appended value visible. Add a capacity pre-check for charsWritten + padCount (when padCount > 0) before advancing, and apply padding before updating the builder length.
| var remaining = _builder.RemainingSpan; | ||
| if (!value.TryFormat(remaining, out var charsWritten, default, _provider)) | ||
| { | ||
| throw new ArgumentOutOfRangeException("value", "The destination buffer is too small."); | ||
| } | ||
|
|
||
| var padCount = alignment > 0 ? alignment - charsWritten : -alignment - charsWritten; | ||
|
|
||
| if (padCount <= 0) | ||
| { | ||
| _builder.Advance(charsWritten); | ||
| return; | ||
| } | ||
|
|
||
| if (alignment > 0) | ||
| { | ||
| if (remaining.Length < charsWritten + padCount) | ||
| { | ||
| throw new ArgumentOutOfRangeException("value", "The destination buffer is too small."); | ||
| } | ||
|
|
||
| for (var i = charsWritten - 1; i >= 0; i--) | ||
| { | ||
| remaining[i + padCount] = remaining[i]; | ||
| } | ||
|
|
||
| for (var i = 0; i < padCount; i++) | ||
| { | ||
| remaining[i] = ' '; | ||
| } | ||
|
|
||
| _builder.Advance(charsWritten + padCount); | ||
| } | ||
| else | ||
| { | ||
| _builder.Advance(charsWritten); | ||
| _builder.AppendRepeat(' ', padCount); | ||
| } |
There was a problem hiding this comment.
In the negative-alignment path, the handler advances the builder and then calls AppendRepeat for padding. If there isn’t enough space for the padding, AppendRepeat will throw after Advance(charsWritten), leaving the builder partially modified (formatted value appended without required spaces). Pre-check capacity for charsWritten + padCount before advancing, and write padding into the current remaining span so the operation is all-or-nothing when throwing for insufficient capacity.
| /// <summary> | ||
| /// Attempts to append a read-only span without throwing. | ||
| /// </summary> | ||
| public bool TryAppend(ReadOnlySpan<char> value) | ||
| { | ||
| ThrowIfDisposed(); | ||
| try | ||
| { | ||
| EnsureCapacity(value.Length); | ||
| value.CopyTo(_buffer.AsSpan(Length)); | ||
| Length += value.Length; | ||
| return true; | ||
| } | ||
| catch | ||
| { | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Attempts to append a string without throwing. | ||
| /// </summary> | ||
| public bool TryAppend(string? value) | ||
| { | ||
| return value is null || TryAppend(value.AsSpan()); | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Attempts to append a single character without throwing. | ||
| /// </summary> | ||
| public bool TryAppend(char value) | ||
| { | ||
| ThrowIfDisposed(); | ||
| try | ||
| { | ||
| EnsureCapacity(1); | ||
| _buffer[Length++] = value; | ||
| return true; | ||
| } | ||
| catch | ||
| { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
TryAppend(ReadOnlySpan<char>) and TryAppend(char) catch all exceptions and return false. This will also swallow non-recoverable exceptions (e.g., OutOfMemoryException) and can mask serious failures. Prefer explicit non-throwing checks (e.g., capacity/overflow guards) and/or catch only the specific expected exceptions (typically ArgumentOutOfRangeException) rather than catch {}.
| public void ZaUtf8Handle_IsRefStruct_PreventsHeapAllocation() | ||
| { | ||
| // This test verifies that ZaUtf8Handle is a ref struct | ||
| // by ensuring it can be used with pattern-based disposal | ||
| // but cannot be stored in fields (compile-time check) | ||
| using var builder = ZaPooledStringBuilder.Rent(); | ||
| builder.Append("Test"); | ||
|
|
||
| using (var handle = builder.ToUtf8NullTerminated()) | ||
| { | ||
| var span = handle.Span; | ||
| Assert.Equal(5, span.Length); | ||
| Assert.Equal((byte)'T', span[0]); | ||
| } | ||
|
|
||
| // After disposal, the handle should be invalid | ||
| // Since it's a ref struct, it can't be copied, preventing use-after-free | ||
| } |
There was a problem hiding this comment.
This test doesn’t actually verify the claim that ZaUtf8Handle is a ref struct (the body would also pass for a regular struct with a Dispose method), and the comment “it can't be copied” is inaccurate (ref structs are still copyable as values). If you want a runtime assertion, you can check typeof(ZaUtf8Handle).IsByRefLike (and optionally add a compile-time-only snippet via conditional compilation) so the test meaning matches its name/comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Fix RemoveLast extension method to use SetLength instead of Advance(-count) which was throwing on negative values. Make AppendQueryParam overloads atomic by precomputing required length, checking space, writing, then advancing. The ref bool isFirst variant now only mutates isFirst after successful write, preventing partial state on buffer overflow.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 25 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| else | ||
| { | ||
| length += 9; |
There was a problem hiding this comment.
GetUrlEncodedLengthReplacingInvalid returns 9 chars for the general non-ASCII/non-surrogate-pair case. That treats valid BMP characters (e.g. 'é') as invalid and effectively encodes them as U+FFFD. This helper should only substitute for invalid surrogate code units; valid BMP scalars should compute 2- or 3-byte UTF-8 percent-encoded length.
| length += 9; | |
| length += char.IsSurrogate(c) ? 9 : (c <= 0x7FF ? 2 * 3 : 3 * 3); |
| else | ||
| { | ||
| w += WriteReplacementChar(dest[w..]); | ||
| } |
There was a problem hiding this comment.
WriteUrlEncodedReplacingInvalid falls back to WriteReplacementChar for any non-ASCII char that isn’t part of a surrogate pair. That will corrupt valid Unicode BMP input in query params. This branch should only replace invalid surrogate code units; valid BMP scalars should be UTF-8 percent-encoded (and match the length calculation).
| else | |
| { | |
| w += WriteReplacementChar(dest[w..]); | |
| } | |
| else if (char.IsSurrogate(c)) | |
| { | |
| w += WriteReplacementChar(dest[w..]); | |
| } | |
| else | |
| { | |
| w += PercentEncodeUtf8FromCodePoint(c, dest[w..]); | |
| } |
| else | ||
| { | ||
| w += WriteReplacementChar(dest[w..]); | ||
| } |
There was a problem hiding this comment.
This else branch replaces all non-ASCII, non-surrogate-pair characters with U+FFFD. That breaks form URL encoding for valid Unicode input (e.g. "café"). It should percent-encode valid BMP scalars as UTF-8 and only use the replacement sequence for invalid surrogate code units.
| else | |
| { | |
| w += WriteReplacementChar(dest[w..]); | |
| } | |
| else if (char.IsSurrogate(c)) | |
| { | |
| w += WriteReplacementChar(dest[w..]); | |
| } | |
| else | |
| { | |
| w += PercentEncodeUtf8FromCodePoint(c, dest[w..]); | |
| } |
| var required = GetUrlEncodedLength(value); | ||
| if (required > builder.RemainingSpan.Length) | ||
| { |
There was a problem hiding this comment.
TryAppendUrlEncoded still uses GetUrlEncodedLength/the existing write loop, which percent-encodes lone surrogate code units as if they were valid Unicode scalar values. That emits invalid UTF-8 sequences. Consider switching URL encoding to the same "replace invalid surrogates with U+FFFD" behavior introduced for query/form helpers and add a regression test for lone-surrogate input.
| public ref struct ZaUtf8Handle | ||
| { |
There was a problem hiding this comment.
Changing ZaUtf8Handle to a ref struct is a breaking public API change (it can’t be stored in fields, used in async methods, passed through many generic APIs, etc.). If this is intentional, consider calling it out explicitly in release notes/versioning; otherwise consider an alternative that preserves the type shape while preventing double-dispose.
| if ((uint)index >= (uint)Length) | ||
| throw new IndexOutOfRangeException(); | ||
| throw new ArgumentOutOfRangeException(nameof(index)); | ||
| return ref _buffer[index]; |
There was a problem hiding this comment.
The indexer now throws ArgumentOutOfRangeException instead of IndexOutOfRangeException. This is a breaking behavioral change for consumers who catch IndexOutOfRangeException (and also differs from typical indexer semantics in .NET). If the new exception type is intentional, consider documenting it prominently as part of the public API contract/versioning.
| [Fact] | ||
| public void AppendFormUrlEncoded_EncodesSpecialChars() | ||
| { | ||
| Span<char> buffer = stackalloc char[64]; | ||
| var builder = ZaSpanStringBuilder.Create(buffer); |
There was a problem hiding this comment.
There’s no regression test covering valid non-ASCII characters for the new query/form encoding paths (e.g. key/value containing "café" or "€"). Adding one would help ensure only lone surrogates are replaced and valid Unicode scalars are UTF-8 percent-encoded.
Summary