Respect .editorconfig end_of_line and charset in generated code#2114
Respect .editorconfig end_of_line and charset in generated code#2114YoelDruxman wants to merge 19 commits intoriok:mainfrom
end_of_line and charset in generated code#2114Conversation
4199944 to
3194a3e
Compare
|
Thank you for this contribution. Before I start the review, I’d like to clarify a few open questions:
|
|
Thank you @latonz! I am still working on this. Let me consider your feedback and update the PR or at least report back. |
end_of_line and charset in generated code
|
@latonz I think this is ready for your review. Your thoughtful questions led to changing the implementation for the better:
Here is a report I created with Claude that delves a little deeper. Please let me know what you think. Any feedback is welcome! Thank you for this wonderful library! |
Generated mapper files (.g.cs) now honor .editorconfig settings: - end_of_line: supports lf, crlf, cr (defaults to crlf) - charset: supports utf-8, utf-8-bom, utf-16be, utf-16le (defaults to utf-8 without BOM) This allows generated code to match the project's line ending and encoding preferences, avoiding git diffs and post-processing requirements. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use the mapper source file path (.cs) instead of trying to create a synthetic tree with the generated file path (.g.cs). This provides reliable editorconfig support with a documented limitation. Note: [*.g.cs] patterns are not supported due to a Roslyn limitation. Users should configure settings in [*.cs] or [*] sections. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of GetText().ToString().Replace() which creates 2 string allocations for non-CRLF line endings, use SyntaxNode.WriteTo() with a custom LineEndingTextWriter that replaces CRLF on-the-fly. This reduces allocations from 2 to 1 for LF/CR configurations while maintaining the fast path (no replacement) for CRLF (default). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clarifies that settings are read from the source file containing the mapper declaration, not from generated files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
72bed81 to
76d8d14
Compare
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
| var text = GetSourceText(mapper.Body, mapper.EndOfLine); | ||
|
|
||
| // Always use SourceText.From to ensure BOM is included when specified | ||
| spc.AddSource(mapper.FileName, SourceText.From(text, encoding)); |
There was a problem hiding this comment.
This introduces a performance penalty even when the feature is not used. Could you measure the performance impact compared to main using Riok.Mapperly.Benchmarks, both with and without custom EOL or encoding settings?
There was a problem hiding this comment.
I updated the code to have the fast-track as requested, when measuring performance, there did not seem to be any impact in any cases (although I am not 100% confident).
There was a problem hiding this comment.
Could you post the benchmark dotnet results?
There was a problem hiding this comment.
Here are the benchmark results:
BenchmarkDotNet v0.15.8, Windows 11
AMD Ryzen 9 7900X 4.70GHz, 1 CPU, 24 logical and 12 physical cores
.NET SDK 10.0.102
[Host] : .NET 10.0.2, X64 RyuJIT x86-64-v4
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|------------- |----------:|----------:|----------:|----------:|---------:|------------:|
| Compile | 1.340 ms | 0.0264 ms | 0.0475 ms | 46.8750 | 15.6250 | 818.41 KB |
| LargeCompile | 38.092 ms | 0.7134 ms | 0.6673 ms | 1384.6154 | 692.3077 | 23124.07 KB |
Address PR review comment riok#2: Remove redundant comments that don't add value to the self-explanatory switch expression. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#3: Remove redundant null/empty check since the switch expression's default arm handles these cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#6: Add using directive for Microsoft.CodeAnalysis.CSharp.Syntax and use the simple type name. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ions Address PR review comment riok#9: Extract EndOfLine and Charset properties from MapperNode into a separate SourceTextConfig record for better organization of source text formatting configuration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#10: Move editorconfig testing infrastructure from EndOfLineTest into shared TestHelper utilities. - Add EditorConfigEndOfLine and EditorConfigCharset to TestHelperOptions - Create TestAnalyzerConfigOptionsProvider for shared editorconfig testing - Add TestHelper.GenerateSourceText() for encoding/line ending tests - Simplify EndOfLineTest to use shared infrastructure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#8: Add comprehensive unit tests for the LineEndingTextWriter class to verify CRLF to LF/CR conversion behavior. Made LineEndingTextWriter public to allow direct testing, following the pattern of other tested helper classes in the project. Tests cover: - CRLF to LF conversion - CRLF to CR conversion - Text with no line endings - Standalone CR and LF preservation - Mixed line endings - Empty and null strings - Consecutive CRLF sequences - Trailing CR at end of text - Char array writes - Large text handling - CRLF split across writes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comments riok#4 and riok#5: Add RMG095 and RMG096 diagnostics to warn users when unknown charset or end_of_line values are specified in .editorconfig. - RMG095: Unknown charset value (valid: utf-8, utf-8-bom, utf-16be, utf-16le) - RMG096: Unknown end_of_line value (valid: lf, crlf, cr) Validation is performed during mapper generation and diagnostics are reported at the mapper declaration location. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#1: Add a reference link in the EditorConfig support section pointing to the analyzer diagnostics page for configuring diagnostic severities via .editorconfig. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
latonz
left a comment
There was a problem hiding this comment.
Thank you for the updates. I am deferring my full review until the performance benchmarks are done, as those metrics may necessitate adjustments to the current design (by using the correct newline when building the syntaxes).
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
Move new editorconfig diagnostics from Unshipped.md to Shipped.md as this project tracks added/removed diagnostics via git history.
Follow codebase convention of keeping fields at the top of the class.
Follow codebase convention of keeping fields at the top of the class.
…lues SourceTextConfig now holds actual values (Encoding, line ending strings) instead of raw editorconfig strings. Parsing and validation moved to MapperGenerator.BuildSourceTextConfig. Fast path uses GetText directly when no line ending conversion is needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@latonz Thank you for the review! I think I addressed all your comments. Please let me know if there is more to do. As I replied above, I was not able to detect any meaningful performance difference. |
ccbc82b to
11a2002
Compare
| if (value == null) | ||
| return; | ||
|
|
||
| foreach (var c in value) |
There was a problem hiding this comment.
Using foreach on a string allocates an enumerator. Consider using a for loop with indexing to avoid the allocation, especially since this method may be called frequently during code generation.
| /// A test implementation of <see cref="AnalyzerConfigOptions"/> that provides | ||
| /// specific editorconfig values for testing. | ||
| /// </summary> | ||
| internal class TestAnalyzerConfigOptions : AnalyzerConfigOptions |
There was a problem hiding this comment.
This codebase follows a file per class approach, please move it into its own file.
|
|
||
| public TestAnalyzerConfigOptions(string? endOfLine, string? charset) | ||
| { | ||
| _options = new Dictionary<string, string?> { ["end_of_line"] = endOfLine, ["charset"] = charset } |
There was a problem hiding this comment.
The dictionary initialization could be improved by using an immutable dictionary builder and avoiding the intermediate nullable dictionary.
| var sb = new StringBuilder(); | ||
| using (var writer = new LineEndingTextWriter(sb, config.EndOfLine)) | ||
| { | ||
| body.WriteTo(writer); | ||
| } | ||
|
|
||
| return SourceText.From(sb.ToString(), config.Encoding); |
There was a problem hiding this comment.
To improve performance, the initial capacity of the StringBuilder should be set.
Performance could probably be further improved by using a pipelines implementation using the SourceText.From overload that accepts a TextReader, which allocates less memory for large texts (I think over 40 KB) or a custom SourceText implementation. But I think these approaches are not worth it for now.
|
|
||
| var endOfLine = endOfLineValue?.ToLowerInvariant() switch | ||
| { | ||
| null or "" => null, |
There was a problem hiding this comment.
The handling on the reading side could be simplified by eliminating the null case by using \r\n here (combining it with the crlf case). ReportUnknownEndOfLine could also just return \r\n
| "utf-8-bom" => _utf8WithBom, | ||
| "utf-16be" => Encoding.BigEndianUnicode, | ||
| "utf-16le" => Encoding.Unicode, | ||
| _ => ReportUnknownCharset(charsetValue!, diagnostics), |
There was a problem hiding this comment.
The handling on the reading side could be simplified by eliminating the null case by returning _utf8NoBom from ReportUnknownCharset.
| generatedSource.ShouldContain(expectedLineEnding); | ||
|
|
||
| // After removing expected line endings, no line ending chars should remain | ||
| var normalized = generatedSource.Replace(expectedLineEnding, ""); |
There was a problem hiding this comment.
Nit: This codebase uses string.Empty for empty string literals.
| public static SourceText GenerateSourceText(string source, TestHelperOptions? options = null) | ||
| { | ||
| var driver = Generate(source, options); | ||
| return driver.GetRunResult().GeneratedTrees.First().GetText(); |
There was a problem hiding this comment.
Nit: wouldn't it be saver to use Single instead of First?
| // Create options provider for editorconfig settings if specified | ||
| var optionsProvider = | ||
| options.EditorConfigEndOfLine != null || options.EditorConfigCharset != null | ||
| ? new TestAnalyzerConfigOptionsProvider(options.EditorConfigEndOfLine, options.EditorConfigCharset) | ||
| : null; |
There was a problem hiding this comment.
Are the null checks really required? Wouldn't always using new TestAnalyzerConfigOptionsProvider(options.EditorConfigEndOfLine, options.EditorConfigCharset) lead to the same (TryGetValue returning false)?
Respect .editorconfig
end_of_lineandcharsetin generated codeDescription
Generated mapper files (
.g.cs) now respect.editorconfigsettings for line endings and character encoding.Problem
Generated code always used CRLF (
\r\n) line endings and UTF-8 without BOM, regardless of the project's.editorconfigconfiguration. This caused issues for projects configured with different line endings.Solution
Read
end_of_lineandcharsetsettings from the source file's.editorconfigviaAnalyzerConfigOptionsProviderand apply them to the generated output.Supported settings:
end_of_line:lf,crlf,cr(defaults tocrlf)charset:utf-8,utf-8-bom,utf-16be,utf-16le(defaults toutf-8without BOM)Known Limitation
Due to a Roslyn limitation, editorconfig settings are read from the source file containing the mapper declaration, not from generated
.g.csfiles. This means[*.g.cs]patterns in.editorconfigwill not be applied. Users should configure settings in[*.cs]or[*]sections.Files changed:
LineEndingTextWriter.cs- Streaming line ending replacementIncrementalValuesProviderExtensions.cs- Read editorconfig settings and apply encodingMapperNode.cs- StoreEndOfLineandCharsetpropertiesMapperGenerator.cs- Pass settings from source file to outputAddresses this discussion
Checklist