Skip to content

Respect .editorconfig end_of_line and charset in generated code#2114

Open
YoelDruxman wants to merge 19 commits intoriok:mainfrom
YoelDruxman:feat/editorconfig-settings
Open

Respect .editorconfig end_of_line and charset in generated code#2114
YoelDruxman wants to merge 19 commits intoriok:mainfrom
YoelDruxman:feat/editorconfig-settings

Conversation

@YoelDruxman
Copy link

@YoelDruxman YoelDruxman commented Jan 14, 2026

Respect .editorconfig end_of_line and charset in generated code

Description

Generated mapper files (.g.cs) now respect .editorconfig settings for line endings and character encoding.

Problem

Generated code always used CRLF (\r\n) line endings and UTF-8 without BOM, regardless of the project's .editorconfig configuration. This caused issues for projects configured with different line endings.

Solution

Read end_of_line and charset settings from the source file's .editorconfig via AnalyzerConfigOptionsProvider and apply them to the generated output.

Supported settings:

  • end_of_line: lf, crlf, cr (defaults to crlf)
  • charset: utf-8, utf-8-bom, utf-16be, utf-16le (defaults to utf-8 without BOM)

Known Limitation

Due to a Roslyn limitation, editorconfig settings are read from the source file containing the mapper declaration, not from generated .g.cs files. This means [*.g.cs] patterns in .editorconfig will not be applied. Users should configure settings in [*.cs] or [*] sections.

Files changed:

  • LineEndingTextWriter.cs - Streaming line ending replacement
  • IncrementalValuesProviderExtensions.cs - Read editorconfig settings and apply encoding
  • MapperNode.cs - Store EndOfLine and Charset properties
  • MapperGenerator.cs - Pass settings from source file to output

Addresses this discussion

Checklist

  • The existing code style is followed
  • The commit message follows our guidelines
  • Performed a self-review of my code
  • Hard-to-understand areas of my code are commented
  • The documentation is updated (as applicable)
  • Unit tests are added/updated
  • Integration tests are added/updated (as applicable)

@YoelDruxman YoelDruxman force-pushed the feat/editorconfig-settings branch 6 times, most recently from 4199944 to 3194a3e Compare January 14, 2026 15:30
@latonz
Copy link
Contributor

latonz commented Jan 14, 2026

Thank you for this contribution. Before I start the review, I’d like to clarify a few open questions:

  • How do other source generators handle this? It seems like a challenge all source generators face.
  • What is the performance impact?
  • Would it make more sense to emit the correct line feeds directly in the syntax emitters instead of replacing them afterward? This would probably reduce the performance impact.

@YoelDruxman
Copy link
Author

YoelDruxman commented Jan 14, 2026

Thank you @latonz! I am still working on this. Let me consider your feedback and update the PR or at least report back.

@YoelDruxman YoelDruxman changed the title Respect .editorconfig end_of_line and charset in generated code Respect .editorconfig end_of_line and charset in generated code Jan 15, 2026
@YoelDruxman
Copy link
Author

@latonz I think this is ready for your review. Your thoughtful questions led to changing the implementation for the better:

  1. It seems that this is an unsolved problem.
  2. The performance has been improved to what I believe is almost transparent.
  3. This seems like more effort and difficulty than it is worth.

Here is a report I created with Claude that delves a little deeper.

Please let me know what you think. Any feedback is welcome! Thank you for this wonderful library!

YoelDruxman and others added 6 commits January 19, 2026 19:34
Generated mapper files (.g.cs) now honor .editorconfig settings:
- end_of_line: supports lf, crlf, cr (defaults to crlf)
- charset: supports utf-8, utf-8-bom, utf-16be, utf-16le (defaults to utf-8 without BOM)

This allows generated code to match the project's line ending and encoding
preferences, avoiding git diffs and post-processing requirements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use the mapper source file path (.cs) instead of trying to create a
synthetic tree with the generated file path (.g.cs). This provides
reliable editorconfig support with a documented limitation.

Note: [*.g.cs] patterns are not supported due to a Roslyn limitation.
Users should configure settings in [*.cs] or [*] sections.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of GetText().ToString().Replace() which creates 2 string
allocations for non-CRLF line endings, use SyntaxNode.WriteTo()
with a custom LineEndingTextWriter that replaces CRLF on-the-fly.

This reduces allocations from 2 to 1 for LF/CR configurations while
maintaining the fast path (no replacement) for CRLF (default).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clarifies that settings are read from the source file containing
the mapper declaration, not from generated files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@YoelDruxman YoelDruxman force-pushed the feat/editorconfig-settings branch from 72bed81 to 76d8d14 Compare January 20, 2026 00:34
@YoelDruxman YoelDruxman marked this pull request as ready for review January 20, 2026 00:34
Comment on lines 103 to 106
var text = GetSourceText(mapper.Body, mapper.EndOfLine);

// Always use SourceText.From to ensure BOM is included when specified
spc.AddSource(mapper.FileName, SourceText.From(text, encoding));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a performance penalty even when the feature is not used. Could you measure the performance impact compared to main using Riok.Mapperly.Benchmarks, both with and without custom EOL or encoding settings?

Copy link
Author

@YoelDruxman YoelDruxman Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to have the fast-track as requested, when measuring performance, there did not seem to be any impact in any cases (although I am not 100% confident).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you post the benchmark dotnet results?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the benchmark results:

BenchmarkDotNet v0.15.8, Windows 11
AMD Ryzen 9 7900X 4.70GHz, 1 CPU, 24 logical and 12 physical cores
.NET SDK 10.0.102
  [Host] : .NET 10.0.2, X64 RyuJIT x86-64-v4

| Method       | Mean      | Error     | StdDev    | Gen0      | Gen1     | Allocated   |
|------------- |----------:|----------:|----------:|----------:|---------:|------------:|
| Compile      |  1.340 ms | 0.0264 ms | 0.0475 ms |   46.8750 |  15.6250 |   818.41 KB |
| LargeCompile | 38.092 ms | 0.7134 ms | 0.6673 ms | 1384.6154 | 692.3077 | 23124.07 KB |

@latonz latonz added the enhancement New feature or request label Jan 20, 2026
YoelDruxman and others added 8 commits January 20, 2026 14:25
Address PR review comment riok#2: Remove redundant comments that don't
add value to the self-explanatory switch expression.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#3: Remove redundant null/empty check
since the switch expression's default arm handles these cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#6: Add using directive for
Microsoft.CodeAnalysis.CSharp.Syntax and use the simple type name.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ions

Address PR review comment riok#9: Extract EndOfLine and Charset properties
from MapperNode into a separate SourceTextConfig record for better
organization of source text formatting configuration.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#10: Move editorconfig testing infrastructure
from EndOfLineTest into shared TestHelper utilities.

- Add EditorConfigEndOfLine and EditorConfigCharset to TestHelperOptions
- Create TestAnalyzerConfigOptionsProvider for shared editorconfig testing
- Add TestHelper.GenerateSourceText() for encoding/line ending tests
- Simplify EndOfLineTest to use shared infrastructure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#8: Add comprehensive unit tests for the
LineEndingTextWriter class to verify CRLF to LF/CR conversion behavior.

Made LineEndingTextWriter public to allow direct testing, following the
pattern of other tested helper classes in the project.

Tests cover:
- CRLF to LF conversion
- CRLF to CR conversion
- Text with no line endings
- Standalone CR and LF preservation
- Mixed line endings
- Empty and null strings
- Consecutive CRLF sequences
- Trailing CR at end of text
- Char array writes
- Large text handling
- CRLF split across writes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comments riok#4 and riok#5: Add RMG095 and RMG096 diagnostics
to warn users when unknown charset or end_of_line values are specified
in .editorconfig.

- RMG095: Unknown charset value (valid: utf-8, utf-8-bom, utf-16be, utf-16le)
- RMG096: Unknown end_of_line value (valid: lf, crlf, cr)

Validation is performed during mapper generation and diagnostics are
reported at the mapper declaration location.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#1: Add a reference link in the EditorConfig
support section pointing to the analyzer diagnostics page for
configuring diagnostic severities via .editorconfig.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@latonz latonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates. I am deferring my full review until the performance benchmarks are done, as those metrics may necessitate adjustments to the current design (by using the correct newline when building the syntaxes).

Move new editorconfig diagnostics from Unshipped.md to Shipped.md
as this project tracks added/removed diagnostics via git history.
YoelDruxman and others added 3 commits January 21, 2026 09:08
Follow codebase convention of keeping fields at the top of the class.
Follow codebase convention of keeping fields at the top of the class.
…lues

SourceTextConfig now holds actual values (Encoding, line ending strings)
instead of raw editorconfig strings. Parsing and validation moved to
MapperGenerator.BuildSourceTextConfig. Fast path uses GetText directly
when no line ending conversion is needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@YoelDruxman
Copy link
Author

YoelDruxman commented Jan 22, 2026

@latonz Thank you for the review! I think I addressed all your comments. Please let me know if there is more to do. As I replied above, I was not able to detect any meaningful performance difference.

@YoelDruxman YoelDruxman force-pushed the feat/editorconfig-settings branch from ccbc82b to 11a2002 Compare January 22, 2026 21:42
if (value == null)
return;

foreach (var c in value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using foreach on a string allocates an enumerator. Consider using a for loop with indexing to avoid the allocation, especially since this method may be called frequently during code generation.

/// A test implementation of <see cref="AnalyzerConfigOptions"/> that provides
/// specific editorconfig values for testing.
/// </summary>
internal class TestAnalyzerConfigOptions : AnalyzerConfigOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codebase follows a file per class approach, please move it into its own file.


public TestAnalyzerConfigOptions(string? endOfLine, string? charset)
{
_options = new Dictionary<string, string?> { ["end_of_line"] = endOfLine, ["charset"] = charset }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dictionary initialization could be improved by using an immutable dictionary builder and avoiding the intermediate nullable dictionary.

Comment on lines +94 to +100
var sb = new StringBuilder();
using (var writer = new LineEndingTextWriter(sb, config.EndOfLine))
{
body.WriteTo(writer);
}

return SourceText.From(sb.ToString(), config.Encoding);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve performance, the initial capacity of the StringBuilder should be set.

Performance could probably be further improved by using a pipelines implementation using the SourceText.From overload that accepts a TextReader, which allocates less memory for large texts (I think over 40 KB) or a custom SourceText implementation. But I think these approaches are not worth it for now.


var endOfLine = endOfLineValue?.ToLowerInvariant() switch
{
null or "" => null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling on the reading side could be simplified by eliminating the null case by using \r\n here (combining it with the crlf case). ReportUnknownEndOfLine could also just return \r\n

"utf-8-bom" => _utf8WithBom,
"utf-16be" => Encoding.BigEndianUnicode,
"utf-16le" => Encoding.Unicode,
_ => ReportUnknownCharset(charsetValue!, diagnostics),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling on the reading side could be simplified by eliminating the null case by returning _utf8NoBom from ReportUnknownCharset.

generatedSource.ShouldContain(expectedLineEnding);

// After removing expected line endings, no line ending chars should remain
var normalized = generatedSource.Replace(expectedLineEnding, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This codebase uses string.Empty for empty string literals.

public static SourceText GenerateSourceText(string source, TestHelperOptions? options = null)
{
var driver = Generate(source, options);
return driver.GetRunResult().GeneratedTrees.First().GetText();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: wouldn't it be saver to use Single instead of First?

Comment on lines +115 to +119
// Create options provider for editorconfig settings if specified
var optionsProvider =
options.EditorConfigEndOfLine != null || options.EditorConfigCharset != null
? new TestAnalyzerConfigOptionsProvider(options.EditorConfigEndOfLine, options.EditorConfigCharset)
: null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the null checks really required? Wouldn't always using new TestAnalyzerConfigOptionsProvider(options.EditorConfigEndOfLine, options.EditorConfigCharset) lead to the same (TryGetValue returning false)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants