Skip to content

TF-4524 Fix blank plain-text email caused by backslash-hex false-positive in HTML sanitizer #4527

Merged
hoangdat merged 8 commits into
masterfrom
bugfix/tf-4524-email-without-content
May 18, 2026
Merged

TF-4524 Fix blank plain-text email caused by backslash-hex false-positive in HTML sanitizer #4527
hoangdat merged 8 commits into
masterfrom
bugfix/tf-4524-email-without-content

Conversation

@dab246
Copy link
Copy Markdown
Member

@dab246 dab246 commented May 15, 2026

Issue

#4524

Problem

Plain-text emails containing \XX patterns — such as PHP/Go namespace paths like \Sabre\DAV\Exception\Forbidden — were rendered as a completely blank white screen.

Root cause was a four-level chain:

  1. StandardizeHtmlSanitizingTransformers (an HTML sanitizer) was incorrectly placed first in the textPlain pipeline — plain text is not HTML.
  2. NodeSanitizer._isEncodedJs() in dart-neats uses RegExp(r'\\[0-9a-f]{2}') to detect unicode-escape obfuscation. This matches any single \XX pair, so \DA in \Sabre\DAV triggers it.
  3. _shouldStripText() calls node.remove() → entire text node deleted.
  4. body.innerHtml becomes empty → blank screen.

Solution

Rebuilt the textPlain transformer pipeline with three dedicated layers:

Layer Transformer Role
1 SanitizeAutolinkHtmlTransformers HTML-escape all plain text before any parsing; autolink URLs
2 SanitizePlainTextHtmlOutputTransformer (new) Allow only <a> tags with safe href (validated via Uri.tryParse) and attribute allowlist; unwrap everything else
3 PersistPreformattedTextTransformer Preserve line breaks and spacing

The HTML sanitizer (StandardizeHtmlSanitizingTransformers) is removed from the plain-text path entirely.

It remains in the HTML path where it belongs.

Related

Demo

Screenshot 2026-05-15 at 13 35 08

Summary by CodeRabbit

  • Bug Fixes

    • Fixed plain-text emails becoming blank when containing certain backslash-hex escape patterns.
  • New Features

    • Enhanced HTML sanitization for plain-text emails with improved anchor tag validation, allowing only safe URL schemes (http, https, mailto) while removing dangerous attributes.
  • Tests

    • Added comprehensive test coverage for HTML transformation and email content sanitization across various scenarios and edge cases.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This pull request implements a defense-in-depth fix for a plain-text email rendering bug where certain backslash-hex escape patterns caused message bodies to become blank. The solution introduces a new SanitizePlainTextHtmlOutputTransformer that sanitizes HTML output by recursively processing the DOM: removing comments, unwrapping non-anchor elements, and sanitizing anchors through attribute allowlisting and href scheme validation. The transformer replaces a previous broad-regex-based sanitizer in the plain-text pipeline. Supporting changes include comprehensive test coverage across multiple integration layers, reusable test fixtures for email corpus data, a dependency update to fix the underlying unicode-escape regex, and minor integration test timing adjustments.

Suggested reviewers

  • hoangdat
  • codescene-delta-analysis
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the specific bug being fixed (blank plain-text email caused by backslash-hex false-positive in HTML sanitizer) and matches the changeset, which implements a comprehensive solution to address this issue.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/tf-4524-email-without-content

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/test/utils/sanitize_plain_text_html_output_transformer_test.dart`:
- Line 12: The shared test helper currently trims every transformer output via
the process helper: change String process(String input) =>
transformer.process(input, htmlEscape).trim(); to return the raw transformer
output without trimming so whitespace-preservation regressions are detectable;
locate and edit the process helper function and update any tests that depended
on trimmed output to explicitly call .trim() where appropriate (references:
process helper, transformer.process, htmlEscape).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7748a983-ddd9-4f5f-9941-a96e0a2239cb

📥 Commits

Reviewing files that changed from the base of the PR and between 2a19ffe and a780e99.

📒 Files selected for processing (4)
  • core/lib/presentation/utils/html_transformer/text/sanitize_plain_text_html_output_transformer.dart
  • core/test/utils/sanitize_plain_text_html_output_transformer_test.dart
  • docs/adr/0089-fix-plain-text-email-blank-on-backslash-hex-patterns.md
  • lib/features/email/data/local/html_analyzer.dart

Comment thread core/test/utils/sanitize_plain_text_html_output_transformer_test.dart Outdated
…lse-positive

Signed-off-by: dab246 <tdvu@linagora.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown

This PR has been deployed to https://linagora.github.io/tmail-flutter/4527.

…izer false-positive

Signed-off-by: dab246 <tdvu@linagora.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/test/utils/html_transform_text_plain_test.dart`:
- Around line 103-126: The tests use brittle substring checks like
contains('<script>'), contains('<img '), contains('<div ') which can miss
equivalent malicious variants; update the XSS assertions in the group to use
boundary-aware tag matchers (e.g., regex or token-boundary checks) against the
transform output so they reliably detect any opening tag forms (reference the
transform(...) calls in the three tests) — replace the plain contains assertions
with matchers that assert absence of an opening tag pattern like `<\s*script\b`,
`<\s*img\b`, and `<\s*div\b` (or equivalent boundary-based matchers) to ensure
robust detection.

In `@core/test/utils/plain_text_transformer_pipeline_test.dart`:
- Around line 113-117: The test "SHOULD escape and block <script> tags injected
in plain text" currently checks only for the exact substring "<script>", which
misses variants like "<script type='...'>"; update the assertion to use a regex
boundary check (e.g., look for "<script\b" case-insensitive) against the
pipeline output produced by runPipeline so the expect in that test asserts that
out does not match /<script\b/i, ensuring any <script ...> forms are caught;
keep the test name and use the existing runPipeline output variable.

In `@test/features/email/data/local/html_analyzer_text_plain_test.dart`:
- Around line 96-105: The tests for
transformPlain(TextPlainEmailCorpus.scriptTag) and
transformPlain(TextPlainEmailCorpus.imgTracker) only check for exact substrings
like "<script>" and "<img " which miss variants with attributes or different
spacing; change the assertions to assert the transformed content does not match
a tag-opening pattern instead (e.g. use a RegExp for "<\\s*script\\b" and
"<\\s*img\\b" or the language's equivalent negative match) so any "<script...>"
or "<img...>" or self-closing variants are caught; update the two tests to use
these regex-based non-matching checks against the returned .content from
transformPlain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d5f2217f-522b-4c35-bbdf-19724b17e6fa

📥 Commits

Reviewing files that changed from the base of the PR and between 122c840 and 5e1af6a.

📒 Files selected for processing (10)
  • core/test/fixtures/email_content_corpus.dart
  • core/test/fixtures/html_email_corpus.dart
  • core/test/utils/html_transform_text_html_test.dart
  • core/test/utils/html_transform_text_plain_test.dart
  • core/test/utils/plain_text_transformer_pipeline_test.dart
  • test/features/email/data/local/html_analyzer_text_html_test.dart
  • test/features/email/data/local/html_analyzer_text_plain_test.dart
  • test/fixtures/email_html_corpus.dart
  • test/fixtures/html_email_corpus.dart
  • test/fixtures/text_plain_email_corpus.dart
✅ Files skipped from review due to trivial changes (2)
  • test/fixtures/text_plain_email_corpus.dart
  • test/fixtures/email_html_corpus.dart

Comment thread core/test/utils/html_transform_text_plain_test.dart
Comment thread core/test/utils/plain_text_transformer_pipeline_test.dart Outdated
Comment thread test/features/email/data/local/html_analyzer_text_plain_test.dart
…L sanitizer false-positive

Signed-off-by: dab246 <tdvu@linagora.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/fixtures/text_plain_email_corpus.dart (1)

22-33: 💤 Low value

Consider using raw multiline strings for improved readability.

The current concatenation of raw strings (for backslash preservation) with regular strings (for newline interpretation) is correct but less readable than alternatives.

♻️ Refactor suggestion: use raw multiline strings
-  static const String phpStackTrace =
-      r'Fatal error: Uncaught \App\Http\Exception\NotFound' '\n'
-      r'Stack trace: `#0` \App\Router\Dispatcher::dispatch()' '\n'
-      'In file: /var/www/html/public/index.php on line 42';
+  static const String phpStackTrace = r'''Fatal error: Uncaught \App\Http\Exception\NotFound
+Stack trace: `#0` \App\Router\Dispatcher::dispatch()
+In file: /var/www/html/public/index.php on line 42''';

Apply the same pattern to phpNamespaceWithUrl:

-  static const String phpNamespaceWithUrl =
-      'On Monday, Alice wrote:\n'
-      r"throw new \App\DB\Exception\AuthFailed('access denied');" '\n'
-      'See https://jira.example.com/issues/1234 for context.';
+  static const String phpNamespaceWithUrl = r'''On Monday, Alice wrote:
+throw new \App\DB\Exception\AuthFailed('access denied');
+See https://jira.example.com/issues/1234 for context.''';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/fixtures/text_plain_email_corpus.dart` around lines 22 - 33, The
phpNamespaceWithUrl constant (and similar string constants like
phpStackTrace/windowsFilePath/goPackagePath) uses concatenated raw and regular
strings which is correct but hard to read; replace those concatenations with a
single raw multiline Dart string (e.g., r"""...""" or r'''...''') so backslashes
are preserved and newlines are explicit in the literal—update the declaration of
phpNamespaceWithUrl to use one raw multiline string containing the whole value
(including the \App\DB\Exception text, the newline, and the URL) and apply the
same pattern to phpStackTrace, windowsFilePath, and goPackagePath where
appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/fixtures/text_plain_email_corpus.dart`:
- Around line 22-33: The phpNamespaceWithUrl constant (and similar string
constants like phpStackTrace/windowsFilePath/goPackagePath) uses concatenated
raw and regular strings which is correct but hard to read; replace those
concatenations with a single raw multiline Dart string (e.g., r"""...""" or
r'''...''') so backslashes are preserved and newlines are explicit in the
literal—update the declaration of phpNamespaceWithUrl to use one raw multiline
string containing the whole value (including the \App\DB\Exception text, the
newline, and the URL) and apply the same pattern to phpStackTrace,
windowsFilePath, and goPackagePath where appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 60eea130-1482-43d0-947b-8800c002689f

📥 Commits

Reviewing files that changed from the base of the PR and between 5e1af6a and 89eec7a.

📒 Files selected for processing (2)
  • core/test/utils/standardize_html_sanitizing_transformers_test.dart
  • test/fixtures/text_plain_email_corpus.dart

Comment thread contact/pubspec.lock Outdated
Comment thread scribe/pubspec.lock Outdated
… by HTML sanitizer false-positive

Signed-off-by: dab246 <tdvu@linagora.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

@dab246 dab246 requested a review from chibenwa May 18, 2026 05:03
hoangdat
hoangdat previously approved these changes May 18, 2026
… caused by HTML sanitizer false-positive

Signed-off-by: dab246 <tdvu@linagora.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

…t email caused by HTML sanitizer false-positive

Signed-off-by: dab246 <tdvu@linagora.com>
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Passed
3 Quality Gates Passed

See analysis details in CodeScene

Quality Gate Profile: The Bare Minimum
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

@hoangdat hoangdat merged commit 6cfdef0 into master May 18, 2026
27 checks passed
@hoangdat hoangdat deleted the bugfix/tf-4524-email-without-content branch May 28, 2026 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants