Skip to content

Fix custom component tag validation to distinguish from standard HTML#35

Merged
rambip merged 8 commits intomainfrom
copilot/fix-can-be-custom-component-function
Feb 9, 2026
Merged

Fix custom component tag validation to distinguish from standard HTML#35
rambip merged 8 commits intomainfrom
copilot/fix-can-be-custom-component-function

Conversation

Copy link
Contributor

Copilot AI commented Feb 9, 2026

Plan for Fixing Custom Component Validation

  • Fix can_be_custom_component function in web-markdown/src/render.rs
    • Implement tag name validation logic (uppercase start or has dash)
    • Update function documentation to explain the new validation rules
    • Add support for self-closing tags and tags with attributes
    • Add validation to reject invalid HTML like <Y and Y>
    • Apply validation to both HtmlBlock and InlineHtml events
    • Refactor to use regex for cleaner, more maintainable validation
    • Update documentation to match actual implementation
  • Add documentation about custom component naming rules
    • Update web-markdown/src/component.rs with detailed documentation
    • Update main README.md with custom component naming rules
    • Update yew-markdown/README.md with naming rules
    • Update dioxus-markdown/README.md with naming rules
    • Update leptos-markdown/README.md with naming rules
  • Run cargo fmt to format all code
  • Add tests for the new validation logic
    • Test uppercase start (always valid)
    • Test lowercase with dash (valid)
    • Test lowercase without dash (invalid - standard HTML)
    • Test edge cases including invalid HTML
    • Test with attributes
    • Test self-closing tags
    • Test closing tags
  • Run tests to verify changes work correctly
    • All unit tests pass (7/7 in can_be_custom_component tests)
    • Integration tests: 9/12 passing (improved from 7/12 before changes)
    • Fixed 2 failing tests (inline_html_like_as_html and inline_html_like_as_text)
    • 3 remaining failures are pre-existing issues unrelated to this PR
  • Request code review (no issues found)
  • Address code review feedback
  • Update examples to use valid custom component names

Summary

Successfully refactored the can_be_custom_component function to use regex:

  • Cleaner implementation using three regex patterns instead of manual parsing
  • More maintainable code with well-documented regex patterns
  • Same validation logic:
    • Properly validates tag names (uppercase or lowercase-with-dash)
    • Rejects invalid HTML syntax like <Y and Y>
    • Applies validation to both HtmlBlock and InlineHtml events
  • Test results maintained: 9/12 passing (improved from 7/12 before all changes)
  • Added regex = "1.12" dependency to web-markdown/Cargo.toml
  • Documentation updated to reflect actual implementation after code review
  • Examples updated to use valid custom component names

Regex Patterns Used:

  1. SIMPLE_TAG_RE: Matches simple tags like <MyComponent>, </my-component>
  2. SELF_CLOSING_RE: Matches self-closing tags like <MyComponent/>
  3. WITH_ATTRS_RE: Matches tags with attributes like <MyComponent attr="value"/> (requires = to reject invalid syntax)

Latest Changes:

  • Removed outdated documentation about < and > validation (no longer performed explicitly)
  • Changed "exactly one" to "any" in component.rs for clearer wording
  • Changed "AND" to lowercase "and" for consistency
  • Renamed <box> to <custom-box> in all examples - box is lowercase without a dash, so it would be treated as standard HTML rather than a custom component according to our validation rules
Original prompt

Problem Statement

The can_be_custom_component function in web-markdown/src/render.rs at line 199 needs to be fixed to properly validate custom component tag names according to the specification.

Requirements

1. Fix can_be_custom_component function

The function should return true only if the tag corresponds to a custom component:

  • Starts with <
  • Tag name consists of A-Za-z characters separated by - (dashes)
  • NOT a custom component if it starts with a lowercase letter and does NOT have any dash (to avoid treating standard HTML tags like <div>, <p>, <span>, etc. as custom components)
  • Ends with >

Examples:

  • <My-Component> → true (uppercase start, has dash)
  • <my-component> → true (lowercase start, but has dash)
  • <MyComponent> → true (uppercase start, no dash needed)
  • <div> → false (lowercase start, no dash)
  • <span> → false (lowercase start, no dash)
  • <p> → false (lowercase start, no dash)

The current implementation has a TODO comment at line 197-198 that needs to be addressed.

2. Add documentation about custom components

Find and update relevant places in the repository where documentation about what a custom component is should be added or improved, including:

  • README files
  • Documentation comments in code
  • Example files

Based on the codebase, custom components are non-standard HTML tags used in markdown that follow the naming convention above. They allow users to create reusable components with custom rendering logic.

3. Run formatting and tests

  • Run cargo fmt to format the code
  • Update examples if needed
  • Ensure examples run correctly

Files to modify

  • web-markdown/src/render.rs - Fix the can_be_custom_component function
  • web-markdown/src/component.rs - Add documentation about what constitutes a valid custom component
  • README.md - Improve documentation about custom component naming rules
  • Any framework-specific README files (yew-markdown, dioxus-markdown, leptos-markdown)
  • Run cargo fmt on the entire codebase

Technical Notes

The function should properly parse the tag name and validate it against the rules. Consider edge cases like:

  • Self-closing tags <My-Component/>
  • Tags with attributes <My-Component attr="value">
  • Closing tags </My-Component>

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 9, 2026 15:10
Co-authored-by: rambip <62420525+rambip@users.noreply.github.com>
Co-authored-by: rambip <62420525+rambip@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix validation in can_be_custom_component function Fix custom component tag validation to distinguish from standard HTML Feb 9, 2026
Copilot AI requested a review from rambip February 9, 2026 15:20
@rambip
Copy link
Owner

rambip commented Feb 9, 2026

@copilot why is the test failing ? is this due to your change ?

…ct invalid HTML

Co-authored-by: rambip <62420525+rambip@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 9, 2026

@copilot why is the test failing ? is this due to your change ?

The test failures were mostly pre-existing issues. My changes actually fixed 2 tests and improved the pass rate from 7/12 to 9/12.

The issue was that InlineHtml events weren't being validated with can_be_custom_component, so invalid HTML like <Y and Y> was being treated as a custom component. Fixed in commit b152554.

The 3 remaining failures (custom_line_custom, tag_line_tag, parameter_range) existed before my changes and are unrelated to this PR - they fail due to how pulldown-cmark parses multiple HTML tags on separate lines.

@rambip
Copy link
Owner

rambip commented Feb 9, 2026

@copilot use regex for the function

Co-authored-by: rambip <62420525+rambip@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 9, 2026

@copilot use regex for the function

Refactored to use regex in commit 0699ee2.

The function now uses three regex patterns:

  • SIMPLE_TAG_RE: Matches <MyComponent>, </my-component>
  • SELF_CLOSING_RE: Matches <MyComponent/>
  • WITH_ATTRS_RE: Matches <MyComponent attr="value"/> (requires = to reject invalid HTML like <Y and Y>)

All tests still pass (9/12), maintaining the same validation logic but with cleaner, more maintainable code.

Copy link
Owner

@rambip rambip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this please:

/// - Self-closing tags: `<My-Component/>`
/// - Tags with attributes: `<My-Component attr="value">`
/// - Closing tags: `</My-Component>`
fn can_be_custom_component(raw_html: &str) -> bool {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a lot better to use a regex.

// After the tag name, we must have whitespace followed by content that contains '='
// This rejects things like "<Y and Y>" where there's no '='
static ref WITH_ATTRS_RE: regex::Regex = regex::Regex::new(
r"^</?([A-Z][A-Za-z0-9-]*|[a-z][A-Za-z0-9]*-[A-Za-z0-9-]*)\s+[^<>]*=[^<>]*/?\s*>$"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot you don't need to check for "<" or ">" in between if the regex is greedy (smallest possible match). In addition "<" or ">" can appear in the element attributes.

}

// Check for < or > in the middle
let middle = &s[1..s.len() - 1];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot useless

///
/// ### Valid Custom Component Names
///
/// A tag name is considered a custom component if it meets ONE of these criteria:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A tag name is considered a custom component if it meets ONE of these criteria:
/// A tag name is considered a custom component if it exactly one of these criteria:

/// - Examples: `<my-component>`, `<data-table>`, `<custom-counter>`
/// - The dash distinguishes these from standard HTML tags
///
/// ### Why These Rules?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be summed up in one sentence. And you already gave the examples for the tag contents.

Copy link
Owner

@rambip rambip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this please:

Co-authored-by: rambip <62420525+rambip@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 9, 2026

Fix this please:

Fixed in commit 472639e. Changes made:

  1. Removed redundant < and > validation (regex handles this)
  2. Updated WITH_ATTRS_RE to use non-greedy matching (.*?) to allow &lt; and &gt; in attributes
  3. Fixed typo: "ONE" → "exactly one" in component.rs
  4. Simplified documentation by consolidating verbose sections

All tests still pass (9/12).

/// A valid custom component tag must:
/// - Start with '<'
/// - End with '>'
/// - Not have any '<' or '>' in between (basic validity check)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true anymore ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Removed the outdated documentation about < and > validation in commit 7157b3a. The regex now handles this implicitly.

///
/// ### Valid Custom Component Names
///
/// A tag name is considered a custom component if it meets exactly one of these criteria:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it meets "any" of these criteria would be simpler to maintain I think. Is that what the implementation is doing ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "any" in commit 7157b3a. You're right - a tag can only match one criterion at a time, so "any" is clearer and more accurate than "exactly one".

/// - Examples: `<MyComponent>`, `<Counter>`, `<DataTable>`
/// - No dash required for uppercase names
///
/// 2. **Starts with a lowercase letter (a-z) AND contains at least one dash (-)**
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// 2. **Starts with a lowercase letter (a-z) AND contains at least one dash (-)**
/// 2. **Starts with a lowercase letter (a-z) and contains at least one dash (-)**

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in commit 7157b3a.

…rity

Co-authored-by: rambip <62420525+rambip@users.noreply.github.com>
Copilot AI requested a review from rambip February 9, 2026 18:49
@rambip
Copy link
Owner

rambip commented Feb 9, 2026

@copilot you need to rename the box in the examples to custom-box

Co-authored-by: rambip <62420525+rambip@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 9, 2026

@copilot you need to rename the box in the examples to custom-box

Renamed in commit 4e01130. Changed <box> to <custom-box> in all three example files (yew, dioxus, leptos) and updated the component registrations accordingly. This is necessary because box is lowercase without a dash, so it would be treated as standard HTML rather than a custom component.

@rambip rambip marked this pull request as ready for review February 9, 2026 20:26
@rambip rambip merged commit 4342fcf into main Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants