Skip to content

Add comprehensive documentation for subroutine calls and recursion level backreferences#1

Draft
Copilot wants to merge 4 commits into
masterfrom
copilot/add-markdown-documentation
Draft

Add comprehensive documentation for subroutine calls and recursion level backreferences#1
Copilot wants to merge 4 commits into
masterfrom
copilot/add-markdown-documentation

Conversation

Copilot AI commented Nov 8, 2025

Copy link
Copy Markdown

Documents \g<...> subroutine calls and \k<name±level> recursion level backreferences for developers implementing these features in other regex engines without C/C++ background.

Changes

Added doc/SUBROUTINE_CALLS.md (912 lines)

  • Differentiates backreferences (match captured text) vs. subroutine calls (re-execute pattern)
  • Explains recursion levels: \k<name+0> (same level), \k<name+1> (parent), \k<name-1> (child)
  • Documents 10 edge cases: left-recursion prohibition, depth limits, duplicate name restrictions, capture scope in quantifiers
  • Provides Rust implementation guidance: call stack structures, capture-by-level indexing, memoization strategies
  • Includes 12 real-world examples: balanced parentheses, palindromes, nested XML tags, expressions
  • 4 Mermaid diagrams visualizing execution flow and recursion trees

Updated README.md

  • Links to new advanced features documentation

Key Examples

Palindrome with same-level backreference:

\A(?<p>|.|(?<c>.)\g<p>\k<c+0>)\z
# \k<c+0> references capture from THIS recursion level only
# Matches: "racecar", "aba"

Nested XML tags with cross-level backreference:

(?<etag> </ \k<name+1> > )
# \k<name+1> references tag name from parent recursion level
# Ensures opening/closing tags match in nested structures

Target Audience

Developers implementing regex engines in languages like Rust who need to understand capture behavior at recursion boundaries and edge cases around call stack management.

Original prompt

Please analyse the code and tests and create some documentation in Markdown format, explaining how subroutine calls and relative recursion level backreferences work. Give examples, try to use regex patterns which solve real problems, include mermaid diagrams if you judge them to be useful. Explain how subroutine calls affect capture groups, both in terms of backrefs during matching and accessing captures by group name or number programmatically after the match was successful. This document should be aimed at helping a developer with no C/C++ experience to understand main behavior and edge cases so they could build similar behavior into a different regex engine in another language like Rust.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits November 8, 2025 17:57
Co-authored-by: keith-hall <11882719+keith-hall@users.noreply.github.com>
…vel backreferences

Co-authored-by: keith-hall <11882719+keith-hall@users.noreply.github.com>
Co-authored-by: keith-hall <11882719+keith-hall@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Markdown documentation for subroutine calls and recursion Add comprehensive documentation for subroutine calls and recursion level backreferences Nov 8, 2025
Copilot AI requested a review from keith-hall November 8, 2025 18:10

@keith-hall keith-hall left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of places I could use a bit more clarification on please. See individual comments for details.

Comment thread doc/SUBROUTINE_CALLS.md
```

Behavior:
- **Backreferences**: Try groups in reverse order (most recent definition first)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bearing in mind that some regex patterns could have a backreference to groups with the same name, some defined to the right of the backref and some to the left, and could be part of a different group which is called as a recursive subexpression, what does most recent definition refer to here?

Comment thread doc/SUBROUTINE_CALLS.md
\n # Backreference group n (n >= 1)
\k<n> # Alternative syntax for group n
\k<-n> # Nth group counting backwards
\k<+n> # Nth group counting forwards

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give some practical examples of these Nth groups counting forwards/backwards please - does it differ when the \k<...> is inside a capture group etc.?

Comment thread doc/SUBROUTINE_CALLS.md

#### Example 5: Multiple Named Groups

When multiple groups share the same name, backreference tries them in reverse order (most recent first):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some more examples which involve backtracking, perhaps backtracking inside one of those groups, as well as when a less "recent" group needs to be matched as the backref for the whole thing to match overall etc.

Comment thread doc/SUBROUTINE_CALLS.md
```regex
\n # Backreference group n (n >= 1)
\k<n> # Alternative syntax for group n
\k<-n> # Nth group counting backwards

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give some practical examples of these Nth groups counting forwards/backwards please - does it differ when the \k<...> is inside a capture group etc.?

Comment thread doc/SUBROUTINE_CALLS.md
Comment on lines +396 to +397
Behavior:
- **Backreferences**: Try groups in reverse order (most recent definition first)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bearing in mind that some regex patterns could have a backreference to groups with the same name, some defined to the right of the backref and some to the left, and could be part of a different group which is called as a recursive subexpression, what does most recent definition refer to here?

Comment thread doc/SUBROUTINE_CALLS.md
Comment on lines +206 to +208
#### Example 5: Multiple Named Groups

When multiple groups share the same name, backreference tries them in reverse order (most recent first):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some more examples which involve backtracking, perhaps backtracking inside one of those groups, as well as when a less "recent" group needs to be matched as the backref for the whole thing to match overall etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants