cpan-authors · toddr-bot · Apr 12, 2026
diff --git a/lib/Regexp/Parser.pm b/lib/Regexp/Parser.pm
@@ -474,6 +474,17 @@ value:
 
   my $capture_2 = $parser->captures(2);
 
+=head2 Getting Named Captures
+
+You can access the named capture groups with the named_captures() method:
+
+  my $all_named = $parser->named_captures();
+
+This returns a hash reference mapping capture names to their node objects.
+To look up a specific named capture:
+
+  my $node = $parser->named_captures('year');
+
 =head2 Walking the Tree
 
 To walk over the created tree, create an iterator with walker()Z<>:
@@ -708,251 +719,115 @@ Invalid [] range "%s-%s"
 
 =back
 
-=head1 EXTENSIONS
-
-Here are some ideas for extensions (sub-classes) for this module.  Some
-of them may be absorbed into the core functionality of F<Regexp::Parser>
-in the future.  Module names are merely the author's suggestions.
-
-=over 4
-
-=item Regexp::WordBounds
-
-Adds handlers for C<< < >> and C<< > >> anchors, which match at the
-beginning and end of a "word", respectively.  C<< /</ >> is equivalent to
-C</(?!\w)(?=\w)/>, and C<< />/ >> is equivalent to C</(?<=\w)(?!\w)/>. (So
-that's the object's qr() method for you right there!)
-
-=item Regexp::MinLength
-
-Implements a min_length() method for all objects that determines the
-minimum length of a string that would be matched by the regex; provides
-a front-end method for the parser.
-
-=item Regexp::QuantAttr
-
-Removes quantifiers as objects, and makes 'min' and 'max' attributes of
-other objects themselves.
-
-=item Regexp::Explain (pending, Jeff Pinyan)
-
-Produces a human-readable explanation of the execution of a regex.  Will
-be able to produce HTML output that color-codes the elements of the regex
-according to a style-sheet (syntax highlighting).
-
-=item Regexp::Reverse (difficulty rating: ****)
-
-Reverses a regex so it matches backwards.  Ex.: C</\s+$/> becomes
-C</^\n?\s+/>, which perhaps gets optimized to C</^\s+/>.  The difficulty
-rating is so high because of cases like C</(\d+)(\w+)/> which, when
-reversed, I<can> match differently.
-
-  "100years" =~ /(\d+)(\w+)/;  # $1 = 100, $2 = years
-  "sraey001" =~ /(\w+)(\d+)/;  # $1 = sraey00, $2 = 1
-
-This means character classes should store a hash of what characters
-they represent, as well as the macros C<\w>, C<\d>, etc.  Then this
-example would be reversed into something like C</(\w+(?<!\d))(\d+)/>.
-The other difficulty is complex regexes with if-then assertions.  I
-don't want to think about that.  This module is more of a theoretical
-exercise, a jump-start to built-in reversing capability in Perl.
-
-=item Regexp::CharClassOps
-
-Implements character class operations like union, intersection, and
-subtraction.
-
-=item Regexp::Optimize
-
-Eliminates redundancy from a regex.  It should have various options,
-such as whether to do optimize...
-
-  # strings
-  /foo|father|fort/  => /f(?:o(?:o|rt)|ather)/
-
-  # char classes
-  /[\w\d][a-zaeiou]/ => /[\w][a-z]/
-
-  # redundancy
-  /^\n?\s+/          => /^\s+/
-  /[\w]/             => /\w/
-
-There are other possibilities as well.
-
-=back
-
-=head1 HISTORY
-
-=head2 0.022b -- July 6, 2004
-
-=over 4
-
-=item Hierarchy Changes
-
-There are now abstract classes I<anchor> and I<assertion>. You can't call
-their new() method directly, you can only call it through an object that
-inherits from that class.
-
-There are no longer I<star>, I<plus>, and I<curly> classes; they have been
-combined into one class, I<quantifier>.  You pass it the min and max,
-and the object's C<type> is determined dynamically.
-
-=item Character Class Hashes
-
-Character classes (I<anyof> objects) now have another attribute, C<charmap>,
-which is a hash reference holding character values (eg. 65 for 'A') and
-the number of times that character appeared in the character class.  The
-character class C<[A-CB-E]> would have a character map of C<< { 65 => 1, 66
-=> 2, 67 => 2, 68 => 1, 69 => 1} >>.  This will reflect ranges and embedded
-classes (such as C<[:cntrl:]> or C<\p{Print}>.
-
-=item Character Class Rendering
-
-The visual() method of I<anyof> objects will quell the repetition of any
-character in the class I<outside> of embedded classes, so the class
-C<[\w\d:4-65:]> will render as C<[\w\d:4-6]>.  If you want to prevent
-characters and ranges from being display if they are included in an embedded
-class, set the I<anyof> object's C<strict> attribute to 1; the character
-class would render as C<[\w\d:]>.  If you want to go even further and remove
-any embedded class that is I<entirely> redundant (that is, I<every>
-character in that embedded class is already found in the class), set the
-C<strict> attribute to 2; the class above would render as C<[\w:]>.
-
-=back
+=head1 SUPPORTED CONSTRUCTS
 
-=head2 0.021 -- July 3, 2004
+This module supports parsing the following Perl regex constructs:
 
 =over 4
 
-=item I<anyof_class> Changed
-
-If an I<anyof_class> element is a Unicode property or a Perl class (like
-C<\w> or C<\S>), the object's C<data> field points to the underlying
-object type (I<prop>, I<alnum>, etc.).  If the element is a POSIX class,
-the C<data> field is the string "POSIX".  POSIX classes don't exist in a
-regex outside of a character class, so I'm a little wary of making them
-objects in their own right, even if it would create a better sense of
-uniformity.
-
-=item Documentation
-
-Fixed some poor wording, and documented the problem with using F<SUPER::>
-inside F<MyClass::__object__>.
-
-=item Bug Fixes
-
-Character classes weren't closing properly in the tree.  Fixed.
-
-Standard escapes (C<\a>, C<\e>, etc.) were being returned as I<exact>
-nodes instead of I<anyof_char> nodes when inside character classes.  Fixed.
-(Mike Lambert)
+=item Grouping
 
-Non-grouping parentheses weren't being parsed properly.  Fixed.  (Mike
-Lambert)
+C<(...)>, C<(?:...)>, C<< (?<name>...) >>,
+C<(?|...)> (branch reset), C<< (?>...) >> (atomic).
+Also supports Python-compatible C<(?P=name)> and C<< (?P>name) >> syntax.
 
-Flags weren't being turned off.  Fixed.
+=item Quantifiers
 
-=back
+C<*>, C<+>, C<?>, C<{n}>, C<{n,}>, C<{n,m}> -- with greedy (default),
+lazy (C<?>), and possessive (C<+>) variants
 
-=head2 0.02 -- July 1, 2004
+=item Assertions
 
-=over 4
+C<^>, C<$>, C<\b>, C<\B>, C<\A>, C<\Z>, C<\z>, C<\G>,
+C<\K> (keep), C<\b{type}> (extended boundaries)
 
-=item Better Abstracting
+=item Lookaround
 
-The object() method calls force_object().  force_object() creates an
-object no matter what pass the parser is making; object() will return
-immediately if it's just the first pass.  This means that force_object()
-should be used to create stand-alone objects.
+C<(?=...)>, C<(?!...)>, C<(?<=...)>, C<(?<!...)>,
+and alphabetic forms C<(*pla:...)>, C<(*nla:...)>, C<(*plb:...)>,
+C<(*nlb:...)>
 
-Each object now has an insert() method that defines how it gets placed
-into the regex tree.  Most objects inherit theirs from the base object
-class.
+=item Character classes
 
-The walker() method is also now abstracted -- each node it comes across
-will have its walk() method called.  And the ending node for stack-type
-nodes has been abstracted to the ender() method of the node.
+C<[...]>, C<[^...]>, POSIX classes C<[[:alpha:]]>,
+C<\d>, C<\D>, C<\w>, C<\W>, C<\s>, C<\S>, C<\h>, C<\H>, C<\v>, C<\V>,
+C<\R> (linebreak), C<\N> (non-newline), C<\X> (extended grapheme cluster),
+C<.> (any)
 
-The init() method has been moved to another file to help keep I<this>
-file as abstract as possible.  F<Regexp::Parser> installs its handlers
-in F<Regexp/Parser/Handlers.pm>.  That file might end up being where
-documentation on writing handlers goes.
+=item Unicode properties
 
-The documentation on sub-classing includes an ordered list of what
-packages a method is looked up in for a given object of type 'OBJ':
-F<YourMod::OBJ>, F<YourMod::__object__>, F<Regexp::Parser::OBJ>,
-F<Regexp::Parser::__object__>.
+C<\p{Name}>, C<\P{Name}>, C<\p{Script=Latin}>, etc.
 
-=item Cleaner Grammar Flow
+=item Escape sequences
 
-Now the only places 'atom' gets pushed to the queue are after an opening
-parenthesis or after 'atom' matches.  This makes things flow more
-cleanly.
+C<\a>, C<\e>, C<\f>, C<\n>, C<\r>, C<\t>,
+C<\xHH>, C<\x{HHHH}>, C<\NNN> (octal), C<\o{NNN}>,
+C<\cX> (control), C<\N{NAME}>, C<\N{U+HHHH}>
 
-=item Flag Handlers
+=item Backreferences
 
-Flag handlers now receive an additional argument that says whether
-they're being turned on or off.  Also, if the flag handler returns 0,
-that flag is removed from the resulting object's visual flag set.  That
-means C<(?gi-o)> becomes C<(?i)>.
+C<\1>..C<\9>, C<\g{N}>, C<\g{-N}>, C<\g{+N}>,
+C<< \k<name> >>, C<\k'name'>, C<\k{name}>,
+C<(?P=name)>
 
-=item Diagnostics and Bug Fixes
+=item Flags
 
-More tests added (specifically, making sure C<(?(N)T|F)> works right).
-In doing so, found that the "too many branches" error wasn't being raised
-until the second pass.  Figured out how to improve the grammar to get
-it to work properly.  Also added tests for the new captures() method.
+C<(?imsx...)>, C<(?-imsx...)>, C<(?^...)> (caret reset),
+C<(?a)>, C<(?aa)>, C<(?d)>, C<(?l)>, C<(?u)>, C<(?n)>, C</xx>
 
-I changed the field 'class' to 'family' in objects.  I was getting
-confused by it, so I figured it was a sign that I'd chosen an awful name
-for the field.  There will still be a class() method in F<__object__>,
-but it will throw a "use of class() is deprecated" warning.
+=item Conditionals
 
-Quantifiers of the form C<{n}> were being misrepresented as C<{n,}>.
-It's been corrected.  (Mike Lambert)
+C<(?(N)...|...)>, C<(?(DEFINE)...)>,
+C<< (?(<name>)...|...) >>, C<(?('name')...|...)>
 
-C<\b> was being turned into "b" inside a character class, instead of
-a backspace.  (Mike Lambert)
+=item Backtracking control
 
-Fixed errant "Quantifier unexpected" warning raised by a zero-width
-assertion followed by C<?>, which doesn't warrant the warning.
+C<(*ACCEPT)>, C<(*FAIL)>, C<(*F)>, C<(*MARK:name)>,
+C<(*PRUNE)>, C<(*SKIP)>, C<(*THEN)>, C<(*COMMIT)>
 
-Added "Unrecognized escape" warnings to I<all> escape sequence handlers.
+=item Recursive patterns
 
-The 'g', 'c', and 'o' flags now evoke "Useless ..." warnings when used
-in flag and non-capturing group constructs.
+C<(?R)>, C<(?N)>, C<(?&name)>, C<< (?P>name) >>
 
-=back
+=item Script runs
 
-=head2 0.01 -- June 29, 2004
+C<(*script_run:...)>, C<(*sr:...)>,
+C<(*atomic_script_run:...)>, C<(*asr:...)>
 
-=over 4
+=item Special
 
-=item First Release
-
-Documentation not complete, etc.
+C<(?{code})>, C<(??{code})> (opaque -- code is stored as string),
+C<(?[...])> (extended character class, opaque),
+C<(?#comment)>,
+C<\Q...\E> (quotemeta)
 
 =back
 
 =head1 CAVEATS
 
 =over 4
 
-=item * Bugs...?
+=item Two-pass parsing
 
-I'd like to say this module doesn't have bugs.  I don't know of any in
-this current version, because I've tried to fix those I've already
-found. Those who find bugs should email me.  Messages should include the
-code you ran that contains the bug, and your opinion on what's wrong
-with it.
+The parser uses a two-pass model: the first pass (via C<regex()>) checks
+structural validity; the second pass (triggered by C<root()>, C<visual()>,
+or C<parse()>) builds the object tree and checks semantics. Some errors
+(such as invalid backreferences) are only detected on the second pass.
 
-=item * Variable interpolation
+=item Variable interpolation
 
 This module parses I<regexes>, not Perl.  If you send a single-quoted
-string as a regex with a variable in it, that '$' will be interpreted as
-an anchor. If you want to include variables, use C<qr//>, or mix single-
+string as a regex with a variable in it, that C<$> will be interpreted as
+an anchor.  If you want to include variables, use C<qr//>, or mix single-
 and double-quoted strings in building your regex.
 
+=item Opaque constructs
+
+Code blocks C<(?{...})> and C<(??{...})> store their content as opaque
+strings.  Extended character classes C<(?[...])> are also stored as
+opaque strings -- their internal set operations are not decomposed into
+structured nodes.
+
 =back
 
 =head1 AUTHOR