Skip to content

Latest commit

 

History

History
162 lines (129 loc) · 4.83 KB

File metadata and controls

162 lines (129 loc) · 4.83 KB

reference_parenthesis_operators.md

Regular Expression Reference

PCRE compared to other flavors

Parenthesis operators

(?#...) - comment

echo "abc" | perl -pe 's/a(?#this is just a comment)/A/;'  # output: Abc

(?:...) - turn off backreferences
do not apply backreference to this pharenthesis (faster)
helpful when you need grouping, quantifying or alternating, you do not need backreferences
and you want to increase matching speed

echo "abc" | perl -pe 's/(?:ab|c)/CASA/g;'	 # output: CASACASA

(?modifier-modifier) - turn on/off modifier
i, x, s, m are allowed

echo 'abc' | perl -ne 'if(/(?i)A(?-i)bc/){print $_;}'    # output: abc

in the past example the "ignore case" modifier was turned on and off inside the regex


(?modifier-modifier:....) - modifier span
works like the turn on/off modifier explained above, but the influence is only limited to the non capturing parenthesis

echo "the Dog" | perl -ne 'print if/(?s-i:the).*dog/i'   # output: 

in the example above the s modifier is temporary turned on inside the parenthesis and the i modifier is turned off


(?=...) - positive lookahead
match but do not consume

echo "My Name" | perl -pe 's/My (?=Name)/Your/g;'          # output: YourName
echo "Frau Guenda" | perl -pe 's/(?=Frau Gue)Frau/Herr/g;' # output: Herr Guenda
echo "the name" | perl -pe 's/(?=name)/best /g;'         # outupt: the best name

Notice how in the last example nothing was removed. The lookahead operator is useful there to find the right position for the substitution.


(?!...) - negative lookahead
negative match but do not consume

echo "Max Mad" | perl -pe 's/Max(?! Good)/Marc/g;'     # output: Marc Mad

(?<=...) - positive lookbehind
match but do not consume
only support static length regex

echo "ab cd ab ef" | perl -pe 's/(?<=cd )ab/X/g;'           # output: ab cd X ef
echo "ab cd ab ef" | perl -pe 's/ab(?<=cd ab)/X/g;'         # output: ab cd X ef
echo "Jeffs" | perl -pe 's/(?<=\bJeff)(?=s\b)/`/;'          # output: Jeff`s
echo "Jeffs" | perl -pe 's/(?=s\b)(?<=\bJeff)/`/;'          # output: Jeff`s
echo "12345678"|perl -pe 's/(?<=\d)(?=(?:\d{3})+$)/./g;'    # output: 12.345.678
echo "ab cd ab ef" | perl -pe 's/ab(?<=cd )/X/;'            # output: ab cd ab ef
echo "ab cd ab ef" | perl -pe 's/(?<=cd +)ab/X/g;'          # error in perl only static length is supported!
# Variable length lookbehind not implemented in regex m/(?<=cd +)ab/ at -e line 1.

(?<!...) - negative lookbehind
match but do not consume
only support static length regex

echo "abac" | perl -pe 's/(?<!b)a/X/g;'    # output: Xbac
echo "abac" | perl -pe 's/(?<!^)a/X/g;'    # output: abXc
echo "abac" | perl -pe 's/(?<!c)a/X/g;'    # output: XbXc

(?...) - named capture
giving names to backreferences provide a better overview
only supported by python and .NET

echo 'abc' | perl -ne 'if(/a(?<uno>.)c/){print $uno;}'  # .net (not working in perl)
echo 'abc' | perl -ne 'if(/a(?P<uno>.)c/){print $uno;}' # python (not working in perl)

(?{...}) - insert perl code
only supported by perl engine
interesting for debugging+learning purposes
avoid use in CGI for security reasons
WARNING: This extended regular expression feature is considered highly experimental, and may be changed or deleted without notice.

echo "mamma" | perl -pe 's/.(?{print "debug:-$&-\n";})/-$&/g;'
# output: debug:-m-
# output: debug:-a-
# output: debug:-m-
# output: debug:-m-
# output: debug:-a-
# output: -m-a-m-m-a

echo "mamma" | perl -ple 's/.(?{print "debug:-$`-$&-" . pos($_) . "-";})/X$&/g;'
# output: debug:--m-1-
# output: debug:-m-a-2-
# output: debug:-ma-m-3-
# output: debug:-mam-m-4-
# output: debug:-mamm-a-5-
# output: XmXaXmXmXa

---

**(??{...})	- dynamic regex**</br>

---

**(?if then|else)	- conditional**
not always supported</br>
```perl
echo "abc" | perl -pe 's/(?(?<=a)b|c)/X/;'    # output: aXc
echo "abc" | perl -pe 's/(?(?<=a).c|b)/X/;'   # output: aX

(?>...) - atomic grouping does not allow backtracking to the matches (increasing speed)
supported by: perl, java, .net, ruby
backrefs are turned off by this operator

perl -le 'print $& if "abcd" =~ /(?>\w+)/'    # output: abcd
perl -le 'print $& if "abcd" =~ /(?>\w+)d/'   # output: 

The example above uses atomic grouping.
The \w matches every letter in abcd to the end of the string.
Since backtracking is not allowed, when trying to match d the regex fails.
About atomic is not about matching faster, but if it has to fail, it fails faster.
Unallowing backtracking, you could let a regex fail, that normally would match (as in the last example).