reference_chars_soft.md
PCRE compared to other flavors
. - single dot
one char - it may not mach a newline (depending on s modifier settings) - like ? in the bash
echo "a" | perl -ne 'print if/a/;' # output: aecho "b" | perl -ne 'print if/a/;' # output:\ - escape chars
turn on/off an escape - you mostly don't need to use it in a class (except for ^ and -)
\b act as a backspace if used in a class, otherwise is \b a word boundary
echo "C1" | perl -ne 'print if(/C1\n/);' # output: C1
echo "the (cat)" | perl -ne 'print if(/the \(cat\)/);' # output: the (cat)
echo "(" | perl -pe 's/[(]/MAMMA/;' # output: MAMMAecho "the (cat)" | perl -ne 'print if(/the (cat)/);' # output:
echo "(" | perl -pe 's/(/MAMMA/;' # error: Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE\a alert, \b backspace, \e escape, \f form feed, \r carriage return, \t horizontal tab, \v vertical tab, \cchar control char (\cg), \number octal escape (\077), \xnumber hex escape (\xFF)
\w - one alphanumeric char
like [a-zA-z0-9_] and eventually other unicode letters (depending on your local environment)
echo "C1" | perl -ne 'print if(/\w/);' # output: C1echo "$&" | perl -ne 'print if(/\w/);' # output:\d - one number
like [0-9]
echo "123" | perl -ne 'print if(/\d/);' # output: 123echo "abc" | perl -ne 'print if(/\d/);' # output:\s - spaces tabs and newlines
like [ \t\n\r] - doesn't match a \v (vertical tab)
echo "1 2" | perl -ne 'print if(/\s/);' # output: 1 2\b \B - word boundary (or not)
since perl 5.8: fully support unicode - matches just a position not a char!! - means backspace if used in a class
egrep (usually): < (begin) >(end)
echo "il cane" | perl -ne 'print if(/il\b/);' # output: il caneecho "il cane" | perl -ne 'print if(/il\bcane/);' # output:\C - forced match of single byte (char)
not so smart if using utf-8 - lookbehind still not supported
^ \A - begin of a line/string
\A does not respect the m mode - ^ has a different meaning into a class.
$ \Z \z - end of a line/string
\Z does not respect the m mode -\z matches just after the last \n and not before
echo "duc" | perl -ne 'print if(/c$/);' # output: duc
echo "duc" | perl -ne 'print if(/c\Z/);' # output: duc
echo "duc" | perl -ne 'print if(/c\n\Z/);' # output: duc
echo "duc" | perl -ne 'print if(/c\n\z/);' # output: ducecho "duc" | perl -ne 'print if(/c\z/);' # output:| - alternate
echo "a" | perl -ne 'print if(/a|b/);' # output: aecho "d" | perl -ne 'print if(/a|b|c/);' # output:[ ] - create a class
inside a class only ^ and - are metachars
echo "a" | perl -ne 'print if(/[ab]/);' # output: aecho "A" | perl -ne 'print if(/[ab]/);' # output:echo "ca" | perl -ne 'print if(/[[a-z]&&[^aeiou]]/);');' # this only works in java regex engine
#for doing the same in perl use lookarounds:
echo "ca" | perl -ne 'print if(/(?![aeiou])[a-z]/);' + output: ca^ - something not in the class
Meaning of [^x] -> Match unless there is an x
^ has a different meaning out of a class (as explained above)
echo "Ce" | perl -ne 'print if(/C[^abc]/);' # output: Ceecho -n "C" | perl -ne 'print if(/C[^ABC]/);' # output:\W - something that is not an alphanumeric char
same as [^a-zA-Z0-9_] and eventually unicode letters depending on your local environment
\D - something that is not a number
like [^0-9]
\S - something that is not a \s
like [^\s]
- - defines a range
- has this meaning just in a class
echo "5" | perl -ne 'print if(/[4-6]/);' # output: 5echo "5" | perl -ne 'print if(/[6-8]/);' # output:
echo "5" | perl -ne 'print if(/[3-40]/);' # output:\l \u - fold next character's case
lowercase / uppercase next char
echo "AbC" | perl -pe 's/A\lBC/X/;' #output: X\Q....\E - Literal text span
\Q turns off every metachar, until \E - Supported only by java and perl - VB uses Regex escape method instead
echo "[" | perl -pe 's/\Q[/X/;' # output: X
echo "[g" | perl -pe 's/\Q[\E[a-z]*/X/;' # output: X\U \L ... \E - Case folding span useful to turn upper/lower case on/off on the fly - Make sense used with variables
echo "abcde" | perl -ne '$a="CD";print if/ab\L$a\Ee/;' # output: abcde