Skip to content

Latest commit

 

History

History
192 lines (151 loc) · 4.61 KB

File metadata and controls

192 lines (151 loc) · 4.61 KB

reference_chars_soft.md

Regular Expression Reference

PCRE compared to other flavors

META CHARS SOFT

. - single dot
one char - it may not mach a newline (depending on s modifier settings) - like ? in the bash

echo "a" | perl -ne 'print if/a/;' # output: a
echo "b" | perl -ne 'print if/a/;' # output:

\ - escape chars
turn on/off an escape - you mostly don't need to use it in a class (except for ^ and -)
\b act as a backspace if used in a class, otherwise is \b a word boundary

echo "C1" | perl -ne 'print if(/C1\n/);' # output: C1
echo "the (cat)" | perl -ne 'print if(/the \(cat\)/);' # output: the (cat)
echo "(" | perl -pe 's/[(]/MAMMA/;' # output: MAMMA
echo "the (cat)" | perl -ne 'print if(/the (cat)/);' # output:
echo "(" | perl -pe 's/(/MAMMA/;' # error: Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE

\a alert, \b backspace, \e escape, \f form feed, \r carriage return, \t horizontal tab, \v vertical tab, \cchar control char (\cg), \number octal escape (\077), \xnumber hex escape (\xFF)


\w - one alphanumeric char
like [a-zA-z0-9_] and eventually other unicode letters (depending on your local environment)

echo "C1" | perl -ne 'print if(/\w/);' # output: C1
echo "$&" | perl -ne 'print if(/\w/);' # output:

\d - one number
like [0-9]

echo "123" | perl -ne 'print if(/\d/);' # output: 123
echo "abc" | perl -ne 'print if(/\d/);' # output:

\s - spaces tabs and newlines
like [ \t\n\r] - doesn't match a \v (vertical tab)

echo "1 2" | perl -ne 'print if(/\s/);' # output: 1 2

\b \B - word boundary (or not)
since perl 5.8: fully support unicode - matches just a position not a char!! - means backspace if used in a class
egrep (usually): < (begin) >(end)

echo "il cane" | perl -ne 'print if(/il\b/);' # output: il cane
echo "il cane" | perl -ne 'print if(/il\bcane/);' # output:

\C - forced match of single byte (char)
not so smart if using utf-8 - lookbehind still not supported


^ \A - begin of a line/string
\A does not respect the m mode - ^ has a different meaning into a class.


$ \Z \z - end of a line/string
\Z does not respect the m mode -\z matches just after the last \n and not before

echo "duc" | perl -ne 'print if(/c$/);' # output: duc
echo "duc" | perl -ne 'print if(/c\Z/);' # output: duc
echo "duc" | perl -ne 'print if(/c\n\Z/);' # output: duc
echo "duc" | perl -ne 'print if(/c\n\z/);' # output: duc
echo "duc" | perl -ne 'print if(/c\z/);' # output:

| - alternate

echo "a" | perl -ne 'print if(/a|b/);' # output: a
echo "d" | perl -ne 'print if(/a|b|c/);' # output:

[ ] - create a class
inside a class only ^ and - are metachars

echo "a" | perl -ne 'print if(/[ab]/);' # output: a
echo "A" | perl -ne 'print if(/[ab]/);' # output:
echo "ca" | perl -ne 'print if(/[[a-z]&&[^aeiou]]/);');' # this only works in java regex engine

#for doing the same in perl use lookarounds: 
echo "ca" | perl -ne 'print if(/(?![aeiou])[a-z]/);' + output: ca

^ - something not in the class
Meaning of [^x] -> Match unless there is an x ^ has a different meaning out of a class (as explained above)

echo "Ce" | perl -ne 'print if(/C[^abc]/);' # output: Ce
echo -n "C" | perl -ne 'print if(/C[^ABC]/);' # output:

\W - something that is not an alphanumeric char
same as [^a-zA-Z0-9_] and eventually unicode letters depending on your local environment


\D - something that is not a number
like [^0-9]


\S - something that is not a \s
like [^\s]


- - defines a range

  • has this meaning just in a class
echo "5" | perl -ne 'print if(/[4-6]/);' # output: 5
echo "5" | perl -ne 'print if(/[6-8]/);' # output:
echo "5" | perl -ne 'print if(/[3-40]/);' # output:

\l \u - fold next character's case
lowercase / uppercase next char

echo "AbC" | perl -pe 's/A\lBC/X/;' #output: X

\Q....\E - Literal text span
\Q turns off every metachar, until \E - Supported only by java and perl - VB uses Regex escape method instead

echo "[" | perl -pe 's/\Q[/X/;' # output: X
echo "[g" | perl -pe 's/\Q[\E[a-z]*/X/;' # output: X

\U \L ... \E - Case folding span useful to turn upper/lower case on/off on the fly - Make sense used with variables

echo "abcde" | perl -ne '$a="CD";print if/ab\L$a\Ee/;' # output: abcde