RegExReference/reference_chars_soft.md at master · galunni/RegExReference

reference_chars_soft.md

Regular Expression Reference

PCRE compared to other flavors

META CHARS SOFT

. - single dot
one char - it may not mach a newline (depending on s modifier settings) - like ? in the bash

echo "a" | perl -ne 'print if/a/;' # output: a

echo "b" | perl -ne 'print if/a/;' # output:

\ - escape chars
turn on/off an escape - you mostly don't need to use it in a class (except for ^ and -)
\b act as a backspace if used in a class, otherwise is \b a word boundary

echo "C1" | perl -ne 'print if(/C1\n/);' # output: C1
echo "the (cat)" | perl -ne 'print if(/the \(cat\)/);' # output: the (cat)
echo "(" | perl -pe 's/[(]/MAMMA/;' # output: MAMMA

echo "the (cat)" | perl -ne 'print if(/the (cat)/);' # output:
echo "(" | perl -pe 's/(/MAMMA/;' # error: Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE

\a alert, \b backspace, \e escape, \f form feed, \r carriage return, \t horizontal tab, \v vertical tab, \cchar control char (\cg), \number octal escape (\077), \xnumber hex escape (\xFF)

\w - one alphanumeric char
like [a-zA-z0-9_] and eventually other unicode letters (depending on your local environment)

echo "C1" | perl -ne 'print if(/\w/);' # output: C1

echo "$&" | perl -ne 'print if(/\w/);' # output:

\d - one number
like [0-9]

echo "123" | perl -ne 'print if(/\d/);' # output: 123

echo "abc" | perl -ne 'print if(/\d/);' # output:

\s - spaces tabs and newlines
like [ \t\n\r] - doesn't match a \v (vertical tab)

echo "1 2" | perl -ne 'print if(/\s/);' # output: 1 2

\b \B - word boundary (or not)
since perl 5.8: fully support unicode - matches just a position not a char!! - means backspace if used in a class
egrep (usually): < (begin) >(end)

echo "il cane" | perl -ne 'print if(/il\b/);' # output: il cane

echo "il cane" | perl -ne 'print if(/il\bcane/);' # output:

\C - forced match of single byte (char)
not so smart if using utf-8 - lookbehind still not supported

^ \A - begin of a line/string
\A does not respect the m mode - ^ has a different meaning into a class.

$ \Z \z - end of a line/string
\Z does not respect the m mode -\z matches just after the last \n and not before

echo "duc" | perl -ne 'print if(/c$/);' # output: duc
echo "duc" | perl -ne 'print if(/c\Z/);' # output: duc
echo "duc" | perl -ne 'print if(/c\n\Z/);' # output: duc
echo "duc" | perl -ne 'print if(/c\n\z/);' # output: duc

echo "duc" | perl -ne 'print if(/c\z/);' # output:

| - alternate

echo "a" | perl -ne 'print if(/a|b/);' # output: a

echo "d" | perl -ne 'print if(/a|b|c/);' # output:

[ ] - create a class
inside a class only ^ and - are metachars

echo "a" | perl -ne 'print if(/[ab]/);' # output: a

echo "A" | perl -ne 'print if(/[ab]/);' # output:

echo "ca" | perl -ne 'print if(/[[a-z]&&[^aeiou]]/);');' # this only works in java regex engine

#for doing the same in perl use lookarounds: 
echo "ca" | perl -ne 'print if(/(?![aeiou])[a-z]/);' + output: ca

^ - something not in the class
Meaning of [^x] -> Match unless there is an x ^ has a different meaning out of a class (as explained above)

echo "Ce" | perl -ne 'print if(/C[^abc]/);' # output: Ce

echo -n "C" | perl -ne 'print if(/C[^ABC]/);' # output:

\W - something that is not an alphanumeric char
same as [^a-zA-Z0-9_] and eventually unicode letters depending on your local environment

\D - something that is not a number
like [^0-9]

\S - something that is not a \s
like [^\s]

- - defines a range

has this meaning just in a class

echo "5" | perl -ne 'print if(/[4-6]/);' # output: 5

echo "5" | perl -ne 'print if(/[6-8]/);' # output:
echo "5" | perl -ne 'print if(/[3-40]/);' # output:

\l \u - fold next character's case
lowercase / uppercase next char

echo "AbC" | perl -pe 's/A\lBC/X/;' #output: X

\Q....\E - Literal text span
\Q turns off every metachar, until \E - Supported only by java and perl - VB uses Regex escape method instead

echo "[" | perl -pe 's/\Q[/X/;' # output: X
echo "[g" | perl -pe 's/\Q[\E[a-z]*/X/;' # output: X

\U \L ... \E - Case folding span useful to turn upper/lower case on/off on the fly - Make sense used with variables

echo "abcde" | perl -ne '$a="CD";print if/ab\L$a\Ee/;' # output: abcde

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regular Expression Reference

META CHARS SOFT

FilesExpand file tree

reference_chars_soft.md

Latest commit

History

reference_chars_soft.md

File metadata and controls

Regular Expression Reference

META CHARS SOFT