* Samples
Swap two item
s/(\S+)\s+(\S+)/$2 $1/
Search C identifier
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/
Empty Line
/^$/
Word
\b\w+\b
* Questions
* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)
* Basic Structure
* Syntax
m/regex/ismx
s/regex/replacement/ismxg
* Flags
i case-insensitive
s single-line or dot-match-all (only affects .)
m multi-line (only ^ $)
x allows space and comment (perl specific)
g global subsitution
* Alternations
m/ABC|XYZ/
* Sequence
m/ABC/
* Repeatition
(agressive)
A = a? 0 or 1
a* 0 or more
a+ 1 or more
a{m} m
a{m,} m or more
a{m,n} m to n (inclusively)
(lazy)
a??
a*?
a+?
a{m}?
a{m,}?
a{m,n}?
aa
(a?)(a*) $1 => a a
(a??)(a*) $1 => "" aa
* Atoms
Character = a b c
Character Class
Escape = \ + non-alpha, such as \\, \+, \(, except reference
Meta Escape= \ + alpha[a-zA-Z]
Groups = (...)
* Character Class
[abc] [a-b] [^abc] [^abc0-9]
[- and [] are considered literal
[-a] = - or a
[^\-]
[[]
[]]
[ ]
* Posix Character Class
[[.a.]] collation
[[=a=]] equivalence
[[:alpha:]]
* Meta
. anything except newlines (normal mode)
. anything (s mode, singleline, dotall)
^ start of string, or start of line (m mode)
$ end of string (including newline), or end of line (m mode)
* Meta Escape
\t \n \r \f \a \e
\0nn \xnn
\cA (using algorithm ch ^ 0x40)
\cM
\N{name}
\l lowercase next char
\u uppercase next char
\L...\E lowercase until \E
\U...\E uppercase until \E
\Q...\E quote until \E
\w \W word char
\s \S space
\d \D digit
\b \B boundary
\p{property}
\P{property}
\X combining character sequence
\C single byte (perl)
\< start of word (emacs)
\> end of word (emacs)
* Groups
(abc) for capture group
* Special group
(?#comment)
(?imsx-imsx) embedded flags
(?:pattern) for non-capture
(?imsx-imsx:pattern) subpattern
(?=pattern) positive look ahead
(?!pattern) negative look ahead
(?<=pattern) positive look behind
(?
* Reference for capture
m/(x)\1/
s/(x)/$1$1/
* Traditional vs Extended
\{m,n\} vs {m,n}
\(xxx\) vs (xxx)
Emacs is still using traditional regular expression
* Special extension
\< start of word (emacs)
\> end of word (emacs)
* New Lines
\n \v \r \r\n \f \x85 \x2028 \x2029 \x1A
* Samples
Swap two item
s/(\S+)\s+(\S+)/$2 $1/
Search C identifier
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/
Empty Line
/^$/
Word
\b\w+\b
* Questions
* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)
版权声明:可以转载,转载时请务必以超链接形式标明文章 Really advanced perl RegEx reference 的原始出处和作者信息及本版权声明。
http://www.chedong.com/blog/archives/000631.html