Really advanced perl RegEx reference (车东[Blog^2])

* Samples
Swap two item
s/(\S+)\s+(\S+)/$2 $1/

Search C identifier
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/

Empty Line
/^$/

Word
\b\w+\b

* Questions

* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)

* Basic Structure

* Syntax
m/regex/ismx
s/regex/replacement/ismxg

* Flags
i case-insensitive
s single-line or dot-match-all (only affects .)
m multi-line (only ^ $)
x allows space and comment (perl specific)
g global subsitution

* Alternations
m/ABC|XYZ/

* Sequence
m/ABC/

* Repeatition
(agressive)
A = a? 0 or 1
a* 0 or more
a+ 1 or more
a{m} m
a{m,} m or more
a{m,n} m to n (inclusively)

(lazy)
a??
a*?
a+?
a{m}?
a{m,}?
a{m,n}?

aa
(a?)(a*) $1 => a a
(a??)(a*) $1 => "" aa

* Atoms
Character = a b c
Character Class
Escape = \ + non-alpha, such as \\, \+, \(, except reference
Meta Escape= \ + alpha[a-zA-Z]
Groups = (...)

* Character Class
[abc] [a-b] [^abc] [^abc0-9]
[- and [] are considered literal
[-a] = - or a
[^\-]

[[]
[]]
[ ]

* Posix Character Class
[[.a.]] collation
[[=a=]] equivalence
[[:alpha:]]

* Meta
. anything except newlines (normal mode)
. anything (s mode, singleline, dotall)
^ start of string, or start of line (m mode)
$ end of string (including newline), or end of line (m mode)

* Meta Escape
\t \n \r \f \a \e
\0nn \xnn
\cA (using algorithm ch ^ 0x40)
\cM
\N{name}
\l lowercase next char
\u uppercase next char
\L...\E lowercase until \E
\U...\E uppercase until \E
\Q...\E quote until \E
\w \W word char
\s \S space
\d \D digit
\b \B boundary
\p{property}
\P{property}
\X combining character sequence
\C single byte (perl)
\< start of word (emacs)
\> end of word (emacs)

* Groups
(abc) for capture group

* Special group
(?#comment)
(?imsx-imsx) embedded flags
(?:pattern) for non-capture
(?imsx-imsx:pattern) subpattern
(?=pattern) positive look ahead
(?!pattern) negative look ahead
(?<=pattern) positive look behind
(?

* Reference for capture
m/(x)\1/
s/(x)/$1$1/

* Traditional vs Extended
\{m,n\} vs {m,n}
$xxx$ vs (xxx)
Emacs is still using traditional regular expression

* Special extension
\< start of word (emacs)
\> end of word (emacs)

* New Lines

\n \v \r \r\n \f \x85 \x2028 \x2029 \x1A

* Samples
Swap two item
s/(\S+)\s+(\S+)/$2 $1/

Search C identifier
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/

Empty Line
/^$/

Word
\b\w+\b

* Questions

* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)

作者：车东发表于：2004-09-20 22:09 最后更新于：2007-04-15 19:04
版权声明：可以转载，转载时请务必以超链接形式标明文章 Really advanced perl RegEx reference 的原始出处和作者信息及本版权声明。
http://www.chedong.com/blog/archives/000631.html

« Flickr的网络收藏夹服务 | (回到Blog入口)|(回到首页) | BBS逐渐在Blog化 » [再编辑]

车东[Blog^2]

良好引用，良好结构，良好导航 Well referenced and well organized, with easy navigation

Really advanced perl RegEx reference

发表一个评论

搜索

相关文章

关于