Python正则速查

最新推荐文章于 2024-04-29 20:38:26 发布

csdn_moming

最新推荐文章于 2024-04-29 20:38:26 发布

阅读量667

点赞数 1

分类专栏： Python 编程文章标签： python 正则

本文链接：https://blog.csdn.net/csdn_moming/article/details/52458696

版权

Python 同时被 2 个专栏收录

5 篇文章

订阅专栏

编程

2 篇文章

订阅专栏

Python正则速查

特殊情况

‘\’: special characters
‘\\’: match a literal backslash

r’ ’ : Python’s raw string notation for regular expression patterns

‘\number’: ‘\x00’

Special characters:

• '.' : any character except a newline (DOTALL: newline)
• '^' : the start of the string (MULTILINE)
• '$' : the end of the string (MULTILINE)
• '*' : 0 or more repetitions of the preceding RE
• '+' : 1 or more repetitions of the preceding RE
• '?' : 0 or 1 repetitions of the preceding RE
• '*?', '+?', '??' : non-greedy
• {m} : specifies that exactly m copies of the previous RE should be matched
• {m, n} : match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible. {m, } and {,n} are useful
• {m, n}? : attempting to match as few repetitions as possible
• '/' : either escapes special characters, or signals a special sequence
• [] : used to indicate a set of characters
    ○ [amk] match 'a', 'm', 'k'
    ○ [a-z], [0-9], [0-9A-Fa-f]
    ○ Special characters lose their special meaning inside sets. [(*+)] match '(', ')', '+', '*'
    ○ Character classes like \w or \S are also accepted inside a set
    ○ [^5] match any character except '5'
    ○ [()[\]{}] and []()[{}] will both match a parenthesis
• '|' : A|B match either A or B (tried from left to right)
• (…) : matches whatever regular expression is inside the parentheses (can be retrieved after a match)
• (?...) : extension notation
    ○ (?iLmsux) : one or more letters from the set 'i', 'L', 'm', 's', 'u', 'x', the group matches the empty string
    ○ (?:…) : a non-capturing version of regular parentheses (the substring matched by the group cannot be retrieved after performing a match)
    ○ (?P<name>…) : each group name must be defined only once within a regular expression
    ○ (?P=name) : a backreference to a named group
    ○ (?#...) : a comment
    ○ (?=…) : matches if … matches next
    ○ (?!...) : matches if … doesn't match next
    ○ (?<=…) : a positive lookbehind assertion
    ○ (?<!...) : negative lookbehind assertion
    ○ (?(id/name)yes-pattern|no-pattern) : try to match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn't
• \number :  matches the contents of the group of the same number
• \A : matches only at the start of the string
• \b : matches the empty string, but only at the beginning or end of a word
• \B : matches the empty string, but only when it is not at the beginning or end of a word
• \d : decimal digit
• \D : non-digit character
• \s : any whitespace character
• \S : any non-whitespace character
• \w : alphanumeric character and the underscore
• \W : any non-alphanumeric character
• \Z : match only at the end of the string

Module contents

• re.compile(pattern, flags=0): compile a regular expression pattern, prepare for search and match
    ○ re.I : ignore case
    ○ re.L : locale dependent
    ○ re.M : Multi-line
    ○ re.S : dot matches all
    ○ re.U : Unicode dependent
    ○ re.X : verbose
    ○ re.DEBUG
• re.search(): look for the first location
• re.match(): at the beginning of string
• re.split(): split string
• re.findall(): all non-overlapping matches of pattern in string
• re.finditer(): return an iterator
• re.sub(pattern, rep1): replace the leftmost non-overlapping occurrences of pattern in string by the replacement rep1.
• re.subn(): return a tuple
• re.escape(): return a string with all non-alphanumerics backslashed
• re.purge(): clear the regualr expression cache
• re.error