當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

python正则匹配_python 正则表达式详解

發(fā)布時(shí)間：2025/3/20 python 20 豆豆

生活随笔收集整理的這篇文章主要介紹了 python正则匹配_python 正则表达式详解小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

正則表達(dá)式是一個(gè)很強(qiáng)大的字符串處理工具，幾乎任何關(guān)于字符串的操作都可以使用正則表達(dá)式來(lái)完成，作為一個(gè)爬蟲(chóng)工作者，每天和字符串打交道，正則表達(dá)式更是不可或缺的技能，正則表達(dá)式的在不同的語(yǔ)言中使用方式可能不一樣，不過(guò)只要學(xué)會(huì)了任意一門(mén)語(yǔ)言的正則表達(dá)式用法，其他語(yǔ)言中大部分也只是換了個(gè)函數(shù)的名稱(chēng)而已，本質(zhì)都是一樣的。下面，我來(lái)介紹一下python中的正則表達(dá)式是怎么使用的。

首先，python中的正則表達(dá)式大致分為以下幾部分：

元字符

模式

函數(shù)

re 內(nèi)置對(duì)象用法

分組用法

環(huán)視用法

所有關(guān)于正則表達(dá)式的操作都使用 python 標(biāo)準(zhǔn)庫(kù)中的 re 模塊。

一、元字符（參見(jiàn) python 模塊 re 文檔）

. 匹配任意字符（不包括換行符）

^ 匹配開(kāi)始位置，多行模式下匹配每一行的開(kāi)始

$ 匹配結(jié)束位置，多行模式下匹配每一行的結(jié)束

* 匹配前一個(gè)元字符0到多次

+ 匹配前一個(gè)元字符1到多次

? 匹配前一個(gè)元字符0到1次

{m,n} 匹配前一個(gè)元字符m到n次

\\ 轉(zhuǎn)義字符，跟在其后的字符將失去作為特殊元字符的含義，例如\\.只能匹配.，不能再匹配任意字符

[] 字符集，一個(gè)字符的集合，可匹配其中任意一個(gè)字符

| 邏輯表達(dá)式或，比如 a|b 代表可匹配 a 或者 b

(...) 分組，默認(rèn)為捕獲，即被分組的內(nèi)容可以被單獨(dú)取出，默認(rèn)每個(gè)分組有個(gè)索引，從 1 開(kāi)始，按照"("的順序決定索引值

(?iLmsux) 分組中可以設(shè)置模式，iLmsux之中的每個(gè)字符代表一個(gè)模式,用法參見(jiàn) 模式 I

(?:...) 分組的不捕獲模式，計(jì)算索引時(shí)會(huì)跳過(guò)這個(gè)分組

(?P...) 分組的命名模式，取此分組中的內(nèi)容時(shí)可以使用索引也可以使用name

(?P=name) 分組的引用模式，可在同一個(gè)正則表達(dá)式用引用前面命名過(guò)的正則

(?#...) 注釋，不影響正則表達(dá)式其它部分,用法參見(jiàn) 模式 I

(?=...) 順序肯定環(huán)視，表示所在位置右側(cè)能夠匹配括號(hào)內(nèi)正則

(?!...) 順序否定環(huán)視，表示所在位置右側(cè)不能匹配括號(hào)內(nèi)正則

(?<=...) 逆序肯定環(huán)視，表示所在位置左側(cè)能夠匹配括號(hào)內(nèi)正則

(?(id/name)yes|no) 若前面指定id或name的分區(qū)匹配成功則執(zhí)行yes處的正則，否則執(zhí)行no處的正則

\number 匹配和前面索引為number的分組捕獲到的內(nèi)容一樣的字符串

\A 匹配字符串開(kāi)始位置，忽略多行模式

\Z 匹配字符串結(jié)束位置，忽略多行模式

\b 匹配位于單詞開(kāi)始或結(jié)束位置的空字符串

\B 匹配不位于單詞開(kāi)始或結(jié)束位置的空字符串

\d 匹配一個(gè)數(shù)字，相當(dāng)于 [0-9]

\D 匹配非數(shù)字,相當(dāng)于 [^0-9]

\s 匹配任意空白字符，相當(dāng)于 [ \t\n\r\f\v]

\S 匹配非空白字符，相當(dāng)于 [^ \t\n\r\f\v]

\w 匹配數(shù)字、字母、下劃線(xiàn)中任意一個(gè)字符，相當(dāng)于 [a-zA-Z0-9_]

\W 匹配非數(shù)字、字母、下劃線(xiàn)中的任意字符，相當(dāng)于 [^a-zA-Z0-9_]

二、模式

I IGNORECASE，忽略大小寫(xiě)的匹配模式, 樣例如下

s = 'hello World!'

regex = re.compile("hello world!", re.I)

print regex.match(s).group()

#output> 'Hello World!'

#在正則表達(dá)式中指定模式以及注釋

regex = re.compile("(?#注釋)(?i)hello world!")

print regex.match(s).group()

#output> 'Hello World!'

L LOCALE，字符集本地化。這個(gè)功能是為了支持多語(yǔ)言版本的字符集使用環(huán)境的，比如在轉(zhuǎn)義符\w，在英文環(huán)境下，它代表[a-zA-Z0-9_]，即所以英文字符和數(shù)字。如果在一個(gè)法語(yǔ)環(huán)境下使用，缺省設(shè)置下，不能匹配"é" 或 "?"。加上這L選項(xiàng)和就可以匹配了。不過(guò)這個(gè)對(duì)于中文環(huán)境似乎沒(méi)有什么用，它仍然不能匹配中文字符。

M MULTILINE，多行模式, 改變 ^ 和 $ 的行為

s = '''first line

second line

third line'''

# ^

regex_start = re.compile("^\w+")

print regex_start.findall(s)

# output> ['first']

regex_start_m = re.compile("^\w+", re.M)

print regex_start_m.findall(s)

# output> ['first', 'second', 'third']

regex_end = re.compile("\w+$")

print regex_end.findall(s)

# output> ['line']

regex_end_m = re.compile("\w+$", re.M)

print regex_end_m.findall(s)

# output> ['line', 'line', 'line']

S 　DOTALL，此模式下 '.' 的匹配不受限制，可匹配任何字符，包括換行符

s = '''first line

second line

third line'''

regex = re.compile(".+")

print regex.findall(s)

# output> ['first line', 'second line', 'third line']

# re.S

regex_dotall = re.compile(".+", re.S)

print regex_dotall.findall(s)

# output> ['first line\nsecond line\nthird line']

X VERBOSE，冗余模式，此模式忽略正則表達(dá)式中的空白和#號(hào)的注釋，例如寫(xiě)一個(gè)匹配郵箱的正則表達(dá)式

email_regex = re.compile("[\w+\.]+@[a-zA-Z\d]+\.(com|cn)")

email_regex = re.compile("""[\w+\.]+ # 匹配@符前的部分

@ # @符

[a-zA-Z\d]+ # 郵箱類(lèi)別

\.(com|cn) # 郵箱后綴 """, re.X)

U UNICODE，使用 \w, \W, \b, \B 這些元字符時(shí)將按照 UNICODE 定義的屬性.

正則表達(dá)式的模式是可以同時(shí)使用多個(gè)的，在 python 里面使用按位或運(yùn)算符 | 同時(shí)添加多個(gè)模式

如 re.compile('', re.I|re.M|re.S)

每個(gè)模式在 re 模塊中其實(shí)就是不同的數(shù)字

print re.I

# output> 2

print re.L

# output> 4

print re.M

# output> 8

print re.S

# output> 16

print re.X

# output> 64

print re.U

# output> 32

三、函數(shù) （參見(jiàn) python 模塊 re 文檔）

python 的 re 模塊提供了很多方便的函數(shù)使你可以使用正則表達(dá)式來(lái)操作字符串，每種函數(shù)都有它自己的特性和使用場(chǎng)景，熟悉之后對(duì)你的工作會(huì)有很大幫助

compile(pattern, flags=0)

給定一個(gè)正則表達(dá)式 pattern，指定使用的模式 flags 默認(rèn)為0 即不使用任何模式,然后會(huì)返回一個(gè) SRE_Pattern (參見(jiàn) 第四小節(jié) re 內(nèi)置對(duì)象用法) 對(duì)象

regex = re.compile(".+")

print regex

# output> <_sre.SRE_Pattern object at 0x00000000026BB0B8>

這個(gè)對(duì)象可以調(diào)用其他函數(shù)來(lái)完成匹配，一般來(lái)說(shuō)推薦使用 compile 函數(shù)預(yù)編譯出一個(gè)正則模式之后再去使用，這樣在后面的代碼中可以很方便的復(fù)用它，當(dāng)然大部分函數(shù)也可以不用 compile 直接使用，具體見(jiàn) findall 函數(shù)

s = '''first line

second line

third line'''

regex = re.compile(".+")

# 調(diào)用 findall 函數(shù)

print regex.findall(s)

# output> ['first line', 'second line', 'third line']

# 調(diào)用 search 函數(shù)

print regex.search(s).group()

# output> first lin

escape(pattern)

轉(zhuǎn)義如果你需要操作的文本中含有正則的元字符，你在寫(xiě)正則的時(shí)候需要將元字符加上反斜扛 \ 去匹配自身，而當(dāng)這樣的字符很多時(shí)，寫(xiě)出來(lái)的正則表達(dá)式就看起來(lái)很亂而且寫(xiě)起來(lái)也挺麻煩的，這個(gè)時(shí)候你可以使用這個(gè)函數(shù),用法如下

s = ".+\d123"

regex_str = re.escape(".+\d123")

# 查看轉(zhuǎn)義后的字符

print regex_str

# output> \.\+\\d123

# 查看匹配到的結(jié)果

for g in re.findall(regex_str, s):

print g

# output> .+\d123

findall(pattern, string, flags=0)

參數(shù) pattern 為正則表達(dá)式, string 為待操作字符串, flags 為所用模式，函數(shù)作用為在待操作字符串中尋找所有匹配正則表達(dá)式的字串，返回一個(gè)列表，如果沒(méi)有匹配到任何子串，返回一個(gè)空列表。

s = '''first line

second line

third line'''

# compile 預(yù)編譯后使用 findall

regex = re.compile("\w+")

print regex.findall(s)

# output> ['first', 'line', 'second', 'line', 'third', 'line']

# 不使用 compile 直接使用 findall

print re.findall("\w+", s)

# output> ['first', 'line', 'second', 'line', 'third', 'line']

finditer(pattern, string, flags=0)

參數(shù)和作用與 findall 一樣，不同之處在于 findall 返回一個(gè)列表， finditer 返回一個(gè)迭代器(參見(jiàn) http://www.cnblogs.com/huxi/archive/2011/07/01/2095931.html )，而且迭代器每次返回的值并不是字符串，而是一個(gè) SRE_Match (參見(jiàn) 第四小節(jié) re 內(nèi)置對(duì)象用法) 對(duì)象，這個(gè)對(duì)象的具體用法見(jiàn) match 函數(shù)。

s = '''first line

second line

third line'''

regex = re.compile("\w+")

print regex.finditer(s)

# output>

for i in regex.finditer(s):

print i

# output> <_sre.SRE_Match object at 0x0000000002B7A920>

# <_sre.SRE_Match object at 0x0000000002B7A8B8>

# <_sre.SRE_Match object at 0x0000000002B7A920>

# <_sre.SRE_Match object at 0x0000000002B7A8B8>

# <_sre.SRE_Match object at 0x0000000002B7A920>

# <_sre.SRE_Match object at 0x0000000002B7A8B8>

match(pattern, string, flags=0)

使用指定正則去待操作字符串中尋找可以匹配的子串, 返回匹配上的第一個(gè)字串，并且不再繼續(xù)找，需要注意的是 match 函數(shù)是從字符串開(kāi)始處開(kāi)始查找的，如果開(kāi)始處不匹配，則不再繼續(xù)尋找，返回值為一個(gè) SRE_Match(參見(jiàn) 第四小節(jié) re 內(nèi)置對(duì)象用法) 對(duì)象，找不到時(shí)返回 None

s = '''first line

second line

third line'''

# compile

regex = re.compile("\w+")

m = regex.match(s)

print m

# output> <_sre.SRE_Match object at 0x0000000002BCA8B8>

print m.group()

# output> first

# s 的開(kāi)頭是 "f", 但正則中限制了開(kāi)始為 i 所以找不到

regex = re.compile("^i\w+")

print regex.match(s)

# output> None

purge()

當(dāng)你在程序中使用 re 模塊，無(wú)論是先使用 compile 還是直接使用比如 findall 來(lái)使用正則表達(dá)式操作文本，re 模塊都會(huì)將正則表達(dá)式先編譯一下，并且會(huì)將編譯過(guò)后的正則表達(dá)式放到緩存中，這樣下次使用同樣的正則表達(dá)式的時(shí)候就不需要再次編譯，因?yàn)榫幾g其實(shí)是很費(fèi)時(shí)的，這樣可以提升效率，而默認(rèn)緩存的正則表達(dá)式的個(gè)數(shù)是 100, 當(dāng)你需要頻繁使用少量正則表達(dá)式的時(shí)候，緩存可以提升效率，而使用的正則表達(dá)式過(guò)多時(shí)，緩存帶來(lái)的優(yōu)勢(shì)就不明顯了 (參考《python re.compile對(duì)性能的影響》http://blog.trytofix.com/article/detail/13/)，這個(gè)函數(shù)的作用是清除緩存中的正則表達(dá)式，可能在你需要優(yōu)化占用內(nèi)存的時(shí)候會(huì)用到。

search(pattern, string, flags=0)

函數(shù)類(lèi)似于 match，不同之處在于不限制正則表達(dá)式的開(kāi)始匹配位置

s = '''first line

second line

third line'''

# 需要從開(kāi)始處匹配所以匹配不到

print re.match('i\w+', s)

# output> None

# 沒(méi)有限制起始匹配位置

print re.search('i\w+', s)

# output> <_sre.SRE_Match object at 0x0000000002C6A920>

print re.search('i\w+', s).group()

# output> irst

split(pattern, string, maxsplit=0, flags=0)

參數(shù) maxsplit 指定切分次數(shù)，函數(shù)使用給定正則表達(dá)式尋找切分字符串位置，返回包含切分后子串的列表，如果匹配不到，則返回包含原字符串的一個(gè)列表

s = '''first 111 line

second 222 line

third 333 line'''

# 按照數(shù)字切分

print re.split('\d+', s)

# output> ['first ', ' line\nsecond ', ' line\nthird ', ' line']

# \.+ 匹配不到返回包含自身的列表

print re.split('\.+', s, 1)

# output> ['first 111 line\nsecond 222 line\nthird 333 line']

# maxsplit 參數(shù)

print re.split('\d+', s, 1)

# output> ['first ', ' line\nsecond 222 line\nthird 333 line']

sub(pattern, repl, string, count=0, flags=0)

替換函數(shù)，將正則表達(dá)式 pattern 匹配到的字符串替換為 repl 指定的字符串, 參數(shù) count 用于指定最大替換次數(shù)

s = "the sum of 7 and 9 is [7+9]."

# 基本用法將目標(biāo)替換為固定字符串

print re.sub('\[7\+9\]', '16', s)

# output> the sum of 7 and 9 is 16.

# 高級(jí)用法 1 使用前面匹配的到的內(nèi)容 \1 代表 pattern 中捕獲到的第一個(gè)分組的內(nèi)容

print re.sub('\[(7)\+(9)\]', r'\2\1', s)

# output> the sum of 7 and 9 is 97.

# 高級(jí)用法 2 使用函數(shù)型 repl 參數(shù), 處理匹配到的 SRE_Match 對(duì)象

def replacement(m):

p_str = m.group()

if p_str == '7':

return '77'

if p_str == '9':

return '99'

return ''

print re.sub('\d', replacement, s)

# output> the sum of 77 and 99 is [77+99].

# 高級(jí)用法 3 使用函數(shù)型 repl 參數(shù), 處理匹配到的 SRE_Match 對(duì)象增加作用域自動(dòng)計(jì)算

scope = {}

example_string_1 = "the sum of 7 and 9 is [7+9]."

example_string_2 = "[name = 'Mr.Gumby']Hello,[name]"

def replacement(m):

code = m.group(1)

st = ''

try:

st = str(eval(code, scope))

except SyntaxError:

exec code in scope

return st

# 解析: code='7+9'

# str(eval(code, scope))='16'

print re.sub('\[(.+?)\]', replacement, example_string_1)

# output> the sum of 7 and 9 is 16.

# 兩次替換

# 解析1: code="name = 'Mr.Gumby'"

# eval(code)

# raise SyntaxError

# exec code in scope

# 在命名空間 scope 中將 "Mr.Gumby" 賦給了變量 name

# 解析2: code="name"

# eval(name) 返回變量 name 的值 Mr.Gumby

print re.sub('\[(.+?)\]', replacement, example_string_2)

# output> Hello,Mr.Gumby

subn(pattern, repl, string, count=0, flags=0)

作用與函數(shù) sub 一樣，唯一不同之處在于返回值為一個(gè)元組，第一個(gè)值為替換后的字符串，第二個(gè)值為發(fā)生替換的次數(shù)

template(pattern, flags=0)

這個(gè)吧，咋一看和 compile 差不多，不過(guò)不支持 +、？、*、｛｝等這樣的元字符，只要是需要有重復(fù)功能的元字符，就不支持，查了查資料，貌似沒(méi)人知道這個(gè)函數(shù)到底是干嘛的...

四、re 內(nèi)置對(duì)象用法

SRE_Pattern 這個(gè)對(duì)象是一個(gè)編譯后的正則表達(dá)式，編譯后不僅能夠復(fù)用和提升效率，同時(shí)也能夠獲得一些其他的關(guān)于正則表達(dá)式的信息

屬性：

flags 編譯時(shí)指定的模式

groupindex 以正則表達(dá)式中有別名的組的別名為鍵、以該組對(duì)應(yīng)的編號(hào)為值的字典，沒(méi)有別名的組不包含在內(nèi)。

groups 正則表達(dá)式中分組的數(shù)量

pattern 編譯時(shí)用的正則表達(dá)式

s = 'Hello, Mr.Gumby : 2016/10/26'

p = re.compile('''(?: # 構(gòu)造一個(gè)不捕獲分組用于使用 |

(?P\w+\.\w+) # 匹配 Mr.Gumby

| # 或

(?P\s+\.\w+) # 一個(gè)匹配不到的命名分組

)

.*? # 匹配 :

(\d+) # 匹配 2016

''', re.X)

print p.flags

# output> 64

print p.groupindex

# output> {'name': 1, 'no': 2}

print p.groups

# output> 3

print p.pattern

# output> (?: # 構(gòu)造一個(gè)不捕獲分組用于使用 |

# (?P\w+\.\w+) # 匹配 Mr.Gumby

# | # 或

# (?P\s+\.\w+) # 一個(gè)匹配不到的命名分組

# )

# .*? # 匹配 :

# (\d+) # 匹配 2016

函數(shù)：可使用 findall、finditer、match、search、split、sub、subn 等函數(shù)

SRE_Match 這個(gè)對(duì)象會(huì)保存本次匹配的結(jié)果，包含很多關(guān)于匹配過(guò)程以及匹配結(jié)果的信息

屬性：

endpos 本次搜索結(jié)束位置索引

lastgroup 本次搜索匹配到的最后一個(gè)分組的別名

lastindex 本次搜索匹配到的最后一個(gè)分組的索引

pos 本次搜索開(kāi)始位置索引

re 本次搜索使用的 SRE_Pattern 對(duì)象

regs 列表，元素為元組，包含本次搜索匹配到的所有分組的起止位置

string 本次搜索操作的字符串

s = 'Hello, Mr.Gumby : 2016/10/26'

m = re.search(', (?P\w+\.\w+).*?(\d+)', s)

# 本次搜索的結(jié)束位置索引

print m.endpos

# output> 28

# 本次搜索匹配到的最后一個(gè)分組的別名

# 本次匹配最后一個(gè)分組沒(méi)有別名

print m.lastgroup

# output> None

# 本次搜索匹配到的最后一個(gè)分組的索引

print m.lastindex

# output> 2

# 本次搜索開(kāi)始位置索引

print m.pos

# output> 0

# 本次搜索使用的 SRE_Pattern 對(duì)象

print m.re

# output> <_sre.SRE_Pattern object at 0x000000000277E158>

# 列表，元素為元組，包含本次搜索匹配到的所有分組的起止位置第一個(gè)元組為正則表達(dá)式匹配范圍

print m.regs

# output> ((7, 22), (7, 15), (18, 22))

# 本次搜索操作的字符串

print m.string

# output> Hello, Mr.Gumby : 2016/10/26

函數(shù)：

end([group=0]) 返回指定分組的結(jié)束位置，默認(rèn)返回正則表達(dá)式所匹配到的最后一個(gè)字符的索引

expand(template) 根據(jù)模版返回相應(yīng)的字符串，類(lèi)似與 sub 函數(shù)里面的 repl，可使用 \1 或者 \g 來(lái)選擇分組

group([group1, ...]) 根據(jù)提供的索引或名字返回響應(yīng)分組的內(nèi)容，默認(rèn)返回 start() 到 end() 之間的字符串，提供多個(gè)參數(shù)將返回一個(gè)元組

groupdict([default=None]) 返回返回一個(gè)包含所有匹配到的命名分組的字典，沒(méi)有命名的分組不包含在內(nèi)，key 為組名， value 為匹配到的內(nèi)容，參數(shù) default 為沒(méi)有參與本次匹配的命名分組提供默認(rèn)值

groups([default=None]) 以元組形式返回每一個(gè)分組匹配到的字符串，包括沒(méi)有參與匹配的分組，其值為 default

span([group]) 返回指定分組的起止位置組成的元組，默認(rèn)返回由 start() 和 end() 組成的元組

start([group]) 返回指定分組的開(kāi)始位置，默認(rèn)返回正則表達(dá)式所匹配到的第一個(gè)字符的索引

s = 'Hello, Mr.Gumby : 2016/10/26'

m = re.search('''(?: # 構(gòu)造一個(gè)不捕獲分組用于使用 |

(?P\w+\.\w+) # 匹配 Mr.Gumby

| # 或

(?P\s+\.\w+) # 一個(gè)匹配不到的命名分組

)

.*? # 匹配 :

(\d+) # 匹配 2016

''',

s, re.X)

# 返回指定分組的結(jié)束位置，默認(rèn)返回正則表達(dá)式所匹配到的最后一個(gè)字符的索引

print m.end()

# output> 22

# 根據(jù)模版返回相應(yīng)的字符串，類(lèi)似與 sub 函數(shù)里面的 repl，可使用 \1 或者 \g 來(lái)選擇分組

print m.expand("my name is \\1")

# output> my name is Mr.Gumby

# 根據(jù)提供的索引或名字返回響應(yīng)分組的內(nèi)容，默認(rèn)返回 start() 到 end() 之間的字符串，提供多個(gè)參數(shù)將返回一個(gè)元組

print m.group()

# output> Mr.Gumby : 2016

print m.group(1,2)

# output> ('Mr.Gumby', None)

# 返回返回一個(gè)包含所有匹配到的命名分組的字典，沒(méi)有命名的分組不包含在內(nèi)，key 為組名， value 為匹配到的內(nèi)容，參數(shù) default 為沒(méi)有參與本次匹配的命名分組提供默認(rèn)值

print m.groupdict('default_string')

# output> {'name': 'Mr.Gumby', 'no': 'default_string'}

# 以元組形式返回每一個(gè)分組匹配到的字符串，包括沒(méi)有參與匹配的分組，其值為 default

print m.groups('default_string')

# output> ('Mr.Gumby', 'default_string', '2016')

# 返回指定分組的起止未知組成的元組，默認(rèn)返回由 start() 和 end() 組成的元組

print m.span(3)

# output> (18, 22)

# 返回指定分組的開(kāi)始位置，默認(rèn)返回正則表達(dá)式所匹配到的第一個(gè)字符的索引

print m.start(3)

# output> 18

五、分組用法

python 的正則表達(dá)式中用小括號(hào) "(" 表示分組，按照每個(gè)分組中前半部分出現(xiàn)的順序 "(" 判定分組的索引，索引從 1 開(kāi)始，每個(gè)分組在訪(fǎng)問(wèn)的時(shí)候可以使用索引，也可以使用別名

s = 'Hello, Mr.Gumby : 2016/10/26'

p = re.compile("(?P\w+\.\w+).*?(\d+)(?#comment)")

m = p.search(s)

# 使用別名訪(fǎng)問(wèn)

print m.group('name')

# output> Mr.Gumby

# 使用分組訪(fǎng)問(wèn)

print m.group(2)

# output> 2016

有時(shí)候可能只是為了把正則表達(dá)式分組，而不需要捕獲其中的內(nèi)容，這時(shí)候可以使用非捕獲分組

s = 'Hello, Mr.Gumby : 2016/10/26'

p = re.compile("""

(?: # 非捕獲分組標(biāo)志用于使用 |

(?P\w+\.\w+)

(\d+/)

)

""", re.X)

m = p.search(s)

# 使用非捕獲分組

# 此分組將不計(jì)入 SRE_Pattern 的分組計(jì)數(shù)

print p.groups

# output> 2

# 不計(jì)入 SRE_Match 的分組

print m.groups()

# output> ('Mr.Gumby', None)

如果你在寫(xiě)正則的時(shí)候需要在正則里面重復(fù)書(shū)寫(xiě)某個(gè)表達(dá)式，那么你可以使用正則的引用分組功能，需要注意的是引用的不是前面分組的正則表達(dá)式而是捕獲到的內(nèi)容，并且引用的分組不算在分組總數(shù)中.

s = 'Hello, Mr.Gumby : 2016/2016/26'

p = re.compile("""

(?: # 非捕獲分組標(biāo)志用于使用 |

(?P\w+\.\w+)

(\d+/)

)

.*?(?P\d+)/(?P=number)/

""", re.X)

m = p.search(s)

# 使用引用分組

# 此分組將不計(jì)入 SRE_Pattern 的分組計(jì)數(shù)

print p.groups

# output> 3

# 不計(jì)入 SRE_Match 的分組

print m.groups()

# output> ('Mr.Gumby', None, '2016')

# 查看匹配到的字符串

print m.group()

# output> Mr.Gumby : 2016/2016/

六、環(huán)視用法

環(huán)視還有其他的名字，例如界定、斷言、預(yù)搜索等，叫法不一。

環(huán)視是一種特殊的正則語(yǔ)法，它匹配的不是字符串，而是位置，其實(shí)就是使用正則來(lái)說(shuō)明這個(gè)位置的左右應(yīng)該是什么或者應(yīng)該不是什么，然后去尋找這個(gè)位置。

環(huán)視的語(yǔ)法有四種，見(jiàn)第一小節(jié)元字符，基本用法如下。

s = 'Hello, Mr.Gumby : 2016/10/26 Hello,r.Gumby : 2016/10/26'

# 不加環(huán)視限定

print re.compile("(?P\w+\.\w+)").findall(s)

# output> ['Mr.Gumby', 'r.Gumby']

# 環(huán)視表達(dá)式所在位置左邊為 "Hello, "

print re.compile("(?<=Hello, )(?P\w+\.\w+)").findall(s)

# output> ['Mr.Gumby']

# 環(huán)視表達(dá)式所在位置左邊不為 ","

print re.compile("(?\w+\.\w+)").findall(s)

# output> ['Mr.Gumby']

# 環(huán)視表達(dá)式所在位置右邊為 "M"

print re.compile("(?=M)(?P\w+\.\w+)").findall(s)

# output> ['Mr.Gumby']

# 環(huán)視表達(dá)式所在位置右邊不為 r

print re.compile("(?!r)(?P\w+\.\w+)").findall(s)

# output> ['Mr.Gumby']

參考文章：

總結(jié)

以上是生活随笔為你收集整理的python正则匹配_python 正则表达式详解的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： python脚本画pie饼图_pytho
下一篇： python 计量_距离度量以及pyth