當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

js 正则或者_正则表达式

發布時間：2025/3/19 编程问答 13 豆豆

生活随笔收集整理的這篇文章主要介紹了 js 正则或者_正则表达式小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

???? 正則表達式使用單個字符串來描述、匹配一系列符合某個句法規則的字符串。在很多文本編輯器里，正則表達式通常被用來檢索、替換那些符合某個模式的文本，比如爬蟲工程師可以用正則表達式來匹配網頁上的文本數據，自然語言工程師可以用正則表達式匹配出含有敏感詞的語句，作為氣象工程師，我們可以用正則表達式來處理我們服務器內的日志文件，也可以用來匹配特定規律的模式輸出文件名。

假如我們有如下的日志文件

f = open('xxx.log.txt') # log = f.read()print(log)'2018-01-16 09：14：35 reading EC DATA\n2018-01-16 10：17：37 reprocess the EC DATA\n2018-01-17 18：18：38 put into WRF,\n2018-01-22 16：17：37 extract the grid data to nearest station, merge with actual data, save to Mysql database \n2018-01-24 17：14：39 extract the data from Mysql and put the station data to CNN-LSTM model\n2018-01-24 22：12：39 training the CNN-LSTM model\n2018-01-25 17：09：22 tuning\n2018-01-25 17：09：22 save the best model\n2018-01-26 19：23：55 predict the wind speed\n2018-01-27 06：09：45 save and evaluate\n\n\n'

log中的日期格式為yyyy-mm-dd，如果甲方爸爸突然要求我們把日期全部改成mm/dd/yyyy，我們應該如何是好？

這個時候正則表達式就可以派上用場了，首先我們匹配出年月日,并打印出來檢驗一下,說明匹配出來的日期是正確的。

import repattern = r'\d{4}-\d{2}-\d{2}'print(re.findall(pattern,log))['2018-01-16', '2018-01-16', '2018-01-17', '2018-01-22', '2018-01-24', '2018-01-24', '2018-01-25', '2018-01-25', '2018-01-26', '2018-01-27']

我們再對上面的表達式進行年月日分組(即加個括號)，并進行重新排序,以前默認的123 改成231，后使用re.sub進行位置替換

pattern_ed=r'(\d{4})-(\d{2})-(\d{2})'sub_order =r'\2/\3/\1' # 重新排序print(re.sub(pattern_ed,sub_order,log))'01/16/2018 09：14：35 reading EC DATA\n01/16/2018 10：17：37 reprocess the EC DATA\n01/17/2018 18：18：38 put into WRF,\n01/22/2018 16：17：37 extract the grid data to nearest station, merge with actual data, saved at Mysql database \n01/24/2018 17：14：39 extract the data fro Mysql and put the station data to CNN-LSTM model\n01/24/2018 22：12：39 training the CNN-LSTM model\n01/25/2018 17：09：22 tuning\n01/25/2018 17：09：22 save the best model\n01/26/2018 19：23：55 predict the wind speed\n01/27/2018 06：09：45 save and plot\n\n\n'

實際上，我們還可以對各個分組命名，即：

pattern_ed=r'(?P\d{4})-(?P\d{2})-(?P\d{2})'sub_order = r'\g/\g/\g'print(re.sub(pattern_ed,sub_order,log))

效果和上面的一致。

上面的案例只為拋磚引玉，下面我們來正式學習正則表達式基礎吧。本文將結合python的re模塊來講解正則表達式的使用。

1.基本匹配

正則表達式其實就是在執行搜索時的格式, 它由一些字母和數字組合而成^[1]. 例如: 一個正則表達式 d03, 它表示一個規則: 由字母d開始,接著是0,再接著是3，它逐個字符地與輸入的正則表達式做比較。正則表達式對大小寫敏感，所以D03 不會匹配d03

import retext = 'WRF_d03_hunan_20190608_16:00:00'regex_1 = 'd03'regex_2 = 'D03'print('匹配出：',re.findall(regex_1,text))print('匹配出：',re.findall(regex_2,text))匹配出：['d03']匹配出：[]

2.元字符

正則表達式主要依賴于元字符. 元字符不代表他們本身的字面意思, 他們都有特殊的含義. 一些元字符寫在方括號中的時候有一些特殊的意思. 以下是一些元字符的介紹:

元字符	描述
$	從末端開始匹配
^	從開始行開始匹配
\	轉義字符,用于匹配一些保留的字符 `[ ] ( ) { } . * + ? ^ $ \
\|	或運算符,匹配符號前或后的字符.
(xyz)	字符集, 匹配與 xyz 完全相等的字符串.
{n,m}	匹配num個大括號之前的字符 (n <= num <= m).
?	標記?之前的字符為可選.
+	匹配>=1個重復的+號前的字符.
*	匹配>=0個重復的在*號之前的字符.
[^ ]	否定的字符種類. 匹配除了方括號里的任意字符
[ ]	字符種類. 匹配方括號內的任意字符.
.	句號匹配任意單個字符除了換行符.

2.1 點運算符.

.是元字符中最簡單的例子, ?.匹配任意單個字符, 但不匹配換行符. 例如, 表達式wrf_d03_20180.\.nc中的第一個 .匹配一個任意字符,該字符前面是wrf_d03_20180,后面是.nc.

import retext = 'wrf_d03_201805.nc wrf_d03_201806.nc wrf_d03_201807.nc wrf_d03_201810.nc wrf_d03_201806 wrf_d03_201812'regex = 'wrf_d03_20180.\.nc' # 第一個點是點運算符, 第二點前面加上\ 是為了反轉義，即第二個點只是一個字符，不是點運算符print(re.findall(regex,text))['wrf_d03_201805.nc', 'wrf_d03_201806.nc', 'wrf_d03_201807.nc']

2.2 字符集

字符集也叫做字符類. 方括號用來指定一個字符集. 在方括號中使用連字符來指定字符集的范圍. 在方括號中的字符集不關心順序. 例如, 表達式[Ww]rf 匹配 Wrf 和 wrf.

import retext = 'Wrf666.nchjhjhjhjhffsfsgfergwrf777.ncfjkajawrf888888.nc'#regex = '[Ww]rf[0-9]{1,9},nc$'regex = '[Ww]rf[0-9]{3,6}.nc' #花括號匹配num個大括號之前的字符 (n <= num <= m).print(re.findall(regex,text))['Wrf666.nc', 'wrf777.nc', 'wrf888888.nc']

2.21否定字符集

一般來說 ^ 表示一個字符串的開頭, 但它用在一個方括號內的開頭的時候, 它表示這個字符集是否定的.下面例子中的[^We]表示非W 和e,即既不是W,也不是e.

import retext_list = ['Wrfout_d02_2019080215.nc','wrfout_d02_2019080615.nc','wrfout_d03_2019080715.nc' ,'WRFCHEM_d02_2019081213.nc','wrfout_d01_2019080215.nc','erfout_d01_2019080215.nc'] #regex = '^wrfout_d0[0-3]_2019[0-9]{6}.nc$'regex = '[^We]rfout_d0[0-3]_2019*'for each_file in text_list: if re.search(regex,each_file) is not None: print(each_file) print('匹配項為：',re.findall(regex,each_file)) else: print('%s not match anything' %each_file)Wrfout_d02_2019080215.nc not match anythingwrfout_d02_2019080615.nc匹配項為：['wrfout_d02_2019']wrfout_d03_2019080715.nc匹配項為：['wrfout_d03_2019']WRFCHEM_d02_2019081213.nc not match anythingwrfout_d01_2019080215.nc匹配項為：['wrfout_d01_2019']erfout_d01_2019080215.nc not match anything

另外一個應用

def extract_size(arow , item_col = 'item_name'): """ 根據品名提取商品大小 XXL XL import re re.findall('XXL|xxl','吉氏.輕柔學行褲嬰兒紙尿褲XXL碼') import re re.findall('\W+(L)|\W+(l)"','吉氏.輕柔學行褲嬰兒紙尿褲XXL碼') vipsale['SIZE']=vipsale.apply(extract_size,axis=1) """ import re item_value = arow[item_col] my_reg_baby = "nb|Nb|nB|NB|NEW BABY|new baby|嬰兒" my_reg_S = "S|s|小碼|小號|小" my_reg_M = "M|m|中碼|中號|中" my_reg_L = "[^a-zA-Z]+(L)|[^a-zA-Z]+(l)|大號|大" my_reg_XL = "[^a-zA-Z]+(XL)|[^a-zA-Z]+(xl)|[^a-zA-Z]+(xL)|[^a-zA-Z]+(xL)|[^\u4e00-\u9fa5]+(加大碼)|[^\u4e00-\u9fa5]+(加大號)|加大" my_reg_XXL = "XXL|xxl|Xxl|XXl|xxL|xXL|xXl|加加大碼|加加大號|加加大" #"(?i)xxl" # if re.findall(my_reg_baby,item_value): size = 'NB' elif re.findall(my_reg_S,item_value): size = 'S' elif re.findall(my_reg_M,item_value): size = 'M' elif re.findall(my_reg_L,item_value) : size = 'L' elif re.findall(my_reg_XL,item_value) : size = 'XL' elif re.findall(my_reg_XXL,item_value) : size = 'XXL' else: size= '未知' return size

2.3 重復次數

后面跟著元字符 +, * or ? 的, 用來指定匹配子模式的次數. 這些元字符在不同的情況下有著不同的意思.

2.3.1 *號

*號匹配在之前的字符出現大于等于0次. 在下面的例子中，對于表達式wrf*out,表示w后面跟個r,r 后面跟0個或者無數次的f，之后再依次跟o、u、t.

'wrffffout_d02_2019080215.nc' 匹配了4個f, 'wrout_d02_2019080615.nc'中，r后面有0個f，所以也被匹配出來了。

import retext_list =['wrffffout_d02_2019080215.nc','wrout_d02_2019080615.nc','wrfout_d03_2019080715.nc' ,'WRFCHEM_d02_2019081213.nc']regex = 'wrf*out'for each_file in text_list: if re.search(regex,each_file) is not None: print(each_file) print('匹配項為：',re.findall(regex,each_file)) else: print('%s not match anything' %each_file)wrffffout_d02_2019080215.nc匹配項為：['wrffffout']wrout_d02_2019080615.nc匹配項為：['wrout']wrfout_d03_2019080715.nc匹配項為：['wrfout']WRFCHEM_d02_2019081213.nc not match anything

2.3.2 +號

+號匹配+號之前的字符出現 >=1 次. 例如表達式c.+t 匹配以首字母c開頭以t結尾,中間跟著任意個字符的字符串. 注意與2.3.1中星號的區別，由于'wrout_d02_2019080615.nc'中wr后面沒有f,所以沒有匹配到。

import retext_list =['wrffffout_d02_2019080215.nc','wrout_d02_2019080615.nc','wrfout_d03_2019080715.nc' ,'WRFCHEM_d02_2019081213.nc']regex = 'wrf+out'for each_file in text_list: if re.search(regex,each_file) is not None: print(each_file) print('匹配項為：',re.findall(regex,each_file)) else: print('%s not match anything' %each_file)wrffffout_d02_2019080215.nc匹配項為：['wrffffout']wrout_d02_2019080615.nc not match anythingwrfout_d03_2019080715.nc匹配項為：['wrfout']WRFCHEM_d02_2019081213.nc not match anything

2.3.3 ?？號

在正則表達式中元字符 ? 標記在符號前面的字符為可選, 即出現 0 或 1 次. 例如, 表達式 [w]?rf 匹配字符串 rf 和 wrf.

text =['wrfout_d02_2019080215.nc','Wrfout_d02_2019080615.nc','wrfout_d03_2019080715.nc' ,'WRFCHEM_d02_2019081213.nc','rfout_d02_2019080215.nc']regex = '[wW]?rfout_d0[1-3].+nc'#for each_file in text: if re.search(regex,each_file) is not None: print(each_file) print('匹配項為：',re.findall(regex,each_file)) else: print('%s not match anything' %each_file)wrfout_d02_2019080215.nc匹配項為：['wrfout_d02_2019080215.nc']Wrfout_d02_2019080615.nc匹配項為：['Wrfout_d02_2019080615.nc']wrfout_d03_2019080715.nc匹配項為：['wrfout_d03_2019080715.nc']WRFCHEM_d02_2019081213.nc not match anythingrfout_d02_2019080215.nc匹配項為：['rfout_d02_2019080215.nc']

2.4{}號

在正則表達式中 {} 是一個量詞, 常用來一個或一組字符可以重復出現的次數. 例如, 表達式 [0-9]{4,10} 匹配最少4 位最多10 位 0~9 的數字.[0-9]{2}只匹配兩位。

text =['The number was 9.9997 but we rounded it off to 10.0.','Wrfout_d02_2019080615.nc','wrfout_d03_2019080715.nc' ,'WRFCHEM_d02_2019081213.nc','rfout_d02_2019080215.nc','WRFCHEM_d02_2019.nc']regex = 'd[0-9]{2}_+[0-9]{4,10}\.nc'for each_file in text: if re.search(regex,each_file) is not None: print(each_file) print('匹配項為：',re.findall(regex,each_file)) else: print('%s not match anything' %each_file)The number was 9.9997 but we rounded it off to 10.0. not match anythingWrfout_d02_2019080615.nc匹配項為：['d02_2019080615.nc']wrfout_d03_2019080715.nc匹配項為：['d03_2019080715.nc']WRFCHEM_d02_2019081213.nc匹配項為：['d02_2019081213.nc']rfout_d02_2019080215.nc匹配項為：['d02_2019080215.nc']WRFCHEM_d02_2019.nc匹配項為：['d02_2019.nc']def extract_pian(arow,item_col='item_name'): # 根據品名和正則表達式提取片數 , 未知數量用 -9999代替 # 用法 : vip_sale_llk['片數']=vip_sale_llk.apply(extract_pian,item_col='item_name',axis=1) import re try: ro = arow[item_col] #zhiliang = re.findall(r'[\d]{3}(?!=G)|[\d]{3}(?!=g)?|[\d]{3}(?!=克)',ro) #print(ro) pian = re.findall(r'[\d]{1,3}(?!=片)',ro) pian= int(pian[0]) except: pian=-9999 return pian

2.5(....)特征標群

特征標群是一組寫在 (...) 中的子模式. 例如之前說的 {} 是用來表示前面一個字符出現指定次數. 但如果在 {} 前加入特征標群則表示整個標群內的字符重復 N 次. 例如, 表達式 (ab)* 匹配連續出現 0 或更多個 ab.們還可以在 () 中用或字符 | 表示或.

text =['erfout_d02_2019080615.nc', 'Wrfout_d02_2019080615.nc', 'wrfout_d03_2019080715.nc' ,'WRFCHEM_d02_2019081213.nc', 'rfout_d02_2019080215.nc']regex = '(W|w|e)rf'for each_file in text: if re.search(regex,each_file) is not None: print(each_file) print('匹配項為：',re.findall(regex,each_file)) else: print('%s not match anything' %each_file)erfout_d02_2019080615.nc匹配項為：['e']Wrfout_d02_2019080615.nc匹配項為：['W']wrfout_d03_2019080715.nc匹配項為：['w']WRFCHEM_d02_2019081213.nc not match anythingrfout_d02_2019080215.nc not match anything

2.6 |或運算符

或運算符就表示或, 用作判斷條件.

text =['The car is parked in the garage.', 'Wrfout_d02_2019080615.nc', 'wrfout_d03_2019080715.nc' ,'WRFCHEM_d02_2019081213.nc', 'rfout_d02_2019080215.nc']regex = 'WRFCHEM|wrf' for each_file in text: if re.search(regex,each_file) is not None: print(each_file) print('匹配項為：',re.findall(regex,each_file)) else: print('%s not match anything' %each_file)The car is parked in the garage. not match anythingWrfout_d02_2019080615.nc not match anythingwrfout_d03_2019080715.nc匹配項為：['wrf']WRFCHEM_d02_2019081213.nc匹配項為：['WRFCHEM']rfout_d02_2019080215.nc not match anythingdef extract_size(arow , item_col = 'item_name'): """ 根據品名提取商品大小 XXL XL import re re.findall('XXL|xxl','吉氏.輕柔學行褲嬰兒紙尿褲XXL碼') import re re.findall('\W+(L)|\W+(l)"','吉氏.輕柔學行褲嬰兒紙尿褲XXL碼') vipsale['SIZE']=vipsale.apply(extract_size,axis=1) """ import re item_value = arow[item_col] my_reg_baby = "nb|Nb|nB|NB|NEW BABY|new baby|嬰兒" my_reg_S = "S|s|小碼|小號|小" my_reg_M = "M|m|中碼|中號|中" my_reg_L = "[^a-zA-Z]+(L)|[^a-zA-Z]+(l)|大號|大" my_reg_XL = "[^a-zA-Z]+(XL)|[^a-zA-Z]+(xl)|[^a-zA-Z]+(xL)|[^a-zA-Z]+(xL)|[^\u4e00-\u9fa5]+(加大碼)|[^\u4e00-\u9fa5]+(加大號)|加大" my_reg_XXL = "XXL|xxl|Xxl|XXl|xxL|xXL|xXl|加加大碼|加加大號|加加大" #"(?i)xxl" # if re.findall(my_reg_baby,item_value): size = 'NB' elif re.findall(my_reg_S,item_value): size = 'S' elif re.findall(my_reg_M,item_value): size = 'M' elif re.findall(my_reg_L,item_value) : size = 'L' elif re.findall(my_reg_XL,item_value) : size = 'XL' elif re.findall(my_reg_XXL,item_value) : size = 'XXL' else: size= '未知' return size

2.7 錨點

在正則表達式中, 想要匹配指定開頭或結尾的字符串就要使用到錨點. ^ 指定開頭, $ 指定結尾.

^ 用來檢查匹配的字符串是否在所匹配字符串的開頭.

例如, 在 abc 中使用表達式 ^a 會得到結果 a. 但如果使用 ^b 將匹配不到任何結果. 因為在字符串 abc 中并不是以 b 開頭.

同理于 ^ 號, $ 號用來匹配字符是否是最后一個.

例如, (at.)$ 匹配以 at. 結尾的字符串.

2.8簡寫字符集

正則表達式提供一些常用的字符集簡寫. 如下:

. 除換行符外的所有字符\w 匹配所有字母數字, 等同于 [a-zA-Z0-9_]\W 匹配所有非字母數字, 即符號, 等同于: [^\w]\d 匹配數字: [0-9]\D 匹配非數字: [^\d]\s 匹配所有空格字符, 等同于: [\t\n\f\r\p{Z}]\S 匹配所有非空格字符: [^\s]\f 匹配一個換頁符\n 匹配一個換行符\r 匹配一個回車符\t 匹配一個制表符\v 匹配一個垂直制表符\p 匹配 CR/LF (等同于 \r\n)，用來匹配 DOS 行終止符

???? 回頭再看看我們文章開頭案例的'\d{4}-\d{2}-\d{2}'，是不是覺得我們的案例so easy 了呢？，這里的\d就是[0-9]的簡寫形式。

通過正則表達式，我們還能夠挑選出wrfout每天15點的數據(wrf.+15.nc)，也能挑選出某個年份8月的數據(方法一通百通，就不逐個列舉了)。

import retext_list =['wrffffout_d02_2019080215.nc','wrout_d02_2019080615.nc','wrfout_d03_2019080715.nc' ,'WRFCHEM_d02_2019081213.nc']regex = 'wrf.+15\.nc'for each_file in text_list: if re.search(regex,each_file) is not None: print(each_file) print('匹配項為：',re.findall(regex,each_file)) else: print('%s not match anything' %each_file)wrffffout_d02_2019080215.nc匹配項為：['wrffffout_d02_2019080215.nc']wrout_d02_2019080615.nc not match anythingwrfout_d03_2019080715.nc匹配項為：['wrfout_d03_2019080715.nc']WRFCHEM_d02_2019081213.nc not match anything

References

[1] : https://github.com/ziishaned/learn-regex/blob/master/translations/README-cn.md[2] : https://github.com/deepwindlee/MySQL-with-Python-DATA-MINING/blob/master/0814%E5%AD%A6%E4%B9%A0%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F.ipynb

總結

以上是生活随笔為你收集整理的js 正则或者_正则表达式的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python按行读取文件 with op
下一篇： mybatis的简单查询用语句吗_FIL