當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

1. R语言中grep函数和gsub()函数的使用

發布時間：2023/12/14 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 1. R语言中grep函数和gsub()函数的使用小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.grep 函數

1）語法結構

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
各參數的含義如下：
（1）pattern: 字符串類型，正則表達式，指定搜索模式，當將fixed參數設置為TRUE時，也可以是一個待搜索的字符串。
（2）x : 字符串向量，用于被搜索的字符串。
（3）ignore.case: 是否忽略大小寫。為FALSE時，大小寫敏感，為TRUE時，忽略大小寫。
（4）perl: 用于指定是否Perl兼容的正則表達式
（5）value：邏輯值，為FALSE時，grep返回搜索結果的位置信息，為TRUE時，返回結果位置的值。
（6）fixed:邏輯值，為TRUE時，按pattern指定的字符串進行原樣搜索，且會忽略產生沖突的參數設置。
（7） useBytes：邏輯值，如果為真，則按字節進行匹配，而不是按字符進行匹配。
（8）invert：邏輯值，如果為TRUE，則返回未匹配項的索引或值，也就是反向搜索。

2) 案例學習

（1）提取gene1到gene40中末尾是3的基因，提取末尾不是3的基因，提取末尾是3但不是gene3的基因.

geen = paste0("gene",1:40) # 或者str_c("gene",1:40) # 注意：library(stringr) 1. 含有3的基因 geen[grep("3",geen)] # grep("3",geen,value = T) # [1] "gene3" "gene13" "gene23" "gene30" "gene31" # [6] "gene32" "gene33" "gene34" "gene35" "gene36" # [11] "gene37" "gene38" "gene39" 2.末尾是3的基因 geen[grep("3$",geen)] # 或者grep("3$",geen,value = T) # [1] "gene3" "gene13" "gene23" "gene33" 3.末尾不是3的基因 geen[-grep("3$",geen)] # 或者 grep("3$",geen,invert = T,value = TRUE) # [1] "gene1" "gene2" "gene4" "gene5" "gene6" # [6] "gene7" "gene8" "gene9" "gene10" "gene11" # [11] "gene12" "gene14" "gene15" "gene16" "gene17" # [16] "gene18" "gene19" "gene20" "gene21" "gene22" # [21] "gene24" "gene25" "gene26" "gene27" "gene28" # [26] "gene29" "gene30" "gene31" "gene32" "gene34" # [31] "gene35" "gene36" "gene37" "gene38" "gene39" # [36] "gene40" 4.提取末尾是3但不是gene3的基因. grep("[0-9]3$",geen,value = TRUE) 或者 setdiff(grep("3$",geen,value = T),"gene3") # [1] "gene13" "gene23" "gene33"

3) grep 和grepl的區別

1.語法結構 grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE);grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)2. 返回值 grep函數：在向量x中尋找含有特定字符串（pattern參數指定）的元素，返回其在x中的下標； grepl函數：返回邏輯向量（TRUE，FALSE），即是否包含pattern

2. gsub（）函數

gsub()可以用于字段的刪減、增補、替換和切割，可以處理一個字段也可以處理由字段組成的向量。

1.用法：gsub(“目標字符”, “替換字符”, 對象)

text1 <- "ABcdEfgh . ljkl MNNM" gsub("Efg","RRR",text1) # #將Efg改為RRR，區分大小寫# 任何符號，包括空格、Tab和換行都是可以識別的 gsub(" l","q",text1) # #可識別空格 # [1] "ABcdEfgh .qjkl MNNM"# 同時字符可以識別多個，進行批量置換 gsub("M","O",text1) # [1] "ABcdEfgh . ljkl ONNO"# 除此之外，gsub還有其他批量操作的方法 gsub("^.*l(j).*$","\\1",text1) ##只保留一個j # [1] "j"gsub("^.* ", "a", text1) #選擇從開頭到最后一個空格（注意字符"^.* "后引號前有一個空格）替換為a # [1] "aMNNM"gsub(" .*","a",text1) #第一個空格直達結尾替換成agsub("\\..*","\\+",text1) # #句號.和加號+是特殊的，要添加\\來識別 # [1] "ABcdEfgh +"gsub("\\ ..*","",text1) # [1] "ABcdEfgh"gsub("\\.","\\+",text1) # [1] "ABcdEfgh + ljkl MNNM" gsub("\\s","a",text1) # [1] "ABcdEfgha.aljklaMNNM"

2. 特殊字符

Syntax Description \\d Digit, 0,1,2 ... 9 \\D Not Digit \\s Space \\S Not Space \\w Word \\W Not Word \\t Tab \\n New line ^ Beginning of the string $ End of the string \ Escape special characters, e.g. \\ is "\", \+ is "+" | Alternation match. e.g. /(e|d)n/ matches "en" and "dn" ? Any character, except \n or line terminator [ab] a or b [^ab] Any character except a and b [0-9] All Digit [A-Z] All uppercase A to Z letters [a-z] All lowercase a to z letters [A-z] All Uppercase and lowercase a to z letters i+ i at least one time i* i zero or more times i? i zero or 1 time i{n} i occurs n times in sequence i{n1,n2} i occurs n1 - n2 times in sequence i{n1,n2}? non greedy match, see above example i{n,} i occures >= n times [:alnum:] Alphanumeric characters: [:alpha:] and [:digit:] [:alpha:] Alphabetic characters: [:lower:] and [:upper:] [:blank:] Blank characters: e.g. space, tab [:cntrl:] Control characters [:digit:] Digits: 0 1 2 3 4 5 6 7 8 9 [:graph:] Graphical characters: [:alnum:] and [:punct:] [:lower:] Lower-case letters in the current locale [:print:] Printable characters: [:alnum:], [:punct:] and space [:punct:] Punctuation character: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ [:space:] Space characters: tab, newline, vertical tab, form feed, carriage return, space [:upper:] Upper-case letters in the current locale [:xdigit:] Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f

3. sub()和gsub()函數有什么區別

text <- c("we are the world", "we are the children") sub("w", "W", text) #第一個句子有兩個w，但sub()只識別第一個相應的字符 # [1] "We are the world" "We are the children" sub("W","w",text) # [1] "we are the world" "we are the children" gsub("W","w",text) #gsub()識別全部對應的字符 # [1] "we are the world" "we are the children" gsub("w","W",text) # [1] "We are the World" "We are the children"

1.sub（）和gsub（）的區別在于，前者只替換第一次匹配的字符串，而后者會替換掉所有匹配的字符串。
2.gsub()是對向量里面的每個元素進行搜素，如果發現元素里面有多個位置匹配了模式，則全部進行替換，而grep()也是對向量里每個元素進行搜索，但它僅僅知道元素是否匹配了模式（并返回該元素在向量中的下標），但具體元素中匹配了多少次卻無法知道。

總結

以上是生活随笔為你收集整理的1. R语言中grep函数和gsub()函数的使用的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：真爱如血第一季/全集True Blood
下一篇： c51单片机光电门测反应时间（实战小项目