1. R语言中grep函数和gsub()函数的使用
1.grep 函數
1)語法結構
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
各參數的含義如下:
(1)pattern: 字符串類型,正則表達式,指定搜索模式,當將fixed參數設置為TRUE時,也可以是一個待搜索的字符串。
(2)x : 字符串向量,用于被搜索的字符串。
(3)ignore.case: 是否忽略大小寫。為FALSE時,大小寫敏感,為TRUE時,忽略大小寫。
(4)perl: 用于指定是否Perl兼容的正則表達式
(5)value:邏輯值,為FALSE時,grep返回搜索結果的位置信息,為TRUE時,返回結果位置的值。
(6)fixed:邏輯值,為TRUE時,按pattern指定的字符串進行原樣搜索,且會忽略產生沖突的參數設置。
(7) useBytes:邏輯值,如果為真,則按字節進行匹配,而不是按字符進行匹配。
(8)invert:邏輯值,如果為TRUE,則返回未匹配項的索引或值,也就是反向搜索。
2) 案例學習
(1)提取gene1到gene40中末尾是3的基因,提取末尾不是3的基因,提取末尾是3但不是gene3的基因.
geen = paste0("gene",1:40) # 或者str_c("gene",1:40) # 注意:library(stringr) 1. 含有3的基因 geen[grep("3",geen)] # grep("3",geen,value = T) # [1] "gene3" "gene13" "gene23" "gene30" "gene31" # [6] "gene32" "gene33" "gene34" "gene35" "gene36" # [11] "gene37" "gene38" "gene39" 2.末尾是3的基因 geen[grep("3$",geen)] # 或者grep("3$",geen,value = T) # [1] "gene3" "gene13" "gene23" "gene33" 3.末尾不是3的基因 geen[-grep("3$",geen)] # 或者 grep("3$",geen,invert = T,value = TRUE) # [1] "gene1" "gene2" "gene4" "gene5" "gene6" # [6] "gene7" "gene8" "gene9" "gene10" "gene11" # [11] "gene12" "gene14" "gene15" "gene16" "gene17" # [16] "gene18" "gene19" "gene20" "gene21" "gene22" # [21] "gene24" "gene25" "gene26" "gene27" "gene28" # [26] "gene29" "gene30" "gene31" "gene32" "gene34" # [31] "gene35" "gene36" "gene37" "gene38" "gene39" # [36] "gene40" 4.提取末尾是3但不是gene3的基因. grep("[0-9]3$",geen,value = TRUE) 或者 setdiff(grep("3$",geen,value = T),"gene3") # [1] "gene13" "gene23" "gene33"3) grep 和grepl的區別
1.語法結構 grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE);grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)2. 返回值 grep函數:在向量x中尋找含有特定字符串(pattern參數指定)的元素,返回其在x中的下標; grepl函數:返回邏輯向量(TRUE,FALSE),即是否包含pattern2. gsub()函數
gsub()可以用于字段的刪減、增補、替換和切割,可以處理一個字段也可以處理由字段組成的向量。
1.用法:gsub(“目標字符”, “替換字符”, 對象)
text1 <- "ABcdEfgh . ljkl MNNM" gsub("Efg","RRR",text1) # #將Efg改為RRR,區分大小寫# 任何符號,包括空格、Tab和換行都是可以識別的 gsub(" l","q",text1) # #可識別空格 # [1] "ABcdEfgh .qjkl MNNM"# 同時字符可以識別多個,進行批量置換 gsub("M","O",text1) # [1] "ABcdEfgh . ljkl ONNO"# 除此之外,gsub還有其他批量操作的方法 gsub("^.*l(j).*$","\\1",text1) ##只保留一個j # [1] "j"gsub("^.* ", "a", text1) #選擇從開頭到最后一個空格(注意字符"^.* "后引號前有一個空格)替換為a # [1] "aMNNM"gsub(" .*","a",text1) #第一個空格直達結尾替換成agsub("\\..*","\\+",text1) # #句號.和加號+是特殊的,要添加\\來識別 # [1] "ABcdEfgh +"gsub("\\ ..*","",text1) # [1] "ABcdEfgh"gsub("\\.","\\+",text1) # [1] "ABcdEfgh + ljkl MNNM" gsub("\\s","a",text1) # [1] "ABcdEfgha.aljklaMNNM"2. 特殊字符
Syntax Description \\d Digit, 0,1,2 ... 9 \\D Not Digit \\s Space \\S Not Space \\w Word \\W Not Word \\t Tab \\n New line ^ Beginning of the string $ End of the string \ Escape special characters, e.g. \\ is "\", \+ is "+" | Alternation match. e.g. /(e|d)n/ matches "en" and "dn" ? Any character, except \n or line terminator [ab] a or b [^ab] Any character except a and b [0-9] All Digit [A-Z] All uppercase A to Z letters [a-z] All lowercase a to z letters [A-z] All Uppercase and lowercase a to z letters i+ i at least one time i* i zero or more times i? i zero or 1 time i{n} i occurs n times in sequence i{n1,n2} i occurs n1 - n2 times in sequence i{n1,n2}? non greedy match, see above example i{n,} i occures >= n times [:alnum:] Alphanumeric characters: [:alpha:] and [:digit:] [:alpha:] Alphabetic characters: [:lower:] and [:upper:] [:blank:] Blank characters: e.g. space, tab [:cntrl:] Control characters [:digit:] Digits: 0 1 2 3 4 5 6 7 8 9 [:graph:] Graphical characters: [:alnum:] and [:punct:] [:lower:] Lower-case letters in the current locale [:print:] Printable characters: [:alnum:], [:punct:] and space [:punct:] Punctuation character: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ [:space:] Space characters: tab, newline, vertical tab, form feed, carriage return, space [:upper:] Upper-case letters in the current locale [:xdigit:] Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f3. sub()和gsub()函數有什么區別
text <- c("we are the world", "we are the children") sub("w", "W", text) #第一個句子有兩個w,但sub()只識別第一個相應的字符 # [1] "We are the world" "We are the children" sub("W","w",text) # [1] "we are the world" "we are the children" gsub("W","w",text) #gsub()識別全部對應的字符 # [1] "we are the world" "we are the children" gsub("w","W",text) # [1] "We are the World" "We are the children"1.sub()和gsub()的區別在于,前者只替換第一次匹配的字符串,而后者會替換掉所有匹配的字符串。
2.gsub()是對向量里面的每個元素進行搜素,如果發現元素里面有多個位置匹配了模式,則全部進行替換,而grep()也是對向量里每個元素進行搜索,但它僅僅知道元素是否匹配了模式(并返回該元素在向量中的下標),但具體元素中匹配了多少次卻無法知道。
總結
以上是生活随笔為你收集整理的1. R语言中grep函数和gsub()函数的使用的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 真爱如血第一季/全集True Blood
- 下一篇: c51单片机光电门测反应时间(实战小项目