當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

字符串匹配算法（BF RK）

發布時間：2024/7/5 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了字符串匹配算法（BF RK）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- 1. BF（Brute Force）暴力匹配
- - BF代碼
- 2. RK（Rabin-Karp）算法
- - RK代碼
- 3. 思考題：（二維匹配）

1. BF（Brute Force）暴力匹配

BF算法的思想，在主串中，檢查起始位置分別是0、1、2…n-m且長度為m的n-m+1個子串，看有沒有跟模式串匹配的。最壞情況下每次都要對比m個字符，對比次數n-m+1次，復雜度O(m*n)，適用小規模字符串匹配

BF代碼

/*** @description: BF暴力匹配* @author: michael ming* @date: 2019/6/17 20:11* @modified by: */ #include <string> #include <iostream> using namespace std; int str_BFM(string s, int pos, string t) {if(s.length()== 0 || t.length() == 0)return 0;int i = pos - 1, j = 0;while(i < s.length() && j < t.length()){if(s[i] == t[j]){i++;j++;}else//字符串匹配失敗，主串查找開始位置i+1，模式串從頭開始{i = i - j + 1;j = 0;}}if(j >= t.length())return i-j+1;elsereturn 0; } int main() {string a = "ababcabcacbab", b = "abcac";cout << a << "中第一次出現" << b << "的位置是：" << str_BFM(a,1,b) << endl;return 0; }

2. RK（Rabin-Karp）算法

上面BF算法，每次檢查主串與子串是否匹配，需要逐次對比每個字符
引入哈希，降低復雜度
RK算法思路：對n-m+1個子串分別求哈希值，然后與模式串的哈希值比較；如果某個子串的哈希值和模式串的哈希值匹配（需要考慮哈希沖突），比較數字是否相等是非常快的，所以效率比BF效率高
But, 計算子串的哈希值的時候，需要遍歷每個字符；雖然比較效率高了，但是整體效率沒有提高
哈希算法設計技巧：K進制數表示子串（無沖突）（K為字符集內字符種數）
K進制法，相鄰子串的哈希值計算公式有一定的關系：
26^(m-1)可以提前算好存放在數組中，指數就是數組的下標，計算26的x次方時，直接去數組下標x位置讀取
復雜度，計算子串哈希值需要掃描一遍主串O(n)；比較n-m+1個子串哈希值O(n)；所以整體復雜度O(n)（取決于哈希函數沖突概率）
問題：如果模式串很長，子串的哈希值很大，超過計算機可表示的范圍，怎么辦？
針對哈希值范圍溢出，改造哈希函數：
(1) 將a對應1，以此類推z對應26，將字符串每個字符對應數字相加作為哈希值，值的范圍小了 (但是沖突概率有點大)
(2) 將每個字符對應一個質數（沖突概率降低）
存在沖突的情況下，如果模式串和子串哈希值相等，再比較一下它兩真的相等否。
哈希算法沖突概率要比較低，否則RK算法復雜度退化，效率下降

RK代碼

/*** @description:RK匹配算法，計算子串哈希值，進行對比* @author: michael ming* @date: 2019/6/17 22:40* @modified by: */ #include <string> #include <iostream> using namespace std; bool same(char* a, char* b, int m) {for(int i = 0; i < m; ++i){if(a[i] != b[i])return false;}return true; } int str_RK(string s, string t)//s是主串，t是模式串 {int n = s.length(), m = t.length();int table[26] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101};//質數表對應a-zint i, j, hash_val, value = 0;for(i = 0; i < m; ++i) //計算模式串的hash值value{value += table[t[i]-'a'];}for(i = 0; i < n-m+1; ++i)//最多n-m+1次比較{hash_val = 0;for(j = i; j < m+i; ++j)//計算第i個子串的哈希值{hash_val += table[s[j]-'a'];}if(hash_val == value && same(&s[i],&t[0],m)){//如果子串哈希值等于模式串的，且"真的"字符串匹配（避免沖突帶來的假匹配）return i+1;//返回匹配位置，第i位開始，i從1開始}}return 0; } int main() {string a = "ababcabcacbab", b = "abcac";cout << a << "中第一次出現" << b << "的位置是：" << str_RK(a,b) << endl;return 0; }

如果不檢查沖突，刪除以下條件，匹配會出錯

same(&s[i],&t[0],m)

3. 思考題：（二維匹配）

對RK算法進行改造得到答案
nr 主串行數
nc 主串列數
mr 模式串行數
mc 模式串列數
復雜度則為O(（nr-mr+1）*（nc-mc+1）)，簡寫為O(nr * nc)

/*** @description: 2維字符串匹配* @author: michael ming* @date: 2019/6/18 0:07* @modified by: */ #include <iostream> #define nr 5 //主串行數 #define nc 5 //主串列數 #define mr 2 //模式串行數 #define mc 2 //模式串列數 int cal_hash_t(int* table, int r, int c, char ch[][mc]) {int i, j, value = 0;for (i = 0; i < r; ++i) //計算2d模式串的hash值value{for(j = 0; j < c; ++j)value += table[ch[i][j]-'a'];}return value; } int cal_hash_s_child(int* table, int i0, int j0, int r, int c, char ch[][nc]) {int i, j, hash_value = 0;for (i = i0; i < r; ++i) //計算2d子串的hash值value{for(j = j0; j < c; ++j)hash_value += table[ch[i][j]-'a'];}return hash_value; } bool same(char s[][nc], char t[][mc], int i0, int j0) {int x = i0, y = j0, i, j;for(i = 0; i < mr; ++i,++x){for(j = 0, y = j0; j < mc; ++j,++y)//記得寫y=j0,換行后y復位{if(s[x][y] != t[i][j])return false;}}return true; } bool str_RK_2d(char s[][nc], char t[][mc])//s是主串，t是模式串 {int table[26] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101};//質數表對應a-zint i, j, hash_val, value;value = cal_hash_t(table,mr,mc,t);//計算2d模式串哈希值for(i = 0; i < nr-mr+1; ++i)//行最多nr-mr+1次比較{for(j = 0; j < nc-mc+1; ++j)//列最多nc-mc+1次比較{hash_val = cal_hash_s_child(table,i,j,mr+i,mc+j,s);//計算2d子串哈希值if(hash_val == value && same(s,t,i,j)){//如果2d子串哈希值等于模式串的，且"真的"字符串匹配（避免沖突帶來的假匹配）std::cout << "找到模式矩陣，其左上角在 " << i+1 << " 行，" << j+1 << " 列." << std::endl;return true;}}}return false; }int main() {char s[ ][nc] = {{ 'a', 'b', 'a', 'b', 'a' },{ 'a', 'b', 'a', 'b', 'a' },{ 'a', 'b', 'b', 'a', 'a' },{ 'a', 'b', 'a', 'a', 'b' },{ 'b', 'b', 'a', 'b', 'a' }};char t[ ][mc] = {{ 'a', 'b' },{ 'b', 'a' }};str_RK_2d(s,t);char a[ ][nc] = {{ 'd', 'a', 'b', 'c' },{ 'e', 'f', 'a', 'd' },{ 'c', 'c', 'a', 'f' },{ 'd', 'e', 'f', 'c' },{ 'b', 'b', 'a', 'b' }};char b[ ][mc] = {{ 'c', 'a' },{ 'e', 'f' }};str_RK_2d(a,b);return 0; }

總結

以上是生活随笔為你收集整理的字符串匹配算法（BF RK）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：在android添加数据采集,一种基于A
下一篇：算法--排序--寻找数组内第K大的元素