dfa算法c语言,DFA跟trie字典树实现敏感词过滤(python和c语言)
DFA和trie字典樹實現敏感詞過濾(python和c語言)
現在做的項目都是用python開發,需要用做關鍵詞檢查,過濾關鍵詞,之前用c語言做過這樣的事情,用字典樹,蠻高效的,內存小,檢查快。
到了python上,第一想法是在pip上找一個基于c語言的python字典樹模塊,可惜沒找到合適的,如果我會用c寫python模塊的話,我就自己寫一個了,可惜我還不具備這個能力,
只能用python寫了,性能差一點就差點吧,內存多一點也無所謂了。
用搜索引擎看CSDN上的網友的用python實現的DFA,再參照自己以前用c語言寫過的字典樹,有些不大對,就自己寫了一個。想象一下如果用C語言是會非常高效,而且空間也特別小。
某位網友的:DFA 算法實現敏感詞過濾(python 實現)
下面是python代碼:
class cNode(object):
def __init__(self):
self.children = None
# The encode of word is UTF-8
# The encode of message is UTF-8
class cDfa(object):
def __init__(self,lWords):
self.root=None
self.root=cNode()
for sWord in lWords:
self.addWord(sWord)
# The encode of word is UTF-8
def addWord(self,word):
node = self.root
iEnd=len(word)-1
for i in xrange(len(word)):
if node.children == None:
node.children = {}
if i!=iEnd:
node.children[word[i]]=(cNode(),False)
else:
node.children[word[i]]=(cNode(),True)
elif word[i] not in node.children:
if i!=iEnd:
node.children[word[i]]=(cNode(),False)
else:
node.children[word[i]]=(cNode(),True)
else: #word[i] in node.children:
if i==iEnd:
Next,bWord=node.children[word[i]]
node.children[word[i]]=(Next,True)
node=node.children[word[i]][0]
def isContain(self,sMsg):
root=self.root
iLen=len(sMsg)
for i in xrange(iLen):
p = root
j = i
while (j
(p,bWord) = p.children[sMsg[j]]
if bWord:
return True
j = j + 1
return False
def filter(self,sMsg):
lNew=[]
root=self.root
iLen=len(sMsg)
i=0
bContinue=False
while i
p=root
j=i
while (j
(p,bWord) = p.children[sMsg[j]]
if bWord:
#print sMsg[i:j+1]
lNew.append(u'*'*(j-i+1))#關鍵字替換
i=j+1
bContinue=True
break
j=j+1
if bContinue:
bContinue=False
continue
lNew.append(sMsg[i])
i=i+1
return ''.join(lNew)
下面是c語言代碼trie_tree.h:
#ifndef _TRIE_TREE_H_INCLUDED_
#define _TRIE_TREE_H_INCLUDED_
#define WORD_NUM 256
struct trie_node {
struct trie_node *node[WORD_NUM];
int value;
int exist;
};
struct trie_node *create_trie_node(int value);
void trie_tree_insert_word(struct trie_node *root, unsigned char *word);
/* return 1 表示存在, return 0表示不存在 */
int tire_word_is_exist(struct trie_node *root, unsigned char *word);
void destory_trie_tree(struct trie_node *root);
void update_trie_tree(struct trie_node **root, const char *filename);
#endif
trie_tree.c:
#include
#include
#include
#include
struct trie_node *create_trie_node(int value)
{
struct trie_node * node = calloc(1, sizeof(struct trie_node));
node->value = value;
return node;
}
int tire_word_is_exist(struct trie_node *root, unsigned char *word)
{
struct trie_node *n = NULL;
unsigned char *p = NULL;
if (root == NULL) {
return 0;
}
while (*word != 0) {
p = word++;
n = root;
while (*p != 0) {
n = n->node[*p];
if (n == NULL) {
break;
}
else if (n->exist == 1) {
return 1;
}
p++;
}
}
return 0;
}
void trie_tree_insert_word(struct trie_node *root, unsigned char *word)
{
struct trie_node *n;
while (*word != 0) {
n = root->node[*word];
if (n == NULL) {
n = create_trie_node(*word);
root->node[*word] = n;
}
root = n;
word++;
}
root->exist = 1;
}
void destroy_trie_tree(struct trie_node *root)
{
int i;
if (root == NULL) {
return;
}
for (i = 0; i < WORD_NUM; i++) {
destroy_trie_tree(root->node[i]);
}
free(root);
}
void update_trie_tree(struct trie_node **root, const char *filename)
{
char word[1024];
FILE *fp;
char *p;
if (*root != NULL) {
destroy_trie_tree(*root);
}
*root = calloc(sizeof(**root),1);
fp = fopen(filename, "r");
if (fp == NULL) {
printf("file can't open %s\n", filename);
return;
}
while (fgets(word, sizeof(word), fp)) {
p = word;
while (*p != 0) {
if (*p == '\r' || *p == '\n' || *p == ' ') {
*p = 0;
break;
}
p++;
}
trie_tree_insert_word(*root, (unsigned char *)word);
}
}
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的dfa算法c语言,DFA跟trie字典树实现敏感词过滤(python和c语言)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: SpringBoot快速入门——hell
- 下一篇: spring,Whitelabel Er