词云_jieba分词
生活随笔
收集整理的這篇文章主要介紹了
词云_jieba分词
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 詞云_jieba分詞
本篇是對詞云的代碼展示,詳細的見如下描述:
# -*- coding: utf-8 -*- from wordcloud import WordCloud import matplotlib.pyplot as plt import jieba import re combine_dict={} stopwords=[]#過濾停用詞 def stopwordslist(stopWord):#stopwords = [line.strip() for line in open(stopWord, encoding='UTF-8').readlines()]return stopwords#同義詞字典,以\t分割 def synonymwordslist(synonymWord):#for line in open(synonymWord, "r", encoding='UTF-8'):seperate_word = line.strip().split("\t")num = len(seperate_word)for i in range(1, num):combine_dict[seperate_word[i]] = seperate_word[0]# refer https://blog.csdn.net/jlulxg/article/details/84650683 # https://www.cnblogs.com/crawer-1/p/8341762.html # http://lzw.me/pages/unicode/ def cleanChinese():s = r"\n\r\t@#$%^&*這樣一本書大賣,hello,,12。!《。有點意外,據說已經印了四五十萬,排行榜僅次于《希拉里自傳》。大概是大眾拋棄了一位表演過火的“文化大師”后,。\n\s\r\t"#t = re.findall('[\u3002\uff1b\uff0c\uff1a\u201c\u201d\uff08\uff09\u3001\uff1f\u300a\u300b\u4e00-\u9fa5]', s)t = re.findall('[\u4e00-\u9fa5]', s) #僅保留漢字部分print(''.join(t))## 讀取文本文件+停用詞 def wordClould(inputText,splitText,outPic):fRead = open(inputText,'r',encoding='UTF-8')fWrite= open(splitText,'w',encoding='UTF-8')def replace_all_blank(value):"""去除value中的所有非字母內容,包括標點符號、空格、換行、下劃線等"""result = re.sub('[a-zA-Z0-9’!"#$%&\'()()。;,:“”()、?《》*+,-./:;<=>?@,。?★、…【】《》?“”‘’![\\]^_`{|}~\s]+', "", value)result = re.sub('[\001\002\003\004\005\006\007\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a]+','', result)return resultdef seg_depart(sentence):sentence_depart = jieba.cut(sentence)#stopwords = stopwordslist('../input/stopWords.txt')outstr = ''for word in sentence_depart:if word not in stopwords:if word in combine_dict: #同義詞替換word = combine_dict[word]outstr += replace_all_blank(word)outstr += " "return outstr#匯總成完整的文本cut_text=''for line in fRead:cut_text = cut_text + seg_depart(line)fWrite.write(cut_text)fRead.close()fWrite.close()wordcloud = WordCloud(#設置字體,不然會出現口字亂碼,文字的路徑是電腦的字體一般路徑,可以換成別的font_path="C:/Windows/Fonts/彩虹粗仿宋.TTF",background_color="white",width=2000,height=1760,max_words=2000).generate(cut_text)plt.imshow(wordcloud, interpolation="bilinear")plt.axis("off")##plt.show()wordcloud.to_file(outPic)if __name__ == '__main__':###cleanChinese()jieba.load_userdict('../input/nlp/userDic.txt')synonymwordslist(r'..\input\nlp\synonymWord.txt')stopwords = stopwordslist(r'../input/nlp/stopWords.txt')wordClould(r'D:\bidingDemo.txt',r'D:\splitSingle.txt',r'D:\bidingDemo.png')需要文件以及結果截圖見下:
總結
以上是生活随笔為你收集整理的词云_jieba分词的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 银行卡被冻结怎么取钱
- 下一篇: 合作性金融是什么