當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

jieba库的使用

發布時間：2023/11/29 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 jieba库的使用小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

jieba庫的使用:

? ? ? jieba庫是一款優秀的 Python 第三方中文分詞庫，jieba?支持三種分詞模式：精確模式、全模式和搜索引擎模式，下面是三種模式的特點。

? ? ?精確模式：試圖將語句最精確的切分，不存在冗余數據，適合做文本分析

? ? ?全模式：將語句中所有可能是詞的詞語都切分出來，速度很快，但是存在冗余數據

? ? ? 搜索引擎模式：在精確模式的基礎上，對長詞再次進行切分.

jieba的使用

# -*- coding: utf-8 -*-
import jieba

seg_str = "好好學習，天天向上。"

print("/".join(jieba.lcut(seg_str))) # 精簡模式，返回一個列表類型的結果
print("/".join(jieba.lcut(seg_str, cut_all=True))) # 全模式，使用 'cut_all=True' 指定?
print("/".join(jieba.lcut_for_search(seg_str))) # 搜索引擎模式

jieba庫對英文單詞的統計

# -*- coding: utf-8 -*-

def get_text():
txt = open("1.txt", "r", encoding='UTF-8').read()
txt = txt.lower()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
txt = txt.replace(ch, " ") # 將文本中特殊字符替換為空格
return txt

file_txt = get_text()
words = file_txt.split() # 對字符串進行分割，獲得單詞列表
counts = {}

for word in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1

items = list(counts.items())?
items.sort(key=lambda x: x[1], reverse=True)

for i in range(5):
word, count = items[i]
print("{0:<5}->{1:>5}".format(word, count))

詞云的制作

完成安裝jieba ， wordcloud ，matplotlib

（1）打開taglue官網，點擊import words，把運行的結果copy過來。
（2）選擇形狀，在這里是網上下載的圖片進行的導入。
（3）選擇字體。
（4）點擊Visualize生成圖片。

from wordcloud import WordCloud import matplotlib.pyplot as plt import jiebadef create_word_cloud(filename):text = open("哈姆雷特.txt".format(filename)).read()wordlist = jieba.cut(text, cut_all=True) wl = " ".join(wordlist)wc = WordCloud(background_color="black",max_words=2000,font_path='simsun.ttf',height=1200,width=1600,max_font_size=100,random_state=100,)myword = wc.generate(wl) plt.imshow(myword)plt.axis("off")plt.show()wc.to_file('img_book.png')if __name__ == '__main__':create_word_cloud('mytext')

轉載于:https://www.cnblogs.com/zhoukun520/p/10649666.html

總結

以上是生活随笔為你收集整理的jieba库的使用的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

Jieba