生活随笔
收集整理的這篇文章主要介紹了
TFIDF java实现
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
代碼模板:
jar包下載:https://download.csdn.net/download/dreamzuora/10853842
/*** */
package TFIDF;import java.util.Arrays;
import java.util.List;/*** @author weijie* 作用:用來(lái)計(jì)算詞項(xiàng)對(duì)于一個(gè)文檔集或一個(gè)語(yǔ)料庫(kù)中的一份文件的重要程度* 2018年12月15日*/
public class TfidfUtils {// 詞項(xiàng)頻率(TF) = 單詞在文檔中出現(xiàn)的次數(shù) / 文檔的總詞數(shù)public double tf(List<String> doc, String term) {double termFrequency = 0;for (String str : doc) {if (str.equalsIgnoreCase(term)) {termFrequency++;}}return termFrequency / doc.size();}// 文檔頻率(DF):代表文檔集中包含某個(gè)詞的所有文檔數(shù)目public int df(List<List<String>> docs, String term) {int n = 0;if (term != null && term != "") {for (List<String> doc : docs) {for (String word : doc) {if (term.equalsIgnoreCase(word)) {n++;break;}}}} else {System.out.println("term can not null or hava not content!");}return n;}// 逆文檔率(IDF)= log(文檔集總的文檔數(shù) / (包含某個(gè)詞的文檔數(shù) + 1)) = log(N / df + 1)public double idf(List<List<String>> docs, String term) {return Math.log(docs.size() / (double) df(docs, term) + 1);}// TFIDF = 詞頻(tf) * 逆文檔率(idf)public double tfIdf(List<String> doc, List<List<String>> docs, String term) {return tf(doc, term) * idf(docs, term);}public static void main(String[] args) {List<String> doc1 = Arrays.asList("人工", "智能", "成為", "互聯(lián)網(wǎng)", "大會(huì)", "焦點(diǎn)");List<String> doc2 = Arrays.asList("谷歌", "推出", "開(kāi)源", "人工", "智能", "系統(tǒng)", "工具");List<String> doc3 = Arrays.asList("互聯(lián)網(wǎng)", "的", "未來(lái)", "在", "人工", "智能");List<String> doc4 = Arrays.asList("谷歌", "開(kāi)源", "機(jī)器", "學(xué)習(xí)", "工具");List<List<String>> documents = Arrays.asList(doc1, doc2, doc3, doc4);TfIdfCal calculator = new TfIdfCal();System.out.println(calculator.tf(doc2, "谷歌"));System.out.println(calculator.df(documents, "谷歌"));double tfidf = calculator.tfIdf(doc2, documents, "谷歌");System.out.println("TF-IDF (谷歌) = " + tfidf);}
}
優(yōu)秀博客:https://www.cnblogs.com/ywl925/archive/2013/08/26/3275878.html
總結(jié)
以上是生活随笔為你收集整理的TFIDF java实现的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。