文本多标签分类python_Scikitlearn多标签分类
我正在嘗試使用Scikit學習來學習文本的多標簽分類,我正在嘗試調整Scikit附帶的一個初始示例教程,用于使用wikipedia文章作為培訓數據對語言進行分類。我試圖在下面實現這一點,但代碼仍然為每個返回一個標簽,我希望最后一個預測返回fr,en
有誰能建議正確的方法來啟用多標簽分類。在import sys
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.datasets import make_multilabel_classification
from sklearn.preprocessing import LabelBinarizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_files
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.multiclass import OneVsRestClassifier
#change model_selection to cross_validation
# The training data folder must be passed as first argument - This uses the example wiki language data files
languages_data_folder = sys.argv[1]
dataset = load_files(languages_data_folder)
# Split the dataset in training and test set:
docs_train, docs_test, y_train, y_test = train_test_split(
dataset.data, dataset.target, test_size=0.5)
#pipeline
clf = Pipeline([
('vectorizer', CountVectorizer(ngram_range=(1,2))),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC())),
])
target_names=dataset.target_names
# TASK: Fit the pipeline on the training set
clf.fit(docs_train, y_train)
# TASK: Predict the outcome on the testing set in a variable named y_predicted
y_predicted = clf.predict(docs_test)
print target_names
# Predict the result on some short new sentences:
sentences = [
u'This is a language detection test.',
u'Ceci est un test de d\xe9tection de la langue.',
u'Dies ist ein Test, um die Sprache zu erkennen.',
u'Bonjour Mon ami. This is a language detection test.',
]
predicted = clf.predict(sentences)
for s, p in zip(sentences, predicted):
print(u'The language of "%s" is "%s"' % (s, target_names[p]))
返回-
“這是語言檢測測試”的語言是“en”
“Ceci est un test de detection de la langue.”的語言是“fr”
“死在考驗中,嗯,我是在考驗。”是“德”
“你好,朋友”的語言。這是一個語言檢測測試
總結
以上是生活随笔為你收集整理的文本多标签分类python_Scikitlearn多标签分类的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: wince怎么刷carplay_Carp
- 下一篇: 信用卡积分兑换现金是真的吗?信用卡积分兑