flink实时机器学习-alink在线机器学习remote模式贝叶斯建模实验记录
生活随笔
收集整理的這篇文章主要介紹了
flink实时机器学习-alink在线机器学习remote模式贝叶斯建模实验记录
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
alink的鏈接是[1]
?
目前不支持pyflink-shell.sh的任何模式。
只支持jupyter notebook以及python shell以及jar包提交的方式.
下面是來自官方釘釘群的回復:
#################################################################################################
①啟動hadoop集群
②啟動flink集群
$FLINK_HOME/bin/start-cluster.sh
注意flink集群的端口號,默認是master:8081,這個要寫入后面的代碼中去的。
由于不支持pyflink-shell.sh,所以只能打開jupyter notebook來做實驗了
完整測試代碼如下(下面的Desktop和8082要改成自己的):
from pyalink.alink import * useRemoteEnv("Desktop", 8082, 2, flinkHome=None, localIp="localhost", config=None)URL = "https://alink-release.oss-cn-beijing.aliyuncs.com/data-files/review_rating_train.csv" SCHEMA_STR = "review_id bigint, rating5 bigint, rating3 bigint, review_context string" LABEL_COL = "rating5" TEXT_COL = "review_context" VECTOR_COL = "vec" PRED_COL = "pred" PRED_DETAIL_COL = "predDetail" source = CsvSourceBatchOp() \.setFilePath(URL)\.setSchemaStr(SCHEMA_STR)\.setFieldDelimiter("_alink_")\.setQuoteChar(None)## Split data for train and test trainData = SplitBatchOp().setFraction(0.9).linkFrom(source) testData = trainData.getSideOutput(0)pipeline = (Pipeline().add(Segment().setSelectedCol(TEXT_COL)).add(StopWordsRemover().setSelectedCol(TEXT_COL)).add(DocHashCountVectorizer().setFeatureType("WORD_COUNT").setSelectedCol(TEXT_COL).setOutputCol(VECTOR_COL)) )naiveBayes = (NaiveBayesTextClassifier().setVectorCol(VECTOR_COL).setLabelCol(LABEL_COL).setPredictionCol(PRED_COL).setPredictionDetailCol(PRED_DETAIL_COL) ) model = pipeline.add(naiveBayes).fit(trainData)predict = model.transform(testData) metrics = (EvalMultiClassBatchOp().setLabelCol(LABEL_COL).setPredictionDetailCol(PRED_DETAIL_COL).linkFrom(predict).collectMetrics() )print("ConfusionMatrix:", metrics.getConfusionMatrix()) print("LabelArray:", metrics.getLabelArray()) print("LogLoss:", metrics.getLogLoss()) print("Accuracy:", metrics.getAccuracy()) print("Kappa:", metrics.getKappa()) print("MacroF1:", metrics.getMacroF1()) print("Label 1 Accuracy:", metrics.getAccuracy("1")) print("Label 1 Kappa:", metrics.getKappa("1")) print("Label 1 Precision:", metrics.getPrecision("1"))實驗結果如下:
ConfusionMatrix: [[4944, 374, 190, 181, 223], [29, 1207, 128, 137, 82], [1, 2, 317, 22, 10], [0, 0, 0, 62, 0], [1, 0, 1, 1, 187]] LabelArray: ['5', '4', '3', '2', '1'] LogLoss: 1.3876163466154336 Accuracy: 0.8293616495863687 Kappa: 0.6641967288935378 MacroF1: 0.6239089988842421 Label 1 Accuracy: 0.960735893320163 Label 1 Kappa: 0.5242700620715995 Label 1 Precision: 0.9842105263157894 下面的這個來自web ui?
Reference:
[1]https://github.com/alibaba/Alink
[2]https://github.com/alibaba/Alink/blob/master/pyalink/review_naive_bayes.ipynb
總結
以上是生活随笔為你收集整理的flink实时机器学习-alink在线机器学习remote模式贝叶斯建模实验记录的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 为折叠屏新机做准备?魅族申请折叠屏相关新
- 下一篇: PicoPad 开源游戏掌机发布:RP2