OnLineML一:关于Jubatus 的简介...
一:簡介:原文鏈接:jubat.us/en/ ?xuwenq.iteye.com/blog/1702746
Jubatus?http://jubat.us/en/overview.html?是一個面向大數(shù)據(jù)數(shù)據(jù)流的分布式在線機(jī)器學(xué)習(xí)的開源框架,和storm有些類似,但是從介紹上來看,它提供了更多的功能。?
?Jubatus認(rèn)為未來的數(shù)據(jù)分析平臺應(yīng)該同時向三個方向展開:處理更大的數(shù)據(jù),深層次的分析和實(shí)時處理;而當(dāng)前還沒有一種能夠處理不斷生成的流式大數(shù)據(jù)的水平可擴(kuò)展的分布式架構(gòu)。Hadoop的mapreduce能夠處理大數(shù)據(jù),但不能做復(fù)雜的機(jī)器學(xué)習(xí)算法;Apache Mahout是基于Hadoop的機(jī)器學(xué)習(xí)平臺,但不適用于在線處理數(shù)據(jù)流。?
Jubatus將在線機(jī)器學(xué)習(xí),分布式計算和隨機(jī)算法等的優(yōu)勢結(jié)合在一起用于機(jī)器學(xué)習(xí),并支持分類,回歸,推薦等基本元素。根據(jù)其設(shè)計目的,Jubatus有如下的特點(diǎn):?
- 可擴(kuò)展:支持可擴(kuò)展的機(jī)器學(xué)習(xí)處理。在普通硬件集群上處理數(shù)據(jù)速度高達(dá)100000條/秒
- 實(shí)時計算:實(shí)時分析數(shù)據(jù)和更新模型
- 深層次的數(shù)據(jù)分析:支持各種分析計算:分類,回歸,統(tǒng)計,推薦等
二:又一鏈接:blog.csdn.net/jixuan1989/article/details/7880978
Abstract: In the coming era of extremely large databases, computer science will face new challenges in real Big Data applications such as nation-wide M2M sensor network analysis, online advertising optimization for millions of consumers, and real-time security monitoring on the raw Internet traffic. In such applications, it is impractical or useless to apply ordinary approaches for data analysis on small datasets by storing all data into databases, analyzing the data on the databases as a batch-processing, and only visualizing the summarized output. In fact, the future of data analytics platform should expand to three directions at the same time, handling even bigger data, applying deep analytics, and processing in real-time. However, there has been no such analytics platform for massive data streams of continuously generated Big Data with a distributed scale-out architecture. For example, Hadoop is not equipped with sophisticated machine learning algorithms since most of the algorithms do not fit its MapReduce paradigm. Though Apache Mahout is also a Hadoop-based machine learning platform, online processing of data streams is still out of the scope.在即將到來的超大規(guī)模數(shù)據(jù)庫的時代,計算機(jī)科學(xué)將在實(shí)時大數(shù)據(jù)應(yīng)用上面臨新的挑戰(zhàn),比如全國M2M傳感器網(wǎng)絡(luò)分析,面向百萬級別用戶的在線廣告優(yōu)化,和互聯(lián)網(wǎng)流量的實(shí)時安全監(jiān)控。在這些應(yīng)用中,使用傳統(tǒng)的用來處理小數(shù)據(jù)集的方式進(jìn)行數(shù)據(jù)分析是不切合實(shí)際的,這種傳統(tǒng)方式往往把所有數(shù)據(jù)存在數(shù)據(jù)庫中、使用一個批處理在數(shù)據(jù)庫中分析數(shù)據(jù)、并且僅僅可視化輸出概要數(shù)據(jù)。事實(shí)上,未來的數(shù)據(jù)分析平臺應(yīng)該同時向三個方向展開:處理更大的數(shù)據(jù)、深層的分析、實(shí)時處理。然而,在分布式水平擴(kuò)展架構(gòu)上還沒有這樣的分析平臺來處理不斷生成大數(shù)據(jù)的數(shù)據(jù)流。比如說,由于大多數(shù)算法無法適應(yīng)Hadoop 的Map/Reduce框架,因此 Hadoop 不能做復(fù)雜的機(jī)器學(xué)習(xí)算法。盡管Apache Mahout 也是一個基于Hadoop的機(jī)器學(xué)習(xí)平臺,但在線處理數(shù)據(jù)流仍然超出了他的能力范圍。
Jubatus is the first open source platform for online distributed machine learning on the data streams of Big Data. We use a loose model sharing architecture for efficient training and sharing of machine learning models, by defining three fundamental operations; Update, Mix, and Analyze, in a similar way with the Map and Reduce operations in Hadoop. The point is how to reduce the size of model and the number of the Mix operations while keeping high accuracy, since Mix-ing large models for many times causes high networking cost and high latency in the distributed environment. Then our development team includes competent researchers who combine the latest advances in online machine learning, distributed computing, and randomized algorithms to provide efficient machine learning features for Jubatus. Currently, Jubatus supports basic tasks including classification, regression, and recommendation. A demo system for tweet categorization on fast Twitter data streams is available.
Jubatus是第一個面向大數(shù)據(jù)數(shù)據(jù)流的分布式在線機(jī)器學(xué)習(xí)的開源平臺。我們使用一個松散的模型通過定義了三種基本操作來共享有效訓(xùn)練的架構(gòu) 并且共享機(jī)器學(xué)習(xí)模型,這三種方式做事:更新、混合、分析,這是一種和Hadoop中的Map 、Reduce操作類似的方式。關(guān)鍵點(diǎn)是如何在保持高精準(zhǔn)度的同時來減小模型的大小和混合操作的數(shù)量,因為多次混合大模型將導(dǎo)致在分布式環(huán)境下的高網(wǎng)絡(luò)消耗和高潛伏期。我們的開發(fā)團(tuán)隊中有這樣的研究者:他們結(jié)合了在在線機(jī)器學(xué)習(xí)、分布式計算和隨機(jī)算法中的最新的優(yōu)點(diǎn)以提供Jubatus高效的機(jī)器學(xué)習(xí)特點(diǎn)。目前,Jubatus支持基本的任務(wù),包括分類、回歸和推薦。一個在Twitter的數(shù)據(jù)上的信息分類演示系統(tǒng)已經(jīng)可用了。
三:項目主頁:jubat.us/en/
Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities:
- Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering
- Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction
- Framework for Distributed Online Machine Learning with Fault Tolerance
Table of Contents
- Quick Start
- Install Jubatus
- Red Hat Enterprise Linux 6.2 or later (64-bit)
- Ubuntu Server 12.04 LTS (64-bit)
- Other Linux Distributions (including 32-bit)
- Mac OS X
- Install Jubatus Client Libraries
- C++
- Python
- Ruby
- Java
- Try Tutorial
- Write Your Application
- Install Jubatus
- Overview
- Scalable
- Real-Time
- Deep-Analysis
- Difference from Hadoop and Mahout
- Tutorial
- Scenario
- Run Tutorial
- Tutorial in Detail
- Dataset
- Server Configuration
- Use of Classifier API: Train & Classify
- Other Tutorials
- Classifier
- Regression
- Graph
- Stat
- Setup in Distributed Mode
- Distributed Mode
- Setup ZooKeeper
- Register configuration file to ZooKeeper
- Jubatus Proxy
- Join Jubatus Servers to Cluster
- Run Tutorial
- Cluster Management in Jubatus
- ZooKeepers & Jubatus Proxies
- Jubavisor: Process Management Agent
- Distributed Mode
- Documentation
- Architecture
- Data Conversion
- Datum
- Flow of Data Conversion
- Filter
- Feature Extraction from Strings
- Feature Extraction from Numbers
- Feature Extraction from Binary Data
- Hashing Key of Feature Vector
- Plugins
- Plugin Development
- Plugin for Data Conversion
- Cluster Administration Guide
- Recommended Process Configuration
- Managing Clusters
- Monitoring
- Logging
- Save and Load
- Building Jubatus from Source
- Requirements
- Using Framework
- Using Code Generators
- How to Get Clients
- RPC Error Handling
- Common Issues
- Recommendation for each client languages
- Backup and Recovery
- Save and Load
- Frequently Asked Questions (FAQs)
- Installation
- RPC Errors
- Distributed Environment
- Learning Model
- Anomaly detection
- Miscellaneous
- References
- Commands
- Jubatus Servers
- Distributed Environment
- Utilities
- Client API
- Common Data Structures and Methods
- Classifier
- Regression
- Recommender
- Nearest Neighbor
- Anomaly
- Clustering
- Stat
- Graph
- Commands
- How to Contribute
- We Welcome Your Contribution
- Join the Community
- Issue Openning Policy
- Pull-Request Policy
- Tips for Contributors
- Miscellaneous
- Publications
- 2013
- 2012
- 2011
- Contributions (Thanks a lot!)
- Publications
- About Us
- Jubatus Team Members
總結(jié)
以上是生活随笔為你收集整理的OnLineML一:关于Jubatus 的简介...的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 网易将军令如何退出登录(网易游戏官网)
- 下一篇: 农业银行k宝怎么使用方法 怎么使用农业银