!! 机器学习常用工具
http://fuliang.iteye.com/blog/955023
機器學習
Support Vector Machine
- SVMlight
An implementation of Vapnik's Support Vector Machine
- LIBSVM
A Library for Support Vector Machines
Decision Tree
- C4.5
The "classic" decision-tree tool, developed by J. R. Quinlan?Tutorial
Maximum Entropy
- YASMET
Yet Another Small MaxEnt Toolkit
Conditional Random Field
- CRF++
A simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data
自然語言處理
綜合
- OpenNLP
An organizational center for open source projects related to natural language processing
- CMU Statistical Language Modeling Toolkit
A suite of UNIX software tools to facilitate the construction and testing of statistical language models
- The Dragon ToolKit
A Java-based development package for academic use in information retrieval (IR) and text mining. Include many NLP tools
- LingPipe
A suite of Java libraries for the linguistic analysis of human language, including
- track mentions of entities (e.g. people or proteins);
- link entity mentions to database entries;
- uncover relations between entities and actions;
- classify text passages by language, character encoding, genre, topic, or sentiment;
- correct spelling with respect to a text collection;
- cluster documents by implicit topic and discover significant trends over time; and
- provide part-of-speech tagging and phrase chunking.
- Natural Language Toolkit
Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.
- Antelope
- Advanced Natural Lange Object-oriented Processing Environment.包括一系列工具(特別c#的stanford parser)
分詞
- ICTCLAS
中科院的中文分詞系統
- Stanford Chinese Word Segmenter
A Java implementation of a CRF-based Chinese Word Segmenter
詞性標注
- Brill tagger
A error-driven transformation-based tagger implemented by?Eric Brill
- Stanford POS Tagger
A Java implementation of the log-linear part-of-speech taggers descriped by Kristina Toutanova, et.al.
- MBT:Memory-based Tagger
- TreeTagger
A decision tree based tagger from the University of Stuttgart.
- SVMTool?, a POS Tagger based on SVMs
- QTAG Part of speech tagger
An HMM-based Java POS tagger from Birmingham U.
命名實體識別
- Stanford Named Entity Recognizer
A Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition
- LingPipe
Tools include statistical named-entity recognition, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Java. GPL. By Bob Carpenter, Breck Baldwin and co.
- YamCha
SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)
Stemming
- Porter Stemming
A process for removing the commoner morphological and inflexional endings from words in English byMartin Porter
- Snowball
A small string processing language designed for creating stemming algorithms for use in Information Retrieval.
句法分析
- Stanford Parser
Java implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser.
- Berkeley Parser
文本挖掘
摘要
- Rouge?Rouge在Windows下的配置
其他
加密
- OpenSSL
包括眾多加密算法,RSA、DES、MD5、SHA等?Win32安裝版
壓縮
- zlib
A Massively Spiffy Yet Delicately Unobtrusive Compression Library
日志
- Apache Logging Services
Creates and maintains open-source software related to the logging of application behavior and released at no charge to the public, including
- log4j?for Java,
- log4cxx?for C++, and
- log4net?for MS .Net framework.
注: log4cxx官方版本有內存泄漏問題
Unicode
- ICU
A mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications
XML
- Xerces
A validating XML parser, including C and Java edition
多字符串匹配
- AC in C#?: Aho-Corasick string matching in C#
HTML Parser
- Html Agility Pack?, an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
- Majestic-12?, an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. 速度快,但不生成dom樹
外部聯接
- An annotated list of resources?by Stanford NLP Group
- KDnuggets?有一些與KDD相關的軟件等
轉載于:https://www.cnblogs.com/carl2380/archive/2012/08/24/2654681.html
總結
以上是生活随笔為你收集整理的!! 机器学习常用工具的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 泰拉瑞亚世纪花苞怎么找?
- 下一篇: 今日松油价格是多少钱一斤