数据集(二)
1、氣候監測數據集?http://cdiac.ornl.gov/ftp/ndp026b
2、幾個實用的測試數據集下載的網站
? ?Data for MATLAB hackers?(Handwritten Digits、Faces、Text)
? ?http://www.cs.toronto.edu/~roweis/data.html
3、UCI KDD Archive(各類數據集)
? ?http://kdd.ics.uci.edu/summary.task.type.html
? ?http://kdd.ics.uci.edu/summary.data.type.html
4、UCI收集的機器學習數據集
? ?ftp://pami.sjtu.edu.cn/ ?
? ?http://www.ics.uci.edu/~mlearn//MLRepository.htm ?
5、樣本數據庫
? ?http://kdd.ics.uci.edu/
? ?WWW-pages were manually classified
? ?http://www-2.csNaNu.edu/afs/csNaNu.edu/project/theo-20/www/data/ ?
6、CMU World Wide Knowledge Base (Web->KB) project(classified web pages、relational data describing pages and hyperlinks)
? ?http://www-2.csNaNu.edu/afs/csNaNu.edu/project/theo-11/www/wwkb/ ?
7、人工智能機器學習
? ?http://duch-links.wikispaces.com/
8、文本分類,即rainbow的數據集
? ?http://www-2.csNaNu.edu/afs/cs/project/theo-11/www/naive-bayes.html ?
9、Statlib?數理統計相關程序庫
? ?http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
? ?http://lib.statNaNu.edu/
? ?http://lib.statNaNu.edu/datasets/
? ?http://lib.statNaNu.edu/modules.php?op=modload&name=Downloads&file=index&req=viewdownload&cid=2
10、癌癥基因:
? ?http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
11、金融、醫藥數據:
? ?http://lisp.vse.cz/pkdd99/Challenge/chall.htm
12、時間序列數據的網址
? ?http://www.stat.wisc.edu/~reinsel/bjr-data/ ?
13、kdnuggets?相關鏈接各種數據集:
? ?http://www.kdnuggets.com/datasets/index.html
14、德國智能分析和信息系統
? ?http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html ?
? ?http://dctc.sjtu.edu.cn/adaptive/datasets/ ?
? ?http://fimi.cs.helsinki.fi/data/ ?
15、IBM智能信息
? ?http://www-958.ibm.com/software/data/cognos/manyeyes/datasets
? ?http://www.almaden.ibm.com/software/quest/Resources/index.shtml
16、Frequent Set Counting
? ?http://miles.cnuce.cnr.it/~palmeri/datam/DCI/datasets.php
17、評分數據集
??Movielens?電影評分數據
? ?基本數據描述:包括以下三個數據集:
? ?a.943個用戶對1682個電影的10萬條評分
? ?b.6040個用戶對3900個電影的1百萬條評分
? ?c.71567個用戶對10681個電影的1千萬條評分
? ?http://www.grouplens.org/ ?
?
? ?Book-Crossing?書籍評分數據
? ?基本數據描述:包含了278,858個用戶對271,379本書籍的1,149,780條評分。該數據集由Cai-Nicolas Ziegler?在2004年8-9月用4周的時間從Book-Crossing社區用網絡爬出。
? ?http://www.informatik.uni-freiburg.de/~cziegler/BX/
?
??Jester Joke Data Set?笑話評分集合
? ?來自UC Berkeley的Ken Goldberg發布的一個推薦系統使用的數據集。包含關于100個笑話的73,496名用戶評分的410萬條連續評分。
? ?http://www.ieor.berkeley.edu/~goldberg/jester-data/
?
? Netflix?數據集
? ?也是電影評分數據集,480,189?個用戶,17,770?部電影,100,480,507?條評分記錄。與它相比,MovieLens?數據集少了?2?個數量級。它的位置相信會逐漸被?Netflix?數據所替代,這是時代進步的必然結果。
? ?說明:以上四個均為用戶評分數據
18、GPS軌跡數據
? ?GeoLife GPS Trajectories
? ?http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/default.aspx ?
?
? ?GPS Trajectories with transportation mode labels
? ?http://research.microsoft.com/apps/pubs/?id=141896
?
? ?Movebank?動物軌跡
? ?http://www.movebank.org/
19、手機WIFI藍牙
A Community Resource for Archiving Wireless Data At Dartmouth
? ?http://crawdad.cs.dartmouth.edu/
? ?crowflow ?手機和wifi軌跡
? ?http://crowdflow.net/
20、OpenStreetMap Data
? ?planet.openstreetmap.org?或者?http://metro.teczno.com/
21、openpath上傳數據+API
? ?https://openpaths.cc/ ?
22、FOURSQUARE
23、GeoTime
? ?http://www.geotime.com/GeoTime(s)/January-2012/Cupid-Strikes-Again--Time-Series---GIS--Together-a.aspx ?
24、數據堂
? ?http://www.datatang.com/
25、http://www.kdnuggets.com/datasets/
26、http://appsrv.cse.cuhk.edu.hk/~kdd/data_collection.html
IBM Almaden Research Center Data Mining Projects
Data Sets:
· ? ? ? ??Synthetic Data Generation Code for Associations and Sequential Patterns
· ? ? ? ??Synthetic Data Generation Code for Classification
· ? ? ? ??"Dense" Data-Sets (apriori binary format, 3.2Mb)
· ? ? ? ??Enron Email Data Set
Demos:
· ? ? ? ??General Visualizations for Associations
· ? ? ? ??Visualization Demo: Market Basket Analysis
?
IBM Intelligent Miner:
?
· ? ? ? ??IBM Intelligent Miner for Data
· ? ? ? ??Video and image clips from IBM Data Mining T.V. Ad
IBM Data Mining Resources:
· ? ? ? ??Business Intelligence Solutions ? Our colleagues offering data mining consultancy and services.
· ? ? ? ??Data Abstraction Research Group ? Our colleagues in IBM Thomas J. Watson Research Center. ? Our colleagues in France.
· ? ? ? ??Data Mining: Extending the Information Warehouse Framework ? IBM White Paper on Data Mining.
在下面的網址可以找到reuters數據集
? ?http://www.research.att.com/~lewis/reuters21578.html
關于基金的數據挖掘的網站
? ?http://www.gotofund.com/index.asp
? ?http://lans.ece.utexas.edu/~strehl/
reuters數據集
? ?http://www.research.att.com/~lewis/reuters21578.html
? ?http://www-2.csNaNu.edu/webkb
? ?http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
關聯:
? ?http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
? ?http://www.phys.uni.torun.pl/~duch/software.html
WEKA:
? ?http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar ?
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
? ?http://prdownloads.sourceforge.net/weka/datasets-UCI.jar ?
2。A jarfile containing 37 regression problems, obtained from various sources
? ?http://prdownloads.sourceforge.net/weka/datasets-numeric.jar ?
3。A jarfile containing 30 regression datasets collected by Luis Torgo
? ?http://prdownloads.sourceforge.net/weka/regression-datasets.jar ?
數據挖掘相關比賽以及數據集
-
2005 University of California data mining contest, predicting bad accounts and their churn date using real-world CRM data, deadline June 30, 2005.
-
ILP 2005 Challenge, on the prediction of functional classes of genes.
-
KDD Cup 2005, on classifying internet user search queries, deadline July 8.
-
Data Mining Cup 2005 (Chemnitz, Germany), for students; topic: How data mining can ascertain the risk of loss of payments and reduce this risk.
-
?KDD Cup 2004, focuses on data-mining for a several performance criteria using datasets frombioinformatics and quantum physics.
-
?InfoVis 2004 Contest, The History of InfoVis.
-
DATA MINING CUP 2004 (Chemnitz, Germany), for students.
-
InfoVis 2003 Contest: Visualization and Pair Wise Comparison of Trees, results announced Sep 5, 2003.
-
KDD CUP 2003
-
?http://www.cs.cornell.edu/projects/kddcup/index.html
-
?KDD Cup 2003, focuses on problems motivated by network mining and the analysis of usage logs.
-
DATA MINING CUP 2003 (Chemnitz, Germany). The task is to identify spam emails before they reach the user′s mailbox.
-
?KDD Cup 2002, focus on data mining in molecular biology.
-
?Student Data Mining Cup (2002), Chemnitz University and Prudential Systems.
轉載于:https://www.cnblogs.com/codeOfLife/p/6773825.html
總結
- 上一篇: 网易2022年财报出炉:营收965亿元
- 下一篇: Made on iPad交通卡卡面设计大