数据挖掘数据集下载汇总

作者:数据小雄 , 分类:其它 , 浏览:8621 , 评论:1

1、气候监测数据集 

http://cdiac.ornl.gov/ftp/ndp026b


2、几个实用的测试数据集下载的网站

Data for MATLAB hackers (Handwritten Digits、Faces、Text)

http://www.cs.toronto.edu/~roweis/data.html


3、UCI KDD Archive(各类数据集)

http://kdd.ics.uci.edu/summary.task.type.html 


http://kdd.ics.uci.edu/summary.data.type.html 


4、UCI收集的机器学习数据集

ftp://pami.sjtu.edu.cn/  

http://www.ics.uci.edu/~mlearn//MLRepository.htm  


5、样本数据库

http://kdd.ics.uci.edu/ 


WWW-pages were manually classified

http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/  


6、CMU World Wide Knowledge Base (Web->KB) project(classified web pages、relational data describing pages and hyperlinks)

http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/  


7、人工智能机器学习

http://duch-links.wikispaces.com/ 


8、文本分类,即rainbow的数据集

http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html  


9、Statlib 数理统计相关程序库

http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm


http://lib.stat.cmu.edu/ 


http://lib.stat.cmu.edu/datasets/


http://lib.stat.cmu.edu/modules.php?op=modload&name=Downloads&file=index&req=viewdownload&cid=2


10、癌症基因

http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi


11、金融、医药数据

http://lisp.vse.cz/pkdd99/Challenge/chall.htm


12、时间序列数据的网址

http://www.stat.wisc.edu/~reinsel/bjr-data/  


13、kdnuggets 相关链接各种数据集

http://www.kdnuggets.com/datasets/index.html 


14、德国智能分析和信息系统

http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html  


http://dctc.sjtu.edu.cn/adaptive/datasets/   


http://fimi.cs.helsinki.fi/data/  


15、IBM智能信息

http://www-958.ibm.com/software/data/cognos/manyeyes/datasets


http://www.almaden.ibm.com/software/quest/Resources/index.shtml 


16、Frequent Set Counting

http://miles.cnuce.cnr.it/~palmeri/datam/DCI/datasets.php


17、评分数据集

    Movielens 电影评分数据

    基本数据描述:包括以下三个数据集:

    a.943个用户对1682个电影的10万条评分

    b.6040个用户对3900个电影的1百万条评分

    c.71567个用户对10681个电影的1千万条评分

    http://www.grouplens.org/  


    Book-Crossing 书籍评分数据

    基本数据描述:包含了278,858个用户对271,379本书籍的1,149,780条评分。该数据集由Cai-Nicolas Ziegler 在2004年8-9月用4周的时间从Book-Crossing社区用网络爬出。

    http://www.informatik.uni-freiburg.de/~cziegler/BX/


    Jester Joke Data Set 笑话评分集合 

    来自UC Berkeley的Ken Goldberg发布的一个推荐系统使用的数据集。包含关于100个笑话的73,496名用户评分的410万条连续评分。

    http://www.ieor.berkeley.edu/~goldberg/jester-data/


    Netflix 数据集

    也是电影评分数据集,480,189 个用户,17,770 部电影,100,480,507 条评分记录。与它相比,MovieLens 数据集少了 2 个数量级。它的位置相信会逐渐被 Netflix 数据所替代,这是时代进步的必然结果。

    说明:以上四个均为用户评分数据


21、GPS轨迹数据

GeoLife GPS Trajectories

http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/default.aspx   


GPS Trajectories with transportation mode labels

http://research.microsoft.com/apps/pubs/?id=141896 


Movebank 动物轨迹

http://www.movebank.org/

 

22、手机WIFI蓝牙

A Community Resource for Archiving Wireless Data At Dartmouth

http://crawdad.cs.dartmouth.edu/


crowflow  手机和wifi轨迹

http://crowdflow.net/ 


23、OpenStreetMap Data

planet.openstreetmap.org 或者 http://metro.teczno.com/


24、openpath上传数据+API

https://openpaths.cc/   


25、FOURSQUARE


26、GeoTime

http://www.geotime.com/GeoTime(s)/January-2012/Cupid-Strikes-Again--Time-Series---GIS--Together-a.aspx


27、数据堂

http://www.datatang.com/


28、http://www.kdnuggets.com/datasets/


29、http://appsrv.cse.cuhk.edu.hk/~kdd/data_collection.html



IBM Almaden Research Center Data Mining Projects

Data Sets:


·         Synthetic Data Generation Code for Associations and Sequential Patterns

·         Synthetic Data Generation Code for Classification

·         "Dense" Data-Sets (apriori binary format, 3.2Mb)

·         Enron Email Data Set

Demos:


·         General Visualizations for Associations

·         Visualization Demo: Market Basket Analysis 


IBM Intelligent Miner:


·         IBM Intelligent Miner for Data

·         Video and image clips from IBM Data Mining T.V. Ad 


IBM Data Mining Resources:


·         Business Intelligence Solutions   Our colleagues offering data mining consultancy and services.

·         Data Abstraction Research Group   Our colleagues in IBM Thomas J. Watson Research Center.   Our colleagues in France.

·         Data Mining: Extending the Information Warehouse Framework   IBM White Paper on Data Mining.

 

 

在下面的网址可以找到reuters数据集

http://www.research.att.com/~lewis/reuters21578.html


关于基金的数据挖掘的网站

http://www.gotofund.com/index.asp

http://lans.ece.utexas.edu/~strehl/



reuters数据集

http://www.research.att.com/~lewis/reuters21578.html


http://www-2.cs.cmu.edu/webkb


http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf

 


关联:

http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar


http://www.phys.uni.torun.pl/~duch/software.html


WEKA:

http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar  


1、A jarfile containing 37 classification problems, originally obtained from the UCI repository

http://prdownloads.sourceforge.net/weka/datasets-UCI.jar   


2、A jarfile containing 37 regression problems, obtained from various sources

http://prdownloads.sourceforge.net/weka/datasets-numeric.jar  


3、A jarfile containing 30 regression datasets collected by Luis Torgo

http://prdownloads.sourceforge.net/weka/regression-datasets.jar    


数据挖掘相关比赛以及数据集

2005 University of California data mining contest, predicting bad accounts and their churn date using real-world CRM data, deadline June 30, 2005.


ILP 2005 Challenge, on the prediction of functional classes of genes.


KDD Cup 2005, on classifying internet user search queries, deadline July 8.


Data Mining Cup 2005 (Chemnitz, Germany), for students; topic: How data mining can ascertain the risk of loss of payments and reduce this risk.


KDD Cup 2004, focuses on data-mining for a several performance criteria using datasets frombioinformatics and quantum physics.


InfoVis 2004 Contest, The History of InfoVis.


DATA MINING CUP 2004 (Chemnitz, Germany), for students.


InfoVis 2003 Contest: Visualization and Pair Wise Comparison of Trees, results announced Sep 5, 2003.


KDD CUP 2003

http://www.cs.cornell.edu/projects/kddcup/index.html


KDD Cup 2003, focuses on problems motivated by network mining and the analysis of usage logs.


DATA MINING CUP 2003 (Chemnitz, Germany). The task is to identify spam emails before they reach the user′s mailbox.


KDD Cup 2002, focus on data mining in molecular biology.


Student Data Mining Cup (2002), Chemnitz University and Prudential Systems.


—————————————————————————

【版权申明】

如非注明,本站文章均为 数据小雄 原创,转载请注明出处:数据小雄博客,并附带本文链接,谢谢合作!

本文地址:http://zhangzhengxiong.com/?id=93。

—————————————————————————

亲!有什么想法呢?
  • 流泪

    0

  • 打酱油

    0

  • 开心

    0

  • 鼓掌

    0

  • 恐怖

    0

 

发表评论

必填

选填

选填

必填

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。

已有1位网友发表了看法:

1#访客  2018-02-03 17:31:30 回复该评论
Student Data Mining Cup (2002), Chemnitz University and Prudential Systems.
新浪微博
米店
标签列表
@数据小雄 | 专注于数据分析、挖掘、可视化案例分享