期刊文献+

大数据下的机器学习算法综述 被引量:338

A Survey of Machine Learning Algorithms for Big Data
在线阅读 下载PDF
导出
摘要 随着产业界数据量的爆炸式增长,大数据概念受到越来越多的关注.由于大数据的海量、复杂多样、变化快的特性,对于大数据环境下的应用问题,传统的在小数据上的机器学习算法很多已不再适用.因此,研究大数据环境下的机器学习算法成为学术界和产业界共同关注的话题.文中主要分析和总结当前用于处理大数据的机器学习算法的研究现状.此外,并行是处理大数据的主流方法,因此介绍一些并行算法,并引出大数据环境下机器学习研究所面临的问题.最后指出大数据机器学习的研究趋势. With the explosive growth of the industry data, more and more attention is paid to big data. However, due to the volume, complex and fast-changing characteristics of big data, traditional machine learning algorithms for small data are not applicable. Therefore, developing machine learning algorithms for big data is a research focus. In this paper, the state-of-the-art machine learning techniques for big data are introduced and analyzed. As parallelism is a mainstream strategy for applying machine learning algorithms to big data, some parallelism strategies are described in detail as well. Finally, the challenges of applying machine learning to big data and some interesting research trends of machine learning in big data are pointed out.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2014年第4期327-336,共10页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.61175052 61203297 61035003 61363058) 国家863计划项目(No.2014AA012205 2013AA01A606 2012AA011003)资助
关键词 大数据 机器学习 分类 聚类 并行算法 Big Data Machine Learning Classification Clustering Parallel Algorithm
  • 相关文献

参考文献82

  • 1Labrinidis A, Jagadish H V. Challenges and Opportunities with Big Data. Proc of the VLDB Endowment, 2012, 5(12) : 2032-2033.
  • 2Bizer C, Boncz P, Brodie M L, et al. The Meaningful Use of Big Data : Four Perspectives-Four Challenges. ACM SIGMOD Record, 2012, 40(4) : 56-60.
  • 3李国杰,程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(6):647-657. 被引量:1619
  • 4Wang F Y. A Big-Data Perspective on AI: Newton, Merton, and An- alytics Intelligence. IEEE Intelligent Systems, 2012, 27 (5) : 2-4.
  • 5Simon H A. Why Should Machines Learn?//Michalski R S, Car- bonell J G, Mitchell T M, et al. , eds. Machine Learning: An Arti- ficial Intelligence Approach. Berlin, Germany: Springer, 1983: 25 -37.
  • 6Hart P. The Condensed Nearest Neighbor Rule. IEEE Trans on In- formation Theory, 1968, 14(3) : 515-516.
  • 7Gates G. The Reduced Nearest Neighbor Rule. IEEE Trans on In- formation Theory, 1972, 18(3) : 431-433.
  • 8Brighton H, Mellish C. Advances in Instance Selection for Instance- Based Learning Algorithms. Data Mining and Knowledge Discovery, 2002, 6(2) : 153-172.
  • 9Li Y H, Maguire L. Selecting Critical Patterns Based on Local Geo- metrical and Statistical Information. IEEE Trans on Pattern Analysis and Machine Intelligence, 2011, 33(6) : 1189-1201.
  • 10Angiulli F. Fast Nearest Neighbor Condensation for Large Data Sets Classification. IEEE Trans on Knowledge and Data Engineering, 2007, 19(11): 1450-1464.

二级参考文献18

  • 1Chris Anderson. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired, 2008, 16 (7).
  • 2Albert-L~iszl6 Barab~isi. The network takeover. Nature Physics, 2012,8(1): 14-16.
  • 3Reuven Cohen, Shlomo Havlin. Scale-Free Networks Are U1- trasmall. Physical Review Letters, 2003, 90,(5 ).
  • 4Tony Hey, Stewart Tansley, Kristin Tolle (Editors). The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft, 2009 October 16.
  • 5Big Data. Nature, 2008, 455(7 209): 1-136.
  • 6Dealing with data. Science, 2011,331 ( 6 018 ): 639-806.
  • 7Complexity. Nature Physics, 2012, 8( 1 ).
  • 8Big Data. ERCIM News, 2012, (89).
  • 9David Lazer, Alex Pentland, Lada Adamic et al. Computational Social Science. Science, 2009, 323 ( 5 915 ): 721-723.
  • 10The 2011 Digital Universe Study: Extracting Value from Chaos. International Data Corporation and EMC, June 2011.

共引文献1618

同被引文献3470

引证文献338

二级引证文献2370

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部