摘要
随着产业界数据量的爆炸式增长,大数据概念受到越来越多的关注.由于大数据的海量、复杂多样、变化快的特性,对于大数据环境下的应用问题,传统的在小数据上的机器学习算法很多已不再适用.因此,研究大数据环境下的机器学习算法成为学术界和产业界共同关注的话题.文中主要分析和总结当前用于处理大数据的机器学习算法的研究现状.此外,并行是处理大数据的主流方法,因此介绍一些并行算法,并引出大数据环境下机器学习研究所面临的问题.最后指出大数据机器学习的研究趋势.
With the explosive growth of the industry data, more and more attention is paid to big data. However, due to the volume, complex and fast-changing characteristics of big data, traditional machine learning algorithms for small data are not applicable. Therefore, developing machine learning algorithms for big data is a research focus. In this paper, the state-of-the-art machine learning techniques for big data are introduced and analyzed. As parallelism is a mainstream strategy for applying machine learning algorithms to big data, some parallelism strategies are described in detail as well. Finally, the challenges of applying machine learning to big data and some interesting research trends of machine learning in big data are pointed out.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2014年第4期327-336,共10页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.61175052
61203297
61035003
61363058)
国家863计划项目(No.2014AA012205
2013AA01A606
2012AA011003)资助
关键词
大数据
机器学习
分类
聚类
并行算法
Big Data
Machine Learning
Classification
Clustering
Parallel Algorithm