摘要
利用MapReduce编程模型的简化性和期望最大化算法(Expectation maximization,EM)的高精度、恒收敛性,提出了一种对数据集规模无限制的数据处理算法;并通过对高斯混合模型的参数估计进行算法性能的测试。结果表明,算法能改善传统EM算法在处理大规模数据集时效率低的缺点,具有较好的加速比及可扩展性。
An algorithm which is unlimited to the size of data is proposed. The algorithm makes full use of sim- plify of programming model-MapReduee and the high precision, constant convergence performance of expectation maximization algorithm. According to estimating the parameters of Gaussian mixture model, the algorithm perform- ance can be tested. Simulation results show that, it can abandon the shortcoming of low efficiency to deal with the large-scale data, and the algorithm can obtain better performance in terms of converge speed and scalability.
出处
《科学技术与工程》
北大核心
2013年第16期4603-4606,共4页
Science Technology and Engineering
基金
河南省科技攻关项目(122102310412,11210231058)
郑州市科技局项目(112PCXTD343,114PYFZX504)资助