摘要
针对传统的特征选择方法只适用于小规模数据集、运行效率低的缺陷,结合Filter方法和Wrapper方法的特点,提出一种基于多层MapReduce的混合网络流量分类特征选择方法。该方法通过Fisher score对数据进行预处理,剔除部分无关特征,实现高维数据的降维。采用序列前向搜索的搜索策略,通过多层MapReduce实现不断选取分类能力最好的特征。实验结果表明,该方法既保持较高的分类精度,又减少特征选择时间,实现较好的加速比,提高了网络流量分类的执行效率。
The traditional feature selection method is only suitable for small scale datasets and the operating efficiency is low, combining the feature of Filter and Wrapper, a hybrid network traffic classification feature selection method based on multilayer MapReduce is proposed. In this method, Fisher score is used to preprocess the data, the part of unrelated feature is removed and the dimensionality is reduced. Then seg, uential forward search strategy is adopted, and the best feature for classi fication is selected constantly by multilayer MapReduce. The experimental results show that this method can not only keep the high classification accuracy, but also reduce the feature selection time. Meanwhile, it can get a nice speedup ratio and increase the efficiency of network traffic classification.
出处
《桂林电子科技大学学报》
2016年第2期123-128,共6页
Journal of Guilin University of Electronic Technology
基金
国家自然科学基金(61163058
61363006)
广西可信软件重点实验室开放基金(KX201306)