摘要
提出了一种基于遗传算法的大数据特征选择算法。该算法首先对各维度的特征进行评估,根据每个特征在同类最近邻和异类最近邻上的差异度调整其权重,基于特征权重引导遗传算法的搜索,以提升算法的搜索能力和获取特征的准确性;然后结合特征权重计算特征的适应度,以适应度作为评价指标,启动遗传算法获取最优的特征子集,并最终实现高效准确的大数据特征选择。通过实验分析发现,该算法能够有效减小分类特征数,并提升特征分类准确率。
This paper proposed a novel feature selection method based on genetic algorithm for big data processing.Firstly,this method evaluated the features of each dimension,adjusted its weight according to the difference of each feature on the similar nearest neighbor and the heterogeneous nearest neighbor,and guided the search of genetic algorithm based on the feature weight,thus improved the search ability of the algorithm and the accuracy of feature acquisition.And then it combined the feature weights to calculate the fitness of the feature,took fitness as the evaluation index,and started the genetic algorithm to obtain the optimal feature subset,finally achieved an efficient and accurate big data feature selection.The results of experiment show that this method can effectively reduce the number of classification features and improve the accuracy of feature classification.
作者
张文杰
蒋烈辉
Zhang Wenjie;Jiang Liehui(Faculty of Cyberspace Security,PLA Information Engineering University,Zhengzhou 450001,China;State Key Laboratory of Mathematical Engineering&Advanced Computing,Zhengzhou 450001,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第1期50-52,56,共4页
Application Research of Computers
基金
河南省基础前沿课题
河南省科技攻关计划项目.
关键词
大数据
特征选择
遗传算法
特征子集
big data
feature selection
genetic algorithm
feature subset