摘要
多标记数据有很多的冗余特征和数据,为了解决多标记数据中冗余和无关特征,提高多标记学习算法的泛化能力。提出一个基于模拟退火的卷积式特征选择方法——SAML(simulated annealing based feature selection for multi-label data),已有的算法只是使用了遗传算法来进行优化,新算法采用模拟退火来寻找最优子集,其效果在已有的工作中表现出比前者遗传算法更好的效果。在用于公开评测的Yahoo网页分类数据集上的实验结果表明,SAML算法的性能优于新近提出的一些流行的多标记特征选择方法。
There are many redundant features in the data sets, in order to remove the irrelevant and redundant features in the multi-label data and further to improve the generalization performance of multi-label learning algorithms, simulated annealing based feature selection for multi-label data (SAML) is proposed, which employs the simulated annealing algorithm to search the optimal subsets. We know simulated annealing algorithm perform better than genetic algorithm. Experiments on Yahoo web page categorization data sets, which are widely used for benchmark evaluation, show that the performance of SAML is superior to some state-of-arts multi-label dimensionality reduction methods.
出处
《计算机工程与设计》
CSCD
北大核心
2011年第7期2494-2496,2500,共4页
Computer Engineering and Design
基金
国家自然科学基金项目(61005006)
关键词
多标记学习
特征选择
模拟退火
维数约简
YAHOO网页
multi-label learning
feature selection
simulated annealing
dimension reduction
Yahoo web