摘要
针对传统特征选择算法无法处理流特征数据、冗余性计算复杂、对实例描述不够准确的问题,提出了基于粗糙集的数据流多标记分布特征选择算法。首先,将在线流特征选择框架引入多标记学习中;其次,用粗糙集中的依赖度替代原有的条件概率,仅仅利用数据自身的信息计算,使得数据流特征选择算法更加高效快捷;最后,由于在现实世界中,每个标记对实例的描述程度并不相同,为更加准确地描述实例,将传统的逻辑标记用标记分布的形式进行刻画。在多组数据集上的实验表明,所提算法能保留与标记空间有着较高相关性的特征,使得分类精度相较于未进行特征选择的有一定程度的提高。
Traditional feature selection algorithm cannot process streaming feature data,the redundancy calculation is complicated and the description of the instance is not accurate enough.A multi-label Distribution learning Feature Selection with Streaming Data Using Rough Set(FSSRS)was proposed to solve the above problem.Firstly,the online streaming feature selection framework was introduced into multi-label learning.Secondly,the original conditional probability was replaced by the dependency in rough set theory,which made the streaming data feature selection algorithm more efficient and faster than before by only using the information calculation of the data itself.Finally,since each label has a different degree of description for the same instance in real world,to make the description of the instance more accurate,label distribution was used to instead of traditional logical labels.The experimental results show that the proposed algorithm can retain the features with high correlation with the label space,so that the classification accuracy is improved to a certain extent compared with that without feature selection.
作者
程玉胜
陈飞
王一宾
CHENG Yusheng;CHEN Fei;WANG Yibin(School of Computer and Information,Anqing Normal University,Anqing Anhui 246011,China;University Key Laboratory of Intelligent Perception and Computing of Anhui Province,Anqing Anhui 246011,China;Key Laboratory of Data Science and Intelligence Application,Fujian Province University,Zhangzhou Fujian 363000,China)
出处
《计算机应用》
CSCD
北大核心
2018年第11期3105-3111,3118,共8页
journal of Computer Applications
基金
安徽省高校重点科研项目(KJ2017A352)
数据科学与智能应用福建省高校重点实验室开放课题(D1801)~~
关键词
粗糙集
多标记
数据流
特征选择
标记分布
rough set
multi-label
streaming data
feature selection
label distribution