摘要
结合文本分类规则抽取的特点,给出了近似规则的定义。该方法首先利用CHI值进行特征选取并为下一步特征选取提供特征重要性信息,然后使用粗糙集对离散决策表继续进行特征选取,最后用粗糙集抽取出精确规则或近似规则。该方法将CHI值特征选取和粗糙集理论充分结合,避免了用粗糙集对大规模决策表进行特征约简,同时避免了决策表的离散化。该方法提高了文本规则抽取的效率,并使其更趋实用化。实验结果表明了这种方法的有效性和实用性。
The definition of proximate rule was proposed based on the characteristic of text classification rule extraction. Based on the CHI values, the features of text set were selected firstly and feature significance information was provided to the further feature selection. Then rough set was used to select further the attributes on the discrete decision table. Finally precise rules or proximate rules were extracted using rough set theory. The method combined CHI value feature selection and rough set theory fully so as to avoid both feature reduction on a large scale decision table and the discretization of the decision table. The method improved the effectiveness and the practicability of extracting text rule greatly. Experiment results demonstrate the effectiveness of the method.
出处
《计算机应用》
CSCD
北大核心
2005年第5期1026-1028,1033,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60275020)
关键词
CHI值
特征选取
粗糙集
文本分类规则
CHI value
feature selection
rough set
text classification rule