摘要
文本分类是当今信息检索和数据挖掘等领域的研究热点,而特征加权是文本分类过程中的重要步骤。为了提高分类质量,文章通过深入分析粗糙集理论和逆文本频率加权的思想,提出了一种基于粗糙集的特征加权方法,从近似分类精度和近似分类质量两个方面考虑特征词对分类的全局作用,将文本的类别属性信息引入到权重中。通过文本分类实验证明,该加权方法有助于提高分类系统的分类效果。
Text Categorization is the focus of many areas like Information Retrieval,Data Mining and so on.Feature weighting is an important problem in text categorization.For computing feature weights,this paper presents a feature weighting scheme for text categorization based on rough set theory.The authors analyze the characteristics of rough set theory and TF-IDF,and consider the overall influence which the keywords establish over the classification from the aspects of approximation accuracy and approximation quality.The decision information of a feature for categorization is introduced into the weight of this feature,and the importance of the feature will be fully reflected.The experimental results indicate the effectiveness of approach.
出处
《浙江理工大学学报(自然科学版)》
2011年第4期544-548,共5页
Journal of Zhejiang Sci-Tech University(Natural Sciences)
基金
浙江省"钱江人才计划"项目(2007R10013)
关键词
粗糙集理论
特征加权
文本分类
近似分类精度
近似分类质量
rough set
feature weighting
text categorization
approximation accuracy
approximation quality