期刊文献+

基于粗糙集特征加权的文本分类 被引量:1

Text Categorization by Feature Weighting Scheme Based on Rough Set
在线阅读 下载PDF
导出
摘要 文本分类是当今信息检索和数据挖掘等领域的研究热点,而特征加权是文本分类过程中的重要步骤。为了提高分类质量,文章通过深入分析粗糙集理论和逆文本频率加权的思想,提出了一种基于粗糙集的特征加权方法,从近似分类精度和近似分类质量两个方面考虑特征词对分类的全局作用,将文本的类别属性信息引入到权重中。通过文本分类实验证明,该加权方法有助于提高分类系统的分类效果。 Text Categorization is the focus of many areas like Information Retrieval,Data Mining and so on.Feature weighting is an important problem in text categorization.For computing feature weights,this paper presents a feature weighting scheme for text categorization based on rough set theory.The authors analyze the characteristics of rough set theory and TF-IDF,and consider the overall influence which the keywords establish over the classification from the aspects of approximation accuracy and approximation quality.The decision information of a feature for categorization is introduced into the weight of this feature,and the importance of the feature will be fully reflected.The experimental results indicate the effectiveness of approach.
出处 《浙江理工大学学报(自然科学版)》 2011年第4期544-548,共5页 Journal of Zhejiang Sci-Tech University(Natural Sciences)
基金 浙江省"钱江人才计划"项目(2007R10013)
关键词 粗糙集理论 特征加权 文本分类 近似分类精度 近似分类质量 rough set feature weighting text categorization approximation accuracy approximation quality
  • 相关文献

参考文献7

二级参考文献27

  • 1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:96
  • 3Marnitsuka H. Selecting features in microarray classification using ROC curves[J]. Pattern Recognition, 2006, 39 (12) : 2393-2404.
  • 4Tahir M A, Bouridane A, Kurugollu F. Simultaneous feature selection and feature weighting using hybrid tabu search/K-nearest neighbor classifier [J]. Pattern Recognition Letters, 2007, 28(4): 438-446.
  • 5Soucy P, Mineau G W. Beyond TFIDF weighting for text categorization in the vector space model [C]//Proc of the Int Joint Conf on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2005:1130-1135.
  • 6Blansche A, Gancarski P, Korezak J J. A modular approach for clustering with local attribute weighting [J]. Pattern Recognition Letters, 2006, 27(11): 1299-1306.
  • 7Samer H, Rada M, Carmen B. Random-walk term weighting for improved text classification [C] // Proe of the 1st IEEE Int Conf on Semantic Computing. Los Alamitos, CA: IEEE Computer Society, 2007:242-249.
  • 8Salton G, McGill M J. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill Book Co, 1983.
  • 9Yamada T, Yamashita K, Ishii N, et al. Text classification by combining different distance functions with weights [C]//Proc of the 7th ACIS Int Conf on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Los Alamitos, CA: IEEE Computer Society, 2006:85-90.
  • 10Zeng Xueqiang, Wang Mingwen, Nie Jianyun. Text classification based on partial least square analysis [C] //Proc of the 2007 ACM Syrup on Applied Computing. New York: ACM, 2007:834-838.

共引文献424

同被引文献3

  • 1胡清华,谢宗霞,于达仁.基于粗糙集加权的文本分类方法研究[J].情报学报,2005,24(1):59-63. 被引量:11
  • 2Xue X B,Zhou Z H.Distributional Features for Text Categorization(J).IEEE Transactions on Knowledge and Datangineering,2009,21(3):428-442.
  • 3Zdzialaw Pawlak,Andrzej Skowron.Rudiments of rough sets(J).Information Sciences,2007,177:3-27.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部