基于粗糙集特征加权的文本分类被引量：1

Text Categorization by Feature Weighting Scheme Based on Rough Set

在线阅读下载PDF

导出

摘要文本分类是当今信息检索和数据挖掘等领域的研究热点,而特征加权是文本分类过程中的重要步骤。为了提高分类质量,文章通过深入分析粗糙集理论和逆文本频率加权的思想,提出了一种基于粗糙集的特征加权方法,从近似分类精度和近似分类质量两个方面考虑特征词对分类的全局作用,将文本的类别属性信息引入到权重中。通过文本分类实验证明,该加权方法有助于提高分类系统的分类效果。 Text Categorization is the focus of many areas like Information Retrieval,Data Mining and so on.Feature weighting is an important problem in text categorization.For computing feature weights,this paper presents a feature weighting scheme for text categorization based on rough set theory.The authors analyze the characteristics of rough set theory and TF-IDF,and consider the overall influence which the keywords establish over the classification from the aspects of approximation accuracy and approximation quality.The decision information of a feature for categorization is introduced into the weight of this feature,and the importance of the feature will be fully reflected.The experimental results indicate the effectiveness of approach.

作者徐欣黄理灿赵玉虹

机构地区浙江理工大学信息电子学院

出处《浙江理工大学学报（自然科学版）》 2011年第4期544-548,共5页 Journal of Zhejiang Sci-Tech University(Natural Sciences)

基金浙江省"钱江人才计划"项目(2007R10013)

关键词粗糙集理论特征加权文本分类近似分类精度近似分类质量 rough set feature weighting text categorization approximation accuracy approximation quality

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献7

1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：391
2Zdzislaw Pawlak, Andrzej Skowrora Rudiments of rough sets[J]. Information Sciences, 2007, 177: 3-27.
3Salton G, Me Gill M J. Introduction to Modern Information Retrieval[M]. New York:Mc Graw-Hill Book Co, 1983.
4胡清华,谢宗霞,于达仁.基于粗糙集加权的文本分类方法研究[J].情报学报,2005,24(1):59-63. 被引量：11
5刘赫,刘大有,裴志利,高滢.一种基于特征重要度的文本分类特征加权方法[J].计算机研究与发展,2009,46(10):1693-1703. 被引量：25
6Huang Li-can, Xu Xin, Zhao Yu-hong, et al. Feature weighting scheme for text categorization based on rough Set[C]// The First International Conference on Networking and Distributed Computing(ICNDC2010), 2010: 186-188.
7Yang Yi-ming, Liu Xin. A re-examination of text categorization methods[C]//Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'99), 1999 : 42-49.

二级参考文献27

1王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量：20
2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量：96
3Marnitsuka H. Selecting features in microarray classification using ROC curves[J]. Pattern Recognition, 2006, 39 (12) : 2393-2404.
4Tahir M A, Bouridane A, Kurugollu F. Simultaneous feature selection and feature weighting using hybrid tabu search/K-nearest neighbor classifier [J]. Pattern Recognition Letters, 2007, 28(4): 438-446.
5Soucy P, Mineau G W. Beyond TFIDF weighting for text categorization in the vector space model [C]//Proc of the Int Joint Conf on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2005:1130-1135.
6Blansche A, Gancarski P, Korezak J J. A modular approach for clustering with local attribute weighting [J]. Pattern Recognition Letters, 2006, 27(11): 1299-1306.
7Samer H, Rada M, Carmen B. Random-walk term weighting for improved text classification [C] // Proe of the 1st IEEE Int Conf on Semantic Computing. Los Alamitos, CA: IEEE Computer Society, 2007:242-249.
8Salton G, McGill M J. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill Book Co, 1983.
9Yamada T, Yamashita K, Ishii N, et al. Text classification by combining different distance functions with weights [C]//Proc of the 7th ACIS Int Conf on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Los Alamitos, CA: IEEE Computer Society, 2006:85-90.
10Zeng Xueqiang, Wang Mingwen, Nie Jianyun. Text classification based on partial least square analysis [C] //Proc of the 2007 ACM Syrup on Applied Computing. New York: ACM, 2007:834-838.

共引文献424

1李林,刁磊,唐詹,柏召,周晗,郭旭超.基于BERT_Stacked LSTM的农业病虫害问句分类方法[J].农业机械学报,2021,52(S01):172-177. 被引量：7
2刘岩,王宇恒,吕冰雪,张卫正,李灿林.基于特征模型融合的实时车道线检测研究[J].科技通报,2020(7):50-57. 被引量：1
3姚学恒,张萍,闫立伟,操诚.基于机器学习的企业秘密文档自动分类方法[J].产业与科技论坛,2020,19(7):44-45.
4张小艳,李强.基于SVM的分类方法综述[J].科技信息,2008(28):344-345. 被引量：23
5王辉,左万利,袁华.一种基于质心与本体的文本分类方法[J].计算机研究与发展,2007,44(z2):6-11. 被引量：3
6徐燕,李锦涛,王斌,孙春明,张森.不均衡数据集上文本分类的特征选择研究[J].计算机研究与发展,2007,44(z2):58-62. 被引量：20
7袁志坚,贾焰.基于误差反馈的高速Web文本流快速近似分类[J].计算机研究与发展,2007,44(z3):13-17.
8贾志洋,高炜,王勇刚.结合信息检索技术的半监督文本分类方法[J].苏州大学学报（自然科学版）,2012,28(1):34-39. 被引量：1
9陈思,钱铭宇,刘昌明.文本分类技术研究进展[J].电脑编程技巧与维护,2009(S1):22-24.
10李艾林,李照耀.基于朴素贝叶斯技术的藏文文本分类[J].中文信息,2013(11). 被引量：4

同被引文献3

1胡清华,谢宗霞,于达仁.基于粗糙集加权的文本分类方法研究[J].情报学报,2005,24(1):59-63. 被引量：11
2Xue X B,Zhou Z H.Distributional Features for Text Categorization(J).IEEE Transactions on Knowledge and Datangineering,2009,21(3):428-442.
3Zdzialaw Pawlak,Andrzej Skowron.Rudiments of rough sets(J).Information Sciences,2007,177:3-27.

引证文献1

1王勋,裴志利,王庆虎.基于粗糙集和特征位置重要度的特征加权方法[J].内蒙古民族大学学报（自然科学版）,2013,28(2):150-153.

1王勋,裴志利,王庆虎.基于粗糙集和特征位置重要度的特征加权方法[J].内蒙古民族大学学报（自然科学版）,2013,28(2):150-153.
2魏巍,魏琪,王锋.粗糙集的不确定性度量比较研究[J].南京大学学报（自然科学版）,2015,51(4):714-722. 被引量：11
3关晓蔷,刘煜伟.一种基于粗糙集的决策树构造方法[J].科技情报开发与经济,2006,16(13):136-138. 被引量：4
4朱晓钟,杨勇,朱英丽.一般关系粗糙集的近似分类精度和质量[J].计算机应用与软件,2011,28(5):52-54.
5张瑞玲,都彦格,张克勇.基于VPRS的ID3算法改进[J].陕西理工学院学报（自然科学版）,2007,23(3):38-41. 被引量：4
6徐久成,沈钧毅,安秋生,李乃乾.基于信息粒度与粗糙集的决策细化研究[J].西安交通大学学报,2005,39(4):335-338. 被引量：3
7高阳,钟波.基于(α,τ)限制相似关系的变精度粗糙集模型[J].系统工程与电子技术,2009,31(7):1639-1641. 被引量：4
8曾正良,罗可,王莹.基于粒子群的不完备决策表属性约简PSOIDTAR法[J].计算机工程与应用,2008,44(14):149-151. 被引量：1
9王飞,王卓,曾姚.基于变精度粗糙集的决策树构造改进算法[J].计算机与数字工程,2013,41(3):337-339. 被引量：4
10周爱武,周闪闪,邹武.一种变精度粗糙集模型阈值选取的方法[J].计算机技术与发展,2009,19(4):112-114. 被引量：3

浙江理工大学学报（自然科学版）

2011年第4期

浏览历史

内容加载中请稍等...

基于粗糙集特征加权的文本分类被引量：1

参考文献7

二级参考文献27

共引文献424

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于粗糙集特征加权的文本分类 被引量：1

参考文献7

二级参考文献27

共引文献424

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于粗糙集特征加权的文本分类被引量：1