期刊文献+

稀疏自动编码器在文本分类中的应用研究 被引量:17

Research of Text Categorization Based on Sparse Autoencoder Algorithm
在线阅读 下载PDF
导出
摘要 传统的文本分类算法都是采用期望交叉熵、信息增益和互信息等统计方法,通过设置阈值获取特征集。如果训练集的数据量较大,则容易出现特征项不明确、特征信息丢失等缺陷。为解决上述问题,提出运用"深度学习"中的稀疏自动编码器算法自动提取文本特征,然后结合深度置信网络形成SD算法进行文本分类。实验表明,在训练集较少的情况下,SD算法的分类性能低于传统的支持向量机;但是在处理高维数据时,SD算法则比支持向量机具有较高的准确率和召回率。 Tradition text classification algorithms use the expected cross entropy, information gain and mutual information statistical method to get the feature set, but these methods require setting thresholds. If the training data set is large which prone to feature items is not clear, the feature information loss and other defects. In order to solve the above problem, the sparse autoencoder algorithm is used which belongs to "deep learning" automatically ex- tracts text features, and then combines with the deep belief networks to form SD algorithm for text classification. Experiments show that, in the case of small training set, SD algorithm performs lower than traditional support vector machines, but when dealing with high-dimensional data, SD has higher accuracy and recall rate than support vector machine algorithm.
出处 《科学技术与工程》 北大核心 2013年第31期9422-9426,共5页 Science Technology and Engineering
基金 欠发达地区工业化与信息化融合及其系统动力机制研究(11FJL007)资助
关键词 文本分类 深度学习 稀疏自动编码器 深度置信网络 text classification deep learning sparse autoencoder deep belief networks
  • 相关文献

参考文献8

二级参考文献63

  • 1胡佳妮,徐蔚然,郭军,邓伟洪.中文文本分类中的特征选择算法研究[J].光通信研究,2005(3):44-46. 被引量:47
  • 2史晶蕊,郑玉明,韩希.人工神经网络在文本分类中的应用[J].计算机应用研究,2005,22(10):213-216. 被引量:10
  • 3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:121
  • 4牛强,王志晓,陈岱,夏士雄.基于SVM的中文网页分类方法的研究[J].计算机工程与设计,2007,28(8):1893-1895. 被引量:22
  • 5DEBOLE F, SCBASTIANI F. An analysis of the relative hardness of recuters-21578 subsets [J]. Journal of the American Society for Information Science and Technology,2004,56(6) :584-596.
  • 6AHN B S, CHO S S, KIM C. The integrated methodology of rough set theory and artificial neural network for business failure prediction[ J]. Expert Systems with Applications, 2000,18(2) :65-74.
  • 7Huang Zhe-xue, Michael K N, Rong hong-qiang, et al. Automa- ted Variable Weighting in k-Means Type Clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005,27(5) : 657-668.
  • 8Kulesza T, Stumpf S, Wong W K, et al. Why-oriented end-user debugging of naive bayes text classification [J]. ACM Transactions on Interactive Intelligent Sys- tems, 2011, 1 ( 1 ) ,doi : 10.1145/2030365. 2030367.
  • 9Hao Xiulan, Tao Xiaopeng, Zhang Chenghong, et al. An effective method to improve KNN text classifier [ C ] //Proceedings of the 8th ACIS International Conference on Software Engineering, Artficial Intelligence, Networ- king and Parallel/Distributed Computing. Quebec: IEEE Computer Society ,2007 : 379 -384.
  • 10Wang T Y, Chiang H M. One-against-one fuzzy support vector machine classifier: an approach to text categoriza- tion [ J ]. Expert Systems with Applications, 2009, 36 (6) : 10030 - 10034.

共引文献134

同被引文献141

引证文献17

二级引证文献164

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部