期刊文献+

一种基于混合概率模型的文本分类方法

A Text Classification Method Based on Hybrid Probability Model
在线阅读 下载PDF
导出
摘要 提出了一种基于类的混合概率分类方法.该方法为每一类文本独立选取能代表其本质特性的主要特征,即不同类型的文本由不同的主要特征表示,并基于各类的主要特征分别为每类文本建立相应的概率分布模型,然后再根据朴素贝叶斯方法对未知类型的文本进行分类.实验结果表明:该方法简单有效且易于实现. A hybrid probabilistic classification method was proposed based on the class. The method selects the main features that can represent the essential characteristics of a class text independently for every class text. That is different class text will be expressed by different main features. Based on the main features of each class text, the probability distribution model will be established respectively for every class text. Then the unknown class label text can be classified by naive Bayesian method. The experiment results show that the proposed method is simple, effective and easy to implemen.
作者 吴新玲
出处 《微电子学与计算机》 CSCD 北大核心 2011年第11期133-136,共4页 Microelectronics & Computer
关键词 文本数据挖掘 文本分类 属性选择 概率模型 多项分布 text data mining text classification feature selection probability model multinomial distribution
  • 相关文献

参考文献8

  • 1Sparck K Jones,Wil[ett P, Readings in information re- trieval[M]. San Francisco, CA, USA. Morgan Kauf- mann, 1997.
  • 2Sebastiani F. A tutorial on automated text categorization [C]// Proceedings of ASAI--99, 1st Argentinian Sym- posium on Artificial Intelligence, Buenos Aires, AR: IEEE, 1999 : 7-35.
  • 3Fabrizio Sebastiani. Machine learning in automated text categorization[J]. ACM Comput Surv, 2002, 34 (1) .. 1 -47.
  • 4杨俊,陈贤富.基于KPCA和RBF网络的文本分类研究[J].微电子学与计算机,2010,27(3):122-125. 被引量:12
  • 5Han J, Kamber M. Data mining: concepts and tech- niques[M]. [s. 1.]" Morgan kaufmann publishers, 2006.
  • 6Hand D, Mannila H, Smyth P. Principles of Data Min- ing [M]. Beijing: Publishing House of Machinery In- dustry , 2003.
  • 7黄双福,陈贤富.基于改进SVM主动学习算法的入侵检测[J].微电子学与计算机,2010,27(3):75-77. 被引量:5
  • 8Zdravko Markov, Daniel T Larose. Data mining the web: uncovering patterns in web content, structure, and usage[M]. Hoboken, New Jersey: John Wiley Sons Inc, 2007.

二级参考文献14

  • 1王俊英,郭景峰,霍峥.中文文本分类系统的设计与实现[J].微电子学与计算机,2006,23(z1):262-265. 被引量:3
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:392
  • 3Yang Y, Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th ICML. USA: Nashville 1997 : 412 - 420.
  • 4Cover T M. The best two independent measurements are not the two best[J].IEEE Transactions on Systems, Man and Cybernetics, 1974(4) :116- 117.
  • 5Scholkopf B, Smola A, Muller K R. Nonlinear component analysis as a kernel eigenvalue probtem[J ]. Neural Computation, 1998,10(5): 1299 - 1319.
  • 6Li YH, Jain A K. Classification of the text document[J]. The Computer Journal, 1998,41 (8) :537 - 546.
  • 7Almgren M, Jonsson E. Using active learning in intrusion detection [C]//Proceedings of the 17th IEEE Symposium on Security Foundations Workshop. IEEE Computer Society Press. Sweden: Goteborg, 2004:88-98.
  • 8Vapnik V N. The nature of statistical learning theory[ M]. New York: Springer - verlag, 1999.
  • 9Tong S, Koller D. Support vector machine active learning with applications to text classification[J ]. Machine learningresearch, 2001(2):45 - 66.
  • 10Girolami M. Mercer kemel based clustering in feature space[J]. IEEE Transactionson Neural Networks, 2002, 13 (3):780 - 784.

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部