一种基于混合概率模型的文本分类方法

A Text Classification Method Based on Hybrid Probability Model

在线阅读下载PDF

导出

摘要提出了一种基于类的混合概率分类方法.该方法为每一类文本独立选取能代表其本质特性的主要特征,即不同类型的文本由不同的主要特征表示,并基于各类的主要特征分别为每类文本建立相应的概率分布模型,然后再根据朴素贝叶斯方法对未知类型的文本进行分类.实验结果表明:该方法简单有效且易于实现. A hybrid probabilistic classification method was proposed based on the class. The method selects the main features that can represent the essential characteristics of a class text independently for every class text. That is different class text will be expressed by different main features. Based on the main features of each class text, the probability distribution model will be established respectively for every class text. Then the unknown class label text can be classified by naive Bayesian method. The experiment results show that the proposed method is simple, effective and easy to implemen.

作者吴新玲

机构地区广东技术师范学院计算机科学学院

出处《微电子学与计算机》 CSCD 北大核心 2011年第11期133-136,共4页 Microelectronics & Computer

关键词文本数据挖掘文本分类属性选择概率模型多项分布 text data mining text classification feature selection probability model multinomial distribution

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1Sparck K Jones,Wil[ett P, Readings in information re- trieval[M]. San Francisco, CA, USA. Morgan Kauf- mann, 1997.
2Sebastiani F. A tutorial on automated text categorization [C]// Proceedings of ASAI--99, 1st Argentinian Sym- posium on Artificial Intelligence, Buenos Aires, AR: IEEE, 1999 : 7-35.
3Fabrizio Sebastiani. Machine learning in automated text categorization[J]. ACM Comput Surv, 2002, 34 (1) .. 1 -47.
4杨俊,陈贤富.基于KPCA和RBF网络的文本分类研究[J].微电子学与计算机,2010,27(3):122-125. 被引量：12
5Han J, Kamber M. Data mining: concepts and tech- niques[M]. [s. 1.]" Morgan kaufmann publishers, 2006.
6Hand D, Mannila H, Smyth P. Principles of Data Min- ing [M]. Beijing: Publishing House of Machinery In- dustry , 2003.
7黄双福,陈贤富.基于改进SVM主动学习算法的入侵检测[J].微电子学与计算机,2010,27(3):75-77. 被引量：5
8Zdravko Markov, Daniel T Larose. Data mining the web: uncovering patterns in web content, structure, and usage[M]. Hoboken, New Jersey: John Wiley Sons Inc, 2007.

二级参考文献14

1王俊英,郭景峰,霍峥.中文文本分类系统的设计与实现[J].微电子学与计算机,2006,23(z1):262-265. 被引量：3
2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：392
3Yang Y, Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th ICML. USA: Nashville 1997 : 412 - 420.
4Cover T M. The best two independent measurements are not the two best[J].IEEE Transactions on Systems, Man and Cybernetics, 1974(4) :116- 117.
5Scholkopf B, Smola A, Muller K R. Nonlinear component analysis as a kernel eigenvalue probtem[J ]. Neural Computation, 1998,10(5): 1299 - 1319.
6Li YH, Jain A K. Classification of the text document[J]. The Computer Journal, 1998,41 (8) :537 - 546.
7Almgren M, Jonsson E. Using active learning in intrusion detection [C]//Proceedings of the 17th IEEE Symposium on Security Foundations Workshop. IEEE Computer Society Press. Sweden: Goteborg, 2004:88-98.
8Vapnik V N. The nature of statistical learning theory[ M]. New York: Springer - verlag, 1999.
9Tong S, Koller D. Support vector machine active learning with applications to text classification[J ]. Machine learningresearch, 2001(2):45 - 66.
10Girolami M. Mercer kemel based clustering in feature space[J]. IEEE Transactionson Neural Networks, 2002, 13 (3):780 - 784.

共引文献15

1张兰华,孙岩,薛绍伟,唐一源.基于BP神经网络的社团分类研究[J].微电子学与计算机,2011,28(6):197-200. 被引量：3
2高武奇,康凤举,钟联炯.数据挖掘的流程改进和模型应用[J].微电子学与计算机,2011,28(7):9-12. 被引量：3
3杨华,王珂.一种基于因子分析改进的RBF神经网络算法[J].微电子学与计算机,2011,28(10):105-108. 被引量：5
4吕岩.改进蚁群算法在文本聚类中的应用研究[J].微电子学与计算机,2012,29(3):31-34. 被引量：2
5程文波,王华军.井下人员无线定位关键技术研究[J].微电子学与计算机,2012,29(4):165-168. 被引量：6
6吴冬妮.一种基于粗糙集理论的入侵检测方法[J].制造业自动化,2012,34(24):54-56.
7徐永华,李广水.基于距离加权模板约简和属性信息熵的增量SVM入侵检测算法[J].计算机科学,2012,39(12):76-78. 被引量：10
8徐冉冉,琚昊霖,李朝锋.非平衡二叉树主动学习支持向量机[J].微电子学与计算机,2013,30(5):55-58. 被引量：3
9郭新辰,李成龙,樊秀玲.基于主成分分析和KNN混合方法的文本分类研究[J].东北电力大学学报,2013,33(6):60-63. 被引量：4
10余彪,万水龙,刘进,王强德.基于Krawtchouk-RBF的印章图像分类识别[J].微型机与应用,2014,33(6):44-47. 被引量：1

1黄建新.室外场景下目标分割和目标识别算法[J].华侨大学学报（自然科学版）,2005,26(4):353-356. 被引量：1
2李刚,童頫.基于混合概率模型的无监督离散化算法[J].计算机学报,2002,25(2):158-164. 被引量：16
3邹健,刘传才.基于MLE与流形学习的数据可视化方法[J].计算机工程,2011,37(1):4-6.
4杨赛,赵春霞.图像分类中的概率乘积核函数[J].中国图象图形学报,2013,18(8):961-967. 被引量：2
5周琳霞,黎明,刘高航,杨小芹.基于前向神经网络的与内容无关的笔迹鉴别[J].南昌航空工业学院学报,2002,16(1):27-34. 被引量：2
6尚赵伟,李振华,张澜.基于日志的协同图像自动标注[J].计算机工程与应用,2015,51(8):178-182. 被引量：3
7张玥,刘传才,邹健,卢桂馥.颜色共生矩阵的Fisher信息度量及识别应用[J].计算机工程与应用,2015,51(5):19-22.
8史庆伟,李艳妮,郭朋亮.科技文献中作者研究兴趣动态发现[J].计算机应用,2013,33(11):3080-3083. 被引量：13
9卢玉书.基于离线文本独立的笔迹鉴别系统设计[J].武警学院学报,2009,25(2):87-91. 被引量：1
10周昱,张杰,沈安文.基于改进3D马尔可夫模型的动态车辆检测[J].华中科技大学学报（自然科学版）,2011,39(9):48-52. 被引量：2

微电子学与计算机

2011年第11期

浏览历史

内容加载中请稍等...

一种基于混合概率模型的文本分类方法

参考文献8

二级参考文献14

共引文献15

相关作者

相关机构

相关主题

浏览历史