一种基于NMF_(SC)的文本聚类方法

Document Clustering Method Based on NMF_(SC)

在线阅读下载PDF

导出

摘要通过分析文本的特征,提出了一种基于稀疏约束非负矩阵分解(NMFSC)的文本聚类新方法。该方法用NMFSC分解词-文本矩阵来降低特征空间的维度,并依照稀疏约束更好地控制稀疏度,然后利用簇中文本的相似性进一步细化簇。实验表明,与基于k-means的文本聚类方法和基于NMF的文本聚类方法相比,此方法具有较高的归一化互信息值(NMI),从而具有良好的聚类性能。 Through analyzing the characteristics of the text, a novel text clustering approach based on Non-negative Matrix Factorization with sparseness constraint （NMFSC） is presented. The method uses NMFSC decomposing word-text matrix to reduce the dimension of the feature space, and better controls sparsity with sparseness constraint, and then further refines clusters by using the similarity of documents in clusters. Compared with text clustering method based on k-means and text clustering method based on NMF, the results of experiment show that the method has high value of the normalized mutual information, thus it has good clustering performance.

作者王永贵高月

机构地区辽宁工程技术大学软件学院

出处《计算机系统应用》 2011年第9期78-81,156,共5页 Computer Systems & Applications

关键词文本聚类细化簇非负矩阵分解稀疏表示归一化互信息值 text clustering refine clusters non-negative matrix factorization sparse representation normalized mutual information

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1Lee DD,Seung H.Learning the parts of objects by non-negative matrix factorization. Nature . 1999
2黄钢石,陆建江,张亚非.基于NMF的文本聚类方法[J].计算机工程,2004,30(11):113-114. 被引量：9
3张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量：16

二级参考文献14

1[1]Jain A K, Dubes R C. Algorithms for Clustering Data. Prentice Hall, 1988
2[2]Inderjit S D, Dharmendra S M. Concept Decompositions for Large Sparse Text Using Clustering. Machine Learning, 2001,42(1): 143-175
3[3]Hinneburg A, Aggarwal C C, Keim D A. What is the Nearest Neighbor in High Dimensional Spaces. In: Proceedings of the VLDB Confe- rence, 2001
4[4]Lee D, Seung H. Learning the Parts of Objects by Non-negative Matrix Factorization. Nature, 1999, 401:788-791
5[5]Lee D, Seung H. Algorithms for Non-negative Matrix Factorization. Adv. Neural Info. Proc. Syst., 2001,13:556-562
6[6]Inderjit S D, Dharmendra S M. Concept Decompositions for Large Sparse Text Using Clustering. Machine Learning, 2001, 42(1):143-175
7J MacQueen. Some methods for classification and analysis of multivariate observation. In: Proc of the 5th Berkeley Symp Math Statist and Prob 1. California; University of California Press,1967. 281～297
8L Kaufman, P J Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons,1990
9M Ankerst, M M Breunig, H P Kriegel, et al. OPTICS:Ordering points to identify the clustering structure. In: Proc of the 1999 ACM SIGMOD Int'l Conf on Management of Data (SIGMOD' 99). New York: ACM Press, 1999. 164～169
10A Hotho, G Stumme. Conceptual clustering of text clusters.FGML Workshop, Hannover, 2002

共引文献23

1雷庆,吴扬扬.识别和抽取XM L文档中的关系信息及其出现模式[J].清华大学学报（自然科学版）,2005,45(S1):1757-1761. 被引量：3
2王素格,彭其伟,张武.基于遗传算法的自然语言参数阈值优化方法[J].测试技术学报,2006,20(1):75-81. 被引量：1
3王燕.基于信息熵的标称变量聚类算法研究[J].计算机应用,2006,26(8):1904-1905. 被引量：2
4王燕.聚类类别数目自动学习算法研究[J].计算机工程与设计,2007,28(2):252-253. 被引量：6
5索红光,杨涛.基于互信息的Web文档聚类方法[J].广西师范大学学报（自然科学版）,2007,25(2):131-134. 被引量：3
6索红光,王玉伟.一种用于文本聚类的改进k-means算法[J].山东大学学报（理学版）,2008,43(1):60-64. 被引量：34
7张江,王年,梁栋,唐俊,周梅菊.基于非负矩阵分解与邻接谱的图像分类[J].中国科学技术大学学报,2008,38(3):247-251. 被引量：6
8宋菲,朱群雄,顾祥柏.基于概念全信息空间的文本知识挖掘算法[J].计算机应用与软件,2008,25(7):96-97. 被引量：1
9索红光,王玉伟.基于参考区域的k-means文本聚类算法[J].计算机工程与设计,2009,30(2):401-403. 被引量：9
10常娥.农史专题资料自动编纂系统的构建与测试[J].图书馆学研究,2009(6):10-14. 被引量：4

1张磊,冯晓森,项学智.基于非负矩阵分解的中文文本主题分类[J].计算机工程,2009,35(13):26-27. 被引量：3
2本刊编辑部.人工智能最了不起的发明[J].网络传播,2017,0(1):38-39.
3王文霞.一种基于LSA与FCM的文本聚类算法[J].山西大同大学学报（自然科学版）,2016,32(1):8-11.
4郑军.以气制“气”[J].我们爱科学,2010(3):34-35.
5朱颢东,钟勇.结合优化的文档频和LSA的特征选择方法[J].计算机工程与应用,2009,45(34):121-123. 被引量：1
6张群,王红军,王伦文.一种结合上下文语义的短文本聚类算法[J].计算机科学,2016,43(S2):443-446. 被引量：11
7袁飞,王成良,文俊浩.一种融合文本重要性的文本检索算法[J].计算机工程与应用,2014,50(3):93-96. 被引量：2
8郭恒明,雷咏梅,李利杰,王雄.潜在语义分析中词汇-文本矩阵奇异值分解的并行实现[J].计算机应用与软件,2009,26(2):103-104. 被引量：1
9张俊琳,崔勇,王弘毅.标准键盘布局比较与优化设计[J].计算机系统应用,2012,21(4):254-258. 被引量：2
10黄钢石,陆建江,张亚非.基于NMF的文本聚类方法[J].计算机工程,2004,30(11):113-114. 被引量：9

计算机系统应用

2011年第9期

浏览历史

内容加载中请稍等...

一种基于NMF_(SC)的文本聚类方法

参考文献3

二级参考文献14

共引文献23

相关作者

相关机构

相关主题

浏览历史