期刊文献+

由向量空间相关模型识别博客文章的垃圾评论 被引量:4

在线阅读 下载PDF
导出
摘要 博客作者往往允许读者在文章后发表评论,许多评论充斥着形形色色的垃圾信息,破坏了博客社区的和谐.在向量空间的基础上构造了一个相关模型,将博客的文章和评论分别分词后,根据模型计算评论和文章的相关度,来判断是否为垃圾评论.该模型不需要训练样本,在一个中文博客测试集上,召回率和准确率分别达到82%和91%.
作者 何海江 凌云
出处 《长沙大学学报》 2008年第2期63-66,共4页 Journal of Changsha University
基金 长沙大学科研基金(批准号:CDJJ-07010110)资助项目
  • 相关文献

参考文献13

  • 1C. Marlow. Audience, structure and authority in the weblog community[C]. New orleans: In The 54th Annual Conference of the International Communication Association, 2004.
  • 2Karl - Michael Schneider. A Comparison of Event Models for Naive Bayes Anti - Spare E - Mail Filtering [ C ]. Buelapest, Hungary: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL' 03), 2003.
  • 3赖均,黄迪明,胡德昆.基于遗传算法、贝叶斯学习的网段反垃圾邮件系统[J].计算机工程,2006,32(2):189-190. 被引量:5
  • 4陈蔚然,董守斌.基于生物序列模式提取技术的邮件过滤算法[J].清华大学学报(自然科学版),2005,45(S1):1734-1737. 被引量:3
  • 5Pranam Kolari et al., Dctecting Spare Blogs: A Machine Learning Approach[C]. Boston, Massachusetts: In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), 2006.
  • 6Yu- Ru Lin et al., Splog Detection Using Self- similarity Analysis on Blog Temporal Dynamics [ C ]. Banff, Albertu, Canada in Proceedings of AIRWeb 2007, May 8, 2007.
  • 7袁军鹏,朱东华,李毅,李连宏,黄进.文本挖掘技术研究进展[J].计算机应用研究,2006,23(2):1-4. 被引量:59
  • 8Yang, Y., Pedersen, J.O., A Comparative Study on Feature Selection in Text Categorization [ C ]. San Francisco Proc. of the 14th International Conference on Machine Learning ICML97,1997:412 - 420.
  • 9黄萱菁,夏迎炬,吴立德.基于向量空间模型的文本过滤系统[J].软件学报,2003,14(3):435-442. 被引量:92
  • 10Gilad Mishne and Natalie Glance , Leave a Reply: An Analysis of Weblog Comments [ C ]. Edinburgh, scotlamel In Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wid Web Conference, 2006.

二级参考文献53

  • 1徐妙君,顾沈明.面向Web的文本挖掘技术研究[J].控制工程,2003,10(z1):44-46. 被引量:4
  • 2林珊,宁国宁,赵之霖.中文分词在邮件过滤系统中的应用[J].华南理工大学学报(自然科学版),2004,32(z1):112-116. 被引量:3
  • 3杨斌,孟志青.一种文本分类数据挖掘的技术[J].湘潭大学自然科学学报,2001,23(4):34-37. 被引量:10
  • 4DavidHand HeikkiMarmila PadhraicSmyth 张银奎 廖丽 宋俊译.数据挖掘原理[M].机械工业出版社,2003..
  • 5TomMMitchell.机器学习[M].北京:机械工业出版社,2003.263-276.
  • 6刘群 张华平 俞鸿魁.基于层次隐马模型的汉语词法分析[Z].,2003..
  • 7Salton G,Wong A,Yang C Sa. Vector Space Model for Automatic Indexing [J]. Communications of the ACM, 1975,18(5 ) : 613-620.
  • 8Bray T, Paoli J, Sperberg-McQaeen C M, Extcnsible Markup Language (XML) 1,0 Specification [EB/OL]. World Wide Web Consortium Recommendation, http://www.w3.org/TR/REC-xml,1998.
  • 9Lassila O, Swick R R. Resource Description Framework Model and Syntax Specification [ EB/OL]. Workt Wide Web Consortium Recommendation, http ://www. w3. org/TR/REC-rdf-syntax/, 1999.
  • 10Koller D, Sahami M. Hierarchically Classifying Documents Using Very Few Words[J]. ICML'97, 1997, 170-178.

共引文献155

同被引文献83

  • 1郭红刚,方敏.AdaBoost方法在入侵检测技术上的应用[J].计算机应用,2005,25(1):144-146. 被引量:6
  • 2WAN Xiao-jun. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis[ C]//Proc of Conference on Empirical Methods in Natural Language Processing. 2008:553- 561.
  • 3PANG Be, LEE L. Opinion mining and sentiment analysis [ J ]. Foundations and Trends in Information Retrieval, 2008, 2 (1- 2) :1-135.
  • 4SU Qi, XU Xin-ying, GUO Hong-lei, et al. Hidden sentiment associ- ation in Chinese Web opinion mining[ C ]//Proc of the 17th Interna- tional Conference on World Wide Web. New York: ACM Press, 2008:959 - 968.
  • 5TITOV I, McDONALD R. Modeling online reviews with multi-grain topic models [ C ]//Proc of the 17th International Conference on World Wide Web. New York : ACM Press,2008 : 111- 120.
  • 6CHOI Y, CARDIE C. Learning with compositional semantics as structural inference for subsentential sentiment analysis [ C ]//Proc of Conference on Empirical Methods in Natural Language Processing. 2008 : 793- 801.
  • 7ZHAO Jun, LIU Kang, WANG Gen. Adding redundant features for CRFs-based sentence sentiment classification [ C ]//Proc of Confer- ence on Empirical Methods in Natural Language Processing. 2008: 117-126.
  • 8ZHANG Min, YE Xin-yao. A generation model to unify topic rele- vance and lexicon-based sentiment for opinion retrieval[ C ]//Proc of the 31 st International Conference on Research and Development in In- formation Retrieval. 2008:411-418.
  • 9LIU Bing. Web data mining: exploring hyperlinks, contents and us- age data[ M]. New York: Springer, 2007:441-448.
  • 10HU Min-qing, LIU Bing. Mining and summarizing customer reviews [ C ]//Proc of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004 : 165-177.

引证文献4

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部