期刊文献+

中文垃圾邮件过滤系统的实现和评估 被引量:5

Implementation and evaluation of Chinese spam filtering system
在线阅读 下载PDF
导出
摘要 垃圾邮件是长期以来困扰电子邮件使用者的一个问题,反垃圾邮件技术除了可以抑制垃圾邮件,对反垃圾短信和垃圾VoIP电话等问题也有借鉴意义.为此,对使用贝叶斯方法过滤垃圾邮件进行了介绍,阐述了中文垃圾邮件过滤系统的实现,并给出了评估结果.结果表明,在过滤中计算最终概率的特征数目以及用于训练的样本个数都存在某个最优值,当用于训练的样本个数逐渐超过这个最优值时,过滤效果会略微下降并趋于一致. Spam has been a serious problem to email users for a long time.Anti-spam technique can be used to block not only spam but also unsolicited commercial mobile messages and VoIP phones.Here the authors give a survey of bayes filtering,introduce a Chinese spam filtering system and show the evaluation.It is shown that there are certain optimized values for the size of the training aggregate and the token numbers that are calculated to the final probability.If the size of the training aggregate exceeds the optimum value,the filtering effect will decrease a little and go to a constant as the aggregate size increases.
出处 《大连理工大学学报》 EI CAS CSCD 北大核心 2005年第z1期189-195,共7页 Journal of Dalian University of Technology
关键词 垃圾邮件 贝叶斯 过滤器 spam bayes filter
  • 相关文献

参考文献16

  • 1[1]PAUL G. A plan for spam[EB/OL]. 2002[2005-08-20]. http://www. paulgraham. com/spam. html.
  • 2[2]TOM F. In vivo spam filtering: A challenge problem for data mining[J]. KDD Explorations, 2003, 5 (2):140-148[2005-08-20]. http://arxiv. org/abs/cs. AI/0405007.
  • 3[3]WILLIAM Y. The spam-filtering accuracy plateau at 99.9% accuracy and how to get past it[EB/OL]. [S.l.]: MIT Spam Conference, 2004(2004-01-18)[2005-08-20]. http://crm114. sourceforge. net/Plateau_Paper. pdf.
  • 4[4]PATRICK P, DEKANG L. SpamCop: A spam classification & organization program[EB/OL]. Madison, Wisconsin: AAAI Technical Report WS-98-05, Proceedings of AAAI-98 Workshop on Learning for Text Categorization. 1998[2005-08-20]. http://citeseer. ist. psu. edu/pantel98spamcop. html.
  • 5[5]MEHRAN S, SUSAN D, DAVID H, et al. A bayesian approach to filtering junk E-Mail[EB/OL]. Madison, Wisconsin: AAAI Technical Report WS-98-05, Proceedings of AAAI-98 Workshop on Learning for Text Categorization. 1998(2001-04-20)[2005-08-20]. http://citeseer. ist. psu. edu/sahami98bayesian. html.
  • 6[6]WILLIAM Y. Sparse binary polynomial hashing and the CRM114 discriminator[EB/OL]. [S.l.]: MIT Spam Conference, 2003. (2003-01-20)[2005-08-20]. http://crm114. sourceforge. net/CRM114_paper. html.
  • 7[7]PAUL G. Better bayesian filtering[EB/OL]. 2003[2005-8-20]. http://www. paulgraham. com/better. html.
  • 8[8]GARY R. Gary Robinson's rants on spam detection [EB/OL]. 2003-07-19[2005-08-20]. http://radio. weblogs. com/0101454/stories/2002/09/24/oldSpamDetection. html
  • 9[9]RARMOND C, LEROY F. Asymptotic optimality of fisher's method of combining independent tests[J]. Journal of the American Statistical Association, 1971, 336(66): 802-805.
  • 10[10]DAVID H. A tutorial on learning with bayesian networks[EB/OL]. Redmond, Washington: Technical Report MSR-TR-95-06, Microsoft Research. 1996[2005-08-20]. http://citeseer. ist. psu. edu/heckerman96tutorial. html.

同被引文献28

引证文献5

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部