期刊文献+

基于主题模型的微博重要话题发现与排序方法 被引量:12

Detection and Ranking of Significant Topics on Sina Weibo Based on Topic Model
在线阅读 下载PDF
导出
摘要 近年来,以Twitter和新浪微博为代表的微博客正在世界范围内流行起来.根据微博的特点,提出一种与特定主题(比如某种产品)相关的话题发现和排序的新方法.首先,在互联网上收集并格式化出现了感兴趣的词的微博.对于这些微博中的所有词汇,综合考虑影响力、突发性和相关性3个要素对其重要性进行评估.其次,对词的重要性做出估量后,以含有同一关键词的微博的集合为输入文档训练LDA模型.然后通过对主题关键词的概率分布的推导,实现词的聚类和主题的挖掘.这一方法可以克服微博的长度限制所带来的数据稀缺性问题.最后,通过真实数据集上的实验表明了该方法的有效性. Micro-blogging services,like Twitter and Sina Weibo,are getting popular across the world.In this paper a new approach is proposed to get information from micro-blogs about what people are thinking about a product,a company or an organization.First,messages in which people mention the item(e.g.aproduct)of interest are collected and formalized.Then,keywords cooccurring with it are analyzed to estimate their importance.In this procedure,three factors-influence,burstiness and relevance-are considered to balance topicsnovelty and specificity.Influence score of a keyword is based on its probability of being viewed by many people,burstiness score is based on whether it appears more times recently than before,and relevance score is based on its co-occurrence relationship with the product of interest.After keywords ranking process,micro-blogs containing the same keywords are aggregated to a term profile as input to train LDA model,by which the data sparsity caused by the length limit of micro-blog is weaken.The validity of this approach is proved in real case study.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第S1期179-185,共7页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2012AA040911)
关键词 微博客 关键词排序 主题发现 LDA 主题模型 文本挖掘 micro-blog keyword ranking topic detection latent Dirichlet allocation(LDA) topic model text mining
  • 相关文献

参考文献3

二级参考文献88

共引文献313

同被引文献153

引证文献12

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部