摘要
网络用户评论的主题发现研究是Web2.0时代信息分析的重要方式,如何从冗杂的用户评论中分析出有价值的信息是研究的热点。针对网络用户评论信息内容短、信息量少的特征,提出基于LDA(latent Dirichlet allocation)主题发现模型结合HowNet知识库进行信息分析的方法,对网络评论进行主题发现的研究。首先通过评论文本的词性标注、语义分析,形成语料库,然后利用HowNet对语料库中的词项进行语义相似度的计算,完成语义去重、合并,最后通过LDA主题模型将用户评论的内容映射到主题上,实现对用户评论信息主题的发现。
Topic extraction of web user opinions is an important way of web 2. 0 information analysis. How to analyze valuable informa-tion from miscellaneous user opinions is a challenging issue. Due to short information content and amount of web user opinions, the article put forward information analysis method based on Latent Dirichlet Allocation and HowNet knowledge base to extract net review topic. Firstly, to set up the corpus through textual tagging and semantic analysis of the reviews, then using HowNet to calculate semantic similari-ty of the corpus items and to reduce semantic repetition, finally, using Latent Dirichlet Allocation to map the topic and realize new review topic extraction.
出处
《情报杂志》
CSSCI
北大核心
2014年第3期161-164,共4页
Journal of Intelligence