摘要
针对微博内容驳杂、信息稀疏的问题,深入研究传统自动摘要技术,结合微博数据特点,在微博事件提取的基础上提出一种基于统计和理解的混合摘要方法。首先根据词频、句子位置等文本特征得到基于统计的初始摘要;然后通过语义词典,计算句子相似度、确定事件主体进行基于语义理解的可读性加工,使最终摘要更具可读性;最后采用合理的摘要评价方法评价所得摘要。实验结果表明,该方法在不同压缩比例下均能获得质量稳定且可读性良好的摘要。
Micro-blog features complex contents and sparse information. In order to solve these prob- lems, on the basis of in-depth study on traditional automatic abstract techniques, combing with the data of micro-blog features, we propose a hybrid automatic summarization method based on statistics and comprehension for micro-blog event extraction. Firstly, we obtain the initial abstract based on the statistics according to word frequency and the location of sentences. Then we calculate sentence similarity through the semantic dictionary, determine the event subject, process the semantic understanding based readability, and make the final abstract more readable. Finally, a reasonable abstract evaluation method is adopted to evaluate the obtained abstract. Experimental results show that the proposed method can obtain a good summary of stable quality and readability under different compression ratios.
出处
《计算机工程与科学》
CSCD
北大核心
2016年第6期1257-1261,共5页
Computer Engineering & Science
基金
国家自然科学基金(61163025)
内蒙古自治区自然科学基金(2015MS0621)
关键词
微博事件
事件价值
可读性
自动摘要
micro-blog event
event value
readablity
automatic summarization