摘要
提出了一种以网页结构为指导的自动摘要方法。对页面源文件进行解析时,利用文档的结构信息生成DOM树,并在此基础上划分文档主题。同时充分挖掘网页标记对主题词提取和句子重要性计算的价值。最后以主题块为单位,根据句子间的相似度调整句子权重,动态生成摘要。实验结果表明该方法能有效解决文档摘要分布不平衡问题,减少了文摘内容的冗余。
A method of automatic summarization in Web information retrieval was proposed based on the struetruc of the Web document. The document was partitioned into several topic blocks through parsing the document into DOM( Document Object Model) tree and comparing the semantic similarity. The tag information was fully used to extract topic words and key sentences. Finally the abstract was created dynamically through adjusting the weights of sentences. The experiment results show that the new method can slove the imbalance problem of abstract and reduce the redundancy of the content effectively.
出处
《计算机应用》
CSCD
北大核心
2006年第3期641-644,共4页
journal of Computer Applications
基金
江苏省高校自然科学基金资助项目(MB20022312)