摘要
由于目前常见的文本抄袭检查系统都存在对文本的关键信息选择不准确的问题,从而影响了判断的准确性。开发的中文文本抄袭检查系统在采用k-grams算法的基础上,利用基于统计的中文分词技术对其改良。实验结果表明通过改良可以使系统更有效地选取关键信息,提高判断的准确性。
The current common copy detection systems cannot select the key information in the text exactly so they have the problem in the accuracy of their judgments. For that reason, the Chinese text plagiarism checker we make is based on the kgrams algorithm, and it utilizes the Chinese word segmentation based on statistical techniques for improvement. The experiments show that after the improvement, the system can be more effective in selecting the key information so it can be more accurate.
出处
《电脑编程技巧与维护》
2010年第20期23-25,共3页
Computer Programming Skills & Maintenance
基金
浙江省科技计划项目:面向中小企事业单位应用的网络综合管理服务平台(2008C21093)
浙江工商大学学生创新项目(1120XJ1709198)
关键词
改良
关键信息
抄袭检查系统
k-grams算法
中文分词
improvement
key information
plagiarism checker system
k-grams algorithm
Chinese word segmentation