摘要
提出了基于编译优化和反汇编的程序相似性检测方法,能够检测出标识符重命名、增加冗余语句、等价的控制结构替换等12种学生常用的抄袭手段.基于该方法,设计和实现了一个程序相似性检测系统BuaaSim,采用编译优化和反汇编技术将源程序转化为汇编指令集合,删除和替换汇编指令中对程序本质特征影响不大的易变元素,使用一个与指令顺序无关的决策函数计算程序相似度;还给出一个简单有效的聚类算法,从程序集合中聚类出相似的程序子集.通过与著名的JPlag系统针对两份典型的抄袭样本集进行评测对比,表明本文方法的检测效果具有明显的优势.
An approach based on compiling optimization and disassembling was proposed to detect similarity in computer programs. It can detect 12 modification strategies that are often used by students, such as renaming identifiers, adding redundant statements and replacing control structures with equivalent structures. The implemented software, called BuaaSim, translates source code into assembly instructions with the help of compiler and disassembler, removes and replaces those easily changed elements in the assembly instructions, and applies a decision function to calculate the similarity, which doesn't depend on the order of assembly instructions. A simple clustering algorithm was also introduced to find all groups of similar programs. By using two sets of plagiarized transcripts as testing programs, the comparative evaluation shows that BuaaSim has more advantages than JPlag, a famous similarity detection system.
出处
《北京航空航天大学学报》
EI
CAS
CSCD
北大核心
2008年第6期711-715,共5页
Journal of Beijing University of Aeronautics and Astronautics
基金
国家自然科学基金资助项目(60703057)
关键词
抄袭
程序相似性
相似性检测
编译优化
plagiarism
program similarity
similarity detection
compiling optimization