Code Clone Detection Method Based on the Combination of Tree-Based and Token-Based Methods

Code Clone Detection Method Based on the Combination of Tree-Based and Token-Based Methods

在线阅读下载PDF

导出

摘要 This article proposes the high-speed and high-accuracy code clone detection method based on the combination of tree-based and token-based methods. Existence of duplicated program codes, called code clone, is one of the main factors that reduces the quality and maintainability of software. If one code fragment contains faults (bugs) and they are copied and modified to other locations, it is necessary to correct all of them. But it is not easy to find all code clones in large and complex software. Much research efforts have been done for code clone detection. There are mainly two methods for code clone detection. One is token-based and the other is tree-based method. Token-based method is fast and requires less resources. However it cannot detect all kinds of code clones. Tree-based method can detect all kinds of code clones, but it is slow and requires much computing resources. In this paper combination of these two methods was proposed to improve the efficiency and accuracy of detecting code clones. Firstly some candidates of code clones will be extracted by token-based method that is fast and lightweight. Then selected candidates will be checked more precisely by using tree-based method that can find all kinds of code clones. The prototype system was developed. This system accepts source code and tokenizes it in the first step. Then token-based method is applied to this token sequence to find candidates of code clones. After extracting several candidates, selected source codes will be converted into abstract syntax tree (AST) for applying tree-based method. Some sample source codes were used to evaluate the proposed method. This evaluation proved the improvement of efficiency and precision of code clones detecting. This article proposes the high-speed and high-accuracy code clone detection method based on the combination of tree-based and token-based methods. Existence of duplicated program codes, called code clone, is one of the main factors that reduces the quality and maintainability of software. If one code fragment contains faults (bugs) and they are copied and modified to other locations, it is necessary to correct all of them. But it is not easy to find all code clones in large and complex software. Much research efforts have been done for code clone detection. There are mainly two methods for code clone detection. One is token-based and the other is tree-based method. Token-based method is fast and requires less resources. However it cannot detect all kinds of code clones. Tree-based method can detect all kinds of code clones, but it is slow and requires much computing resources. In this paper combination of these two methods was proposed to improve the efficiency and accuracy of detecting code clones. Firstly some candidates of code clones will be extracted by token-based method that is fast and lightweight. Then selected candidates will be checked more precisely by using tree-based method that can find all kinds of code clones. The prototype system was developed. This system accepts source code and tokenizes it in the first step. Then token-based method is applied to this token sequence to find candidates of code clones. After extracting several candidates, selected source codes will be converted into abstract syntax tree (AST) for applying tree-based method. Some sample source codes were used to evaluate the proposed method. This evaluation proved the improvement of efficiency and precision of code clones detecting.

作者 Ryota Ami Hirohide Haga

机构地区 Graduate School of Science and Engineering

出处《Journal of Software Engineering and Applications》 2017年第13期891-906,共16页 软件工程与应用（英文）

关键词 Code Clone Token-Based DETECTION Tree-Based DETECTION TREE EDIT Distance Code Clone Token-Based Detection Tree-Based Detection Tree Edit Distance

分类号 R73 [医药卫生—肿瘤]

引文网络
相关文献

1Jonathan van den Berg,Hirohide Haga.Matching Source Code Using Abstract Syntax Trees in Version Control Systems[J].Journal of Software Engineering and Applications,2018,11(6):318-340. 被引量：1
2Hossein Zamani Zeinali,Ehsan Masumi Goodarzi,Hamid Ravanbakhsh,Ali Asgher Sardarpour,Narjes Abagheri Mahabadi,Soghra Moradkhani,Fatemeh Dolatshah,Davood Rahi.Results of the Measurement of the Collimator Hole Angulation for Different Collimators of SPECT with Adaptive Quality Control Phantom[J].Modern Instrumentation,2012,1(4):49-53.
3Y. Ferhat,I. Ozkol.The Effects of Dimension Ratio and Horizon Length in the Micropolar Peridynamic Model[J].Engineering（科研）,2011,3(6):594-601. 被引量：1
4Hossein Zamani Zeinali,Mehran Ataee,Hamid Ravanbakhsh,Ehsan Masumi Goodarzi,Samana Ghoreishi,Gholamreza Raisali.The Precise Methods for the Measurement of Collimator Hole Angulation and Center of Rotation of SPECT by Adaptive Quality Control Phantom[J].World Journal of Nuclear Science and Technology,2014,4(4):208-215. 被引量：1
5Jaein JEONG,David CULLER.Incremental Network Programming for Wireless Sensors[J].International Journal of Communications, Network and System Sciences,2009,2(5):433-452. 被引量：1
6GUO Da,ZHENG Qingfang,PENG Xiaojiang,LIU Ming.Face Detection Detection, Alignment Alignment, Quality Assessment and Attribute Analysis with Multi-Task Hybrid Convolutional Neural Networks[J].ZTE Communications,2019,17(3):15-22. 被引量：5
7Tao Tang,Jian-qun Xu,Sheng-xiang Jin,Hong-qi Wei.Study on Operating Characteristics of Power Plant with Dry and Wet Cooling Systems[J].Energy and Power Engineering,2013,5(4):651-656. 被引量：1
8Jaein Jeong,David Culler.Scalable Incremental Network Programming for Multihop Wireless Sensors[J].International Journal of Communications, Network and System Sciences,2013,6(1):37-51.
9Jian Shang,Chengbao Liu,Lei Yang,Zhiqing Zhang,Jing Wang.Misalignment Angle Calculation Accuracy Analysis of Three-Axis Stabilized Geostationary Satellite[J].Journal of Geoscience and Environment Protection,2017,5(12):153-165. 被引量：2
10Xiaoming Sun.Adaptive Quasi-PID Control Method for Switching Power Amplifiers[J].Journal of Power and Energy Engineering,2017,5(2):19-44. 被引量：1

Journal of Software Engineering and Applications

2017年第13期

浏览历史

内容加载中请稍等...

Code Clone Detection Method Based on the Combination of Tree-Based and Token-Based Methods

相关作者

相关机构

相关主题

浏览历史