摘要
为有效检测SQL注入(structured query language injection,SQLI),对机器学习的基本方法进行研究,通过朴素贝叶斯(Naive Bayes)分类算法对SQLI检测分类。对用户可能输入的字符序列,经特征提取与词法分析后,生成特定顺序标记(Token)的特征向量,通过朴素贝叶斯模型对其分类,评估出SQLI与非SQLI (non-SQLI)两个类别。对预处理阶段加以细化,包括对特征提取方法的改进与词法分析标记原子化;在机器学习阶段,针对预处理后的特征向量,提出一种可去噪声的SQLI检测算法。实验结果表明,在给定的预先确定了SQL语句类别的数据集的情况下,该方案可以有效地检测SQLI攻击。
To detect SQL injection (structured query language injection, SQLI) effectively, the basic method of machine learning was studied, and SQLI detection was classified using Naive Bayes classification algorithm. After the feature extraction and lexical analysis, the character sequence that the user may input generated a feature vector of a specific order token (Token), it was classified wsing the naive Bayes model, and both SQLI and non-SQLI categories were evaluated. The preprocessing phase was refined, including the improvement of the feature extraction method and Token atomization of the lexical analysis. In the machine learning phase, a pre-processed eigenvector was proposed to denoise the SQLI detection algorithm. Experimental results show that the scheme can effectively detect SQLI attacks in a given data set with predetermined SQL statement categories.
作者
胡峰松
李苍
王冕
全夏杰
徐青云
HU Feng-song;LI Cang;WANG Mian;QUAN Xia-jie;XU Qing-yun(College of Compute Science and Electronic Engineering,Hunan University,Changsha 410082,China;Digital Media Technology Lab,Hunan University,Changsha 410082,China)
出处
《计算机工程与设计》
北大核心
2019年第6期1554-1558,共5页
Computer Engineering and Design
关键词
SQL注入
特征提取
词法分析
标记
朴素贝叶斯分类
机器学习
SQL injection
feature extraction
lexical analysis
Token
Naive Bayes classification
machine learning