摘要
网络流量识别对于网络规划、网络管理和安全监测等非常重要。基于应用层的协议特征检测技术已成为网络流量识别的主流方法。但是在高速的网络流量识别的过程中,针对传统协议特征提取算法效率较低、可信度较差等问题,提出了一种适用于协议特征提取的多级T+序列树挖掘算法。该方法首先将序列数据库装入内存,构建多级T+序列树,接着对该树进行裁剪,然后通过构建投影T+序列树和连接等操作得到协议特征序列,最后通过一个实例说明了该算法的执行过程。实验结果表明:该算法较基于Prefix Span的协议识别算法能有效地减少扫描和产生序列数据库的次数,降低磁盘I/O操作的时间,提高了运行效率,从而保证了提取不同协议特征的正确性和可靠性。
It is very important for network traffic identification in the process of network plan, network management and safety monito- ring. The main method of network traffic identification is protocol feature detection technology based on application layer. In course oF high speed network traffic identification, in order to solve the problem of low efficiency and poor reliability of algorithm for traditional protocol signatures extracting, a mining algorithm based on multilevel T+ sequence tree for protocol signatures extracting is proposed. Firstly, the sequence database is loaded into main memory, and multilevel T+ sequence tree is created. Secondly, the tree is cutted. Third- ly, the protocol feature sequence is found by serial operation, such as creating the projection T+ sequence tree and connection and so on. Finally, the implementation process of the algorithm is illustrated through an example. The experimental results indicate that the algorithm effectively reduces the number of scanning and creating sequence database,and saves the time of disk I/O operation compared with the protocol identification algorithm based on PrefixSpan, and improves running efficiency, which verifies the validity and reliability for ex- tracting different protocol feature.
出处
《计算机技术与发展》
2015年第10期71-75,共5页
Computer Technology and Development
基金
湖北省教育科学"十二五"规划项目(2011B130)