摘要
针对传统加密流量识别方法存在多分类准确率低、泛化性不强以及易侵犯隐私等问题,提出一种结合注意力机制(Attention)与一维卷积神经网络(1DCNN)的多分类深度学习模型——Attention-1DCNN-CE。该模型包含3个核心部分:1)数据集预处理阶段,保留原始数据流中数据包间的空间关系,并根据样本分布构建成本敏感矩阵;2)在初步提取加密流量特征的基础上,利用Attention和1DCNN模型深入挖掘并压缩流量的全局与局部特征;3)针对数据不平衡这一挑战,通过结合成本敏感矩阵与交叉熵(CE)损失函数,显著提升少数类别样本的分类精度,进而优化模型的整体性能。实验结果表明,在BOT-IOT和TON-IOT数据集上该模型的整体识别准确率高达97%以上;并且该模型在公共数据集ISCX-VPN和USTC-TFC上表现优异,在不需要预训练的前提下,达到了与ET-BERT(Encrypted Traffic BERT)相近的性能;相较于PERT(Payload Encoding Representation from Transformer),该模型在ISCX-VPN数据集的应用类型检测中的F1分数提升了29.9个百分点。以上验证了该模型的有效性,为加密流量识别和恶意流量检测提供了解决方案。
To address the problems of low multi-classification accuracy,poor generalization,and easy privacy invasion in traditional encrypted traffic identification methods,a multi-classification deep learning model that combines Attention mechanism(Attention)with one-Dimensional Convolutional Neural Network(1DCNN)was proposed,namely Attention-1DCNN-CE.This model consists of three core components:1)in the dataset preprocessing stage,the spatial relationship among packets in the original data stream was retained,and a cost-sensitive matrix was constructed on the basis of the sample distribution;2)based on the preliminary extraction of encrypted traffic features,the Attention and 1DCNN models were used to mine deeply and compress the global and local features of the traffic;3)in response to the challenge of data imbalance,by combining the cost-sensitive matrix with the Cross Entropy(CE)loss function,the sample classification accuracy of minority class was improved significantly,thereby optimizing the overall performance of the model.Experimental results show that on BOT-IOT and TON-IOT datasets,the overall identification accuracy of this model is higher than 97%.Additionally,on public datasets ISCX-VPN and USTC-TFC,this model performs excellently,and achieves performance similar to that of ETBERT(Encrypted Traffic BERT)without the need for pre-training.Compared to Payload Encoding Representation from Transformer(PERT)on ISCX-VPN dataset,this model improves the F1 score in application type detection by 29.9 percentage points.The above validates the effectiveness of this model,so that this model provides a solution for encrypted traffic identification and malicious traffic detection.
作者
耿海军
董赟
胡治国
池浩田
杨静
尹霞
GENG Haijun;DONG Yun;HU Zhiguo;CHI Haotian;YANG Jing;YIN Xia(School of Automation and Software Engineering,Shanxi University,Taiyuan Shanxi 030031,China;Shanxi Qingzhong Technology Company Limited,Taiyuan Shanxi 030006,China;School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China;Key Laboratory of Embedded System and Service Computing,Ministry of Education(Tongji University),Shanghai 201804,China;Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)
出处
《计算机应用》
北大核心
2025年第3期872-882,共11页
journal of Computer Applications
基金
国家自然科学基金资助项目(62472267)
山西省应用基础研究计划项目(20210302123444)。
关键词
网络安全
加密流量
注意力机制
一维卷积神经网络
数据不平衡
成本敏感矩阵
cybersecurity
encrypted traffic
Attention mechanism(Attention)
one-Dimensional Convolutional Neural Network(1DCNN)
data imbalance
cost-sensitive matrix