摘要
[目的/意义]黑灰产业通过大量的恶意网站,严重地危害着家庭网络安全.有效地识别恶意网站,对于打击黑灰产业犯罪具有重要意义.传统的恶意网站识别算法,无论是基于规则匹配或是机器学习,都会因为有标注的恶意样本过少而成为"瓶颈".[方法/过程]面向家庭网络流量提出了一种在海量网页上进行多模态自监督学习的预训练方法.这种方法能够从数以亿计的网页中学习网页的基本知识,从而获得更好的网页向量表示,并在后续分类微调时引入一种网页结构的向量表示,与网页和文本的多模态交叉注意力特征相结合.[结果/结论]相对于传统方案,多模态预训练恶意网站识别算法明显地提升了识别效果,基于近邻查找的方法能在应对恶意对抗时及时做出反馈,提高了对于家庭网络黑灰产业流量的识别率.
[Purpose/Significance]Black macket propagandize through a large number of malicious websites,serious harm to family network security,effective identification of malicious websites is of great significance for combating black macket crime.Traditional malicious website identification algorithms,whether based on rule matching or machine learning,will become a bottleneck due to the small number of labeled malicious samples.[Method/Process]This paper proposes a pre-training method of multimodal self-supervised learning on massive web pages for home-network traffic.This method can learn the basic knowledge of web pages from hundreds of millions of web pages,so as to obtain better vector representation of web pages.In the subsequent classification and fine-tuning,a vector representation of web page structure is introduced,which is combined with multimodal cross-attention features of web pages and text.[Results/Conclusion]Compared with the traditional scheme,the algorithm in this paper has greatly improved the recognition effect.The method based on nearest neighbor search can make more timely feedback when dealing with malicious confrontation,and improve the recognition rate of black and gray traffic in the home-network.
作者
张昕
丰阳露
周志龙
路晓明
智绪龙
Zhang Xin;Feng Yanglu;Zhou Zhilong;Lu Xiaoming;Zhi Xulong(TianJin University,Tianjin 300354;China Mobile Hangzhou R&D Center,Zhejiang Hangzhou 310000;China Mobile Group Shandong Co.,Ltd.,Qingdao Branch,Shandong Qingdao 266000)
出处
《网络空间安全》
2023年第2期52-56,共5页
Cyberspace Security