摘要
针对卷积神经网络存在运算量大、资源要求高的问题,本文提出一种易于在移动端低功耗嵌入式设备上布置的二值化神经网络(Binary Neural Network,BNN)图像分类模型,并提供了其在ARM(Advanced RISC Machines)+FPGA(Field Programmable Gate Array)异构系统上的硬件加速设计。通过将卷积的累乘加运算转化为简单的同或运算(Exclusive NOR,XNOR)和位计数运算(population count,popcount),降低了运算复杂度和片上资源要求;利用数据复用、流水线设计和并行计算提升整体运算速度;针对CIFAR-10数据集进行图像分类识别,利用Vivado HLS工具在FPGA平台上完成该网络模型的部署。在PYNQ-Z2平台上进行测试的实验结果显示,在100 MHz工作频率下,部署在FPGA端的网络模型对任意尺寸的图像输入经过PS(Processing System)端裁剪后整体处理速度可达约631 FPS,运行总时间仅约1.58 ms。
Aiming at the problems of large computational complexity,time-consuming,and high resource requirements of convolutional neural network(CNN),this paper proposes a design scheme of binary neural network(BNN) image classification model running on embedded platforms with limited resources and power consumption in mobile terminals and designs a hardware acceleration design for its implementation on an ARM + FPGA platform.By converting the convolution multiply-accumulate operation into XNOR logic and popcount operations,the computational complexity and on-chip resource requirements are reduced.Data multiplexing,pipeline design,and parallel calculation were utilized to increase overall computation speed.Taking image recognition under the CIFAR-10 data set as an example,We use VIVADO HLS tool to complete the deployment of convolutional neural network model on FPGA platform.The test results on the PYNQ-Z2 platform show that the network model deployed on the FPGA side achieves a processing speed of approximately 631 FPS at a working frequency of 100 MHz,total runtime is only about 1.58 ms for image inputs of any size,after cropping on the processing system(PS) side.
作者
魏行健
孙泽宇
王正斌
WEI Xingjian;SUN Zeyu;WANG Zhengbin(College of Electronic and Optical Engineering,College of Flexible Electronics(Future Technology),Nanjing University of Posts and Telecommunications,Nanjing 210023,China;National Joint Engineering Laboratory of RF Integration and Microassembly Technology,Nanjing 210023,China)
出处
《智能计算机与应用》
2025年第1期69-74,共6页
Intelligent Computer and Applications