摘要
在用于构建深度学习模型的深度学习框架中,算子的正确计算对于深度学习模型的正确预测至关重要.然而,已有的深度学习框架缺陷检测方法只能通过比较和推测的方式找到不同深度学习框架之间计算结果相差较大的算子,而且无法检测深度学习模型在训练过程中产生的计算错误,具有很大的局限性.针对此问题,本文设计并实现了基于元算子的深度学习框架缺陷检测方法,通过将不同深度学习框架中算子的共性计算逻辑抽象为“元算子”,支持在不改变模型代码的前提下绑定元算子的具体实现,从而可以细粒度地对比同一模型使用不同深度学习框架的运算结果,进而发现缺陷.本文的方法同时支持训练过程和推断过程的缺陷检测,还可以对计算错误的定位进行验证.本文验证了元算子计算的准确性,并评估其运算性能;收集了深度学习框架中已知有错误计算的算子,并将本文方法应用在包含这些算子的深度学习模型上,验证了本文缺陷检测方法的有效性.
Deep Learning(DL)has beeb widely adopteb in various fields such as image recognition,machine translation,and autonomouu driving.In ordee to bettee support deep learning tasks and promote the application of DL,more and more platforms and frameworks have emerged,such as TensorFlow,PyTorch,and Keras.These platforms and frameworks are known as deep learning frameworks.Using the programming interfaces provided by these deep learning frameworks,developers can easily design,train,and test the deep learning models.Deep learning frameworks usually take“operator"at the unit of calculation,and different operatore define different typee of numerical calculation.In deep learning frameworks,the correct calculation of operators is critical to the correctness of deep learning models.These calculation errors could affect the accuracy of the prediction resulte of the deep learning models,or ever result in serioue consequences such as traffic accidents in automatic driving.In recech years,attention has been paid on testing and diagnosie of deep learning frameworks,but existing defect detection methods havs greae limitations.On the one hand,existing defect detection methods for deep learninn frameworke can detect only large calculation differences of operatore between different deep learning frameworke through comparison and speculatiog.On the other hand,existing methods can diagnose only calculation errore of deep learning models in the inferencs process,and cannot diagnose calculation errors in the training process.To address the issue,we expect to detect errors of deep learninn models due to the defects of deep learninn frameworks automatically in the process of training or inferencs and verify the accuracy of detection results.There are many challenges in implementing such e defect detection method.First,the deep learning model usually consists of a complex network structurs.Foe a deep learning model,given any input instance,it is very difficult to determins the correct output.Second,a deep learning model usually consists a large number of operators and their relationship in the model is very complex,making locating defectiw operators difficult.In addition,verifying the correctness of defect location in a large and complicated deep learning model is challenging.iN response to the above challenges,in this paper,we desige and implement a defect detection method for deep learning frameworks based on meta operatorc.We abstract common computing logic of operators such as forward computation and gradient computation of operators in different deep learning frameworks as“meta operators”.We bind the specific implementation of operators without changing the code of deep learning models.In this way,users can make fine-grained replacements of operators in deep learning models.Through fine-grained operator replacement,not only can the calculation errors of the deep learning frameworks during the inference process be found,but also the calculation errors during the training process and the localization of these errors can be verified by recording the meta operator’s running time and memory consumption.We verify the accuracy of the meta operator calculation and evaluate its performance.We collect the known operators with calculation errors in the deep learning frameworks and apply the defect detection method on deep learning models containing these operators,showing the effectiveness of the defect detection method.
作者
谷典典
石屹宁
刘譞哲
吴格
姜海鸥
赵耀帅
马郓
GU Dian-Dian;SHI Yi-Ning;LIU Xuan-Zhe;WU Ge;JIANG Hai-Ou;ZHAO Yao-Shuai;MA Yun(Key Laboratory of High Confidence Software Technologies of Ministry of Education(Peking University),Beijing 100871;TravelSky Technology Limited,Beijing 101318;Key Laboratory of Intelligent Passenger Service of Civil Aviation,CAAC,Beijing 101318;Peking University Information Technology Institute(Tianjin Binhai)Information Technology Institute(Tianjin Binhai),Peking University,Tianjin 300452;Institute for Artificial Intelligence,Peking University,Beijing 100871)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2022年第2期240-255,共16页
Chinese Journal of Computers
基金
国家重点研发计划“高时效、可扩展的大数据计算模型、优化技术与系统”(2018YFB1004400)
北京高等学校卓越青年科学家项目“软件定义的人机物融合计算技术与系统”(BJJWZYJH01201910001004)资助。
关键词
深度学习框架
元算子
缺陷检测
深度学习
软件测试
deep learning frameworks
meta operator
defect detection
deep learning
software testing