期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
MW-DLA:a dynamic bit width deep learning accelerator 被引量:1
1
作者 Li Zhen Zhi Tian +2 位作者 Liu Enhe Liu Shaoli Chen Tianshi 《High Technology Letters》 EI CAS 2020年第2期145-151,共7页
Deep learning algorithms are the basis of many artificial intelligence applications.Those algorithms are both computationally intensive and memory intensive,making them difficult to deploy on embedded systems.Thus var... Deep learning algorithms are the basis of many artificial intelligence applications.Those algorithms are both computationally intensive and memory intensive,making them difficult to deploy on embedded systems.Thus various deep learning accelerators(DLAs)are proposed and applied to achieve better performance and lower power consumption.However,most deep learning accelerators are unable to support multiple data formats.This research proposes the MW-DLA,a deep learning accelerator supporting dynamic configurable data-width.This work analyzes the data distribution of different data types in different layers and trains a typical network with per-layer representation.As a result,the proposed MW-DLA achieves 2X performance and more than 50%memory requirement for AlexNet with less than 5.77%area overhead. 展开更多
关键词 deep learning accelerator(DLA) per-layer representation multiple-precision ARITHMETIC unit
在线阅读 下载PDF
NNL:a domain-specific language for neural networks 被引量:1
2
作者 Wang Bingrui Chen Yunji 《High Technology Letters》 EI CAS 2020年第2期160-167,共8页
Recent years,neural networks(NNs)have received increasing attention from both academia and industry.So far significant diversity among existing NNs as well as their hardware platforms makes NN programming a daunting t... Recent years,neural networks(NNs)have received increasing attention from both academia and industry.So far significant diversity among existing NNs as well as their hardware platforms makes NN programming a daunting task.In this paper,a domain-specific language(DSL)for NNs,neural network language(NNL)is proposed to deliver productivity of NN programming and portable performance of NN execution on different hardware platforms.The productivity and flexibility of NN programming are enabled by abstracting NNs as a directed graph of blocks.The language describes 4 representative and widely used NNs and runs them on 3 different hardware platforms(CPU,GPU and NN accelerator).Experimental results show that NNs written with the proposed language are,on average,14.5%better than the baseline implementations across these 3 platforms.Moreover,compared with the Caffe framework that specifically targets the GPU platform,the code can achieve similar performance. 展开更多
关键词 artificial NEURAL network(NN) domain-specific language(DSL) NEURAL network(NN)accelerator
在线阅读 下载PDF
Assembly language and assembler for deep learning accelerators 被引量:1
3
作者 Lan Huiying Wu Linyang +1 位作者 Han Dong Du Zidong 《High Technology Letters》 EI CAS 2019年第4期386-394,共9页
Deep learning accelerators(DLAs)have been proved to be efficient computational devices for processing deep learning algorithms.Various DLA architectures are proposed and applied to different applications and tasks.How... Deep learning accelerators(DLAs)have been proved to be efficient computational devices for processing deep learning algorithms.Various DLA architectures are proposed and applied to different applications and tasks.However,for most DLAs,their programming interfaces are either difficult to use or not efficient enough.Most DLAs require programmers to directly write instructions,which is time-consuming and error-prone.Another prevailing programming interface for DLAs is high-performance libraries and deep learning frameworks,which are easy to be used and very friendly to users,but their high abstraction level limits their control capacity over the hardware resources thus compromises the efficiency of the accelerator.A design of the programming interface is for DLAs.First various existing DLAs and their programming methods are analyzed and a methodology for designing programming interface for DLAs is proposed,which is a high-level assembly language(called DLA-AL),assembler and runtime for DLAs.DLA-AL is composed of a low-level assembly language and a set of high-level blocks.It allows experienced experts to fully exploit the potential of DLAs and achieve near-optimal performance.Meanwhile,by using DLA-AL,end-users who have little knowledge of the hardware are able to develop deep learning algorithms on DLAs spending minimal programming efforts. 展开更多
关键词 deep learning deep learning accelerator(DLA) assembly language programming language
在线阅读 下载PDF
Optimizing deep learning inference on mobile devices with neural network accelerators
4
作者 Zeng Xi Xu Yunlong Zhi Tian 《High Technology Letters》 EI CAS 2019年第4期417-425,共9页
Deep learning has now been widely used in intelligent apps of mobile devices.In pursuit of ultra-low power and latency,integrating neural network accelerators(NNA)to mobile phones has become a trend.However,convention... Deep learning has now been widely used in intelligent apps of mobile devices.In pursuit of ultra-low power and latency,integrating neural network accelerators(NNA)to mobile phones has become a trend.However,conventional deep learning programming frameworks are not well-developed to support such devices,leading to low computing efficiency and high memory-occupation.To address this problem,a 2-stage pipeline is proposed for optimizing deep learning model inference on mobile devices with NNAs in terms of both speed and memory-footprint.The 1 st stage reduces computation workload via graph optimization,including splitting and merging nodes.The 2 nd stage goes further by optimizing at compilation level,including kernel fusion and in-advance compilation.The proposed optimizations on a commercial mobile phone with an NNA is evaluated.The experimental results show that the proposed approaches achieve 2.8×to 26×speed up,and reduce the memory-footprint by up to 75%. 展开更多
关键词 machine learning inference neural network accelerator(NNA) low latency kernel fusion in-advance compilation
在线阅读 下载PDF
BENCHIP: Benchmarking Intelligence Processors 被引量:3
5
作者 Jin-Hua Tao Zi-Dong Du +12 位作者 Qi Guo Hui-Ying Lan Lei Zhang Sheng-Yuan Zhou Ling-Jie Xu Cong Liu Hai-Feng Liu Shah Tang Allen Rush Willian Chen Shao-Li Liu Yun-Ji Chen Tian-Shi Chen 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第1期1-23,共23页
The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and s... The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BENCHIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BENCHIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks, They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect various characteristics of the evaluated intelligence processors, BENCHIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BENCHIP will be open-sourced soon. 展开更多
关键词 deep learning intelligence processor BENCHMARK
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部