期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Design of efficient parallel algorithms on shared memory multiprocessors
1
作者 Qiao Xiangzhen (Institute of Computing Technology, Chinese Academg of Science Beijing 100080, P. R. China) 《Wuhan University Journal of Natural Sciences》 CAS 1996年第Z1期344-349,共6页
The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algori... The design of parallel algorithms is studied in this paper. These algorithms are applicable to shared memory MIMD machines In this paper, the emphasis is put on the methods for design of the efficient parallel algorithms. The design of efficient parallel algorithms should be based on the following considerationst algorithm parallelism and the hardware-parallelism; granularity of the parallel algorithm, algorithm optimization according to the underling parallel machine. In this paper , these principles are applied to solve a model problem of the PDE. The speedup of the new method is high. The results were tested and evaluated on a shared memory MIMD machine. The practical results were agree with the predicted performance. 展开更多
关键词 parallel algorithm shared memory multiprocessor parallel granularity optimization.
在线阅读 下载PDF
Parallelization and Performance Optimization on Face Detection Algorithm with OpenCL: A Case Study 被引量:1
2
作者 Weiyan Wang Yunquan Zhang +2 位作者 Shengen Yan Ying Zhang Haipeng Jia 《Tsinghua Science and Technology》 EI CAS 2012年第3期287-295,共9页
Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time n... Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very naive implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4-13 times speedup varying with the image size. Furthermore those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU. 展开更多
关键词 Viola-Jones OPENCL time cost hidden local memory usage parallel granularity
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部