摘要
针对H.264/AVC环内去块效应滤波器算法中分支密集、分支判断条件产生复杂,以及可变阶数有限冲击响应(FIR)滤波算法复杂度高等性能瓶颈,基于专用指令集处理器(ASIP)Schubert平台提出了加法舍入移位指令和两级条件比较指令,并给出了其专用数据通路的设计实现.根据算法分支执行分布情况优化了算法中分支选择部分的实现,保证了代码的高并行度.时钟精确指令集仿真器的运行结果表明,完成强度为4的4×4像素块边界滤波需要140个时钟周期,而完成强度小于4的边界滤波需要100个时钟周期.运行1/4共享中间格式(QCIF)测试序列时,较x264中的Intel MMX指令实现性能有48%-63%的提升.实验结果表明,使用ASIP实现,可以显著提高去块效应滤波的性能;同时由于其可编程性,可以适应多个视频标准.
The in-loop de-blocking filter of H. 264/AVC is both computation and control intensive because of the high density of branches, the complexity of condition generation and the adaptive finite impulse response (FIR) algorithm. To address the issue, add-round-shift and two-level comparison instructions were proposed based on the application specific instruction set processor (ASIP) platform-Schubert, along with the hardware implementation of the data-path. The control-intensive code was optimized according to the run-time distribution of the branches, which makes sure that it can be executed in parallel. Results from the cycle-accurate instruction set simulator showed that it takes 140 cycles to filter one 4× 4 boundary with strength equals to 4, and it takes 100 cycles with strength less than 4. The proposed implementation achieves 48% to 63% performance improvement compared to the x264 implemented with Intel MMX when processing quarter common intermediate format (QCIF) test video sequences. ASIP approach for implementation of the de-blocking algorithm can achieve better performance while it is also more flexible and compatible with multiple video standards.
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2008年第4期608-611,666,共5页
Journal of Zhejiang University:Engineering Science
基金
国家“863”高技术研究发展计划资助项目(2005AA1Z1271)