基于国产众核架构的非结构网格分区块重构预处理算法研究

Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture

在线阅读下载PDF

导出

摘要如何高效地解决非结构网格离散访存问题一直是科学与工程计算并行算法和应用领域关注的核心热点问题之一。基于国产申威异构众核架构而设计的分布式区块重连的优化算法,在解决应用课题中的非结构稀疏问题时能始终保持高效的计算性能。通过深入分析众核架构片上的通信机制来设计高效的消息分组策略,以提高从核片上阵列带宽的利用率,同时结合无栅栏数据分发算法充分发挥国产异构众核体系架构网络的性能。通过建立性能模型与实验测试分析可知,该算法在不同访存特征下平均内存带宽能达到理论值的70%以上,与主核串行算法相比具有平均10倍和最高45倍的加速性能。同时通过对多个不同领域的应用进行测试分析也证明了该算法的普适性。 How to efficiently solve the discrete-memory-accessing problem of unstructed-grid is one of the hot-spot issues in the field of parallel algorithms and application in scientific and engineering computing.The distributed block reconnection optimization algorithm,which is designed on the basis of domestic Sunway heterogeneous many-core architecture,can maintain high computing performance when solving the problem of unstructured sparsity in applications.After deeply analyzing the on-chip communication mechanism of the many-core architecture,an efficient message grouping strategy is designed to improve the bandwidth utilization of on-chip array on the slave core.At the same time,a barrier-free data distribution algorithm is combined to give full play to the network perfor-mance of the domestic heterogeneous many-core architecture.Through the establishment of perfor-mance models and experimental analysis,the average memory bandwidth of the proposed algorithm can reach more than 70%of the theoretical value under different memory access situations.Compared with the serial algorithm on the master core,it has an ave-rage of 10 times and a maximum of 45 times performance acceleration.At the same time,the universal applicability of the algorithm is proved by application tests in different fields.

作者叶跃进李芳陈德训郭恒陈鑫 YE Yue-jin;LI Fang;CHEN De-xun;GUO Heng;CHEN Xin(National Supercomputing Center in Wuxi,Wuxi,Jiangsu 214000,China;Department of Computer Science and Technology,Tsinghua University,Beijin 100084,China)

机构地区国家超级计算无锡中心清华大学计算机科学与技术系

出处《计算机科学》 CSCD 北大核心 2022年第6期73-80,共8页 Computer Science

基金国家重点研发计划“高性能计算”重点专项(2020YFB0204804,2016YFB0201100)。

关键词国产众核架构非结构网格片上通信消息分组无栅栏数据分发 Domestic many-core architecture Unstructed-grid On-chip communication Message grouping Barrier-free data distribution

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献4

1李亿渊,薛巍,陈德训,王欣亮,许平,张武生,杨广文.稀疏矩阵向量乘法在申威众核架构上的性能优化[J].计算机学报,2020,43(6):1037-1051. 被引量：13
2郑方,李宏亮,吕晖,过锋,许晓红,谢向辉.Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture[J].Journal of Computer Science & Technology,2015,30(1):145-162. 被引量：13
3李芳,李志辉,徐金秀,范昊,褚学森,李新亮.基于十亿亿次国产超算系统的流体力学软件众核适应性研究[J].计算机科学,2020,47(1):24-30. 被引量：10
4倪鸿,刘鑫.基于神威·太湖之光的非结构网格众核优化技术[J].计算机工程,2019,45(6):45-51. 被引量：6

二级参考文献59

1Manferdelli J L, Govindaraju N K, Crall C. Challenges and opportunities in many-core computing. Proceedings of the IEEE, 2008, 96(5): 808-815.
2Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. In Proc. the 9th Int. High Performance Computing for Computational Science- VECPAR, June 2011, pp.1-25.
3Daga M, Aji A M, Feng W. On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing. In Proc. Symposium on Application Accelerators in HighPerformance Computing, July 2011, pp.141-149.
4Chung E S, Milder P A, Hoe J C, Mai K. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In Proc. the 43rd Annual IEEE/ACM International Symposium on Micmarchitecture (MICRO), December 2010, pp.225-236.
5Lee V W, Grochowski E, Geva R. Performance benefits of heterogeneous computing in HPC workloads. In Proc. the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), May 2012, pp.16-26.
6Kumar R, Farkas K I, Jouppi N P et al. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proc. the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2003, pp.81-92.
7Lee V W, Kim C, Chhugani J et al. Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. In Proc. the 37th Annual International Symposium on Computer Architecture (ISCA), June 2010, pp. 451-460.
8Wittenbrink C M, Kilgariff E, Prabhu A. Fermi GF100 GPU architecture. IEEE Micro, 2011, 31(2): 50-59.
9Kapasi U J, Dally W J, Rixner S et al. The imagine stream processor. In Proc. IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD ), September 2002, pp. 282-288.
10Duran A, Klemm M. The Intel? many integrated core architecture. In Proc. International Conference on High Performance Computing and Simulation (HPCS), July 2012, pp. 365-366.

共引文献37

1张俊,吴庆慧.螺钉连接式固定桥初探[J].重庆医科大学学报,2000,25(2):205-207.
2Haohuan FU,Junfeng LIAO,Jinzhe YANG,Lanning WANG,Zhenya SONG,Xiaomeng HUANG,Chao YANG,Wei XUE,Fangfang LIU,Fangli QIAO,Wei ZHAO,Xunqiang YIN,Chaofeng HOU,Chenglong ZHANG,Wei GE,Jian ZHANG,Yangang WANG,Chunbo ZHOU,Guangwen YANG.The Sunway TaihuLight supercomputer： system and applications[J].Science China(Information Sciences),2016,59(7):109-124. 被引量：63
3张昆,郑方,谢向辉.以访存为中心的阵列众核处理器核心流水线设计[J].计算机工程与科学,2017,39(12):2167-2175. 被引量：2
4刘鑫,郭恒,孙茹君,陈左宁.“神威·太湖之光”计算机系统大规模应用特征分析与E级可扩展性研究[J].计算机学报,2018,41(10):2209-2220. 被引量：17
5倪鸿,刘鑫.基于神威·太湖之光的非结构网格众核优化技术[J].计算机工程,2019,45(6):45-51. 被引量：6
6魏敏,王彬,何香,孙俊,姜小成,肖洒,张莉,徐金秀.BCCAGCM模式在神威·太湖之光系统的优化[J].应用气象学报,2019,30(4):502-512. 被引量：5
7吕小敬,刘钊,蒋令闻,陈德训,杨广文.船舶三维声弹性模拟软件的并行优化策略[J].计算机科学与探索,2019,13(11):1852-1863.
8贾迅,钱磊,邬贵明,吴东,谢向辉.FPGA应用于高性能计算的研究现状和未来挑战[J].计算机科学,2019,46(11):11-19. 被引量：4
9李芳,李志辉,徐金秀,范昊,褚学森,李新亮.基于十亿亿次国产超算系统的流体力学软件众核适应性研究[J].计算机科学,2020,47(1):24-30. 被引量：10
10郭杰,高希然,陈莉,傅游,刘颖.用数据驱动的编程模型并行多重网格应用[J].计算机科学,2020,47(8):32-40. 被引量：2

1唐林.人工智能物联网在智能家居的重要应用[J].中国宽带,2022(2):107-108. 被引量：2
2国家科技重大专项“乘用车动力总成国产高端数控机床的组线集成应用”课题通过验收[J].世界制造技术与装备市场,2021(6):44-44.
3侯雨桐,马兆丰,罗守山.基于区块链的数据安全共享与受控分发技术研究与实现[J].信息网络安全,2022(2):55-63. 被引量：14
4亓晋,王微,陈孟玺,许斌,董振江,孙雁飞.工业互联网的概念、体系架构及关键技术[J].物联网学报,2022,6(2):38-49. 被引量：26
5陈鑫,李芳,丁海昕,孙唯哲,刘鑫,陈德训,叶跃进,何香.面向国产异构众核架构的CFD非结构网格计算并行优化方法[J].计算机科学,2022,49(6):99-107. 被引量：1
6涂家贤.智慧家庭体系架构技术研究[J].电视技术,2022,46(4):224-228. 被引量：1
7丁哲昭,储根深,胡长军,李扬.基于申威众核处理器的圣维南求解程序的并行与优化[J].计算机工程与科学,2021,43(5):820-829. 被引量：1
8殷鹏,陈爽,杨天开.我国智慧公园建设的现状分析及发展思考[J].中国信息化,2022(5):95-96. 被引量：2
9方芳.基于云平台构建的数据安全保护技术及策略[J].移动信息,2022(1):129-131. 被引量：1
10吴江,方帅,林蔚.数字化激发地方立法新活力[J].浙江人大,2022(5):32-33.

计算机科学

2022年第6期

浏览历史

内容加载中请稍等...

基于国产众核架构的非结构网格分区块重构预处理算法研究

参考文献4

二级参考文献59

共引文献37

相关作者

相关机构

相关主题

浏览历史