摘要
传统MapReduce在处理倾斜数据时会造成负载不均衡,降低MapReduce框架的执行效率。虽然利用贪心算法分区减轻了MapReduce应用中的数据倾斜,但是忽略了Reduce异构性,因为MapReduce的计算环境通常是异构的,即使中间数据没有倾斜,由于计算能力不同,任务在不同节点上的执行时间也是不同的。为了避免异构性导致Reduce性能下降的问题,提出一种在异构环境下动态平滑加权轮询调度算法。该算法根据节点的计算能力和数据本地性这两个因素选取Reduce计算节点来提高Reduce任务执行效率,还进一步将优化后的框架用于并行图像处理。实验结果表明,动态平滑加权轮询调度算法减少了Reduce跨节点传输的网络带宽,同时也减少了Reduce任务的执行时间。
The traditional MapReduce framework may cause load imbalance when it is used to process skewed data,which can reduce the execution efficiency of the MapReduce framework.Although the application of the greedy algorithm partitioning can alleviate data skew in MapReduce applications,Reduce heterogeneity is ignored,because the computing environment of MapReduce is usually heterogeneous.Even if the intermediate data is not skewed,the execution time of tasks on different nodes is different due to different computing power.In order to avoid Reduce performance degradation caused by the heterogeneity,a dynamic smoothing algorithm of weighted polling scheduling in the heterogeneous environment is put forward.The algorithm is used to select the Reduce computing nodes according to the two factors of the node computing power and data locality,and improve the execution efficiency of Reduce tasks.The optimized framework is adopted for parallel image processing.The experimental results show that the dynamic smoothing weighted polling scheduling algorithm can reduce the network bandwidth of Reduce transmission across nodes and decrease the Reduce task execution time.
作者
黄伟建
贾孟玉
黄亮
HUANG Weijian;JIA Mengyu;HUANG Liang(School of Information and Electrical Engineering,Hebei University of Engineering,Handan 056038,China;Hebei Information Security Testing Evaluation Center,Shijiazhuang 050071,China)
出处
《现代电子技术》
北大核心
2020年第23期139-142,共4页
Modern Electronics Technique
基金
河北省自然科学基金项目:云计算中分布式Jobtracker节点模型的建立与优化(F2015402077)
河北省高等学校科学技术研究项目:基于复杂网络的空气质量动态分析和预测方法研究(QN2018073)。
关键词
Reduce任务调度
负载均衡
异构集群
平滑加权轮询算法
节点选取
并行图像处理
Reduce task scheduling
load balancing
heterogeneous cluster
smooth weighted polling algorithm
node selection
parallel image processing