摘要
地理国情统计分析是深度研究地理国情普查数据的首要前提.针对现有单机集中式数据存储与处理方式存在耗时长、效率低甚至不支持的问题,设计了"格网索引+MapReduce"策略,基于规则格网设计普查数据文件的分块组织与分布式存储方式,研制了格网索引与空间分析相结合的双层过滤机制,构建基于MapReduce的地理国情并行统计算法.最后,与无索引MapReduce、ArcGIS平台进行性能对比测试,结果表明:"格网索引+MapReduce"方法的统计效率远高于ArcGIS平台,对无索引MapReduce方法亦有明显的效率优势,研究拟为地理国情普查数据的高性能、多类型、大批量统计分析提供优选方案.
The statistic of geographical conditions is the primary premise for the deep excavation and application of geographical data.However,the traditional centralized data storage and processing method based on a single computer are time-consuming,inefficient and even unsupported.This paper creates a strategy called'Grid Index+ MapReduce'to solve these problems.Firstly,we design a blocking file organization and distributed storage mode of the census data of geographical situation based on the regular square grid,and then make a double layer filtering method which combines the grid index and the accurate analysis.Lastly,we build a parallel processing algorithm of statistic of the geography conditions based on MapReduce.The test results of performance comparison of the strategy of'Grid Index + MapReduce',the indexless MapReduce and ArcGIS software show that the method of'Grid Index+ MapReduce'is much more efficient than the ArcGIS software,and also has obvious efficiency advantages for the indexless MapReduce method.The study tries to provide an optimal scheme for the high-performance,multi-type and high-volume statistic and analysis method for the data of geographical condition survey.
出处
《浙江大学学报(理学版)》
CAS
CSCD
北大核心
2017年第6期660-665,共6页
Journal of Zhejiang University(Science Edition)
基金
国家自然科学基金资助项目(41471313
41671391)
国家科技基础性工作专项(2012FY112300)
国家海洋公益性行业科研专项(201505003)
浙江省科技攻关计划项目(2015C33021)
关键词
地理国情统计分析
地理国情普查数据
格网索引
MAPREDUCE
the statistic and analysis of geographical conditions
the data of geographical condition survey
grid index
MapReduce