摘要
针对当前常规方法无法满足大数据量存储与快速索引的问题,该文在分布式数据库HBase的基础上,设计了一种面向海量时空数据的多维检索策略。首先利用三维时空格网分割编码;然后将格网编码、感兴趣维度属性与HBase行键进行融合,设计了一种"时空+"可定制的多维索引结构,并给出了相应的检索策略和算法,较好地实现了多维数据的快速检索。基于2014年8月份成都市出租车轨迹数据的实验结果表明,相较于传统方法,所提方法能大幅度提升多维数据检索的效率,数据规模为1亿行时,耗时比达到319.79倍,且数据规模越大优势越明显;与Geohash空间降维编码相比,检索命中率明显提高,耗时明显减少。
According to the fact that the current conventional methods can not meet the needs of large data storage and fast indexing.Based on the distributed database HBase,this paper designed a multidimensional retrieval strategy for massive spatio-temporal data.F irstly,the data was segmented and coded by using three-dimensional spatiotcmporal grid.Then,a customizable multi-dimensional index structure named“spatio-temporal+”was designed by fusing the grid coding,dimension of interest attributes and HBase row keys.Finally,the corresponding retrieval strategy and algorithm were given based on the operation mechanism of HBase*realizing the fast retrieval of multidimensional data.Based on the taxi trajectory data of Chengdu in August 2014,the experimental results showed that,compared with traditional methods,this method could greatly improve the efficiency of multi-dimensional data retrieval.When the data scale was 100 million lines,the time consuming ratio reaches 319.79 times,and the larger the data scale,the more obvious the advantages.Compared with Geohash space dimension reduction coding,the retrieval hit rate was significantly improved and the time-consuming is significantly reduced.
作者
赵英豪
吕亮
徐青
施群山
卢万杰
ZHAO Yinghao;LYU Liang;XU Qing;SHI Qunshan;LU Wanjie(PLA Strategic Force Information Engineering University,Zhengzhou 450001,China)
出处
《测绘科学》
CSCD
北大核心
2020年第6期199-204,共6页
Science of Surveying and Mapping
基金
国家自然科学基金项目(41701463,41371436)
河南省科技攻关计划项目(172102210020)。
关键词
时空大数据
HBASE
格网编码
时空+
多维检索
spatio-temporal big data
HBase
grid coding
spatio-temporal+
multidimensional retrieval