Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a dist...Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a distributed replica of a dataset exists.The aim is to partition the query payload(and its range) into subsets and distribute those to the replica nodes in a way that minimizes a client's response time.However,since query size and distribution characteristics of data(data dense/sparse regions) in varying ranges are not known a priori,performing efficient load balancing and parallel processing over the unpredictable workload is difficult.A technique based on the creation and manipulation of dynamic spatial indexes for query payload estimation in distributed queries was proposed.The effectiveness of this technique was demonstrated on queries for analysis of archived earthquake-generated seismic data records.展开更多
In a database-as-a-service(DaaS)model,a data owner stores data in a database server of a service provider,and the DaaS adopts the encryption for data privacy and indexing for data query.However,an attacker can obtain ...In a database-as-a-service(DaaS)model,a data owner stores data in a database server of a service provider,and the DaaS adopts the encryption for data privacy and indexing for data query.However,an attacker can obtain original data’s statistical information and distribution via the indexing distribution from the database of the service provider.In this work,a novel indexing schema is proposed to satisfy privacy-preserved data management requirements,in which an attacker cannot obtain data source distribution or statistic information from the index.The approach includes 2 parts:the Hash-based indexing for encrypted data and correctness verification for range queries.The evaluation results demonstrate that the approach can hide statistical information of encrypted data distribution while can also obtain correct answers for range queries.Meanwhile,the approach can achieve nearly 10 times and 35 times improvement on encrypted data publishing and indexing respectively,compared with the start-of-the-art method order-preserving Hash-based function(OPHF).展开更多
Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Pee...Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.展开更多
Distance-based range search is crucial in many real applications.In particular,given a database and a query issuer,a distance-based range search retrieves all the objects in the database whose distances from the query...Distance-based range search is crucial in many real applications.In particular,given a database and a query issuer,a distance-based range search retrieves all the objects in the database whose distances from the query issuer are less than or equal to a given threshold.Often,due to the accuracy of positioning devices,updating protocols or characteristics of applications(for example,location privacy protection),data obtained from real world are imprecise or uncertain.Therefore, existing approaches over exact databases cannot be directly applied to the uncertain scenario.In this paper,we redefine the distance-based range query in the context of uncertain databases,namely the probabilistic uncertain distance-based range (PUDR) queries,which obtain objects with confidence guarantees.We categorize the topological relationships between uncertain objects and uncertain search ranges into six cases and present the probability evaluation in each case.It is verified by experiments that our approach outperform Monte-Carlo method utilized in most existing work in precision and time cost for uniform uncertainty distribution.This approach approximates the probabilities of objects following other practical uncertainty distribution,such as Gaussian distribution with acceptable errors.Since the retrieval of a PUDR query requires accessing all the objects in the databases,which is quite costly,we propose spatial pruning and probabilistic pruning techniques to reduce the search space.Two metrics,false positive rate and false negative rate are introduced to measure the qualities of query results.An extensive empirical study has been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings.展开更多
I/O parallelism is considered to be a promising approach to achieving highperformance in parallel data warehousing systems where huge amounts of data and complex analyticalqueries have to be processed. This paper prop...I/O parallelism is considered to be a promising approach to achieving highperformance in parallel data warehousing systems where huge amounts of data and complex analyticalqueries have to be processed. This paper proposes a parallel secondary data cube storage structure(PHC for short) to efficiently support the processing of range sum queries and dynamic updates ondata cube using parallel computing systems. Based on PHC, two parallel algorithms for processingrange sum queries and updates are proposed also. Both the algorithms have the same time complexity,O(log^d n/P). The analytical and experimental results show that PHC and the parallel algorithms havehigh performance and achieve optimum speedup.展开更多
Data obtained from real world are imprecise or uncertain due to the accuracy of positioning devices,updating protocols or characteristics of applications.On the other hand,users sometimes prefer to qualitatively expre...Data obtained from real world are imprecise or uncertain due to the accuracy of positioning devices,updating protocols or characteristics of applications.On the other hand,users sometimes prefer to qualitatively express their requests with vague conditions and different parts of search region are in-equally important in some applications.We address the problem of efficiently processing the fuzzy range queries for uncertain moving objects whose whereabouts in time are not known exactly,for which the basic syntax is find objects always/sometimes near to the query issuer with the qualifying guarantees no less than a given threshold during a given temporal interval.We model the location uncertainty of moving objects on the utilization of probability density functions and describe the indeterminate boundary of query range with fuzzy set.We present the qualifying guarantee evaluation of objects,and propose pruning techniques based on the α-cut of fuzzy set to shrink the search space efficiently.We also design rules to reject non-qualifying objects and validate qualifying objects in order to avoid unnecessary costly numeric integrations in the refinement step.An extensive empirical study has been conducted to demonstrate the efficiency and effectiveness of algorithms under various experimental展开更多
We present a study to show the possibility of using two well-known space partitioning and indexing techniques, kd trees and quad trees, in declustering applications to increase input/output (I/O) paraUelization and ...We present a study to show the possibility of using two well-known space partitioning and indexing techniques, kd trees and quad trees, in declustering applications to increase input/output (I/O) paraUelization and reduce spatial data processing times. This parallelization enables time-consuming computational geometry algorithms to be applied efficiently to big spatial data rendering and querying. The key challenge is how to balance the spatial processing load across a large number of worker nodes, given significant performance heterogeneity in nodes and processing skews in the workload.展开更多
In order to reduce the disk access time, a database can be stored on several simultaneously accessi- ble disks. In this paper, we are concerned with the dynamic d-attribute database allocation problem for range querie...In order to reduce the disk access time, a database can be stored on several simultaneously accessi- ble disks. In this paper, we are concerned with the dynamic d-attribute database allocation problem for range queries. An allocation method, called coordinate modulo allocation method, is proposed to al- locate data in a d-attribute database among disks so that the maximum disk accessing concurrency can be achieved for range queries. Our analysis and experiments show that the method achieves the optimum or near-optimum parallelism for range queries. The paper offers the conditions under which the method is optimal. The worst case bounds of the performance of the method are also given. In addi- tion, the parallel algorithm of processing range queries is described at the end of the paper. The meth- od has been used in the statistic and scientific database management system which is being designed by us.展开更多
通过在U-tree中添加时间戳和速度矢量等时空因素,提出一种基于U-tree的高效率当前及未来不确定位置信息检索的索引结构TPU-tree,可以支持多维空间中不确定移动对象的索引,并提出了一种改进的基于p-bound的MP_BBRQ(modifiedp-bound based...通过在U-tree中添加时间戳和速度矢量等时空因素,提出一种基于U-tree的高效率当前及未来不确定位置信息检索的索引结构TPU-tree,可以支持多维空间中不确定移动对象的索引,并提出了一种改进的基于p-bound的MP_BBRQ(modifiedp-bound based range query)域查询处理算法,能够引入搜索区域进行预裁剪以减少查询精炼阶段所需代价偏高的积分计算.实验仿真表明,采用MP_BBRQ算法的TPU-tree概率查询性能极大地优于传统的TPR-tree索引,且更新性能与传统索引大致相当,具有良好的实用价值.展开更多
文摘Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a distributed replica of a dataset exists.The aim is to partition the query payload(and its range) into subsets and distribute those to the replica nodes in a way that minimizes a client's response time.However,since query size and distribution characteristics of data(data dense/sparse regions) in varying ranges are not known a priori,performing efficient load balancing and parallel processing over the unpredictable workload is difficult.A technique based on the creation and manipulation of dynamic spatial indexes for query payload estimation in distributed queries was proposed.The effectiveness of this technique was demonstrated on queries for analysis of archived earthquake-generated seismic data records.
基金the National Natural Science Foundation of China(No.61931019).
文摘In a database-as-a-service(DaaS)model,a data owner stores data in a database server of a service provider,and the DaaS adopts the encryption for data privacy and indexing for data query.However,an attacker can obtain original data’s statistical information and distribution via the indexing distribution from the database of the service provider.In this work,a novel indexing schema is proposed to satisfy privacy-preserved data management requirements,in which an attacker cannot obtain data source distribution or statistic information from the index.The approach includes 2 parts:the Hash-based indexing for encrypted data and correctness verification for range queries.The evaluation results demonstrate that the approach can hide statistical information of encrypted data distribution while can also obtain correct answers for range queries.Meanwhile,the approach can achieve nearly 10 times and 35 times improvement on encrypted data publishing and indexing respectively,compared with the start-of-the-art method order-preserving Hash-based function(OPHF).
基金Supported by the Natural Science Foundation ofJiangsu Province(BG2004034)
文摘Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.
基金supported by the National High Technology Research and Development 863 Program of China under Grant No. 2007AA01Z404the Program of Jiangsu Province under Grant No.BE2008135.
文摘Distance-based range search is crucial in many real applications.In particular,given a database and a query issuer,a distance-based range search retrieves all the objects in the database whose distances from the query issuer are less than or equal to a given threshold.Often,due to the accuracy of positioning devices,updating protocols or characteristics of applications(for example,location privacy protection),data obtained from real world are imprecise or uncertain.Therefore, existing approaches over exact databases cannot be directly applied to the uncertain scenario.In this paper,we redefine the distance-based range query in the context of uncertain databases,namely the probabilistic uncertain distance-based range (PUDR) queries,which obtain objects with confidence guarantees.We categorize the topological relationships between uncertain objects and uncertain search ranges into six cases and present the probability evaluation in each case.It is verified by experiments that our approach outperform Monte-Carlo method utilized in most existing work in precision and time cost for uniform uncertainty distribution.This approach approximates the probabilities of objects following other practical uncertainty distribution,such as Gaussian distribution with acceptable errors.Since the retrieval of a PUDR query requires accessing all the objects in the databases,which is quite costly,we propose spatial pruning and probabilistic pruning techniques to reduce the search space.Two metrics,false positive rate and false negative rate are introduced to measure the qualities of query results.An extensive empirical study has been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings.
文摘I/O parallelism is considered to be a promising approach to achieving highperformance in parallel data warehousing systems where huge amounts of data and complex analyticalqueries have to be processed. This paper proposes a parallel secondary data cube storage structure(PHC for short) to efficiently support the processing of range sum queries and dynamic updates ondata cube using parallel computing systems. Based on PHC, two parallel algorithms for processingrange sum queries and updates are proposed also. Both the algorithms have the same time complexity,O(log^d n/P). The analytical and experimental results show that PHC and the parallel algorithms havehigh performance and achieve optimum speedup.
基金supported by the National High Technology Research and Development 863 Program of China under Grant No. 2007AA01Z404the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No. 20103218110017+1 种基金the Science & Technology Pillar Program of Jiangsu Province of China under Grant No. BE2008135the Postdoctoral Science Foundation of China under Grant No. 20100481133.
文摘Data obtained from real world are imprecise or uncertain due to the accuracy of positioning devices,updating protocols or characteristics of applications.On the other hand,users sometimes prefer to qualitatively express their requests with vague conditions and different parts of search region are in-equally important in some applications.We address the problem of efficiently processing the fuzzy range queries for uncertain moving objects whose whereabouts in time are not known exactly,for which the basic syntax is find objects always/sometimes near to the query issuer with the qualifying guarantees no less than a given threshold during a given temporal interval.We model the location uncertainty of moving objects on the utilization of probability density functions and describe the indeterminate boundary of query range with fuzzy set.We present the qualifying guarantee evaluation of objects,and propose pruning techniques based on the α-cut of fuzzy set to shrink the search space efficiently.We also design rules to reject non-qualifying objects and validate qualifying objects in order to avoid unnecessary costly numeric integrations in the refinement step.An extensive empirical study has been conducted to demonstrate the efficiency and effectiveness of algorithms under various experimental
文摘We present a study to show the possibility of using two well-known space partitioning and indexing techniques, kd trees and quad trees, in declustering applications to increase input/output (I/O) paraUelization and reduce spatial data processing times. This parallelization enables time-consuming computational geometry algorithms to be applied efficiently to big spatial data rendering and querying. The key challenge is how to balance the spatial processing load across a large number of worker nodes, given significant performance heterogeneity in nodes and processing skews in the workload.
文摘In order to reduce the disk access time, a database can be stored on several simultaneously accessi- ble disks. In this paper, we are concerned with the dynamic d-attribute database allocation problem for range queries. An allocation method, called coordinate modulo allocation method, is proposed to al- locate data in a d-attribute database among disks so that the maximum disk accessing concurrency can be achieved for range queries. Our analysis and experiments show that the method achieves the optimum or near-optimum parallelism for range queries. The paper offers the conditions under which the method is optimal. The worst case bounds of the performance of the method are also given. In addi- tion, the parallel algorithm of processing range queries is described at the end of the paper. The meth- od has been used in the statistic and scientific database management system which is being designed by us.
文摘通过在U-tree中添加时间戳和速度矢量等时空因素,提出一种基于U-tree的高效率当前及未来不确定位置信息检索的索引结构TPU-tree,可以支持多维空间中不确定移动对象的索引,并提出了一种改进的基于p-bound的MP_BBRQ(modifiedp-bound based range query)域查询处理算法,能够引入搜索区域进行预裁剪以减少查询精炼阶段所需代价偏高的积分计算.实验仿真表明,采用MP_BBRQ算法的TPU-tree概率查询性能极大地优于传统的TPR-tree索引,且更新性能与传统索引大致相当,具有良好的实用价值.