The author investigates the query optimization problem for parallel relational databases. A multi - weighted tree based query optimization method is proposed. The method consists of a multi - weighted tree based paral...The author investigates the query optimization problem for parallel relational databases. A multi - weighted tree based query optimization method is proposed. The method consists of a multi - weighted tree based parallel query plan model, a cost model for parallel qury plans and a query optimizer. The parallel query plan model is the first one to model all basic relational operations, all three types of parallelism of query execution, processor and memory allocation to operations, memory allocation to the buffers between operations in pipelines and data redistribution among processors. The cost model takes the waiting time of the operations in pipelining execution into consideration and is computable in a bottom - up fashion. The query optimizer addresses the query optimization problem in the context of Select - Project - Join queries that are widely used in commercial DBMSs. Several heuristics determining the processor allocation to operations are derived and used in the query optimizer. The query optimizer is aware of memory resources in order to generate good - quality plans. It includes the heuristics for determining the memory allocation to operations and buffers between operations in pipelines so that the memory resourse is fully exploit. In addition, multiple algorithms for implementing join operations are consided in the query optimizer. The query optimizer can make an optimal choice of join algorithm for each join operation in a query. The proposed query optimization method has been used in a prototype parallel database management system designed and implemented by the author.展开更多
Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational databas...Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.展开更多
Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made availabl...Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made available in distributed DBMS(D-DBMS). The structure of this optimal solution was firstly characterized according to the distributing status of tables and data, and then the recurrence relations between a problem and its sub-problems were recursively defined. DP in D-DBMS has the same time-complexity with that in centralized DBMS, while it has the capability to solve a much more sophisticated optimal problem of multi-table join in D-DBMS. The effectiveness of this optimal strategy has been proved by experiments.展开更多
A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key...A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key issues to theefficiency of a DeDB, the compilation process is decomposed into two phases.The first is the pre-compilation phase, which is responsible for the minimiza-tion of the potentially relevant facts. The second, which we refer to as thegeneral compilation phase, is responsible for the elimination of redundancy.The rule/goal graph devised by J. D. Ullman is appropriately extended andused as a uniform formalism. Two general algorithms corresponding to the twophases respectively are described intuitively and formally展开更多
Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartG...Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.展开更多
The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is...The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.展开更多
As the popularity of XML (extensible Markup Language) keeps growing rapidly,the management of XML compliant structured-document databases has become a very interesting andcompelling research area. Query optimization f...As the popularity of XML (extensible Markup Language) keeps growing rapidly,the management of XML compliant structured-document databases has become a very interesting andcompelling research area. Query optimization for XML structured-documents stands out as one of themost challenging research issues in this area because of the much enlarged optimization (search)space, which is a consequence of the intrinsic complexity of the underlying data model of XML data.We therefore propose to apply deterministic transformations on query expressions to mostaggressively prune the search space and fast achieve a sufficiently improved alternative (if not theoptimal) for each incoming query expression. This idea is not just exciting but practicallyattainable. This paper first provides an overview of our optimization strategy, and then focuses onthe key implementation issues of our rule-based transformation system for XML query optimization ina database environment. The performance results we obtained from experimentation show that ourapproach is a valid and effective one.展开更多
Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query pla...Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.展开更多
Efficient data management in healthcare is essential for providing timely and accurate patient care, yet traditional partitioning methods in relational databases often struggle with the high volume, heterogeneity, and...Efficient data management in healthcare is essential for providing timely and accurate patient care, yet traditional partitioning methods in relational databases often struggle with the high volume, heterogeneity, and regulatory complexity of healthcare data. This research introduces a tailored partitioning strategy leveraging the MD5 hashing algorithm to enhance data insertion, query performance, and load balancing in healthcare systems. By applying a consistent hash function to patient IDs, our approach achieves uniform distribution of records across partitions, optimizing retrieval paths and reducing access latency while ensuring data integrity and compliance. We evaluated the method through experiments focusing on partitioning efficiency, scalability, and fault tolerance. The partitioning efficiency analysis compared our MD5-based approach with standard round-robin methods, measuring insertion times, query latency, and data distribution balance. Scalability tests assessed system performance across increasing dataset sizes and varying partition counts, while fault tolerance experiments examined data integrity and retrieval performance under simulated partition failures. The experimental results demonstrate that the MD5-based partitioning strategy significantly reduces query retrieval times by optimizing data access patterns, achieving up to X% better performance compared to round-robin methods. It also scales effectively with larger datasets, maintaining low latency and ensuring robust resilience under failure scenarios. This novel approach offers a scalable, efficient, and fault-tolerant solution for healthcare systems, facilitating faster clinical decision-making and improved patient care in complex data environments.展开更多
We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel que...We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.展开更多
针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所...针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所提出的模型考虑了查询语句中包含的合取和析取运算,根据谓词之间的操作符类型将子表达式构建为树形结构,根据组合子表达式向量来表示连续向量空间中的任意逻辑表达式.TreeLSTM模型通过捕捉查询谓词之间的顺序依赖关系从而提升基数估计的性能和准确度,将TreeLSTM与基于直方图方法、基于学习的MSCN和TreeRNN方法进行了比较.实验结果表明:TreeLSTM的估算误差比直方图、MSCN、TreeRNN方法的误差分别降低了60.41%,33.33%和11.57%,该方法显著提高了基数估计器的性能.展开更多
基金Supported by the National Natural Science Foundation of China National (9846-004) '863' High -Technique Program of China (8
文摘The author investigates the query optimization problem for parallel relational databases. A multi - weighted tree based query optimization method is proposed. The method consists of a multi - weighted tree based parallel query plan model, a cost model for parallel qury plans and a query optimizer. The parallel query plan model is the first one to model all basic relational operations, all three types of parallelism of query execution, processor and memory allocation to operations, memory allocation to the buffers between operations in pipelines and data redistribution among processors. The cost model takes the waiting time of the operations in pipelining execution into consideration and is computable in a bottom - up fashion. The query optimizer addresses the query optimization problem in the context of Select - Project - Join queries that are widely used in commercial DBMSs. Several heuristics determining the processor allocation to operations are derived and used in the query optimizer. The query optimizer is aware of memory resources in order to generate good - quality plans. It includes the heuristics for determining the memory allocation to operations and buffers between operations in pipelines so that the memory resourse is fully exploit. In addition, multiple algorithms for implementing join operations are consided in the query optimizer. The query optimizer can make an optimal choice of join algorithm for each join operation in a query. The proposed query optimization method has been used in a prototype parallel database management system designed and implemented by the author.
基金This work is supported by the National High Technology Research and Development Program ofChina(2 0 0 2 AA135 2 30 ) and the Major Project of National Natural Science Foundation of Beijing(4 0 110 0 2 ) .
文摘Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.
文摘Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made available in distributed DBMS(D-DBMS). The structure of this optimal solution was firstly characterized according to the distributing status of tables and data, and then the recurrence relations between a problem and its sub-problems were recursively defined. DP in D-DBMS has the same time-complexity with that in centralized DBMS, while it has the capability to solve a much more sophisticated optimal problem of multi-table join in D-DBMS. The effectiveness of this optimal strategy has been proved by experiments.
文摘A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key issues to theefficiency of a DeDB, the compilation process is decomposed into two phases.The first is the pre-compilation phase, which is responsible for the minimiza-tion of the potentially relevant facts. The second, which we refer to as thegeneral compilation phase, is responsible for the elimination of redundancy.The rule/goal graph devised by J. D. Ullman is appropriately extended andused as a uniform formalism. Two general algorithms corresponding to the twophases respectively are described intuitively and formally
文摘Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.
文摘The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.
文摘As the popularity of XML (extensible Markup Language) keeps growing rapidly,the management of XML compliant structured-document databases has become a very interesting andcompelling research area. Query optimization for XML structured-documents stands out as one of themost challenging research issues in this area because of the much enlarged optimization (search)space, which is a consequence of the intrinsic complexity of the underlying data model of XML data.We therefore propose to apply deterministic transformations on query expressions to mostaggressively prune the search space and fast achieve a sufficiently improved alternative (if not theoptimal) for each incoming query expression. This idea is not just exciting but practicallyattainable. This paper first provides an overview of our optimization strategy, and then focuses onthe key implementation issues of our rule-based transformation system for XML query optimization ina database environment. The performance results we obtained from experimentation show that ourapproach is a valid and effective one.
基金The National High Technology Research and Development Program of China(863 Program) (No.2006AA01Z430)
文摘Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.
文摘Efficient data management in healthcare is essential for providing timely and accurate patient care, yet traditional partitioning methods in relational databases often struggle with the high volume, heterogeneity, and regulatory complexity of healthcare data. This research introduces a tailored partitioning strategy leveraging the MD5 hashing algorithm to enhance data insertion, query performance, and load balancing in healthcare systems. By applying a consistent hash function to patient IDs, our approach achieves uniform distribution of records across partitions, optimizing retrieval paths and reducing access latency while ensuring data integrity and compliance. We evaluated the method through experiments focusing on partitioning efficiency, scalability, and fault tolerance. The partitioning efficiency analysis compared our MD5-based approach with standard round-robin methods, measuring insertion times, query latency, and data distribution balance. Scalability tests assessed system performance across increasing dataset sizes and varying partition counts, while fault tolerance experiments examined data integrity and retrieval performance under simulated partition failures. The experimental results demonstrate that the MD5-based partitioning strategy significantly reduces query retrieval times by optimizing data access patterns, achieving up to X% better performance compared to round-robin methods. It also scales effectively with larger datasets, maintaining low latency and ensuring robust resilience under failure scenarios. This novel approach offers a scalable, efficient, and fault-tolerant solution for healthcare systems, facilitating faster clinical decision-making and improved patient care in complex data environments.
文摘We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.
文摘针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所提出的模型考虑了查询语句中包含的合取和析取运算,根据谓词之间的操作符类型将子表达式构建为树形结构,根据组合子表达式向量来表示连续向量空间中的任意逻辑表达式.TreeLSTM模型通过捕捉查询谓词之间的顺序依赖关系从而提升基数估计的性能和准确度,将TreeLSTM与基于直方图方法、基于学习的MSCN和TreeRNN方法进行了比较.实验结果表明:TreeLSTM的估算误差比直方图、MSCN、TreeRNN方法的误差分别降低了60.41%,33.33%和11.57%,该方法显著提高了基数估计器的性能.