To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over...With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.展开更多
Currently,ocean data portals are being developed around the world based on Geographic Information Systems(GIS) as a source of ocean data and information.However,given the relatively high temporal frequency and the int...Currently,ocean data portals are being developed around the world based on Geographic Information Systems(GIS) as a source of ocean data and information.However,given the relatively high temporal frequency and the intrinsic spatial nature of ocean data and information,no current GIS software is adequate to deal effectively and efficiently with spatiotemporal data.Furthermore,while existing ocean data portals are generally designed to meet the basic needs of a broad range of users,they are sometimes very complicated for general audiences,especially for those without training in GIS.In this paper,a new technical architecture for an ocean data integration and service system is put forward that consists of four layers:the operation layer,the extract,transform,and load(ETL) layer,the data warehouse layer,and the presentation layer.The integration technology based on the XML,ontology,and spatiotemporal data organization scheme for the data warehouse layer is then discussed.In addition,the ocean observing data service technology realized in the presentation layer is also discussed in detail,including the development of the web portal and ocean data sharing platform.The application on the Taiwan Strait shows that the technology studied in this paper can facilitate sharing,access,and use of ocean observation data.The paper is based on an ongoing research project for the development of an ocean observing information system for the Taiwan Strait that will facilitate the prevention of ocean disasters.展开更多
In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data ...In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data on the conceptual level. A conceptual data description approach of multidimensional data model was presented in order to conduct multidimensional data analysis of OLAP for multiple subjects. The UML (unified modeling language) galaxy diagram, describing the multidimensional structure of the conceptual integrating data at the conceptual level, was constructed. The approach was illuminated using a case of 2__roots UML galaxy diagram that takes one retailer and several suppliers of PC products into consideration.展开更多
In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to mul...In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.展开更多
Guyana’s capacity to address the impacts of climate change on its coastal environment requires the ability to mon-itor,quantify and understand coastal change over short-,medium-and long-term.Understanding the drivers...Guyana’s capacity to address the impacts of climate change on its coastal environment requires the ability to mon-itor,quantify and understand coastal change over short-,medium-and long-term.Understanding the drivers of change in coastal and marine environment can be achieved through the accurate measurement and critical anal-yses of morphologies,flows,processes and responses.This manuscript presents a strategy developed to create a central resource,database and web-based platform to integrate data and information on the drivers and the changes within Guyana coastal and marine environment.The strategy involves four complimentary work pack-ages including data collection,development of a platform for data integration,application of the data for coastal change analyses and consultation with stakeholders.The last aims to assess the role of the integrated data sys-tems to support strategic governance and sustainable decision-making.It is hoped that the output of this strategy would support the country’s climate-focused agencies,organisations,decision-makers,and researchers in their tasks and endeavours.展开更多
At present, with the sustainable development of society, the value of forestry resources has gradually attracted peoples attention. The unified registration and management of forest property rights can make its owners...At present, with the sustainable development of society, the value of forestry resources has gradually attracted peoples attention. The unified registration and management of forest property rights can make its ownership clearer, and the enthusiasm of employees can be fully stimulated. Taking unified registration of real estate as the starting point, this paper first introduces the background of registration of real estate with forest property rights, then analyzes the advantages and disadvantages of registration methods, and points out that the key to orderly carry out all work is to adopt the combination of actual measurement and illustration. Finally, it discusses how to integrate the data obtained from actual measurement and illustration, and summarizes the process of data integration and matters needing attention based on the accumulated experience in practice. It is hoped that it can help relevant personnel and provide theoretical basis for future work such as forest right confirmation and registration.展开更多
Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evalu...Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evaluation method by fusing Accelerated Degradation Testing(ADT)data,degradation data,and life data of small samples based on the uncertainty degradation process.An uncertain life model of PCB in airborne equipment is constructed by employing the uncertain distribution that considers the accelerated factor of multiple environmental conditions such as temperature,humidity,and salinity.In addition,a degradation process model of PCB in airborne equipment is constructed by employing the uncertain process of fusing ADT data and field data,in which the performance characteristics of dynamic cumulative change are included.Based on minimizing the pth sample moments,an integrated method for parameter estimation of the PCB in airborne equipment is proposed by fusing the multi-source data of life,degradation,and ADT.An engineering case illustrates the effectiveness and advantage of the proposed method.展开更多
Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we...Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we introduced a comprehensive single-cell transcriptomic atlas of Arabidopsis seedlings.This atlas is the result of meticulous integration of 63 previously published scRNA-seq datasets,addressing batch effects and conserving biological variance.This integration spans a broad spectrum of tissues,including both below-and above-ground parts.Utilizing a rigorous approach for cell type annotation,we identified 47 distinct cell types or states,largely expanding our current view of plant cell compositions.We systematically constructed cell-type specific gene regulatory networks and uncovered key regulators that act in a coordinated manner to control cell-type specific gene expression.Taken together,our study not only offers extensive plant cell atlas exploration that serves as a valuable resource,but also provides molecular insights into gene-regulatory programs that varies from different cell types.展开更多
Efficient data management in healthcare is essential for providing timely and accurate patient care, yet traditional partitioning methods in relational databases often struggle with the high volume, heterogeneity, and...Efficient data management in healthcare is essential for providing timely and accurate patient care, yet traditional partitioning methods in relational databases often struggle with the high volume, heterogeneity, and regulatory complexity of healthcare data. This research introduces a tailored partitioning strategy leveraging the MD5 hashing algorithm to enhance data insertion, query performance, and load balancing in healthcare systems. By applying a consistent hash function to patient IDs, our approach achieves uniform distribution of records across partitions, optimizing retrieval paths and reducing access latency while ensuring data integrity and compliance. We evaluated the method through experiments focusing on partitioning efficiency, scalability, and fault tolerance. The partitioning efficiency analysis compared our MD5-based approach with standard round-robin methods, measuring insertion times, query latency, and data distribution balance. Scalability tests assessed system performance across increasing dataset sizes and varying partition counts, while fault tolerance experiments examined data integrity and retrieval performance under simulated partition failures. The experimental results demonstrate that the MD5-based partitioning strategy significantly reduces query retrieval times by optimizing data access patterns, achieving up to X% better performance compared to round-robin methods. It also scales effectively with larger datasets, maintaining low latency and ensuring robust resilience under failure scenarios. This novel approach offers a scalable, efficient, and fault-tolerant solution for healthcare systems, facilitating faster clinical decision-making and improved patient care in complex data environments.展开更多
Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatia...Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatial data is a major problem that significantly hinders geospatial data integration and sharing.Ontologies are regarded as a promising way to solve semantic problems by providing a formalized representation of geographic entities and relationships between them in a manner understandable to machines.Thus,many efforts have been made to explore ontology-based geospatial data integration and sharing.However,there is a lack of a specialized ontology that would provide a unified description for geospatial data.In this paper,with a focus on the characteristics of geospatial data,we propose a unified framework for geospatial data ontology,denoted GeoDataOnt,to establish a semantic foundation for geospatial data integration and sharing.First,we provide a characteristics hierarchy of geospatial data.Next,we analyze the semantic problems for each characteristic of geospatial data.Subsequently,we propose the general framework of GeoDataOnt,targeting these problems according to the characteristics of geospatial data.GeoDataOnt is then divided into multiple modules,and we show a detailed design and implementation for each module.Key limitations and challenges of GeoDataOnt are identified,and broad applications of GeoDataOnt are discussed.展开更多
New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with incons...New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with inconsistencies is one of the important problems in data integration. In this paper we motivate the problem of data inconsistency solution for data integration in pervasive environments. We define data qualit~ criteria and expense quality criteria for data sources to solve data inconsistency. In our solution, firstly, data sources needing high expense to obtain data from them are discarded by using expense quality criteria and utility function. Since it is difficult to obtain the actual quality of data sources in pervasive computing environment, we introduce fuzzy multi-attribute group decision making approach to selecting the appropriate data sources. The experimental results show that our solution has ideal effectiveness.展开更多
In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and d...In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and develop web-based GIS systems based on SOA-SDI, allowing client applications to pull in, analyze and present spatial data from those available spatial data sources. The proposed architecture logically includes 4 layers or components; they are layer of multiple data provider services, layer of data in-tegration, layer of backend services, and front-end graphical user interface (GUI) for spatial data presentation. On the basis of the 4-layered SOA-SDI framework, WebGIS applications can be quickly deployed, which proves that SOA-SDI has the potential to reduce the input of software development and shorten the development period.展开更多
Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research para...Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research paradigm to decipher regulatory mechanisms. So far, numerous methods have been developed for reconstructing gene regulatory networks. Results: In this paper, we provide a review of bioinformatics methods for inferring gene regulatory network from omics data. To achieve the precision reconstruction of gene regulatory networks, an intuitive alternative is to integrate these available resources in a rational framework. We also provide computational perspectives in the endeavors of inferring gene regulatory networks from heterogeneous data. We highlight the importance of multi-omics data integration with prior knowledge in gene regulatory network inferences. Conclusions: We provide computational perspectives of inferring gene regulatory networks from multiple omics data and present theoretical analyses of existing challenges and possible solutions. We emphasize on prior knowledge and data integration in network inferences owing to their abilities of identifying regulatory causality.展开更多
Land cover is recognized as one of the fundamental terrestrial datasets required in land system change and other ecosystem related researches across the globe. The regional differentiation and spatial-temporal variati...Land cover is recognized as one of the fundamental terrestrial datasets required in land system change and other ecosystem related researches across the globe. The regional differentiation and spatial-temporal variation of land cover has significant impact on regional natural environment and socio-economic sustainable development. Under this context, we reconstructed the history land cover data in Siberia to provide a comparable datasets to the land cover datasets in China and abroad. In this paper, the European Space Agency(ESA) Global Land Cover Map(GlobCover), Landsat Thematic Mapper(TM), Enhanced Thematic Mapper(ETM), Multispectral Scanner(MSS) images, Google Earth images and other additional data were used to produce the land cover datasets in 1975 and 2010 in Siberia. Data evaluation show that the total user′s accuracy of land cover data in 2010 was 86.96%, which was higher than ESA GlobCover data in Siberia. The analysis on the land cover changes found that there were no big land cover changes in Siberia from 1975 to 2010 with only a few conversions between different natural forest types. The mainly changes are the conversion from deciduous needleleaf forest to deciduous broadleaf forest, deciduous needleleaf forest to mixed forest, savannas to deciduous needleleaf forest etc., indicating that the dominant driving factor of land cover changes in Siberia was natural element rather than human activities at some extent, which was very different from China. However, our purpose was not just to produce the land cover datasets at two time period or explore the driving factors of land cover changes in Siberia, we also paid attention on the significance and application of the datasets in various fields such as global climate change, geopolitics, cross-border cooperation and so on.展开更多
To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing ...To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.展开更多
Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-s...Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.展开更多
Background Various blood metabolites are known to be useful indicators of health status in dairy cattle,but their routine assessment is time-consuming,expensive,and stressful for the cows at the herd level.Thus,we eva...Background Various blood metabolites are known to be useful indicators of health status in dairy cattle,but their routine assessment is time-consuming,expensive,and stressful for the cows at the herd level.Thus,we evaluated the effectiveness of combining in-line near infrared(NIR)milk spectra with on-farm(days in milk[DIM]and parity)and genetic markers for predicting blood metabolites in Holstein cattle.Data were obtained from 388 Holstein cows from a farm with an AfiLab system.NIR spectra,on-farm information,and single nucleotide polymorphisms(SNP)markers were blended to develop calibration equations for blood metabolites using the elastic net(ENet)approach,considering 3 mod els:(1)Model 1(M1)including only NIR information,(2)Model 2(M2)with both NIR and on-farm information,and(3)Model 3(M3)combining NIR,on-farm and genomic information.Dimension reduction was considered for M3 by preselecting SNP markers from genome-wide association study(GWAS)results.Results Results indicate that M2 improved the predictive ability by an average of 19%for energy-related metabolites(glucose,cholesterol,NEFA,B H B,urea,and c reatinin e),20%for liver functio n/hepatic damage,7%for inflammation/innate immunity.24%for oxidative stress metabolites,and 23%for minerals compared to M1,Meanwhile,M3 further enhanced the predictive ability by 34%for energy-related metabolites,32%for liver function/hepatic damage,22%for inflammation/innate immunity,42.1%for oxidative stress metabolites,and 41%for mineralse compared to M1.We found improved predictive ability of M3 using selected SNP markers from GWAS results using a threshold of>2.0by 5%for energy-related metabolites,9%for liver function/hepatic damage,8%for inflammation/innate immunity,22%for oxidative stress metabolites,and 9%for minerals.Slight redu ctions were observed fo r phosphorus(2%),ferricreducing antioxidant power(1%),and glucose(3%).Furthermore,it was found that prediction accuracies are influenced by using more restrictive thresholds(-log_(10)^(P-value)>2.5 and 3.0),with a lower increase in the predictive ability.Conclusion Our results highlighted the potential of combining several sources of information,such as genetic markers,on-farm information,and in-line NIR infrared data improves the predictive ability of blood metabolites in dairy cattle,representing an effective strategy for large-scale in-line health monitoring in commercial herds.展开更多
Accurately and efficiently predicting the permeability of porous media is essential for addressing a wide range of hydrogeological issues.However,the complexity of porous media often limits the effectiveness of indivi...Accurately and efficiently predicting the permeability of porous media is essential for addressing a wide range of hydrogeological issues.However,the complexity of porous media often limits the effectiveness of individual prediction methods.This study introduces a novel Particle Swarm Optimization-based Permeability Integrated Prediction model(PSO-PIP),which incorporates a particle swarm optimization algorithm enhanced with dy-namic clustering and adaptive parameter tuning(KGPSO).The model integrates multi-source data from the Lattice Boltzmann Method(LBM),Pore Network Modeling(PNM),and Finite Difference Method(FDM).By assigning optimal weight coefficients to the outputs of these methods,the model minimizes deviations from actual values and enhances permeability prediction performance.Initially,the computational performances of the LBM,PNM,and FDM are comparatively analyzed on datasets consisting of sphere packings and real rock samples.It is observed that these methods exhibit computational biases in certain permeability ranges.The PSOPIP model is proposed to combine the strengths of each computational approach and mitigate their limitations.The PSO-PIP model consistently produces predictions that are highly congruent with actual permeability values across all prediction intervals,significantly enhancing prediction accuracy.The outcomes of this study provide a new tool and perspective for the comprehensive,rapid,and accurate prediction of permeability in porous media.展开更多
An 8×10 GHz receiver optical sub-assembly (ROSA) consisting of an 8-channel arrayed waveguide grating (AWG) and an 8-channel PIN photodetector (PD) array is designed and fabricated based on silica hybrid in...An 8×10 GHz receiver optical sub-assembly (ROSA) consisting of an 8-channel arrayed waveguide grating (AWG) and an 8-channel PIN photodetector (PD) array is designed and fabricated based on silica hybrid integration technology. Multimode output waveguides in the silica AWG with 2% refractive index difference are used to obtain fiat-top spectra. The output waveguide facet is polished to 45° bevel to change the light propagation direction into the mesa-type PIN PD, which simplifies the packaging process. The experimentM results show that the single channel I dB bandwidth of AWG ranges from 2.12nm to 3.06nm, the ROSA responsivity ranges from 0.097 A/W to 0.158A/W, and the 3dB bandwidth is up to 11 GHz. It is promising to be applied in the eight-lane WDM transmission system in data center interconnection.展开更多
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金Supportted by the Natural Science Foundation ofChina (60573091 ,60273018) National Basic Research and Develop-ment Programof China (2003CB317000) the Key Project of Minis-try of Education of China (03044) .
文摘With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.
基金Supported by National High Technology Research and Development Program of China (863 Program) (Nos. 2009AA12Z225,2009AA12Z208)the National Natural Science Foundation of China (No. 61074132)
文摘Currently,ocean data portals are being developed around the world based on Geographic Information Systems(GIS) as a source of ocean data and information.However,given the relatively high temporal frequency and the intrinsic spatial nature of ocean data and information,no current GIS software is adequate to deal effectively and efficiently with spatiotemporal data.Furthermore,while existing ocean data portals are generally designed to meet the basic needs of a broad range of users,they are sometimes very complicated for general audiences,especially for those without training in GIS.In this paper,a new technical architecture for an ocean data integration and service system is put forward that consists of four layers:the operation layer,the extract,transform,and load(ETL) layer,the data warehouse layer,and the presentation layer.The integration technology based on the XML,ontology,and spatiotemporal data organization scheme for the data warehouse layer is then discussed.In addition,the ocean observing data service technology realized in the presentation layer is also discussed in detail,including the development of the web portal and ocean data sharing platform.The application on the Taiwan Strait shows that the technology studied in this paper can facilitate sharing,access,and use of ocean observation data.The paper is based on an ongoing research project for the development of an ocean observing information system for the Taiwan Strait that will facilitate the prevention of ocean disasters.
文摘In e-commerce the multidimensional data analysis for OLAP (on-line analytical processing) based on the web data needs integrating various data sources such as XML (extensible markup language) data and relational data on the conceptual level. A conceptual data description approach of multidimensional data model was presented in order to conduct multidimensional data analysis of OLAP for multiple subjects. The UML (unified modeling language) galaxy diagram, describing the multidimensional structure of the conceptual integrating data at the conceptual level, was constructed. The approach was illuminated using a case of 2__roots UML galaxy diagram that takes one retailer and several suppliers of PC products into consideration.
基金This project was supported by China Postdoctoral Science Foundation (2005037506) and the National Natural ScienceFoundation of China (70472029)
文摘In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.
基金We appreciate United Nations Development Programme-Indonesia and Archipelagic&Island States(AIS)Forum for the 2021 Archipelagic&Island States Innovation Challenges Award given for this idea on Joint Research Programme in Climate Change Mitigation and Adaptation.
文摘Guyana’s capacity to address the impacts of climate change on its coastal environment requires the ability to mon-itor,quantify and understand coastal change over short-,medium-and long-term.Understanding the drivers of change in coastal and marine environment can be achieved through the accurate measurement and critical anal-yses of morphologies,flows,processes and responses.This manuscript presents a strategy developed to create a central resource,database and web-based platform to integrate data and information on the drivers and the changes within Guyana coastal and marine environment.The strategy involves four complimentary work pack-ages including data collection,development of a platform for data integration,application of the data for coastal change analyses and consultation with stakeholders.The last aims to assess the role of the integrated data sys-tems to support strategic governance and sustainable decision-making.It is hoped that the output of this strategy would support the country’s climate-focused agencies,organisations,decision-makers,and researchers in their tasks and endeavours.
文摘At present, with the sustainable development of society, the value of forestry resources has gradually attracted peoples attention. The unified registration and management of forest property rights can make its ownership clearer, and the enthusiasm of employees can be fully stimulated. Taking unified registration of real estate as the starting point, this paper first introduces the background of registration of real estate with forest property rights, then analyzes the advantages and disadvantages of registration methods, and points out that the key to orderly carry out all work is to adopt the combination of actual measurement and illustration. Finally, it discusses how to integrate the data obtained from actual measurement and illustration, and summarizes the process of data integration and matters needing attention based on the accumulated experience in practice. It is hoped that it can help relevant personnel and provide theoretical basis for future work such as forest right confirmation and registration.
基金supported by the National Natural Science Foundation of China(No.62073009).
文摘Accurately evaluating the lifespan of the Printed Circuit Board(PCB)in airborne equipment is an essential issue for aircraft design and operation in the marine atmospheric environment.This paper presents a novel evaluation method by fusing Accelerated Degradation Testing(ADT)data,degradation data,and life data of small samples based on the uncertainty degradation process.An uncertain life model of PCB in airborne equipment is constructed by employing the uncertain distribution that considers the accelerated factor of multiple environmental conditions such as temperature,humidity,and salinity.In addition,a degradation process model of PCB in airborne equipment is constructed by employing the uncertain process of fusing ADT data and field data,in which the performance characteristics of dynamic cumulative change are included.Based on minimizing the pth sample moments,an integrated method for parameter estimation of the PCB in airborne equipment is proposed by fusing the multi-source data of life,degradation,and ADT.An engineering case illustrates the effectiveness and advantage of the proposed method.
基金supported by the National Natural Science Foundation of China (No.32070656)the Nanjing University Deng Feng Scholars Program+1 种基金the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions,China Postdoctoral Science Foundation funded project (No.2022M711563)Jiangsu Funding Program for Excellent Postdoctoral Talent (No.2022ZB50)
文摘Plant morphogenesis relies on precise gene expression programs at the proper time and position which is orchestrated by transcription factors(TFs)in intricate regulatory networks in a cell-type specific manner.Here we introduced a comprehensive single-cell transcriptomic atlas of Arabidopsis seedlings.This atlas is the result of meticulous integration of 63 previously published scRNA-seq datasets,addressing batch effects and conserving biological variance.This integration spans a broad spectrum of tissues,including both below-and above-ground parts.Utilizing a rigorous approach for cell type annotation,we identified 47 distinct cell types or states,largely expanding our current view of plant cell compositions.We systematically constructed cell-type specific gene regulatory networks and uncovered key regulators that act in a coordinated manner to control cell-type specific gene expression.Taken together,our study not only offers extensive plant cell atlas exploration that serves as a valuable resource,but also provides molecular insights into gene-regulatory programs that varies from different cell types.
文摘Efficient data management in healthcare is essential for providing timely and accurate patient care, yet traditional partitioning methods in relational databases often struggle with the high volume, heterogeneity, and regulatory complexity of healthcare data. This research introduces a tailored partitioning strategy leveraging the MD5 hashing algorithm to enhance data insertion, query performance, and load balancing in healthcare systems. By applying a consistent hash function to patient IDs, our approach achieves uniform distribution of records across partitions, optimizing retrieval paths and reducing access latency while ensuring data integrity and compliance. We evaluated the method through experiments focusing on partitioning efficiency, scalability, and fault tolerance. The partitioning efficiency analysis compared our MD5-based approach with standard round-robin methods, measuring insertion times, query latency, and data distribution balance. Scalability tests assessed system performance across increasing dataset sizes and varying partition counts, while fault tolerance experiments examined data integrity and retrieval performance under simulated partition failures. The experimental results demonstrate that the MD5-based partitioning strategy significantly reduces query retrieval times by optimizing data access patterns, achieving up to X% better performance compared to round-robin methods. It also scales effectively with larger datasets, maintaining low latency and ensuring robust resilience under failure scenarios. This novel approach offers a scalable, efficient, and fault-tolerant solution for healthcare systems, facilitating faster clinical decision-making and improved patient care in complex data environments.
基金This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences[grant number XDA23100100]National Natural Science Foundation of China[grant number 41771430],[grant number 41631177]China Scholarship Council[grant number 201804910732].
文摘Effective integration and wide sharing of geospatial data is an important and basic premise to facilitate the research and applications of geographic information science.However,the semantic heterogeneity of geospatial data is a major problem that significantly hinders geospatial data integration and sharing.Ontologies are regarded as a promising way to solve semantic problems by providing a formalized representation of geographic entities and relationships between them in a manner understandable to machines.Thus,many efforts have been made to explore ontology-based geospatial data integration and sharing.However,there is a lack of a specialized ontology that would provide a unified description for geospatial data.In this paper,with a focus on the characteristics of geospatial data,we propose a unified framework for geospatial data ontology,denoted GeoDataOnt,to establish a semantic foundation for geospatial data integration and sharing.First,we provide a characteristics hierarchy of geospatial data.Next,we analyze the semantic problems for each characteristic of geospatial data.Subsequently,we propose the general framework of GeoDataOnt,targeting these problems according to the characteristics of geospatial data.GeoDataOnt is then divided into multiple modules,and we show a detailed design and implementation for each module.Key limitations and challenges of GeoDataOnt are identified,and broad applications of GeoDataOnt are discussed.
基金supported by the National Natural Science Foundation of China under Grant No. 60970010the National Basic Research 973 Program of China under Grant No. 2009CB320705the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20090073110026
文摘New challenges including how to share information on heterogeneous devices appear in data-intensive pervasive computing environments. Data integration is a practical approach to these applications. Dealing with inconsistencies is one of the important problems in data integration. In this paper we motivate the problem of data inconsistency solution for data integration in pervasive environments. We define data qualit~ criteria and expense quality criteria for data sources to solve data inconsistency. In our solution, firstly, data sources needing high expense to obtain data from them are discarded by using expense quality criteria and utility function. Since it is difficult to obtain the actual quality of data sources in pervasive computing environment, we introduce fuzzy multi-attribute group decision making approach to selecting the appropriate data sources. The experimental results show that our solution has ideal effectiveness.
基金Supported by the Research Fund of Key GIS Lab of the Education Ministry (No. 200610)
文摘In this paper we propose a service-oriented architecture for spatial data integration (SOA-SDI) in the context of a large number of available spatial data sources that are physically sitting at different places, and develop web-based GIS systems based on SOA-SDI, allowing client applications to pull in, analyze and present spatial data from those available spatial data sources. The proposed architecture logically includes 4 layers or components; they are layer of multiple data provider services, layer of data in-tegration, layer of backend services, and front-end graphical user interface (GUI) for spatial data presentation. On the basis of the 4-layered SOA-SDI framework, WebGIS applications can be quickly deployed, which proves that SOA-SDI has the potential to reduce the input of software development and shorten the development period.
基金Thanks are due to the three anonymous reviewers for their constructive comments. This work was partially supported by the National Natural Science Foundation of China (Nos. 61572287 and 61533011), the Shandong Provincial Key Research and Development Program (2018GSF 118043), the Natural Science Foundation of Shandong Province, China (ZR2015FQ001), the Fundamental Research Funds of Shandong University (Nos. 2015QY001 and 2016JC007), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, Ministry of Education of China.
文摘Background: More and more high-throughput datasets are available from multiple levels of measuring gene regulations. The reverse engineering of gene regulatory networks from these data offers a valuable research paradigm to decipher regulatory mechanisms. So far, numerous methods have been developed for reconstructing gene regulatory networks. Results: In this paper, we provide a review of bioinformatics methods for inferring gene regulatory network from omics data. To achieve the precision reconstruction of gene regulatory networks, an intuitive alternative is to integrate these available resources in a rational framework. We also provide computational perspectives in the endeavors of inferring gene regulatory networks from heterogeneous data. We highlight the importance of multi-omics data integration with prior knowledge in gene regulatory network inferences. Conclusions: We provide computational perspectives of inferring gene regulatory networks from multiple omics data and present theoretical analyses of existing challenges and possible solutions. We emphasize on prior knowledge and data integration in network inferences owing to their abilities of identifying regulatory causality.
基金Under the auspices of National Natural Science Foundation of China(No.41271416)Strategic Priority Research Program of Chinese Academy of Sciences(No.XDA05090310)
文摘Land cover is recognized as one of the fundamental terrestrial datasets required in land system change and other ecosystem related researches across the globe. The regional differentiation and spatial-temporal variation of land cover has significant impact on regional natural environment and socio-economic sustainable development. Under this context, we reconstructed the history land cover data in Siberia to provide a comparable datasets to the land cover datasets in China and abroad. In this paper, the European Space Agency(ESA) Global Land Cover Map(GlobCover), Landsat Thematic Mapper(TM), Enhanced Thematic Mapper(ETM), Multispectral Scanner(MSS) images, Google Earth images and other additional data were used to produce the land cover datasets in 1975 and 2010 in Siberia. Data evaluation show that the total user′s accuracy of land cover data in 2010 was 86.96%, which was higher than ESA GlobCover data in Siberia. The analysis on the land cover changes found that there were no big land cover changes in Siberia from 1975 to 2010 with only a few conversions between different natural forest types. The mainly changes are the conversion from deciduous needleleaf forest to deciduous broadleaf forest, deciduous needleleaf forest to mixed forest, savannas to deciduous needleleaf forest etc., indicating that the dominant driving factor of land cover changes in Siberia was natural element rather than human activities at some extent, which was very different from China. However, our purpose was not just to produce the land cover datasets at two time period or explore the driving factors of land cover changes in Siberia, we also paid attention on the significance and application of the datasets in various fields such as global climate change, geopolitics, cross-border cooperation and so on.
文摘To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.
基金funding within the Wheat BigData Project(German Federal Ministry of Food and Agriculture,FKZ2818408B18)。
文摘Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.
基金funding provided by Universitàdegli Studi di Padovapart of the project PROH-DAIRY project(Development of precision livestock breeding tools toward One Health in Italian and Israeli dairy chains)funded by the Ministry of Foreign Affairs and International Cooperation(MAECI)within the Italy-Israel R&D Cooperation Program(Roma,Italy)the Agritech National Research Center and received funding from the European Union Next-GenerationEU(PIANO NAZIONALE DI RIPRESA E RESILIENZA(PNRR)-MISSIONE 4 COM-PONENTE 2,INVESTIMENTO 1.4-D.D.103217/06/2022,CN00000022)。
文摘Background Various blood metabolites are known to be useful indicators of health status in dairy cattle,but their routine assessment is time-consuming,expensive,and stressful for the cows at the herd level.Thus,we evaluated the effectiveness of combining in-line near infrared(NIR)milk spectra with on-farm(days in milk[DIM]and parity)and genetic markers for predicting blood metabolites in Holstein cattle.Data were obtained from 388 Holstein cows from a farm with an AfiLab system.NIR spectra,on-farm information,and single nucleotide polymorphisms(SNP)markers were blended to develop calibration equations for blood metabolites using the elastic net(ENet)approach,considering 3 mod els:(1)Model 1(M1)including only NIR information,(2)Model 2(M2)with both NIR and on-farm information,and(3)Model 3(M3)combining NIR,on-farm and genomic information.Dimension reduction was considered for M3 by preselecting SNP markers from genome-wide association study(GWAS)results.Results Results indicate that M2 improved the predictive ability by an average of 19%for energy-related metabolites(glucose,cholesterol,NEFA,B H B,urea,and c reatinin e),20%for liver functio n/hepatic damage,7%for inflammation/innate immunity.24%for oxidative stress metabolites,and 23%for minerals compared to M1,Meanwhile,M3 further enhanced the predictive ability by 34%for energy-related metabolites,32%for liver function/hepatic damage,22%for inflammation/innate immunity,42.1%for oxidative stress metabolites,and 41%for mineralse compared to M1.We found improved predictive ability of M3 using selected SNP markers from GWAS results using a threshold of>2.0by 5%for energy-related metabolites,9%for liver function/hepatic damage,8%for inflammation/innate immunity,22%for oxidative stress metabolites,and 9%for minerals.Slight redu ctions were observed fo r phosphorus(2%),ferricreducing antioxidant power(1%),and glucose(3%).Furthermore,it was found that prediction accuracies are influenced by using more restrictive thresholds(-log_(10)^(P-value)>2.5 and 3.0),with a lower increase in the predictive ability.Conclusion Our results highlighted the potential of combining several sources of information,such as genetic markers,on-farm information,and in-line NIR infrared data improves the predictive ability of blood metabolites in dairy cattle,representing an effective strategy for large-scale in-line health monitoring in commercial herds.
基金supported by the National Key Research and Devel-opment Program of China (Grant No.2022YFC3005503)the National Natural Science Foundation of China (Grant Nos.52322907,52179141,U23B20149,U2340232)+1 种基金the Fundamental Research Funds for the Central Universities (Grant Nos.2042024kf1031,2042024kf0031)the Key Program of Science and Technology of Yunnan Province (Grant Nos.202202AF080004,202203AA080009).
文摘Accurately and efficiently predicting the permeability of porous media is essential for addressing a wide range of hydrogeological issues.However,the complexity of porous media often limits the effectiveness of individual prediction methods.This study introduces a novel Particle Swarm Optimization-based Permeability Integrated Prediction model(PSO-PIP),which incorporates a particle swarm optimization algorithm enhanced with dy-namic clustering and adaptive parameter tuning(KGPSO).The model integrates multi-source data from the Lattice Boltzmann Method(LBM),Pore Network Modeling(PNM),and Finite Difference Method(FDM).By assigning optimal weight coefficients to the outputs of these methods,the model minimizes deviations from actual values and enhances permeability prediction performance.Initially,the computational performances of the LBM,PNM,and FDM are comparatively analyzed on datasets consisting of sphere packings and real rock samples.It is observed that these methods exhibit computational biases in certain permeability ranges.The PSOPIP model is proposed to combine the strengths of each computational approach and mitigate their limitations.The PSO-PIP model consistently produces predictions that are highly congruent with actual permeability values across all prediction intervals,significantly enhancing prediction accuracy.The outcomes of this study provide a new tool and perspective for the comprehensive,rapid,and accurate prediction of permeability in porous media.
基金Supported by the National High Technology Research and Development Program of China under Grant No 2015AA016902the National Natural Science Foundation of China under Grant Nos 61435013 and 61405188the K.C.Wong Education Foundation
文摘An 8×10 GHz receiver optical sub-assembly (ROSA) consisting of an 8-channel arrayed waveguide grating (AWG) and an 8-channel PIN photodetector (PD) array is designed and fabricated based on silica hybrid integration technology. Multimode output waveguides in the silica AWG with 2% refractive index difference are used to obtain fiat-top spectra. The output waveguide facet is polished to 45° bevel to change the light propagation direction into the mesa-type PIN PD, which simplifies the packaging process. The experimentM results show that the single channel I dB bandwidth of AWG ranges from 2.12nm to 3.06nm, the ROSA responsivity ranges from 0.097 A/W to 0.158A/W, and the 3dB bandwidth is up to 11 GHz. It is promising to be applied in the eight-lane WDM transmission system in data center interconnection.