With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbin...With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.展开更多
This study explores the effectiveness of machine learning models in predicting the air-side performance of microchannel heat exchangers.The data were generated by experimentally validated Computational Fluid Dynam-ics...This study explores the effectiveness of machine learning models in predicting the air-side performance of microchannel heat exchangers.The data were generated by experimentally validated Computational Fluid Dynam-ics(CFD)simulations of air-to-water microchannel heat exchangers.A distinctive aspect of this research is the comparative analysis of four diverse machine learning algorithms:Artificial Neural Networks(ANN),Support Vector Machines(SVM),Random Forest(RF),and Gaussian Process Regression(GPR).These models are adeptly applied to predict air-side heat transfer performance with high precision,with ANN and GPR exhibiting notably superior accuracy.Additionally,this research further delves into the influence of both geometric and operational parameters—including louvered angle,fin height,fin spacing,air inlet temperature,velocity,and tube temperature—on model performance.Moreover,it innovatively incorporates dimensionless numbers such as aspect ratio,fin height-to-spacing ratio,Reynolds number,Nusselt number,normalized air inlet temperature,temperature difference,and louvered angle into the input variables.This strategic inclusion significantly refines the predictive capabilities of the models by establishing a robust analytical framework supported by the CFD-generated database.The results show the enhanced prediction accuracy achieved by integrating dimensionless numbers,highlighting the effectiveness of data-driven approaches in precisely forecasting heat exchanger performance.This advancement is pivotal for the geometric optimization of heat exchangers,illustrating the considerable potential of integrating sophisticated modeling techniques with traditional engineering metrics.展开更多
Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to...Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to forecast shale gas production is still challenging due to complex fracture networks,dynamic fracture properties,frac hits,complicated multiphase flow,and multi-scale flow as well as data quality and uncertainty.This work develops an integrated framework for evaluating shale gas well production based on data-driven models.Firstly,a comprehensive dominated-factor system has been established,including geological,drilling,fracturing,and production factors.Data processing and visualization are required to ensure data quality and determine final data set.A shale gas production evaluation model is developed to evaluate shale gas production levels.Finally,the random forest algorithm is used to forecast shale gas production.The prediction accuracy of shale gas production level is higher than 95%based on the shale gas reservoirs in China.Forty-one wells are randomly selected to predict cumulative gas production using the optimal regression model.The proposed shale gas production evaluation frame-work overcomes too many assumptions of analytical or semi-analytical models and avoids huge computation cost and poor generalization for numerical modelling.展开更多
Vortex induced vibration(VIV)is a challenge in ocean engineering.Several devices including fairings have been designed to suppress VIV.However,how to optimize the design of suppression devices is still a problem to be...Vortex induced vibration(VIV)is a challenge in ocean engineering.Several devices including fairings have been designed to suppress VIV.However,how to optimize the design of suppression devices is still a problem to be solved.In this paper,an optimization design methodology is presented based on data-driven models and genetic algorithm(GA).Data-driven models are introduced to substitute complex physics-based equations.GA is used to rapidly search for the optimal suppression device from all possible solutions.Taking fairings as example,VIV response database for different fairings is established based on parameterized models in which model sections of fairings are controlled by several control points and Bezier curves.Then a data-driven model,which can predict the VIV response of fairings with different sections accurately and efficiently,is trained through BP neural network.Finally,a comprehensive optimization method and process is proposed based on GA and the data-driven model.The proposed method is demonstrated by its application to a case.It turns out that the proposed method can perform the optimization design of fairings effectively.VIV can be reduced obviously through the optimization design.展开更多
The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of m...The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of measurement recently,there are numerous data-driven methods devoted to discovering governing laws from data.In this work,a data-driven method is employed to perform the modeling of the projectile based on the Kramers–Moyal formulas.More specifically,the four-dimensional projectile system is assumed as an It?stochastic differential equation.Then the least square method and sparse learning are applied to identify the drift coefficient and diffusion matrix from sample path data,which agree well with the real system.The effectiveness of the data-driven method demonstrates that it will become a powerful tool in extracting governing equations and predicting complex dynamical behaviors of the projectile.展开更多
This work addresses the multiscale optimization of the puri cation processes of antibody fragments. Chromatography decisions in the manufacturing processes are optimized, including the number of chromatography columns...This work addresses the multiscale optimization of the puri cation processes of antibody fragments. Chromatography decisions in the manufacturing processes are optimized, including the number of chromatography columns and their sizes, the number of cycles per batch, and the operational ow velocities. Data-driven models of chromatography throughput are developed considering loaded mass, ow velocity, and column bed height as the inputs, using manufacturing-scale simulated datasets based on microscale experimental data. The piecewise linear regression modeling method is adapted due to its simplicity and better prediction accuracy in comparison with other methods. Two alternative mixed-integer nonlinear programming (MINLP) models are proposed to minimize the total cost of goods per gram of the antibody puri cation process, incorporating the data-driven models. These MINLP models are then reformulated as mixed-integer linear programming (MILP) models using linearization techniques and multiparametric disaggregation. Two industrially relevant cases with different chromatography column size alternatives are investigated to demonstrate the applicability of the proposed models.展开更多
The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased si...The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased significantly,making data driven models more challenging to develop.To address this prob lem,data augmentation technology has been introduced as an effective tool to solve the sparsity problem of high-dimensiona industrial data.This paper systematically explores and discusses the necessity,feasibility,and effectiveness of augmented indus trial data-driven modeling in the context of the curse of dimen sionality and virtual big data.Then,the process of data augmen tation modeling is analyzed,and the concept of data boosting augmentation is proposed.The data boosting augmentation involves designing the reliability weight and actual-virtual weigh functions,and developing a double weighted partial least squares model to optimize the three stages of data generation,data fusion and modeling.This approach significantly improves the inter pretability,effectiveness,and practicality of data augmentation in the industrial modeling.Finally,the proposed method is verified using practical examples of fault diagnosis systems and virtua measurement systems in the industry.The results demonstrate the effectiveness of the proposed approach in improving the accu racy and robustness of data-driven models,making them more suitable for real-world industrial applications.展开更多
With the continual deployment of power-electronics-interfaced renewable energy resources,increasing privacy concerns due to deregulation of electricity markets,and the diversification of demand-side activities,traditi...With the continual deployment of power-electronics-interfaced renewable energy resources,increasing privacy concerns due to deregulation of electricity markets,and the diversification of demand-side activities,traditional knowledge-based power system dynamic modeling methods are faced with unprecedented challenges.Data-driven modeling has been increasingly studied in recent years because of its lesser need for prior knowledge,higher capability of handling large-scale systems,and better adaptability to variations of system operating conditions.This paper discusses about the motivations and the generalized process of datadriven modeling,and provides a comprehensive overview of various state-of-the-art techniques and applications.It also comparatively presents the advantages and disadvantages of these methods and provides insight into outstanding challenges and possible research directions for the future.展开更多
In the synthesis of the control algorithm for complex systems, we are often faced with imprecise or unknown mathematical models of the dynamical systems, or even with problems in finding a mathematical model of the sy...In the synthesis of the control algorithm for complex systems, we are often faced with imprecise or unknown mathematical models of the dynamical systems, or even with problems in finding a mathematical model of the system in the open loop. To tackle these difficulties, an approach of data-driven model identification and control algorithm design based on the maximum stability degree criterion is proposed in this paper. The data-driven model identification procedure supposes the finding of the mathematical model of the system based on the undamped transient response of the closed-loop system. The system is approximated with the inertial model, where the coefficients are calculated based on the values of the critical transfer coefficient, oscillation amplitude and period of the underdamped response of the closed-loop system. The data driven control design supposes that the tuning parameters of the controller are calculated based on the parameters obtained from the previous step of system identification and there are presented the expressions for the calculation of the tuning parameters. The obtained results of data-driven model identification and algorithm for synthesis the controller were verified by computer simulation.展开更多
Sub-Saharan Africa(SSA)has the highest maternal and under-five mortality rates in the world.The advent of the coronavirus disease 2019 exacerbated the region's problems by overwhelming the health systems and affec...Sub-Saharan Africa(SSA)has the highest maternal and under-five mortality rates in the world.The advent of the coronavirus disease 2019 exacerbated the region's problems by overwhelming the health systems and affecting access to healthcare through travel restrictions and rechanelling of resources towards the containment of the pandemic.The region failed to achieve the Millenium Development Goals on maternal and child mortalities,and is poised to fail to achieve the same goals in the Sustainable Development Goals.To improve on the maternal and child health outcomes,many SSA countries introduced digital technologies for educating pregnant and nurs-ing women,making doctors'appointments and sending reminders to mothers and expectant mothers,as well as capturing information about patients and their illnesses.However,the collected epidemiological data are not being utilised to inform patient care and improve on the quality,efficiency and access to maternal,neonatal and child health(MNCH)care.To the researchers'best knowledge,no review paper has been published that focuses on digital health for MNCH care in SSA and proposes data-driven approaches to the same.Therefore,this study sought to:(1)identify digital systems for MNCH in SSA;(2)identify the applicability and weaknesses of the dig-ital MNCH systems in SSA;and(3)propose a data-driven model for diverging emerging technologies into MNCH services in SSA to make better use of data to improve MNCH care coverage,efficiency and quality.The PRISMA methodology was used in this study.The study revealed that there are no data-driven models for monitoring pregnant women and under-five children in Sub-Saharan Africa,with the available digital health technologies mainly based on SMS and websites.Thus,the current digital health systems in SSA do not support real-time,ubiquitous,pervasive and data-driven healthcare.Their main applicability is in non-real-time pregnancy moni-toring,education and information dissemination.Unless new and more effective approaches are implemented,SSA might remain with the highest and unacceptable maternal and under-five mortality rates globally.The study proposes feasible emerging technologies that can be used to provide data-driven healthcare for MNCH in SSA,and the recommendations on how to make the transition successful as well as the lessons learn from other regions.展开更多
Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic ...Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic process modelling(DSPM)approach into dynamic simulation of the railway vehicle collision.This DSPM approach consists of two steps:(i)process description,four kinds of kernels are used to describe the uncertainty inherent in collision processes;(ii)solving,stochastic variational inferences and mini-batch algorithms can then be used to accelerate computations of stochastic processes.By applying DSPM,Gaussian process regression(GPR)and finite element(FE)methods to two collision scenarios(i.e.lead car colliding with a rigid wall,and the lead car colliding with another lead car),we are able to achieve a comprehensive analysis.The comparison between the DSPM approach and the FE method revealed that the DSPM approach is capable of calculating the corresponding confidence interval,simultaneously improving the overall computational efficiency.Comparing the DSPM approach with the GPR method indicates that the DSPM approach has the ability to accurately describe the dynamic response under unknown conditions.Overall,this research demonstrates the feasibility and usability of the proposed DSPM approach for stochastic dynamics simulation of the railway vehicle collision.展开更多
Chlorine-based disinfection is ubiquitous in conventional drinking water treatment (DWT) and serves to mitigate threats of acute microbial disease caused by pathogens that may be present in source water. An important ...Chlorine-based disinfection is ubiquitous in conventional drinking water treatment (DWT) and serves to mitigate threats of acute microbial disease caused by pathogens that may be present in source water. An important index of disinfection efficiency is the free chlorine residual (FCR), a regulated disinfection parameter in the US that indirectly measures disinfectant power for prevention of microbial recontamination during DWT and distribution. This work demonstrates how machine learning (ML) can be implemented to improve FCR forecasting when supplied with water quality data from a real, full-scale chlorine disinfection system in Georgia, USA. More precisely, a gradient-boosting ML method (CatBoost) was developed from a full year of DWT plant-generated chlorine disinfection data, including water quality parameters (e.g., temperature, turbidity, pH) and operational process data (e.g., flowrates), to predict FCR. Four gradient-boosting models were implemented, with the highest performance achieving a coefficient of determination, R2, of 0.937. Values that provide explanations using Shapley’s additive method were used to interpret the model’s results, uncovering that standard DWT operating parameters, although non-intuitive and theoretically non-causal, vastly improved prediction performance. These results provide a base case for data-driven DWT disinfection supervision and suggest process monitoring methods to provide better information to plant operators for implementation of safe chlorine dosing to maintain optimum FCR.展开更多
Conventional automated machine learning(AutoML)technologies fall short in preprocessing low-quality raw data and adapting to varying indoor and outdoor environments,leading to accuracy reduction in forecasting short-t...Conventional automated machine learning(AutoML)technologies fall short in preprocessing low-quality raw data and adapting to varying indoor and outdoor environments,leading to accuracy reduction in forecasting short-term building energy loads.Moreover,their predictions are not transparent because of their black box nature.Hence,the building field currently lacks an AutoML framework capable of data quality enhancement,environment self-adaptation,and model interpretation.To address this research gap,an improved AutoML-based end-to-end data-driven modeling framework is proposed.Bayesian optimization is applied by this framework to find an optimal data preprocessing process for quality improvement of raw data.It bridges the gap where conventional AutoML technologies cannot automatically handle missing data and outliers.A sliding window-based model retraining strategy is utilized to achieve environment self-adaptation,contributing to the accuracy enhancement of AutoML technologies.Moreover,a local interpretable model-agnostic explanations-based approach is developed to interpret predictions made by the improved framework.It overcomes the poor interpretability of conventional AutoML technologies.The performance of the improved framework in forecasting one-hour ahead cooling loads is evaluated using two-year operational data from a real building.It is discovered that the accuracy of the improved framework increases by 4.24%–8.79%compared with four conventional frameworks for buildings with not only high-quality but also low-quality operational data.Furthermore,it is demonstrated that the developed model interpretation approach can effectively explain the predictions of the improved framework.The improved framework offers a novel perspective on creating accurate and reliable AutoML frameworks tailored to building energy load prediction tasks and other similar tasks.展开更多
The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
The optimization of two-scale structures can adapt to the different needs of materials in various regions by reasonably arranging different microstructures at the macro scale,thereby considerably improving structural ...The optimization of two-scale structures can adapt to the different needs of materials in various regions by reasonably arranging different microstructures at the macro scale,thereby considerably improving structural performance.Here,a multiple variable cutting(M-VCUT)level set-based data-driven model of microstructures is presented,and a method based on this model is proposed for the optimal design of two-scale structures.The geometry of the microstructure is described using the M-VCUT level set method,and the effective mechanical properties of microstructures are computed by the homogenization method.Then,a database of microstructures containing their geometric and mechanical parameters is constructed.The two sets of parameters are adopted as input and output datasets,and a mapping relationship between the two datasets is established to build the data-driven model of microstructures.During the optimization of two-scale structures,the data-driven model is used for macroscale finite element and sensitivity analyses.The efficiency of the analysis and optimization of two-scale structures is improved because the computational costs of invoking such a data-driven model are much smaller than those of homogenization.展开更多
During the boreal summer,intraseasonal oscillations exhibit significant interannual variations in intensity over two key regions:the central-western equatorial Pacific(5°S-5°N,150°E-150°W)and the s...During the boreal summer,intraseasonal oscillations exhibit significant interannual variations in intensity over two key regions:the central-western equatorial Pacific(5°S-5°N,150°E-150°W)and the subtropical Northwestern Pacific(10°-20°N,130°E-175°W).The former is well-documented and considered to be influenced by the ENSO,while the latter has received comparatively less attention and is likely influenced by the Pacific Meridional Mode(PMM),as suggested by partial correlation analysis results.To elucidate the physical processes responsible for the enhanced(weakened)intraseasonal convection over the subtropical northwestern Pacific during warm(cold)PMM years,the authors employed a moisture budget analysis.The findings reveal that during warm PMM years,there is an increase in summer-mean moisture over the subtropical northwestern Pacific.This increase interacts with intensified vertical motion perturbations in the region,leading to greater vertical moisture advection in the lower troposphere and consequently resulting in convective instability.Such a process is pivotal in amplifying intraseasonal convection anomalies.The observational findings were further verified by model experiments forced by PMM-like sea surface temperature patterns.展开更多
Objective: To develop a best-evidence-based optimal nutrition management plan for patients with chronic heart failure, apply it in clinical practice, and evaluate its effectiveness. Methods: Use the KTA knowledge tran...Objective: To develop a best-evidence-based optimal nutrition management plan for patients with chronic heart failure, apply it in clinical practice, and evaluate its effectiveness. Methods: Use the KTA knowledge translation model to guide evidence-based practice in nutrition management, and compare the nutritional status, cardiac function status, quality of life, and quality review indicators of chronic heart failure patients before and after the application of evidence. Results: After the application of evidence, the nutritional status indicators (MNA-SF score, albumin, hemoglobin) of two groups of heart failure patients significantly increased compared to before the application of evidence, with statistically significant differences (p Conclusion: The KTA knowledge translation model provides methodological guidance for the implementation of evidence-based practice for heart failure patients. This evidence-based practice project is beneficial for improving the outcomes of malnutrition in chronic heart failure patients and is conducive to standardizing nursing pathways, thereby promoting the improvement of nursing quality.展开更多
In order to enhance the control performance of piezo-positioning system,the influence of hysteresis characteristics and its compensation method are studied.Hammerstein model is used to represent the dynamic hysteresis...In order to enhance the control performance of piezo-positioning system,the influence of hysteresis characteristics and its compensation method are studied.Hammerstein model is used to represent the dynamic hysteresis nonlinear characteristics of piezo-positioning actuator.The static nonlinear part and dynamic linear part of the Hammerstein model are represented by models obtained through the Prandtl-Ishlinskii(PI)model and Hankel matrix system identification method,respectively.This model demonstrates good generalization capability for typical input frequencies below 200 Hz.A sliding mode inverse compensation tracking control strategy based on P-I inverse model and integral augmentation is proposed.Experimental results show that compared with PID inverse compensation control and sliding mode control without inverse compensation,the sliding mode inverse compensation control has a more ideal step response and no overshoot,moreover,the settling time is only 6.2 ms.In the frequency domain,the system closed-loop tracking bandwidth reaches 119.9 Hz,and the disturbance rejection bandwidth reaches 86.2 Hz.The proposed control strategy can effectively compensate the hysteresis nonlinearity,and improve the tracking accuracy and antidisturbance capability of piezo-positioning system.展开更多
When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding bia...When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.展开更多
To enhance energy interaction among low-voltage stations(LVSs)and reduce the line loss of the distribution network,a novel operation mode of the micro-pumped storage system(mPSS)has been proposed based on the common r...To enhance energy interaction among low-voltage stations(LVSs)and reduce the line loss of the distribution network,a novel operation mode of the micro-pumped storage system(mPSS)has been proposed based on the common reservoir.First,some operation modes of mPSS are analyzed,which include the separated reservoir mode(SRM)and common reservoir mode(CRM).Then,based on the SRM,and CRM,an energy mutual assistance control model between LVSs has been built to optimize energy loss.Finally,in the simulation,compared to the model without pumped storage in the LVS,the SRMand CLRMcan decrease the total energy loss by 294.377 and 432.578 kWh,respectively.The configuration of mPSS can improve the utilization rate of the new energy source generation system,and relieve the pressure of transformer capacity in the LVS.Compared with the SRM,the proposed CRM has reduced the total energy loss by 138.201 kWh,increased the new energy consumption by 161.642 kWh,and decreased the line loss by 7.271 kWh.With the efficiency of the mPSS improving,the total energy loss reduction of CRM will be 3.5 times that of SRM.Further,the CRMcan significantly reduce the reservoir capacity construction of mPSS and ismore suitable for scenarios where the capacity configuration of mPSS is limited.展开更多
基金Supported by the National Natural Science Foundation of China under Grant No.52131102.
文摘With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.
基金supported by the National Natural Science Foundation of China(Grant No.52306026)the Wenzhou Municipal Science and Technology Research Program(Grant No.G20220012)+2 种基金the Special Innovation Project Fund of the Institute of Wenzhou,Zhejiang University(XMGL-KJZX202205)the State Key Laboratory of Air-Conditioning Equipment and System Energy Conservation Open Project(Project No.ACSKL2021KT01)the Special Innovation Project Fund of the Institute of Wenzhou,Zhejiang University(XMGL-KJZX-202205).
文摘This study explores the effectiveness of machine learning models in predicting the air-side performance of microchannel heat exchangers.The data were generated by experimentally validated Computational Fluid Dynam-ics(CFD)simulations of air-to-water microchannel heat exchangers.A distinctive aspect of this research is the comparative analysis of four diverse machine learning algorithms:Artificial Neural Networks(ANN),Support Vector Machines(SVM),Random Forest(RF),and Gaussian Process Regression(GPR).These models are adeptly applied to predict air-side heat transfer performance with high precision,with ANN and GPR exhibiting notably superior accuracy.Additionally,this research further delves into the influence of both geometric and operational parameters—including louvered angle,fin height,fin spacing,air inlet temperature,velocity,and tube temperature—on model performance.Moreover,it innovatively incorporates dimensionless numbers such as aspect ratio,fin height-to-spacing ratio,Reynolds number,Nusselt number,normalized air inlet temperature,temperature difference,and louvered angle into the input variables.This strategic inclusion significantly refines the predictive capabilities of the models by establishing a robust analytical framework supported by the CFD-generated database.The results show the enhanced prediction accuracy achieved by integrating dimensionless numbers,highlighting the effectiveness of data-driven approaches in precisely forecasting heat exchanger performance.This advancement is pivotal for the geometric optimization of heat exchangers,illustrating the considerable potential of integrating sophisticated modeling techniques with traditional engineering metrics.
基金funded by National Natural Science Foundation of China(52004238)China Postdoctoral Science Foundation(2019M663561).
文摘Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to forecast shale gas production is still challenging due to complex fracture networks,dynamic fracture properties,frac hits,complicated multiphase flow,and multi-scale flow as well as data quality and uncertainty.This work develops an integrated framework for evaluating shale gas well production based on data-driven models.Firstly,a comprehensive dominated-factor system has been established,including geological,drilling,fracturing,and production factors.Data processing and visualization are required to ensure data quality and determine final data set.A shale gas production evaluation model is developed to evaluate shale gas production levels.Finally,the random forest algorithm is used to forecast shale gas production.The prediction accuracy of shale gas production level is higher than 95%based on the shale gas reservoirs in China.Forty-one wells are randomly selected to predict cumulative gas production using the optimal regression model.The proposed shale gas production evaluation frame-work overcomes too many assumptions of analytical or semi-analytical models and avoids huge computation cost and poor generalization for numerical modelling.
基金supported by the National Natural Science Foundation of China(Grant No.51809279)the Major National Science and Technology Program(Grant No.2016ZX05028-001-05)+1 种基金Program for Changjiang Scholars and Innovative Research Team in University(Grant No.IRT14R58)the Fundamental Research Funds for the Central Universities,that is,the Opening Fund of National Engineering Laboratory of Offshore Geophysical and Exploration Equipment(Grant No.20CX02302A).
文摘Vortex induced vibration(VIV)is a challenge in ocean engineering.Several devices including fairings have been designed to suppress VIV.However,how to optimize the design of suppression devices is still a problem to be solved.In this paper,an optimization design methodology is presented based on data-driven models and genetic algorithm(GA).Data-driven models are introduced to substitute complex physics-based equations.GA is used to rapidly search for the optimal suppression device from all possible solutions.Taking fairings as example,VIV response database for different fairings is established based on parameterized models in which model sections of fairings are controlled by several control points and Bezier curves.Then a data-driven model,which can predict the VIV response of fairings with different sections accurately and efficiently,is trained through BP neural network.Finally,a comprehensive optimization method and process is proposed based on GA and the data-driven model.The proposed method is demonstrated by its application to a case.It turns out that the proposed method can perform the optimization design of fairings effectively.VIV can be reduced obviously through the optimization design.
基金the Six Talent Peaks Project in Jiangsu Province,China(Grant No.JXQC-002)。
文摘The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of measurement recently,there are numerous data-driven methods devoted to discovering governing laws from data.In this work,a data-driven method is employed to perform the modeling of the projectile based on the Kramers–Moyal formulas.More specifically,the four-dimensional projectile system is assumed as an It?stochastic differential equation.Then the least square method and sparse learning are applied to identify the drift coefficient and diffusion matrix from sample path data,which agree well with the real system.The effectiveness of the data-driven method demonstrates that it will become a powerful tool in extracting governing equations and predicting complex dynamical behaviors of the projectile.
文摘This work addresses the multiscale optimization of the puri cation processes of antibody fragments. Chromatography decisions in the manufacturing processes are optimized, including the number of chromatography columns and their sizes, the number of cycles per batch, and the operational ow velocities. Data-driven models of chromatography throughput are developed considering loaded mass, ow velocity, and column bed height as the inputs, using manufacturing-scale simulated datasets based on microscale experimental data. The piecewise linear regression modeling method is adapted due to its simplicity and better prediction accuracy in comparison with other methods. Two alternative mixed-integer nonlinear programming (MINLP) models are proposed to minimize the total cost of goods per gram of the antibody puri cation process, incorporating the data-driven models. These MINLP models are then reformulated as mixed-integer linear programming (MILP) models using linearization techniques and multiparametric disaggregation. Two industrially relevant cases with different chromatography column size alternatives are investigated to demonstrate the applicability of the proposed models.
基金supported in part by the National Natural Science Foundation of China(NSFC)(92167106,61833014)Key Research and Development Program of Zhejiang Province(2022C01206)。
文摘The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased significantly,making data driven models more challenging to develop.To address this prob lem,data augmentation technology has been introduced as an effective tool to solve the sparsity problem of high-dimensiona industrial data.This paper systematically explores and discusses the necessity,feasibility,and effectiveness of augmented indus trial data-driven modeling in the context of the curse of dimen sionality and virtual big data.Then,the process of data augmen tation modeling is analyzed,and the concept of data boosting augmentation is proposed.The data boosting augmentation involves designing the reliability weight and actual-virtual weigh functions,and developing a double weighted partial least squares model to optimize the three stages of data generation,data fusion and modeling.This approach significantly improves the inter pretability,effectiveness,and practicality of data augmentation in the industrial modeling.Finally,the proposed method is verified using practical examples of fault diagnosis systems and virtua measurement systems in the industry.The results demonstrate the effectiveness of the proposed approach in improving the accu racy and robustness of data-driven models,making them more suitable for real-world industrial applications.
基金supported by the U.S.Department of Energy’s Office of Energy Efficiency and Renewable Energy(EERE)under the Solar Energy Technologies Office Award Number 38456.
文摘With the continual deployment of power-electronics-interfaced renewable energy resources,increasing privacy concerns due to deregulation of electricity markets,and the diversification of demand-side activities,traditional knowledge-based power system dynamic modeling methods are faced with unprecedented challenges.Data-driven modeling has been increasingly studied in recent years because of its lesser need for prior knowledge,higher capability of handling large-scale systems,and better adaptability to variations of system operating conditions.This paper discusses about the motivations and the generalized process of datadriven modeling,and provides a comprehensive overview of various state-of-the-art techniques and applications.It also comparatively presents the advantages and disadvantages of these methods and provides insight into outstanding challenges and possible research directions for the future.
文摘In the synthesis of the control algorithm for complex systems, we are often faced with imprecise or unknown mathematical models of the dynamical systems, or even with problems in finding a mathematical model of the system in the open loop. To tackle these difficulties, an approach of data-driven model identification and control algorithm design based on the maximum stability degree criterion is proposed in this paper. The data-driven model identification procedure supposes the finding of the mathematical model of the system based on the undamped transient response of the closed-loop system. The system is approximated with the inertial model, where the coefficients are calculated based on the values of the critical transfer coefficient, oscillation amplitude and period of the underdamped response of the closed-loop system. The data driven control design supposes that the tuning parameters of the controller are calculated based on the parameters obtained from the previous step of system identification and there are presented the expressions for the calculation of the tuning parameters. The obtained results of data-driven model identification and algorithm for synthesis the controller were verified by computer simulation.
文摘Sub-Saharan Africa(SSA)has the highest maternal and under-five mortality rates in the world.The advent of the coronavirus disease 2019 exacerbated the region's problems by overwhelming the health systems and affecting access to healthcare through travel restrictions and rechanelling of resources towards the containment of the pandemic.The region failed to achieve the Millenium Development Goals on maternal and child mortalities,and is poised to fail to achieve the same goals in the Sustainable Development Goals.To improve on the maternal and child health outcomes,many SSA countries introduced digital technologies for educating pregnant and nurs-ing women,making doctors'appointments and sending reminders to mothers and expectant mothers,as well as capturing information about patients and their illnesses.However,the collected epidemiological data are not being utilised to inform patient care and improve on the quality,efficiency and access to maternal,neonatal and child health(MNCH)care.To the researchers'best knowledge,no review paper has been published that focuses on digital health for MNCH care in SSA and proposes data-driven approaches to the same.Therefore,this study sought to:(1)identify digital systems for MNCH in SSA;(2)identify the applicability and weaknesses of the dig-ital MNCH systems in SSA;and(3)propose a data-driven model for diverging emerging technologies into MNCH services in SSA to make better use of data to improve MNCH care coverage,efficiency and quality.The PRISMA methodology was used in this study.The study revealed that there are no data-driven models for monitoring pregnant women and under-five children in Sub-Saharan Africa,with the available digital health technologies mainly based on SMS and websites.Thus,the current digital health systems in SSA do not support real-time,ubiquitous,pervasive and data-driven healthcare.Their main applicability is in non-real-time pregnancy moni-toring,education and information dissemination.Unless new and more effective approaches are implemented,SSA might remain with the highest and unacceptable maternal and under-five mortality rates globally.The study proposes feasible emerging technologies that can be used to provide data-driven healthcare for MNCH in SSA,and the recommendations on how to make the transition successful as well as the lessons learn from other regions.
基金supported by the National Key Research and Development Project(No.2019YFB1405401)the National Natural Science Foundation of China(No.5217120056)。
文摘Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic process modelling(DSPM)approach into dynamic simulation of the railway vehicle collision.This DSPM approach consists of two steps:(i)process description,four kinds of kernels are used to describe the uncertainty inherent in collision processes;(ii)solving,stochastic variational inferences and mini-batch algorithms can then be used to accelerate computations of stochastic processes.By applying DSPM,Gaussian process regression(GPR)and finite element(FE)methods to two collision scenarios(i.e.lead car colliding with a rigid wall,and the lead car colliding with another lead car),we are able to achieve a comprehensive analysis.The comparison between the DSPM approach and the FE method revealed that the DSPM approach is capable of calculating the corresponding confidence interval,simultaneously improving the overall computational efficiency.Comparing the DSPM approach with the GPR method indicates that the DSPM approach has the ability to accurately describe the dynamic response under unknown conditions.Overall,this research demonstrates the feasibility and usability of the proposed DSPM approach for stochastic dynamics simulation of the railway vehicle collision.
基金supported by:US Department of Agriculture’s National Institute of Food and Agriculture,Agriculture and Food Research Initiative,Water for Food Production Systems(No.2018-68011-28371)National Science Foundation(USA)(Nos.1936928,2112533)+1 种基金US Department of Agriculture’National Institute of Food and Agriculture(No.2020-67021-31526)US Environmental Protection Agency(No.840080010).
文摘Chlorine-based disinfection is ubiquitous in conventional drinking water treatment (DWT) and serves to mitigate threats of acute microbial disease caused by pathogens that may be present in source water. An important index of disinfection efficiency is the free chlorine residual (FCR), a regulated disinfection parameter in the US that indirectly measures disinfectant power for prevention of microbial recontamination during DWT and distribution. This work demonstrates how machine learning (ML) can be implemented to improve FCR forecasting when supplied with water quality data from a real, full-scale chlorine disinfection system in Georgia, USA. More precisely, a gradient-boosting ML method (CatBoost) was developed from a full year of DWT plant-generated chlorine disinfection data, including water quality parameters (e.g., temperature, turbidity, pH) and operational process data (e.g., flowrates), to predict FCR. Four gradient-boosting models were implemented, with the highest performance achieving a coefficient of determination, R2, of 0.937. Values that provide explanations using Shapley’s additive method were used to interpret the model’s results, uncovering that standard DWT operating parameters, although non-intuitive and theoretically non-causal, vastly improved prediction performance. These results provide a base case for data-driven DWT disinfection supervision and suggest process monitoring methods to provide better information to plant operators for implementation of safe chlorine dosing to maintain optimum FCR.
基金funded by the National Natural Science Foundation of China(No.52161135202)Hangzhou Key Scientific Research Plan Project(No.2023SZD0028).
文摘Conventional automated machine learning(AutoML)technologies fall short in preprocessing low-quality raw data and adapting to varying indoor and outdoor environments,leading to accuracy reduction in forecasting short-term building energy loads.Moreover,their predictions are not transparent because of their black box nature.Hence,the building field currently lacks an AutoML framework capable of data quality enhancement,environment self-adaptation,and model interpretation.To address this research gap,an improved AutoML-based end-to-end data-driven modeling framework is proposed.Bayesian optimization is applied by this framework to find an optimal data preprocessing process for quality improvement of raw data.It bridges the gap where conventional AutoML technologies cannot automatically handle missing data and outliers.A sliding window-based model retraining strategy is utilized to achieve environment self-adaptation,contributing to the accuracy enhancement of AutoML technologies.Moreover,a local interpretable model-agnostic explanations-based approach is developed to interpret predictions made by the improved framework.It overcomes the poor interpretability of conventional AutoML technologies.The performance of the improved framework in forecasting one-hour ahead cooling loads is evaluated using two-year operational data from a real building.It is discovered that the accuracy of the improved framework increases by 4.24%–8.79%compared with four conventional frameworks for buildings with not only high-quality but also low-quality operational data.Furthermore,it is demonstrated that the developed model interpretation approach can effectively explain the predictions of the improved framework.The improved framework offers a novel perspective on creating accurate and reliable AutoML frameworks tailored to building energy load prediction tasks and other similar tasks.
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
基金supported by the National Natural Science Foundation of China(Grant No.12272144).
文摘The optimization of two-scale structures can adapt to the different needs of materials in various regions by reasonably arranging different microstructures at the macro scale,thereby considerably improving structural performance.Here,a multiple variable cutting(M-VCUT)level set-based data-driven model of microstructures is presented,and a method based on this model is proposed for the optimal design of two-scale structures.The geometry of the microstructure is described using the M-VCUT level set method,and the effective mechanical properties of microstructures are computed by the homogenization method.Then,a database of microstructures containing their geometric and mechanical parameters is constructed.The two sets of parameters are adopted as input and output datasets,and a mapping relationship between the two datasets is established to build the data-driven model of microstructures.During the optimization of two-scale structures,the data-driven model is used for macroscale finite element and sensitivity analyses.The efficiency of the analysis and optimization of two-scale structures is improved because the computational costs of invoking such a data-driven model are much smaller than those of homogenization.
基金supported by the National Natural Science Foundation of China [grant number 42088101]。
文摘During the boreal summer,intraseasonal oscillations exhibit significant interannual variations in intensity over two key regions:the central-western equatorial Pacific(5°S-5°N,150°E-150°W)and the subtropical Northwestern Pacific(10°-20°N,130°E-175°W).The former is well-documented and considered to be influenced by the ENSO,while the latter has received comparatively less attention and is likely influenced by the Pacific Meridional Mode(PMM),as suggested by partial correlation analysis results.To elucidate the physical processes responsible for the enhanced(weakened)intraseasonal convection over the subtropical northwestern Pacific during warm(cold)PMM years,the authors employed a moisture budget analysis.The findings reveal that during warm PMM years,there is an increase in summer-mean moisture over the subtropical northwestern Pacific.This increase interacts with intensified vertical motion perturbations in the region,leading to greater vertical moisture advection in the lower troposphere and consequently resulting in convective instability.Such a process is pivotal in amplifying intraseasonal convection anomalies.The observational findings were further verified by model experiments forced by PMM-like sea surface temperature patterns.
文摘Objective: To develop a best-evidence-based optimal nutrition management plan for patients with chronic heart failure, apply it in clinical practice, and evaluate its effectiveness. Methods: Use the KTA knowledge translation model to guide evidence-based practice in nutrition management, and compare the nutritional status, cardiac function status, quality of life, and quality review indicators of chronic heart failure patients before and after the application of evidence. Results: After the application of evidence, the nutritional status indicators (MNA-SF score, albumin, hemoglobin) of two groups of heart failure patients significantly increased compared to before the application of evidence, with statistically significant differences (p Conclusion: The KTA knowledge translation model provides methodological guidance for the implementation of evidence-based practice for heart failure patients. This evidence-based practice project is beneficial for improving the outcomes of malnutrition in chronic heart failure patients and is conducive to standardizing nursing pathways, thereby promoting the improvement of nursing quality.
文摘In order to enhance the control performance of piezo-positioning system,the influence of hysteresis characteristics and its compensation method are studied.Hammerstein model is used to represent the dynamic hysteresis nonlinear characteristics of piezo-positioning actuator.The static nonlinear part and dynamic linear part of the Hammerstein model are represented by models obtained through the Prandtl-Ishlinskii(PI)model and Hankel matrix system identification method,respectively.This model demonstrates good generalization capability for typical input frequencies below 200 Hz.A sliding mode inverse compensation tracking control strategy based on P-I inverse model and integral augmentation is proposed.Experimental results show that compared with PID inverse compensation control and sliding mode control without inverse compensation,the sliding mode inverse compensation control has a more ideal step response and no overshoot,moreover,the settling time is only 6.2 ms.In the frequency domain,the system closed-loop tracking bandwidth reaches 119.9 Hz,and the disturbance rejection bandwidth reaches 86.2 Hz.The proposed control strategy can effectively compensate the hysteresis nonlinearity,and improve the tracking accuracy and antidisturbance capability of piezo-positioning system.
文摘When assessing seismic liquefaction potential with data-driven models,addressing the uncertainties of establishing models,interpreting cone penetration tests(CPT)data and decision threshold is crucial for avoiding biased data selection,ameliorating overconfident models,and being flexible to varying practical objectives,especially when the training and testing data are not identically distributed.A workflow characterized by leveraging Bayesian methodology was proposed to address these issues.Employing a Multi-Layer Perceptron(MLP)as the foundational model,this approach was benchmarked against empirical methods and advanced algorithms for its efficacy in simplicity,accuracy,and resistance to overfitting.The analysis revealed that,while MLP models optimized via maximum a posteriori algorithm suffices for straightforward scenarios,Bayesian neural networks showed great potential for preventing overfitting.Additionally,integrating decision thresholds through various evaluative principles offers insights for challenging decisions.Two case studies demonstrate the framework's capacity for nuanced interpretation of in situ data,employing a model committee for a detailed evaluation of liquefaction potential via Monte Carlo simulations and basic statistics.Overall,the proposed step-by-step workflow for analyzing seismic liquefaction incorporates multifold testing and real-world data validation,showing improved robustness against overfitting and greater versatility in addressing practical challenges.This research contributes to the seismic liquefaction assessment field by providing a structured,adaptable methodology for accurate and reliable analysis.
基金sponsored by the State Grid Corporation of China Technology Project(Research on Key Technologies and Equipment Development of Micro Pumped Storage for Distributed New Energy Consumption in Distribution Networks,5400-202324196A-1-1-ZN).
文摘To enhance energy interaction among low-voltage stations(LVSs)and reduce the line loss of the distribution network,a novel operation mode of the micro-pumped storage system(mPSS)has been proposed based on the common reservoir.First,some operation modes of mPSS are analyzed,which include the separated reservoir mode(SRM)and common reservoir mode(CRM).Then,based on the SRM,and CRM,an energy mutual assistance control model between LVSs has been built to optimize energy loss.Finally,in the simulation,compared to the model without pumped storage in the LVS,the SRMand CLRMcan decrease the total energy loss by 294.377 and 432.578 kWh,respectively.The configuration of mPSS can improve the utilization rate of the new energy source generation system,and relieve the pressure of transformer capacity in the LVS.Compared with the SRM,the proposed CRM has reduced the total energy loss by 138.201 kWh,increased the new energy consumption by 161.642 kWh,and decreased the line loss by 7.271 kWh.With the efficiency of the mPSS improving,the total energy loss reduction of CRM will be 3.5 times that of SRM.Further,the CRMcan significantly reduce the reservoir capacity construction of mPSS and ismore suitable for scenarios where the capacity configuration of mPSS is limited.