Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a n...Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.展开更多
Chaos theory has taught us that a system which has both nonlinearity and random input will most likely produce irregular data. If random errors are irregular data, then random error process will raise nonlinearity (K...Chaos theory has taught us that a system which has both nonlinearity and random input will most likely produce irregular data. If random errors are irregular data, then random error process will raise nonlinearity (Kantz and Schreiber (1997)). Tsai (1986) introduced a composite test for autocorrelation and heteroscedasticity in linear models with AR(1) errors. Liu (2003) introduced a composite test for correlation and heteroscedasticity in nonlinear models with DBL(p, 0, 1) errors. Therefore, the important problems in regression model axe detections of bilinearity, correlation and heteroscedasticity. In this article, the authors discuss more general case of nonlinear models with DBL(p, q, 1) random errors by score test. Several statistics for the test of bilinearity, correlation, and heteroscedasticity are obtained, and expressed in simple matrix formulas. The results of regression models with linear errors are extended to those with bilinear errors. The simulation study is carried out to investigate the powers of the test statistics. All results of this article extend and develop results of Tsai (1986), Wei, et al (1995), and Liu, et al (2003).展开更多
(Co) variance components and genetic parameters were estimated for milk yield of Iranian Holstein cows. A total number of 68,945 milk test-day records of first, second and third lactations of 8515 animals from 100 sir...(Co) variance components and genetic parameters were estimated for milk yield of Iranian Holstein cows. A total number of 68,945 milk test-day records of first, second and third lactations of 8515 animals from 100 sires and 7743 dams originated from 34 herds collected during 2007 to 2009 by Iranian animal breeding center were used. The ASReml computer program was used to analyze the milk test-day records using the random regression procedure. Herd test date (HTD), milking times per day (milking frequency), number of lactations, year of birth, year of calving, age of animal at calving and days in milk (DIM) considered as fixed effects and additive genetic effects and animal permanent environmental effects were considered as the random effects. Additive genetic variance, animal permanent environment variance, residual variance, phenotypic variance, heritability and repeatability were estimated during different months of lactation between 5.7 - 19.6, 15.3 - 27.1, 31.4 - 17.2, 45.8 - 64.83, 0.1 - 0.32 and 0.4 - 0.6, respectively. Genetic correlation and phenotypic correlation were also estimated between months of lactation in range of -0.35 - 0.98 and 0.03 - 0.67, respectively. Genetic correlation and phenotypic correlation both showed the same changing pattern and they decreased as the interval between months of lactation increased.展开更多
In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by ...In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by making use of the local polynomial regression estimation to predict the nonsampled values of the survey variable y. The performance of the proposed estimator is investigated against some design-based and model-based regression estimators. The simulation experiments show that the resulting estimator exhibits good properties. Generally, good confidence intervals are seen for the nonparametric regression estimators, and use of the proposed estimator leads to relatively smaller values of RE compared to other estimators.展开更多
Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using general...Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.展开更多
In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete co...In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete consistency result for the estimator of g(x) is presented.展开更多
In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coef...In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.展开更多
Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed...Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed by McIntyre[1952.A method for unbiased selective sampling,using ranked sets.Australian Journal of Agricultural Research 3,385-390]as an effective way to estimate the pasture mean.In the current paper,a modification of ranked set sampling called moving extremes ranked set sampling(MERSS)is considered for the best linear unbiased estimators(BLUEs)for the simple linear regression model.The BLUEs for this model under MERSS are derived.The BLUEs under MERSS are shown to be markedly more efficient for normal data when compared with the BLUEs under simple random sampling.展开更多
BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms ...BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. O...In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. One is the individual test and power calculation for varying dispersion through testing the randomness of cluster effects, which is extensions of Dean(1992) and Commenges et al (1994). The second test is the composite test for varying dispersion through simultaneously testing the randomness of cluster effects and the equality of random-effect means. The score test statistics are constructed and expressed in simple, easy to use, matrix formulas. The authors illustrate their test methods using the insecticide data (Giltinan, Capizzi & Malani (1988)).展开更多
Machine learning is currently one of the research hotspots in the field of landslide prediction.To clarify and evaluate the differences in characteristics and prediction effects of different machine learning models,Co...Machine learning is currently one of the research hotspots in the field of landslide prediction.To clarify and evaluate the differences in characteristics and prediction effects of different machine learning models,Conghua District,which is the most prone to landslide disasters in Guangzhou,was selected for landslide susceptibility evaluation.The evaluation factors were selected by using correlation analysis and variance expansion factor method.Applying four machine learning methods namely Logistic Regression(LR),Random Forest(RF),Support Vector Machines(SVM),and Extreme Gradient Boosting(XGB),landslide models were constructed.Comparative analysis and evaluation of the model were conducted through statistical indices and receiver operating characteristic(ROC)curves.The results showed that LR,RF,SVM,and XGB models have good predictive performance for landslide susceptibility,with the area under curve(AUC)values of 0.752,0.965,0.996,and 0.998,respectively.XGB model had the highest predictive ability,followed by RF model,SVM model,and LR model.The frequency ratio(FR)accuracy of LR,RF,SVM,and XGB models was 0.775,0.842,0.759,and 0.822,respectively.RF and XGB models were superior to LR and SVM models,indicating that the integrated algorithm has better predictive ability than a single classification algorithm in regional landslide classification problems.展开更多
Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical appl...Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical applications.Conventional methods of predicting pile drivability often rely on simplified physicalmodels or empirical formulas,whichmay lack accuracy or applicability in complex geological conditions.Therefore,this study presents a practical machine learning approach,namely a Random Forest(RF)optimized by Bayesian Optimization(BO)and Particle Swarm Optimization(PSO),which not only enhances prediction accuracy but also better adapts to varying geological environments to predict the drivability parameters of piles(i.e.,maximumcompressive stress,maximum tensile stress,and blow per foot).In addition,support vector regression,extreme gradient boosting,k nearest neighbor,and decision tree are also used and applied for comparison purposes.In order to train and test these models,among the 4072 datasets collected with 17model inputs,3258 datasets were randomly selected for training,and the remaining 814 datasets were used for model testing.Lastly,the results of these models were compared and evaluated using two performance indices,i.e.,the root mean square error(RMSE)and the coefficient of determination(R2).The results indicate that the optimized RF model achieved lower RMSE than other prediction models in predicting the three parameters,specifically 0.044,0.438,and 0.146;and higher R^(2) values than other implemented techniques,specifically 0.966,0.884,and 0.977.In addition,the sensitivity and uncertainty of the optimized RF model were analyzed using Sobol sensitivity analysis and Monte Carlo(MC)simulation.It can be concluded that the optimized RF model could be used to predict the performance of the pile,and it may provide a useful reference for solving some problems under similar engineering conditions.展开更多
In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to...In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.展开更多
基金supported by National Natural Science Foundation of China (61703410,61873175,62073336,61873273,61773386,61922089)。
文摘Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.
文摘Chaos theory has taught us that a system which has both nonlinearity and random input will most likely produce irregular data. If random errors are irregular data, then random error process will raise nonlinearity (Kantz and Schreiber (1997)). Tsai (1986) introduced a composite test for autocorrelation and heteroscedasticity in linear models with AR(1) errors. Liu (2003) introduced a composite test for correlation and heteroscedasticity in nonlinear models with DBL(p, 0, 1) errors. Therefore, the important problems in regression model axe detections of bilinearity, correlation and heteroscedasticity. In this article, the authors discuss more general case of nonlinear models with DBL(p, q, 1) random errors by score test. Several statistics for the test of bilinearity, correlation, and heteroscedasticity are obtained, and expressed in simple matrix formulas. The results of regression models with linear errors are extended to those with bilinear errors. The simulation study is carried out to investigate the powers of the test statistics. All results of this article extend and develop results of Tsai (1986), Wei, et al (1995), and Liu, et al (2003).
文摘(Co) variance components and genetic parameters were estimated for milk yield of Iranian Holstein cows. A total number of 68,945 milk test-day records of first, second and third lactations of 8515 animals from 100 sires and 7743 dams originated from 34 herds collected during 2007 to 2009 by Iranian animal breeding center were used. The ASReml computer program was used to analyze the milk test-day records using the random regression procedure. Herd test date (HTD), milking times per day (milking frequency), number of lactations, year of birth, year of calving, age of animal at calving and days in milk (DIM) considered as fixed effects and additive genetic effects and animal permanent environmental effects were considered as the random effects. Additive genetic variance, animal permanent environment variance, residual variance, phenotypic variance, heritability and repeatability were estimated during different months of lactation between 5.7 - 19.6, 15.3 - 27.1, 31.4 - 17.2, 45.8 - 64.83, 0.1 - 0.32 and 0.4 - 0.6, respectively. Genetic correlation and phenotypic correlation were also estimated between months of lactation in range of -0.35 - 0.98 and 0.03 - 0.67, respectively. Genetic correlation and phenotypic correlation both showed the same changing pattern and they decreased as the interval between months of lactation increased.
文摘In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by making use of the local polynomial regression estimation to predict the nonsampled values of the survey variable y. The performance of the proposed estimator is investigated against some design-based and model-based regression estimators. The simulation experiments show that the resulting estimator exhibits good properties. Generally, good confidence intervals are seen for the nonparametric regression estimators, and use of the proposed estimator leads to relatively smaller values of RE compared to other estimators.
文摘Adaptive fractional polynomial modeling of general correlated outcomes is formulated to address nonlinearity in means, variances/dispersions, and correlations. Means and variances/dispersions are modeled using generalized linear models in fixed effects/coefficients. Correlations are modeled using random effects/coefficients. Nonlinearity is addressed using power transforms of primary (untransformed) predictors. Parameter estimation is based on extended linear mixed modeling generalizing both generalized estimating equations and linear mixed modeling. Models are evaluated using likelihood cross-validation (LCV) scores and are generated adaptively using a heuristic search controlled by LCV scores. Cases covered include linear, Poisson, logistic, exponential, and discrete regression of correlated continuous, count/rate, dichotomous, positive continuous, and discrete numeric outcomes treated as normally, Poisson, Bernoulli, exponentially, and discrete numerically distributed, respectively. Example analyses are also generated for these five cases to compare adaptive random effects/coefficients modeling of correlated outcomes to previously developed adaptive modeling based on directly specified covariance structures. Adaptive random effects/coefficients modeling substantially outperforms direct covariance modeling in the linear, exponential, and discrete regression example analyses. It generates equivalent results in the logistic regression example analyses and it is substantially outperformed in the Poisson regression case. Random effects/coefficients modeling of correlated outcomes can provide substantial improvements in model selection compared to directly specified covariance modeling. However, directly specified covariance modeling can generate competitive or substantially better results in some cases while usually requiring less computation time.
基金Supported by the Research Teaching Model Curriculum of Anhui University(xjyjkc1407)Supported by the Students Innovative Training Project of Anhui University(201310357004,201410357117,201410357249)Supported by the Quality Improvement Projects for Undergraduate Education of Anhui University(ZLTS2015035)
文摘In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete consistency result for the estimator of g(x) is presented.
文摘In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.
基金Supported by the National Natural Science Foundation of China(11901236)the Scientific Research Fund of Hunan Provincial Science and Technology Department(2019JJ50479)+3 种基金the Scientific Research Fund of Hunan Provincial Education Department(18B322)the Winning Bid Project of Hunan Province for the 4th National Economic Census([2020]1)the Young Core Teacher Foundation of Hunan Province([2020]43)the Funda-mental Research Fund of Xiangxi Autonomous Prefecture(2018SF5026)。
文摘Cost effective sampling design is a major concern in some experiments especially when the measurement of the characteristic of interest is costly or painful or time consuming.Ranked set sampling(RSS)was first proposed by McIntyre[1952.A method for unbiased selective sampling,using ranked sets.Australian Journal of Agricultural Research 3,385-390]as an effective way to estimate the pasture mean.In the current paper,a modification of ranked set sampling called moving extremes ranked set sampling(MERSS)is considered for the best linear unbiased estimators(BLUEs)for the simple linear regression model.The BLUEs for this model under MERSS are derived.The BLUEs under MERSS are shown to be markedly more efficient for normal data when compared with the BLUEs under simple random sampling.
文摘BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.
基金The project supported by NNSFC (19631040), NSSFC (04BTJ002) and the grant for post-doctor fellows in SELF.
文摘In this paper, it is discussed that two tests for varying dispersion of binomial data in the framework of nonlinear logistic models with random effects, which are widely used in analyzing longitudinal binomial data. One is the individual test and power calculation for varying dispersion through testing the randomness of cluster effects, which is extensions of Dean(1992) and Commenges et al (1994). The second test is the composite test for varying dispersion through simultaneously testing the randomness of cluster effects and the equality of random-effect means. The score test statistics are constructed and expressed in simple, easy to use, matrix formulas. The authors illustrate their test methods using the insecticide data (Giltinan, Capizzi & Malani (1988)).
基金supported by the projects of the China Geological Survey(DD20221729,DD20190291)Zhuhai Urban Geological Survey(including informatization)(MZCD–2201–008).
文摘Machine learning is currently one of the research hotspots in the field of landslide prediction.To clarify and evaluate the differences in characteristics and prediction effects of different machine learning models,Conghua District,which is the most prone to landslide disasters in Guangzhou,was selected for landslide susceptibility evaluation.The evaluation factors were selected by using correlation analysis and variance expansion factor method.Applying four machine learning methods namely Logistic Regression(LR),Random Forest(RF),Support Vector Machines(SVM),and Extreme Gradient Boosting(XGB),landslide models were constructed.Comparative analysis and evaluation of the model were conducted through statistical indices and receiver operating characteristic(ROC)curves.The results showed that LR,RF,SVM,and XGB models have good predictive performance for landslide susceptibility,with the area under curve(AUC)values of 0.752,0.965,0.996,and 0.998,respectively.XGB model had the highest predictive ability,followed by RF model,SVM model,and LR model.The frequency ratio(FR)accuracy of LR,RF,SVM,and XGB models was 0.775,0.842,0.759,and 0.822,respectively.RF and XGB models were superior to LR and SVM models,indicating that the integrated algorithm has better predictive ability than a single classification algorithm in regional landslide classification problems.
基金supported by the National Science Foundation of China(42107183).
文摘Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical applications.Conventional methods of predicting pile drivability often rely on simplified physicalmodels or empirical formulas,whichmay lack accuracy or applicability in complex geological conditions.Therefore,this study presents a practical machine learning approach,namely a Random Forest(RF)optimized by Bayesian Optimization(BO)and Particle Swarm Optimization(PSO),which not only enhances prediction accuracy but also better adapts to varying geological environments to predict the drivability parameters of piles(i.e.,maximumcompressive stress,maximum tensile stress,and blow per foot).In addition,support vector regression,extreme gradient boosting,k nearest neighbor,and decision tree are also used and applied for comparison purposes.In order to train and test these models,among the 4072 datasets collected with 17model inputs,3258 datasets were randomly selected for training,and the remaining 814 datasets were used for model testing.Lastly,the results of these models were compared and evaluated using two performance indices,i.e.,the root mean square error(RMSE)and the coefficient of determination(R2).The results indicate that the optimized RF model achieved lower RMSE than other prediction models in predicting the three parameters,specifically 0.044,0.438,and 0.146;and higher R^(2) values than other implemented techniques,specifically 0.966,0.884,and 0.977.In addition,the sensitivity and uncertainty of the optimized RF model were analyzed using Sobol sensitivity analysis and Monte Carlo(MC)simulation.It can be concluded that the optimized RF model could be used to predict the performance of the pile,and it may provide a useful reference for solving some problems under similar engineering conditions.
基金Supported by the National Social Science Foundation of China(Grant No.22BTJ059)。
文摘In this paper,we consider the partial linear regression model y_(i)=x_(i)β^(*)+g(ti)+ε_(i),i=1,2,...,n,where(x_(i),ti)are known fixed design points,g(·)is an unknown function,andβ^(*)is an unknown parameter to be estimated,random errorsε_(i)are(α,β)-mix_(i)ng random variables.The p-th(p>1)mean consistency,strong consistency and complete consistency for least squares estimators ofβ^(*)and g(·)are investigated under some mild conditions.In addition,a numerical simulation is carried out to study the finite sample performance of the theoretical results.Finally,a real data analysis is provided to further verify the effect of the model.