In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of hea...Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.展开更多
Simultaneous-source acquisition has been recog- nized as an economic and efficient acquisition method, but the direct imaging of the simultaneous-source data produces migration artifacts because of the interference of...Simultaneous-source acquisition has been recog- nized as an economic and efficient acquisition method, but the direct imaging of the simultaneous-source data produces migration artifacts because of the interference of adjacent sources. To overcome this problem, we propose the regularized least-squares reverse time migration method (RLSRTM) using the singular spectrum analysis technique that imposes sparseness constraints on the inverted model. Additionally, the difference spectrum theory of singular values is presented so that RLSRTM can be implemented adaptively to eliminate the migration artifacts. With numerical tests on a fiat layer model and a Marmousi model, we validate the superior imaging quality, efficiency and convergence of RLSRTM compared with LSRTM when dealing with simultaneoussource data, incomplete data and noisy data.展开更多
In order to improve classification accuracy, the regularized logistic regression is used to classify single-trial electroencephalogram (EEG). A novel approach, named local sparse logistic regression (LSLR), is pro...In order to improve classification accuracy, the regularized logistic regression is used to classify single-trial electroencephalogram (EEG). A novel approach, named local sparse logistic regression (LSLR), is proposed. The LSLR integrates the locality preserving projection regularization term into the framework of sparse logistic regression. It tries to maintain the neighborhood information of original feature space, and, meanwhile, keeps sparsity. The bound optimization algorithm and component-wise update are used to compute the weight vector in the training data, thus overcoming the disadvantage of the Newton-Raphson method and iterative re-weighted least squares (IRLS). The classification accuracy of 80% is achieved using ten-fold cross-validation in the self-paced finger tapping data set. The results of LSLR are compared with SLR, showing the effectiveness of the proposed method.展开更多
In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically ind...In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.展开更多
Background: Genomic growth curves are general y defined only in terms of population mean; an alternative approach that has not yet been exploited in genomic analyses of growth curves is the Quantile Regression(QR). Th...Background: Genomic growth curves are general y defined only in terms of population mean; an alternative approach that has not yet been exploited in genomic analyses of growth curves is the Quantile Regression(QR). This methodology allows for the estimation of marker effects at different levels of the variable of interest. We aimed to propose and evaluate a regularized quantile regression for SNP marker effect estimation of pig growth curves, as well as to identify the chromosome regions of the most relevant markers and to estimate the genetic individual weight trajectory over time(genomic growth curve) under different quantiles(levels).Results: The regularized quantile regression(RQR) enabled the discovery, at different levels of interest(quantiles), of the most relevant markers al owing for the identification of QTL regions. We found the same relevant markers simultaneously affecting different growth curve parameters(mature weight and maturity rate): two(ALGA0096701 and ALGA0029483)for RQR(0.2), one(ALGA0096701) for RQR(0.5), and one(ALGA0003761) for RQR(0.8). Three average genomic growth curves were obtained and the behavior was explained by the curve in quantile 0.2, which differed from the others.Conclusions: RQR allowed for the construction of genomic growth curves, which is the key to identifying and selecting the most desirable animals for breeding purposes. Furthermore, the proposed model enabled us to find, at different levels of interest(quantiles), the most relevant markers for each trait(growth curve parameter estimates) and their respective chromosomal positions(identification of new QTL regions for growth curves in pigs). These markers can be exploited under the context of marker assisted selection while aiming to change the shape of pig growth curves.展开更多
In recent years, variable selection based on penalty likelihood methods has aroused great concern. Based on the Gibbs sampling algorithm of asymmetric Laplace distribution, this paper considers the quantile regression...In recent years, variable selection based on penalty likelihood methods has aroused great concern. Based on the Gibbs sampling algorithm of asymmetric Laplace distribution, this paper considers the quantile regression with adaptive Lasso and Lasso penalty from a Bayesian point of view. Under the non-Bayesian and Bayesian framework, several regularization quantile regression methods are systematically compared for error terms with different distributions and heteroscedasticity. Under the error term of asymmetric Laplace distribution, statistical simulation results show that the Bayesian regularized quantile regression is superior to other distributions in all quantiles. And based on the asymmetric Laplace distribution, the Bayesian regularized quantile regression approach performs better than the non-Bayesian approach in parameter estimation and prediction. Through real data analyses, we also confirm the above conclusions.展开更多
The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by tradit...The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.展开更多
In this paper, we consider the regularized learning schemes based on l1-regularizer and pinball loss in a data dependent hypothesis space. The target is the error analysis for the quantile regression learning. There i...In this paper, we consider the regularized learning schemes based on l1-regularizer and pinball loss in a data dependent hypothesis space. The target is the error analysis for the quantile regression learning. There is no regularized condition with the kernel function, excepting continuity and boundness. The graph-based semi-supervised algorithm leads to an extra error term called manifold error. Part of new error bounds and convergence rates are exactly derived with the techniques consisting of l1-empirical covering number and boundness decomposition.展开更多
This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presen...This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presented.Finally,a simulated adjustment problem is constructed to explain the method given in this paper.The results from the semiparametric model and G_M model are compared.The results demonstrate that the model errors or the systematic errors of the observations can be detected correctly with the semiparametric estimate method.展开更多
A latent variable regression algorithm with a regularization term(r LVR) is proposed in this paper to extract latent relations between process data X and quality data Y. In rLVR,the prediction error between X and Y is...A latent variable regression algorithm with a regularization term(r LVR) is proposed in this paper to extract latent relations between process data X and quality data Y. In rLVR,the prediction error between X and Y is minimized, which is proved to be equivalent to maximizing the projection of quality variables in the latent space. The geometric properties and model relations of rLVR are analyzed, and the geometric and theoretical relations among r LVR, partial least squares, and canonical correlation analysis are also presented. The rLVR-based monitoring framework is developed to monitor process-relevant and quality-relevant variations simultaneously. The prediction and monitoring effectiveness of rLVR algorithm is demonstrated through both numerical simulations and the Tennessee Eastman(TE) process.展开更多
To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to...To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.展开更多
Least-squares migration (LSM) is applied to image subsurface structures and lithology by minimizing the objective function of the observed seismic and reverse-time migration residual data of various underground refl...Least-squares migration (LSM) is applied to image subsurface structures and lithology by minimizing the objective function of the observed seismic and reverse-time migration residual data of various underground reflectivity models. LSM reduces the migration artifacts, enhances the spatial resolution of the migrated images, and yields a more accurate subsurface reflectivity distribution than that of standard migration. The introduction of regularization constraints effectively improves the stability of the least-squares offset. The commonly used regularization terms are based on the L2-norm, which smooths the migration results, e.g., by smearing the reflectivities, while providing stability. However, in exploration geophysics, reflection structures based on velocity and density are generally observed to be discontinuous in depth, illustrating sparse reflectance. To obtain a sparse migration profile, we propose the super-resolution least-squares Kirchhoff prestack depth migration by solving the L0-norm-constrained optimization problem. Additionally, we introduce a two-stage iterative soft and hard thresholding algorithm to retrieve the super-resolution reflectivity distribution. Further, the proposed algorithm is applied to complex synthetic data. Furthermore, the sensitivity of the proposed algorithm to noise and the dominant frequency of the source wavelet was evaluated. Finally, we conclude that the proposed method improves the spatial resolution and achieves impulse-like reflectivity distribution and can be applied to structural interpretations and complex subsurface imaging.展开更多
This paper presents a meshless method for the nonlinear generalized regularized long wave (GRLW) equation based on the moving least-squares approximation. The nonlinear discrete scheme of the GRLW equation is obtain...This paper presents a meshless method for the nonlinear generalized regularized long wave (GRLW) equation based on the moving least-squares approximation. The nonlinear discrete scheme of the GRLW equation is obtained and is solved using the iteration method. A theorem on the convergence of the iterative process is presented and proved using theorems of the infinity norm. Compared with numerical methods based on mesh, the meshless method for the GRLW equation only requires the.scattered nodes instead of meshing the domain of the problem. Some examples, such as the propagation of single soliton and the interaction of two solitary waves, are given to show the effectiveness of the meshless method.展开更多
Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed dat...Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.展开更多
The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high...The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high dimensionality required for approximating the early exercise boundary. We use sparse grid basis functions in the Least Squares Monte Carlo approach to solve this “curse of dimensionality” problem. The resulting algorithm provides a general and convergent method for pricing moving window Asian options. The sparse grid technique presented in this paper can be generalized to pricing other high-dimensional, early-exercisable derivatives.展开更多
Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The ...Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The independent variables, coefficients of independent variables and dependent variable in the regression model are fuzzy numbers in different times and TW, the shape preserving operator, is the only T-norm which induces a shape preserving multiplication of LL-type of fuzzy numbers. So, in this paper, we propose a new fuzzy regression model based on LL-type of trapezoidal fuzzy numbers and TW. Firstly, we introduce the basic fuzzy set theories, the basic arithmetic propositions of the shape preserving operator and a new distance measure between trapezoidal numbers. Secondly, we investigate the specific model algorithms for FIFCFO model (fuzzy input-fuzzy coefficient-fuzzy output model) and introduce three advantages of fit criteria, Error Index, Similarity Measure and Distance Criterion. Thirdly, we use a design set and two reference sets to make a comparison between our proposed model and the reference models and determine their goodness with the above three criteria. Finally, we draw the conclusion that our proposed model is reasonable and has better prediction accuracy, but short of robust, comparing to the reference models by the three goodness of fit criteria. So, we can expand our traditional fuzzy regression model to our proposed new model.展开更多
We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying ...We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.展开更多
Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the genera...Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the general population using regular health examination data.Methods: The study population consist of more than 7,000 CRC cases and more than 140,000 controls. Using regular health examination data, a model detecting CRC cases was derived by the classification and regression trees(CART) algorithm. Receiver operating characteristic(ROC) curve was applied to evaluate the performance of models. The robustness and generalization of the CART model were validated by independent datasets. In addition, the effectiveness of CART-based screening was compared with stool-based screening.Results: After data quality control, 4,647 CRC cases and 133,898 controls free of colorectal neoplasms were used for downstream analysis. The final CART model based on four biomarkers(age, albumin, hematocrit and percent lymphocytes) was constructed. In the test set, the area under ROC curve(AUC) of the CART model was 0.88 [95%confidence interval(95% CI), 0.87-0.90] for detecting CRC. At the cutoff yielding 99.0% specificity, this model’s sensitivity was 62.2%(95% CI, 58.1%-66.2%), thereby achieving a 63-fold enrichment of CRC cases. We validated the robustness of the method across subsets of test set with diverse CRC incidences, aging rates, genders ratio, distributions of tumor stages and locations, and data sources. Importantly, CART-based screening had the higher positive predictive value(1.6%) than fecal immunochemical test(0.3%).Conclusions: As an alternative approach for the early detection of CRC, this study provides a low-cost method using regular health examination data to identify high-risk individuals for CRC for further examinations. The approach can promote early detection of CRC especially in developing countries such as China, where annual health examination is popular but regular CRC-specific screening is rare.展开更多
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
基金the Hi-Tech Research and Development Program (863) of China (No. 2006AA10Z203)the National Scienceand Technology Task Force Project (No. 2006BAD10A01), China
文摘Detecting plant health conditions plays a key role in farm pest management and crop protection. In this study, measurement of hyperspectral leaf reflectance in rice crop (Oryzasativa L.) was conducted on groups of healthy and infected leaves by the fungus Bipolaris oryzae (Helminthosporium oryzae Breda. de Hann) through the wavelength range from 350 to 2 500 nm. The percentage of leaf surface lesions was estimated and defined as the disease severity. Statistical methods like multiple stepwise regression, principal component analysis and partial least-square regression were utilized to calculate and estimate the disease severity of rice brown spot at the leaf level. Our results revealed that multiple stepwise linear regressions could efficiently estimate disease severity with three wavebands in seven steps. The root mean square errors (RMSEs) for training (n=210) and testing (n=53) dataset were 6.5% and 5.8%, respectively. Principal component analysis showed that the first principal component could explain approximately 80% of the variance of the original hyperspectral reflectance. The regression model with the first two principal components predicted a disease severity with RMSEs of 16.3% and 13.9% for the training and testing dataset, respec-tively. Partial least-square regression with seven extracted factors could most effectively predict disease severity compared with other statistical methods with RMSEs of 4.1% and 2.0% for the training and testing dataset, respectively. Our research demon-strates that it is feasible to estimate the disease severity of rice brown spot using hyperspectral reflectance data at the leaf level.
基金financial support from the National Natural Science Foundation of China (Grant Nos. 41104069, 41274124)National Key Basic Research Program of China (973 Program) (Grant No. 2014CB239006)+2 种基金National Science and Technology Major Project (Grant No. 2011ZX05014-001-008)the Open Foundation of SINOPEC Key Laboratory of Geophysics (Grant No. 33550006-15-FW2099-0033)the Fundamental Research Funds for the Central Universities (Grant No. 16CX06046A)
文摘Simultaneous-source acquisition has been recog- nized as an economic and efficient acquisition method, but the direct imaging of the simultaneous-source data produces migration artifacts because of the interference of adjacent sources. To overcome this problem, we propose the regularized least-squares reverse time migration method (RLSRTM) using the singular spectrum analysis technique that imposes sparseness constraints on the inverted model. Additionally, the difference spectrum theory of singular values is presented so that RLSRTM can be implemented adaptively to eliminate the migration artifacts. With numerical tests on a fiat layer model and a Marmousi model, we validate the superior imaging quality, efficiency and convergence of RLSRTM compared with LSRTM when dealing with simultaneoussource data, incomplete data and noisy data.
基金The National Natural Science Foundation of China(No.61075009)the Natural Science Foundation of Jiangsu Province(No.BK2011595)the Program for New Century Excellent Talents in University of China,the Qing Lan Project of Jiangsu Province
文摘In order to improve classification accuracy, the regularized logistic regression is used to classify single-trial electroencephalogram (EEG). A novel approach, named local sparse logistic regression (LSLR), is proposed. The LSLR integrates the locality preserving projection regularization term into the framework of sparse logistic regression. It tries to maintain the neighborhood information of original feature space, and, meanwhile, keeps sparsity. The bound optimization algorithm and component-wise update are used to compute the weight vector in the training data, thus overcoming the disadvantage of the Newton-Raphson method and iterative re-weighted least squares (IRLS). The classification accuracy of 80% is achieved using ten-fold cross-validation in the self-paced finger tapping data set. The results of LSLR are compared with SLR, showing the effectiveness of the proposed method.
基金National Natural Science Foundation of China No.40301038
文摘In several LUCC studies, statistical methods are being used to analyze land use data. A problem using conventional statistical methods in land use analysis is that these methods assume the data to be statistically independent. But in fact, they have the tendency to be dependent, a phenomenon known as multicollinearity, especially in the cases of few observations. In this paper, a Partial Least-Squares (PLS) regression approach is developed to study relationships between land use and its influencing factors through a case study of the Suzhou-Wuxi-Changzhou region in China. Multicollinearity exists in the dataset and the number of variables is high compared to the number of observations. Four PLS factors are selected through a preliminary analysis. The correlation analyses between land use and influencing factors demonstrate the land use character of rural industrialization and urbanization in the Suzhou-Wuxi-Changzhou region, meanwhile illustrate that the first PLS factor has enough ability to best describe land use patterns quantitatively, and most of the statistical relations derived from it accord with the fact. By the decreasing capacity of the PLS factors, the reliability of model outcome decreases correspondingly.
基金supported by Coordination for the Improvement of Higher Education Personnel(Capes)Foundation Arthur Bernardes(Funarbe)Foundation of research Support of the state of Minas Gerais(FAPEMIG)
文摘Background: Genomic growth curves are general y defined only in terms of population mean; an alternative approach that has not yet been exploited in genomic analyses of growth curves is the Quantile Regression(QR). This methodology allows for the estimation of marker effects at different levels of the variable of interest. We aimed to propose and evaluate a regularized quantile regression for SNP marker effect estimation of pig growth curves, as well as to identify the chromosome regions of the most relevant markers and to estimate the genetic individual weight trajectory over time(genomic growth curve) under different quantiles(levels).Results: The regularized quantile regression(RQR) enabled the discovery, at different levels of interest(quantiles), of the most relevant markers al owing for the identification of QTL regions. We found the same relevant markers simultaneously affecting different growth curve parameters(mature weight and maturity rate): two(ALGA0096701 and ALGA0029483)for RQR(0.2), one(ALGA0096701) for RQR(0.5), and one(ALGA0003761) for RQR(0.8). Three average genomic growth curves were obtained and the behavior was explained by the curve in quantile 0.2, which differed from the others.Conclusions: RQR allowed for the construction of genomic growth curves, which is the key to identifying and selecting the most desirable animals for breeding purposes. Furthermore, the proposed model enabled us to find, at different levels of interest(quantiles), the most relevant markers for each trait(growth curve parameter estimates) and their respective chromosomal positions(identification of new QTL regions for growth curves in pigs). These markers can be exploited under the context of marker assisted selection while aiming to change the shape of pig growth curves.
文摘In recent years, variable selection based on penalty likelihood methods has aroused great concern. Based on the Gibbs sampling algorithm of asymmetric Laplace distribution, this paper considers the quantile regression with adaptive Lasso and Lasso penalty from a Bayesian point of view. Under the non-Bayesian and Bayesian framework, several regularization quantile regression methods are systematically compared for error terms with different distributions and heteroscedasticity. Under the error term of asymmetric Laplace distribution, statistical simulation results show that the Bayesian regularized quantile regression is superior to other distributions in all quantiles. And based on the asymmetric Laplace distribution, the Bayesian regularized quantile regression approach performs better than the non-Bayesian approach in parameter estimation and prediction. Through real data analyses, we also confirm the above conclusions.
文摘The UV absorption spectra of o-naphthol,α-naphthylamine,2,7-dihydroxy naphthalene,2,4-dimethoxy ben- zaldehyde and methyl salicylate,overlap severely;therefore it is impossible to determine them in mixtures by traditional spectrophotometric methods.In this paper,the partial least-squares(PLS)regression is applied to the simultaneous determination of these compounds in mixtures by UV spectrophtometry without any pretreatment of the samples.Ten synthetic mixture samples are analyzed by the proposed method.The mean recoveries are 99.4%,996%,100.2%,99.3% and 99.1%,and the relative standard deviations(RSD) are 1.87%,1.98%,1.94%,0.960% and 0.672%,respectively.
文摘In this paper, we consider the regularized learning schemes based on l1-regularizer and pinball loss in a data dependent hypothesis space. The target is the error analysis for the quantile regression learning. There is no regularized condition with the kernel function, excepting continuity and boundness. The graph-based semi-supervised algorithm leads to an extra error term called manifold error. Part of new error bounds and convergence rates are exactly derived with the techniques consisting of l1-empirical covering number and boundness decomposition.
文摘This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presented.Finally,a simulated adjustment problem is constructed to explain the method given in this paper.The results from the semiparametric model and G_M model are compared.The results demonstrate that the model errors or the systematic errors of the observations can be detected correctly with the semiparametric estimate method.
基金supported by the Chemical Engineering Department at the University of Waterloo。
文摘A latent variable regression algorithm with a regularization term(r LVR) is proposed in this paper to extract latent relations between process data X and quality data Y. In rLVR,the prediction error between X and Y is minimized, which is proved to be equivalent to maximizing the projection of quality variables in the latent space. The geometric properties and model relations of rLVR are analyzed, and the geometric and theoretical relations among r LVR, partial least squares, and canonical correlation analysis are also presented. The rLVR-based monitoring framework is developed to monitor process-relevant and quality-relevant variations simultaneously. The prediction and monitoring effectiveness of rLVR algorithm is demonstrated through both numerical simulations and the Tennessee Eastman(TE) process.
基金Funded by the Natural Basic Research Program of China under the grant No. 2005CB422207.
文摘To predict the economic loss of crops caused by acid rain,we used partial least squares(PLS) regression to build a model of single dependent variable -the economic loss calculated with the decrease in yield related to the pH value and levels of Ca2+,NH4+,Na+,K+,Mg2+,SO42-,NO3-,and Cl-in acid rain. We selected vegetables which were sensitive to acid rain as the sample crops,and collected 12 groups of data,of which 8 groups were used for modeling and 4 groups for testing. Using the cross validation method to evaluate the performace of this prediction model indicates that the optimum number of principal components was 3,determined by the minimum of prediction residual error sum of squares,and the prediction error of the regression equation ranges from -2.25% to 4.32%. The model predicted that the economic loss of vegetables from acid rain is negatively corrrelated to pH and the concentrations of NH4+,SO42-,NO3-,and Cl-in the rain,and positively correlated to the concentrations of Ca2+,Na+,K+ and Mg2+. The precision of the model may be improved if the non-linearity of original data is addressed.
基金supported by the National Natural Science Foundation of China(No.41422403)
文摘Least-squares migration (LSM) is applied to image subsurface structures and lithology by minimizing the objective function of the observed seismic and reverse-time migration residual data of various underground reflectivity models. LSM reduces the migration artifacts, enhances the spatial resolution of the migrated images, and yields a more accurate subsurface reflectivity distribution than that of standard migration. The introduction of regularization constraints effectively improves the stability of the least-squares offset. The commonly used regularization terms are based on the L2-norm, which smooths the migration results, e.g., by smearing the reflectivities, while providing stability. However, in exploration geophysics, reflection structures based on velocity and density are generally observed to be discontinuous in depth, illustrating sparse reflectance. To obtain a sparse migration profile, we propose the super-resolution least-squares Kirchhoff prestack depth migration by solving the L0-norm-constrained optimization problem. Additionally, we introduce a two-stage iterative soft and hard thresholding algorithm to retrieve the super-resolution reflectivity distribution. Further, the proposed algorithm is applied to complex synthetic data. Furthermore, the sensitivity of the proposed algorithm to noise and the dominant frequency of the source wavelet was evaluated. Finally, we conclude that the proposed method improves the spatial resolution and achieves impulse-like reflectivity distribution and can be applied to structural interpretations and complex subsurface imaging.
基金supported by the National Natural Science Foundation of China (Grant No. 10871124)the Innovation Program of the Shanghai Municipal Education Commission,China (Grant No. 09ZZ99)
文摘This paper presents a meshless method for the nonlinear generalized regularized long wave (GRLW) equation based on the moving least-squares approximation. The nonlinear discrete scheme of the GRLW equation is obtained and is solved using the iteration method. A theorem on the convergence of the iterative process is presented and proved using theorems of the infinity norm. Compared with numerical methods based on mesh, the meshless method for the GRLW equation only requires the.scattered nodes instead of meshing the domain of the problem. Some examples, such as the propagation of single soliton and the interaction of two solitary waves, are given to show the effectiveness of the meshless method.
文摘Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β1.
文摘The pricing of moving window Asian option with an early exercise feature is considered a challenging problem in option pricing. The computational challenge lies in the unknown optimal exercise strategy and in the high dimensionality required for approximating the early exercise boundary. We use sparse grid basis functions in the Least Squares Monte Carlo approach to solve this “curse of dimensionality” problem. The resulting algorithm provides a general and convergent method for pricing moving window Asian options. The sparse grid technique presented in this paper can be generalized to pricing other high-dimensional, early-exercisable derivatives.
文摘Fuzzy regression provides more approaches for us to deal with imprecise or vague problems. Traditional fuzzy regression is established on triangular fuzzy numbers, which can be represented by trapezoidal numbers. The independent variables, coefficients of independent variables and dependent variable in the regression model are fuzzy numbers in different times and TW, the shape preserving operator, is the only T-norm which induces a shape preserving multiplication of LL-type of fuzzy numbers. So, in this paper, we propose a new fuzzy regression model based on LL-type of trapezoidal fuzzy numbers and TW. Firstly, we introduce the basic fuzzy set theories, the basic arithmetic propositions of the shape preserving operator and a new distance measure between trapezoidal numbers. Secondly, we investigate the specific model algorithms for FIFCFO model (fuzzy input-fuzzy coefficient-fuzzy output model) and introduce three advantages of fit criteria, Error Index, Similarity Measure and Distance Criterion. Thirdly, we use a design set and two reference sets to make a comparison between our proposed model and the reference models and determine their goodness with the above three criteria. Finally, we draw the conclusion that our proposed model is reasonable and has better prediction accuracy, but short of robust, comparing to the reference models by the three goodness of fit criteria. So, we can expand our traditional fuzzy regression model to our proposed new model.
文摘We construct a fuzzy varying coefficient bilinear regression model to deal with the interval financial data and then adopt the least-squares method based on symmetric fuzzy number space. Firstly, we propose a varying coefficient model on the basis of the fuzzy bilinear regression model. Secondly, we develop the least-squares method according to the complete distance between fuzzy numbers to estimate the coefficients and test the adaptability of the proposed model by means of generalized likelihood ratio test with SSE composite index. Finally, mean square errors and mean absolutely errors are employed to evaluate and compare the fitting of fuzzy auto regression, fuzzy bilinear regression and fuzzy varying coefficient bilinear regression models, and also the forecasting of three models. Empirical analysis turns out that the proposed model has good fitting and forecasting accuracy with regard to other regression models for the capital market.
基金supported by funding from Beijing Municipal Science & Technology Commission, Clinical Application and Development of Capital Characteristic (No. Z161100000516003)National Natural Science Foundation of China (No. 31871266)
文摘Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the general population using regular health examination data.Methods: The study population consist of more than 7,000 CRC cases and more than 140,000 controls. Using regular health examination data, a model detecting CRC cases was derived by the classification and regression trees(CART) algorithm. Receiver operating characteristic(ROC) curve was applied to evaluate the performance of models. The robustness and generalization of the CART model were validated by independent datasets. In addition, the effectiveness of CART-based screening was compared with stool-based screening.Results: After data quality control, 4,647 CRC cases and 133,898 controls free of colorectal neoplasms were used for downstream analysis. The final CART model based on four biomarkers(age, albumin, hematocrit and percent lymphocytes) was constructed. In the test set, the area under ROC curve(AUC) of the CART model was 0.88 [95%confidence interval(95% CI), 0.87-0.90] for detecting CRC. At the cutoff yielding 99.0% specificity, this model’s sensitivity was 62.2%(95% CI, 58.1%-66.2%), thereby achieving a 63-fold enrichment of CRC cases. We validated the robustness of the method across subsets of test set with diverse CRC incidences, aging rates, genders ratio, distributions of tumor stages and locations, and data sources. Importantly, CART-based screening had the higher positive predictive value(1.6%) than fecal immunochemical test(0.3%).Conclusions: As an alternative approach for the early detection of CRC, this study provides a low-cost method using regular health examination data to identify high-risk individuals for CRC for further examinations. The approach can promote early detection of CRC especially in developing countries such as China, where annual health examination is popular but regular CRC-specific screening is rare.