The concept of missing data is important to apply statistical methods on the dataset. Statisticians and researchers may end up to an inaccurate illation about the data if the missing data are not handled properly. Of ...The concept of missing data is important to apply statistical methods on the dataset. Statisticians and researchers may end up to an inaccurate illation about the data if the missing data are not handled properly. Of late, Python and R provide diverse packages for handling missing data. In this study, an imputation algorithm, cumulative linear regression, is proposed. The proposed algorithm depends on the linear regression technique. It differs from the existing methods, in that it cumulates the imputed variables;those variables will be incorporated in the linear regression equation to filling in the missing values in the next incomplete variable. The author performed a comparative study of the proposed method and those packages. The performance was measured in terms of imputation time, root-mean-square error, mean absolute error, and coefficient of determination (R^2). On analysing on five datasets with different missing values generated from different mechanisms, it was observed that the performances vary depending on the size, missing percentage, and the missingness mechanism. The results showed that the performance of the proposed method is slightly better.展开更多
Rivers are important systems which provide water to fulfill human needs. However, excessive human uses over the years have led to deterioration in quality of river causing, causing health problems from contaminated wa...Rivers are important systems which provide water to fulfill human needs. However, excessive human uses over the years have led to deterioration in quality of river causing, causing health problems from contaminated water. This study focuses on the application of statistical techniques, Multiple Linear Regression model and MANOVA to assess health impacts due to pollution in Cauvery river stretch in Srirangapatna. In this study, using Multiple Linear Regression, it is found that health impact level is 60.8% dependent on water quality parameters of BOD, COD, TDS, TC and FC. The t-statistics and their associated 2-tailed p-values indicate that COD and TDS produces health impacts compared to BOD, TC and FC, when their effects are put together across all the six sampling stations in Srirangapatna. Further Pearson correlation Matrix shows highly significant positive correlation amongst parameters across all stations indicating possibility of common sources of origin that might be anthropogenic. Also graphs are plotted for individual parameters across all stations and it reveals that COD and TDS values are significant across all sampling stations, though their values are higher in impact stations, causing health impacts.展开更多
In basketball, each player’s skill level is the key to a team’s success or failure, the skill level is affected by many personal and environmental factors. A physics-informed AI statistics has become extremely impor...In basketball, each player’s skill level is the key to a team’s success or failure, the skill level is affected by many personal and environmental factors. A physics-informed AI statistics has become extremely important. In this article, a complex non-linear process is considered by taking into account the average points per game of each player, playing time, shooting percentage, and others. This physics-informed statistics is to construct a multiple linear regression model with physics-informed neural networks. Based on the official data provided by the American Basketball League, and combined with specific methods of R program analysis, the regression model affecting the player’s average points per game is verified, and the key factors affecting the player’s average points per game are finally elucidated. The paper provides a novel window for coaches to make meaningful in-game adjustments to team members.展开更多
The aim of this paper is to propose some diagnostic methods in stochastic restricted linear regression models. A review of stochastic restricted linear regression models is given. For the model, this paper studies the...The aim of this paper is to propose some diagnostic methods in stochastic restricted linear regression models. A review of stochastic restricted linear regression models is given. For the model, this paper studies the method and application of the diagnostic mostly. Firstly, review the estimators of this model. Secondly, show that the case deletion model is equivalent to the mean shift outlier model for diagnostic purpose. Then, some diagnostic statistics are given. At last, example is given to illustrate our results.展开更多
This paper mainly presented a good simple and multi-linear regression model of verbs in the Quran book. This model, gives an analysis for the influence to frequency of words with the form (—un, ---) made by the frequ...This paper mainly presented a good simple and multi-linear regression model of verbs in the Quran book. This model, gives an analysis for the influence to frequency of words with the form (—un, ---) made by the frequency of plural present verbs (t—un, ---) or (y—un, ---), and models, and the relationship between independent variables and dependent variable by fitting a linear equation to the observed data with simple linear regression model. The matlab function is used for finding the parameters of the linear regression model and plotting the fits. The results show that the parameters of the model are one vector (1, 1) and mean of dataset is (6, 7). Its corresponding to the verb with input is frequency of the verb they enter and the frequency of enter (yadkolun ?dakilun), also other 17 points exist in the line and in the dataset of 387 verbs and their derivate verbs in Quran. The name of Allah () showed when we use tree variables and plot it in 3D with option “Show Text” for a multi regression model.展开更多
This study is aimed at the development of a statistical model for forecasting heavy rain in South Korea. For the 3-hour weather forecast system, the 10 km×10 km area-mean amount of rainfall at 6 stations (Seoul,...This study is aimed at the development of a statistical model for forecasting heavy rain in South Korea. For the 3-hour weather forecast system, the 10 km×10 km area-mean amount of rainfall at 6 stations (Seoul, Daejeon, Gangreung, (Jwangju, Busan, and Jeju) in South Korea are used. And the corresponding 45 synoptic factors generated by the numerical model are used as potential predictors. Four statistical forecast models (linear regression model, logistic regression model, neural network model and decision tree model) for the occurrence of heavy rain are based on the model output statistics (MOS) method. They are separately estimated by the same training data. The thresholds are considered to forecast the occurrence of heavy rain because the distribution of estimated values that are generated by each model is too skewed. The results of four models are compared via Heidke skill scores. As a result, the logistic regression model is recommended.展开更多
The objective of this study is to present a simple method of statistical calculation that allowed us to determine the relationship between the different data obtained from the characterization of the synthetic carbona...The objective of this study is to present a simple method of statistical calculation that allowed us to determine the relationship between the different data obtained from the characterization of the synthetic carbonated apatites containing sodium, in order to find the fundamental substitution mechanism(s) for incorporation of Na+ and?CO32- and to establish the general formula. For that, a series of hydroxyapatites containing carbonate and sodium (Na-CO3HAps) has been obtained by the precipitation method. All the compounds were characterized by infrared spectra (IR), powder X-ray diffraction (PXRD) and elemental analysis. The statistical treatment of the experiment result allows us to determine the relationship between one variable and the change in the other and to found the fundamental substitution mechanism(s) for incorporation of Na+ and?CO32- . Analysis of variance (ANOVA) allows us to test the models proposed.展开更多
This research considers the mathematical relationship between concentration of Chla and seven environmental factors, i.e. Lake water temperature (T), Secci-depth (SD), pH, DO, CODMn, Total Nitrogen (TN), Total Phospho...This research considers the mathematical relationship between concentration of Chla and seven environmental factors, i.e. Lake water temperature (T), Secci-depth (SD), pH, DO, CODMn, Total Nitrogen (TN), Total Phosphorus (TP). Stepwise linear regression of 1997 to 1999 monitoring data at each sampling point of Qiandaohu Lake yielded the multivariate regression models presented in this paper. The concentration of Chla as simulation for the year 2000 by the regression model was similar to the observed value. The suggested mathematical relationship could be used to predict changes in the lakewater environment at any point in time. The results showed that SD, TP and pH were the most significant factors affecting Chla concentration.展开更多
Tropospheric ozone (O3) is one of the pollutants that have a significant impact on human health. It can increase the rate of asthma crises, cause permanent lung infections and death. Predicting its concentration level...Tropospheric ozone (O3) is one of the pollutants that have a significant impact on human health. It can increase the rate of asthma crises, cause permanent lung infections and death. Predicting its concentration levels is therefore important for planning atmospheric protection strategies. The aim of this study is to predict the daily mean O3 concentration one day ahead in the Grand Casablanca area of Morocco using primary pollutants and meteorological variables. Since the available explanatory variables are multicollinear, multiple linear regressions are likely to lead to unstable models. To counteract the multicollinearity problem, we compared several alternative regression methods: 1) Continuum Regression;2) Ridge & Lasso Regressions;3) Principal component regression (PCR);4) Partial least Square regression & sparse PLS and;5) Biased Power Regression. The aim is to set up a good prediction model of the daily ozone in the Grand Casablanca area. These models are fitted on a training data set (from the years 2013 and 2014), tested on a data set (from 2015) and validated on yet another data set data (from 2015). The Lasso model showed a better performance for the prediction of ozone concentrations compared to multiple linear regression and its other alternative methods.展开更多
现有多跳频信号参数估计方法稀疏线性回归(Sparse Linear Regression,SLR)存在计算量大、内存消耗大的缺点。事实上,频率跳变只在少数几个数据点上发生,大部分数据不包含跳变信息。基于此,提出一种基于正交匹配追踪(Orthogonal Matching...现有多跳频信号参数估计方法稀疏线性回归(Sparse Linear Regression,SLR)存在计算量大、内存消耗大的缺点。事实上,频率跳变只在少数几个数据点上发生,大部分数据不包含跳变信息。基于此,提出一种基于正交匹配追踪(Orthogonal Matching Pursuit,OMP)和SLR相结合的跳频信号参数估计方法。该方法将接收到的样本数据均匀分段,对每段数据用OMP算法预处理,检测出发生频率跳变的数据段以及估计出没有发生跳变的数据段的频率;对这些发生跳变的数据段分别用SLR算法估计得到各段的跳时和频率;拼接可以得到整个样本的跳时、跳频图案等。仿真结果表明,该方法在在保持SLR精确估计性能的同时,能有效减少计算量。展开更多
This paper systematically studies the statistical diagnosis and hypothesis testing for the semiparametric linear regression model according to the theories and methods of the statistical diagnosis and hypothesis testi...This paper systematically studies the statistical diagnosis and hypothesis testing for the semiparametric linear regression model according to the theories and methods of the statistical diagnosis and hypothesis testing for parametric regression model.Several diagnostic measures and the methods for gross error testing are derived.Especially,the global and local influence analysis of the gross error on the parameter X and the nonparameter s are discussed in detail;at the same time,the paper proves that the data point deletion model is equivalent to the mean shift model for the semiparametric regression model.Finally,with one simulative computing example,some helpful conclusions are drawn.展开更多
This paper considers the post-J test inference in non-nested linear regression models. Post-J test inference means that the inference problem is considered by taking the first stage J test into account. We first propo...This paper considers the post-J test inference in non-nested linear regression models. Post-J test inference means that the inference problem is considered by taking the first stage J test into account. We first propose a post-J test estimator and derive its asymptotic distribution. We then consider the test problem of the unknown parameters, and a Wald statistic based on the post-J test estimator is proposed. A simulation study shows that the proposed Wald statistic works perfectly as well as the two-stage test from the view of the empirical size and power in large-sample cases, and when the sample size is small, it is even better. As a result,the new Wald statistic can be used directly to test the hypotheses on the unknown parameters in non-nested linear regression models.展开更多
文摘The concept of missing data is important to apply statistical methods on the dataset. Statisticians and researchers may end up to an inaccurate illation about the data if the missing data are not handled properly. Of late, Python and R provide diverse packages for handling missing data. In this study, an imputation algorithm, cumulative linear regression, is proposed. The proposed algorithm depends on the linear regression technique. It differs from the existing methods, in that it cumulates the imputed variables;those variables will be incorporated in the linear regression equation to filling in the missing values in the next incomplete variable. The author performed a comparative study of the proposed method and those packages. The performance was measured in terms of imputation time, root-mean-square error, mean absolute error, and coefficient of determination (R^2). On analysing on five datasets with different missing values generated from different mechanisms, it was observed that the performances vary depending on the size, missing percentage, and the missingness mechanism. The results showed that the performance of the proposed method is slightly better.
文摘Rivers are important systems which provide water to fulfill human needs. However, excessive human uses over the years have led to deterioration in quality of river causing, causing health problems from contaminated water. This study focuses on the application of statistical techniques, Multiple Linear Regression model and MANOVA to assess health impacts due to pollution in Cauvery river stretch in Srirangapatna. In this study, using Multiple Linear Regression, it is found that health impact level is 60.8% dependent on water quality parameters of BOD, COD, TDS, TC and FC. The t-statistics and their associated 2-tailed p-values indicate that COD and TDS produces health impacts compared to BOD, TC and FC, when their effects are put together across all the six sampling stations in Srirangapatna. Further Pearson correlation Matrix shows highly significant positive correlation amongst parameters across all stations indicating possibility of common sources of origin that might be anthropogenic. Also graphs are plotted for individual parameters across all stations and it reveals that COD and TDS values are significant across all sampling stations, though their values are higher in impact stations, causing health impacts.
文摘In basketball, each player’s skill level is the key to a team’s success or failure, the skill level is affected by many personal and environmental factors. A physics-informed AI statistics has become extremely important. In this article, a complex non-linear process is considered by taking into account the average points per game of each player, playing time, shooting percentage, and others. This physics-informed statistics is to construct a multiple linear regression model with physics-informed neural networks. Based on the official data provided by the American Basketball League, and combined with specific methods of R program analysis, the regression model affecting the player’s average points per game is verified, and the key factors affecting the player’s average points per game are finally elucidated. The paper provides a novel window for coaches to make meaningful in-game adjustments to team members.
文摘The aim of this paper is to propose some diagnostic methods in stochastic restricted linear regression models. A review of stochastic restricted linear regression models is given. For the model, this paper studies the method and application of the diagnostic mostly. Firstly, review the estimators of this model. Secondly, show that the case deletion model is equivalent to the mean shift outlier model for diagnostic purpose. Then, some diagnostic statistics are given. At last, example is given to illustrate our results.
文摘This paper mainly presented a good simple and multi-linear regression model of verbs in the Quran book. This model, gives an analysis for the influence to frequency of words with the form (—un, ---) made by the frequency of plural present verbs (t—un, ---) or (y—un, ---), and models, and the relationship between independent variables and dependent variable by fitting a linear equation to the observed data with simple linear regression model. The matlab function is used for finding the parameters of the linear regression model and plotting the fits. The results show that the parameters of the model are one vector (1, 1) and mean of dataset is (6, 7). Its corresponding to the verb with input is frequency of the verb they enter and the frequency of enter (yadkolun ?dakilun), also other 17 points exist in the line and in the dataset of 387 verbs and their derivate verbs in Quran. The name of Allah () showed when we use tree variables and plot it in 3D with option “Show Text” for a multi regression model.
文摘This study is aimed at the development of a statistical model for forecasting heavy rain in South Korea. For the 3-hour weather forecast system, the 10 km×10 km area-mean amount of rainfall at 6 stations (Seoul, Daejeon, Gangreung, (Jwangju, Busan, and Jeju) in South Korea are used. And the corresponding 45 synoptic factors generated by the numerical model are used as potential predictors. Four statistical forecast models (linear regression model, logistic regression model, neural network model and decision tree model) for the occurrence of heavy rain are based on the model output statistics (MOS) method. They are separately estimated by the same training data. The thresholds are considered to forecast the occurrence of heavy rain because the distribution of estimated values that are generated by each model is too skewed. The results of four models are compared via Heidke skill scores. As a result, the logistic regression model is recommended.
文摘The objective of this study is to present a simple method of statistical calculation that allowed us to determine the relationship between the different data obtained from the characterization of the synthetic carbonated apatites containing sodium, in order to find the fundamental substitution mechanism(s) for incorporation of Na+ and?CO32- and to establish the general formula. For that, a series of hydroxyapatites containing carbonate and sodium (Na-CO3HAps) has been obtained by the precipitation method. All the compounds were characterized by infrared spectra (IR), powder X-ray diffraction (PXRD) and elemental analysis. The statistical treatment of the experiment result allows us to determine the relationship between one variable and the change in the other and to found the fundamental substitution mechanism(s) for incorporation of Na+ and?CO32- . Analysis of variance (ANOVA) allows us to test the models proposed.
基金Project supported by the National Natural Science Foundation of China (No. 69673044) the Environmental Protection Bureau of Hangzhou (No. 9901), China
文摘This research considers the mathematical relationship between concentration of Chla and seven environmental factors, i.e. Lake water temperature (T), Secci-depth (SD), pH, DO, CODMn, Total Nitrogen (TN), Total Phosphorus (TP). Stepwise linear regression of 1997 to 1999 monitoring data at each sampling point of Qiandaohu Lake yielded the multivariate regression models presented in this paper. The concentration of Chla as simulation for the year 2000 by the regression model was similar to the observed value. The suggested mathematical relationship could be used to predict changes in the lakewater environment at any point in time. The results showed that SD, TP and pH were the most significant factors affecting Chla concentration.
文摘Tropospheric ozone (O3) is one of the pollutants that have a significant impact on human health. It can increase the rate of asthma crises, cause permanent lung infections and death. Predicting its concentration levels is therefore important for planning atmospheric protection strategies. The aim of this study is to predict the daily mean O3 concentration one day ahead in the Grand Casablanca area of Morocco using primary pollutants and meteorological variables. Since the available explanatory variables are multicollinear, multiple linear regressions are likely to lead to unstable models. To counteract the multicollinearity problem, we compared several alternative regression methods: 1) Continuum Regression;2) Ridge & Lasso Regressions;3) Principal component regression (PCR);4) Partial least Square regression & sparse PLS and;5) Biased Power Regression. The aim is to set up a good prediction model of the daily ozone in the Grand Casablanca area. These models are fitted on a training data set (from the years 2013 and 2014), tested on a data set (from 2015) and validated on yet another data set data (from 2015). The Lasso model showed a better performance for the prediction of ozone concentrations compared to multiple linear regression and its other alternative methods.
文摘现有多跳频信号参数估计方法稀疏线性回归(Sparse Linear Regression,SLR)存在计算量大、内存消耗大的缺点。事实上,频率跳变只在少数几个数据点上发生,大部分数据不包含跳变信息。基于此,提出一种基于正交匹配追踪(Orthogonal Matching Pursuit,OMP)和SLR相结合的跳频信号参数估计方法。该方法将接收到的样本数据均匀分段,对每段数据用OMP算法预处理,检测出发生频率跳变的数据段以及估计出没有发生跳变的数据段的频率;对这些发生跳变的数据段分别用SLR算法估计得到各段的跳时和频率;拼接可以得到整个样本的跳时、跳频图案等。仿真结果表明,该方法在在保持SLR精确估计性能的同时,能有效减少计算量。
基金Supported by the National Natural Science Foundation of China (No. 40604001),the National High Technology Research and Development Program of China (No. 2007AA12Z312).Acknowledgement The authors thank Prof. Tao Benzao and Prof. Wang Xingzhou for several helpful suggestions during the preparation of this manuscript.
文摘This paper systematically studies the statistical diagnosis and hypothesis testing for the semiparametric linear regression model according to the theories and methods of the statistical diagnosis and hypothesis testing for parametric regression model.Several diagnostic measures and the methods for gross error testing are derived.Especially,the global and local influence analysis of the gross error on the parameter X and the nonparameter s are discussed in detail;at the same time,the paper proves that the data point deletion model is equivalent to the mean shift model for the semiparametric regression model.Finally,with one simulative computing example,some helpful conclusions are drawn.
基金supported by a General Research Fund from the Hong Kong Research Grants Council(Grant No.City U-102709)National Natural Science Foundation of China(Grant Nos.11331011and 11271355)the Hundred Talents Program of the Chinese Academy of Sciences
文摘This paper considers the post-J test inference in non-nested linear regression models. Post-J test inference means that the inference problem is considered by taking the first stage J test into account. We first propose a post-J test estimator and derive its asymptotic distribution. We then consider the test problem of the unknown parameters, and a Wald statistic based on the post-J test estimator is proposed. A simulation study shows that the proposed Wald statistic works perfectly as well as the two-stage test from the view of the empirical size and power in large-sample cases, and when the sample size is small, it is even better. As a result,the new Wald statistic can be used directly to test the hypotheses on the unknown parameters in non-nested linear regression models.