Ridge regression is an effective tool to handle multicollinearity in regressions.It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications.Th...Ridge regression is an effective tool to handle multicollinearity in regressions.It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications.The divide and conquer trick,which combines the estimator in each subset with equal weight,is commonly applied in distributed data.To overcome multicollinearity and improve estimation accuracy in the presence of distributed data,we propose a Mallows-type model averaging method for ridge regressions,which combines estimators from all subsets.Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent.The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived.Furthermore,the asymptotic normality of the model averaging estimator is demonstrated.Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases.展开更多
基金partially supported by the Research Foundation of Shenzhen Polytechnic University (Grant No. 6023312034K)the Post-doctoral Later-stageof Shenzhen Polytechnic University(Grant No. 6023271021K)+2 种基金partially supported by the National Natural Science Foundation of China (Grant No. 71973116)partially supported by the National Natural Science Foundation of China (Grant Nos. 11971323 and 12031016)the Beijing Natural Science Foundation (Grant No. Z210003)
文摘Ridge regression is an effective tool to handle multicollinearity in regressions.It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications.The divide and conquer trick,which combines the estimator in each subset with equal weight,is commonly applied in distributed data.To overcome multicollinearity and improve estimation accuracy in the presence of distributed data,we propose a Mallows-type model averaging method for ridge regressions,which combines estimators from all subsets.Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent.The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived.Furthermore,the asymptotic normality of the model averaging estimator is demonstrated.Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases.