Interpretability has drawn increasing attention in machine learning.Most works focus on post-hoc explanations rather than building a self-explaining model.So,we propose a Neural Partially Linear Additive Model(NPLAM),...Interpretability has drawn increasing attention in machine learning.Most works focus on post-hoc explanations rather than building a self-explaining model.So,we propose a Neural Partially Linear Additive Model(NPLAM),which automatically distinguishes insignificant,linear,and nonlinear features in neural networks.On the one hand,neural network construction fits data better than spline function under the same parameter amount;on the other hand,learnable gate design and sparsity regular-term maintain the ability of feature selection and structure discovery.We theoretically establish the generalization error bounds of the proposed method with Rademacher complexity.Experiments based on both simulations and real-world datasets verify its good performance and interpretability.展开更多
This paper considers partially linear additive models with the number of parameters diverging when some linear cons train ts on the parame trie par t are available.This paper proposes a constrained profile least-squar...This paper considers partially linear additive models with the number of parameters diverging when some linear cons train ts on the parame trie par t are available.This paper proposes a constrained profile least-squares estimation for the parametrie components with the nonparametric functions being estimated by basis function approximations.The consistency and asymptotic normality of the restricted estimator are given under some certain conditions.The authors construct a profile likelihood ratio test statistic to test the validity of the linear constraints on the parametrie components,and demonstrate that it follows asymptotically chi-squared distribution under the null and alternative hypo theses.The finite sample performance of the proposed method is illus trated by simulation studies and a data analysis.展开更多
High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data...High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.展开更多
The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.W...The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.We address the problem of identifying and estimating GAPLM when the response variable is nonignorably missing.Three types of monotone missing data mechanism are assumed,including logistic model,probit model and complementary log-log model.In this situation,likelihood based on observed data may not be identifiable.In this article,we show that the parameters of interest are identifiable under very mild conditions,and then construct the estimators of the unknown parameters and unknown functions based on a likelihood-based approach by expanding the unknown functions as a linear combination of polynomial spline functions.We establish asymptotic normality for the estimators of the parametric components.Simulation studies demonstrate that the proposed inference procedure performs well in many settings.We apply the proposed method to the household income dataset from the Chinese Household Income Project Survey 2013.展开更多
基金the National Natural Science Foundation of China(Grant No.12071166)the Fundamental Research Funds for the Central Universities of China(Nos.2662023LXPY005,2662022XXYJ005)HZAU-AGIS Cooperation Fund(No.SZYJY2023010)。
文摘Interpretability has drawn increasing attention in machine learning.Most works focus on post-hoc explanations rather than building a self-explaining model.So,we propose a Neural Partially Linear Additive Model(NPLAM),which automatically distinguishes insignificant,linear,and nonlinear features in neural networks.On the one hand,neural network construction fits data better than spline function under the same parameter amount;on the other hand,learnable gate design and sparsity regular-term maintain the ability of feature selection and structure discovery.We theoretically establish the generalization error bounds of the proposed method with Rademacher complexity.Experiments based on both simulations and real-world datasets verify its good performance and interpretability.
基金supported by the National Natural Science Foundation of China under Grant No.11771250the Natural Science Foundation of Shandong Province under Grant No.ZR2019MA002the Program for Scientific Research Innovation of Graduate Dissertation under Grant No.LWCXB201803
文摘This paper considers partially linear additive models with the number of parameters diverging when some linear cons train ts on the parame trie par t are available.This paper proposes a constrained profile least-squares estimation for the parametrie components with the nonparametric functions being estimated by basis function approximations.The consistency and asymptotic normality of the restricted estimator are given under some certain conditions.The authors construct a profile likelihood ratio test statistic to test the validity of the linear constraints on the parametrie components,and demonstrate that it follows asymptotically chi-squared distribution under the null and alternative hypo theses.The finite sample performance of the proposed method is illus trated by simulation studies and a data analysis.
基金Supported by the Hangzhou Joint Fund of the Zhejiang Provincial Natural Science Foundation of Chi-na(LHZY24A010002)the MOE Project of Humanities and Social Sciences(21YJCZH235).
文摘High-dimensional heterogeneous data have acquired increasing attention and discussion in the past decade.In the context of heterogeneity,semiparametric regression emerges as a popular method to model this type of data in statistics.In this paper,we leverage the benefits of expectile regression for computational efficiency and analytical robustness in heterogeneity,and propose a regularized partially linear additive expectile regression model with a nonconvex penalty,such as SCAD or MCP,for high-dimensional heterogeneous data.We focus on a more realistic scenario where the regression error exhibits a heavy-tailed distribution with only finite moments.This scenario challenges the classical sub-gaussian distribution assumption and is more prevalent in practical applications.Under certain regular conditions,we demonstrate that with probability tending to one,the oracle estimator is one of the local minima of the induced optimization problem.Our theoretical analysis suggests that the dimensionality of linear covariates that our estimation procedure can handle is fundamentally limited by the moment condition of the regression error.Computationally,given the nonconvex and nonsmooth nature of the induced optimization problem,we have developed a two-step algorithm.Finally,our method’s effectiveness is demonstrated through its high estimation accuracy and effective model selection,as evidenced by Monte Carlo simulation studies and a real-data application.Furthermore,by taking various expectile weights,our method effectively detects heterogeneity and explores the complete conditional distribution of the response variable,underscoring its utility in analyzing high-dimensional heterogeneous data.
文摘The generalized additive partial linear models(GAPLM)have been widely used for flexiblemodeling of various types of response.In practice,missing data usually occurs in studies of economics,medicine,and public health.We address the problem of identifying and estimating GAPLM when the response variable is nonignorably missing.Three types of monotone missing data mechanism are assumed,including logistic model,probit model and complementary log-log model.In this situation,likelihood based on observed data may not be identifiable.In this article,we show that the parameters of interest are identifiable under very mild conditions,and then construct the estimators of the unknown parameters and unknown functions based on a likelihood-based approach by expanding the unknown functions as a linear combination of polynomial spline functions.We establish asymptotic normality for the estimators of the parametric components.Simulation studies demonstrate that the proposed inference procedure performs well in many settings.We apply the proposed method to the household income dataset from the Chinese Household Income Project Survey 2013.