Structured Dimensionality Reduction for Additive Model Regression
Abstract: Additive models are regression methods which model the response variable as the sum of univariate transfer functions of the input variables. Key benefits of additive models are their accuracy and interpretability on many real-world tasks. Additive models are however not adapted to problems involving a large number (e.g., hundreds) of input variables, as they are prone to overfitting in addition to losing interpretability. In this paper, we introduce a novel framework for applying additive models to a large number of input variables. The key idea is to reduce the task dimensionality by deriving a small number of new covariates obtained by linear combinations of the inputs, where the linear weights are estimated with regard to the regression problem at hand. The weights are moreover constrained to prevent overfitting and facilitate the interpretation of the derived covariates. We establish identifiability of the proposed model under mild assumptions and present an efficient approximate learning algorithm. Experiments on synthetic and real-world data demonstrate that our approach compares favorably to baseline methods in terms of accuracy, while resulting in models of lower complexity and yielding practical insights into high-dimensional real-world regression tasks. Our framework broadens the applicability of additive models to high-dimensional problems while maintaining their interpretability and potential to provide practical insights.