Skip to contents

Penalised regression with multiple sources of prior effects

Armin Rauschenberger1,*~^{1,*}AR, Zied Landoulsi1~^{1}ZL, Mark A. van de Wiel2,~^{2,\dagger}MvdW, and Enrico Glaab1,~^{1,\dagger}EG

1^1Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg.

2^2Department of Epidemiology and Data Science (EDS), Amsterdam University Medical Centers (Amsterdam UMC), Amsterdam, The Netherlands.

*^{*}To whom correspondence should be addressed.

^{\dagger}Mark A. van de Wiel and Enrico Glaab share senior authorship.

Abstract

In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. We propose an approach for integrating multiple sources of such prior information into penalised regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. The proposed method is implemented in the R package ‘transreg’ (https://github.com/lcsb-bds/transreg, https://cran.r-project.org/package=transreg).

Full text (open access)

Rauschenberger et al. (2023). “Penalized regression with multiple sources of prior effects”. Bioinformatics 39(12):btad680. doi: 10.1093/bioinformatics/btad680. (Click here to access PDF.)