22–25 Jul 2025
EAM2025
Atlantic/Canary timezone

Incorporating longitudinal variability in prediction models: a comparison of machine learning and logistic regression

24 Jul 2025, 11:00
15m
Faculty of Social Sciences and Communication. (The Pyramid)/13 - Room (Faculty of Social Sciences and Communication. (The Pyramid))

Faculty of Social Sciences and Communication. (The Pyramid)/13 - Room

Faculty of Social Sciences and Communication. (The Pyramid)

30
Show room on map
Oral Presentation Statistical analyses Session 20: "IA y M learning"

Speaker

Liza de Groot (Amsterdam UMC)

Abstract

Background: Clinical prediction models estimate health outcome probabilities, aiding decision-making. Incorporating longitudinal data can improve predictive accuracy, but complexity and interpretability challenges often limit its use. While the predictive value of a predictor’s mean and change over time is well-documented, the variability around this change remains underexplored. Traditional regression analyses, though interpretable, struggle with repeated-measurements data, e.g., collinearity and numerous potential predictors. Machine Learning (ML) methods, i.e., Random Forest and Lasso regression, may better handle repeated-measurements data. This study evaluated the predictive value of three longitudinal parameters: mean, change, and variability for a time-independent binary outcome and compared ML methods with logistic regression.
Methods: Random Forest, Lasso regression, and logistic regression were compared regarding selected predictors, interpretability of results (predictor-outcome relationships), and predictive performance (AUC and calibration curves). Depression (clinically significant symptoms) was the binary outcome, with 81 longitudinal parameters (mean, change, variability) as predictors. Models were trained on 70% of data and internally validated on 30% using the Longitudinal Aging Study Amsterdam (LASA).
Results: All methods identified similar important predictors, including variability parameters. Analyses incorporating variability parameters achieved slightly higher AUCs than those without. Regression coefficients of Lasso regression and logistic regression were consistent with the predictor-outcome relationships reflected in Partial Dependency Plots from Random Forest. Predictive performance was comparable across methods (test, AUC: 0.768–0.775). Calibration curves revealed overestimation, with predictions remaining low (<0.6).
Discussion: The high false-negative rates and low predictions, probably are a result of the imbalanced dataset (13.05% depression prevalence). Sensitivity analyses with Random Under Sampling of the majority class resulted in predicted prevalences closer to the observed, a greater variation in predicted probabilities in calibration curves; however, AUCs did not improve.
Conclusion: Advanced ML techniques did not outperform logistic regression in predictive performance. However, incorporating longitudinal predictors’ variability around change over time is critical for improving clinical prediction models. This is particularly relevant in contexts with long follow-up periods where predictors are likely to change on average over time, e.g., when following an aging population.

Oral presentation Incorporating longitudinal variability in prediction models: a comparison of machine learning and logistic regression
Author Liza de Groot
Affiliation aDepartment of Epidemiology and Data Science, Amsterdam UMC, Location VUmc, Amsterdam, the Netherlands
Keywords Longitudinal predictors, Machine learning, Prediction

Primary author

Liza de Groot (Amsterdam UMC)

Co-authors

Presentation materials