Statistical issues in survival analysis (Part XVVIII)

February 26, 2024

TABLE 1. Summary of the available methods for survival regression with competing risks (CR).

Model Type Proportional hazards (PH) High dimensions (�) Missing data

Approaches based on a cause-specific hazard specification

Cox proportional CS hazard Semiparamet rica ✓ ✗ ✗

Lunn–McNeil Semiparamet ric ✓ ✗ ✗

Penalized Cox PH

Semiparamet ric ✓ ✓ ✗

Cox model-based boosting Semiparamet ric ✓ ✓ ✗

Cox likelihood-based boosting Semiparamet ric ✓ ✓ ✗

Approaches based on the CIF

Fine–Gray Semiparamet ric ✓ ✗ ✗

Penalized proportional subdistribution hazard

Semiparamet ric ✓ ✓ ✗

In an article that appeared in Biometrical Journal, Monterruio-Gomez et al presented a review of competing risks (CR) methods in survival analysis and machine learning methods. Competing risks models have traditionally beeen used in survival anlaysis when there is more than one mutually exclusive event of interest. While this has been an active area of interest, the implementation of the methods in software has been limited. They aimed to summarize current landscape of CR approaches developed by statistics and machine learning methods.

They first briefly discussed methodology from cumulative incidence functions to subdistribution hazard functions. They especiallly mentioned the Fine and Grey method. Regression models based on latent failure times also exist. In Table 1, they gave a summary of available methods for survival regression with CR. They also discusssed approaches based on a cause-specific (CS)

hazard specification in a penalized regression with different penalties and also boosting, which aims to convert a weak learner to a strong learner and can do the same with CR data under a CPH model with CS specification. Also under this paradigm they discussed a Lunn-McNeil joint model with a stratified approach equivalent to separate CPH models for each event type and a second, unstratified, approach which capture event-speciric deviations with respect to the reference event, however, this had the baseline hazard assumed to be the same across different event types. Finally they discussed approaches based on the cumulative incidence function (CIF) with either of the following methods: penalized regression, boosting (with a subdistribtuion hazard boosting approach), pseudo-values, direct binomial, parametrc-constrained CIF, dependent Dirichlet processes, survival multitasks boosting (SMTBoost), derivative-based neural network modeling (DeSurv)

They also discussed approachs based on a latent survival times specification. The first was a deep multitask Gaussian process. These can be used to infer a posterior distribution for the survival times.The second was deep survival machines which introduces a parametric model for survival times with a deep learning framework. The third was a Bayesian Lomax delegate racing in which latent times are assumed to be independent and exponentially distributed. It was not clear how exactly CR fit into any of these methods. They then discussed other methods that did not fall into above. They discussed mixture models, which tend to rely on marginal probabilities, tree-based mixture models which is a Bayesian semiparametric model, vertical modeling approach of fitting two models (one for hazard fucntion and one for CS hazard), random survival forests (which can also use a splitting rule based on Grey’s test which can maximize CIF differences).

They also discussed CR survival models for discrete time-to-event data, which is really a CR extension for the proportional odds model. In machine learning this has been extended via neural networks or Bayesian additive regression trees. Providing open-source software has been critical and the authors discussed some libraries in R. The article seemed more like a summary of all available methods and the last section about practical considerations does not really lend itself to final choices until their final example in which they compared approaches in sections 4 and 5 where most approaches agreed with each other. For the methods that allowed for risk prediction, they found that most predictions were identical with strongest discrepancies from the RSF and dependent Dirichlet process. These conclusions were based on one dataset and not simulations. They talked then about goodness-of-fit measures with calibration, discrimination, and Brier score.

Written by, Usha Govindarajulu

Keywords: survival, competing risks, Cox model, machine learning, random survival forests, boosting, cause-specific hazard, cumulative incidence function

References

Monterrubio-Gomez K, Constantine-Cooke N, and Vallejos CA (2024).

“A review on statistical and machine learning competing risks methods” Biometrical Journal.

https://doi.org/10.1002/bimj.202300060

Turn static files into dynamic content formats.

Create a flipbook