TY - JOUR
T1 - Machine learning does not outperform traditional statistical modelling for kidney allograft failure prediction
AU - Truchot, Agathe
AU - Raynaud, Marc
AU - Kamar, Nassim
AU - Naesens, Maarten
AU - Legendre, Christophe
AU - Delahousse, Michel
AU - Thaunat, Olivier
AU - Buchler, Matthias
AU - Crespo, Marta
AU - Linhares, Kamilla
AU - Orandi, Babak J.
AU - Akalin, Enver
AU - Pujol, Gervacio Soler
AU - Silva, Helio Tedesco
AU - Gupta, Gaurav
AU - Segev, Dorry L.
AU - Jouven, Xavier
AU - Bentall, Andrew J.
AU - Stegall, Mark D.
AU - Lefaucheur, Carmen
AU - Aubert, Olivier
AU - Loupy, Alexandre
N1 - Funding Information:
The study was funded by MSD Avenir grant. AL received funds from the French National Institute for Health and Medical Research (INSERM, action thématique incitative sur programme ATIP-Avenir 2016). OA received a grant from the Fondation Bettencourt Schueller .
Publisher Copyright:
© 2022 International Society of Nephrology
PY - 2023
Y1 - 2023
N2 - Machine learning (ML) models have recently shown potential for predicting kidney allograft outcomes. However, their ability to outperform traditional approaches remains poorly investigated. Therefore, using large cohorts of kidney transplant recipients from 14 centers worldwide, we developed ML-based prediction models for kidney allograft survival and compared their prediction performances to those achieved by a validated Cox-Based Prognostication System (CBPS). In a French derivation cohort of 4000 patients, candidate determinants of allograft failure including donor, recipient and transplant-related parameters were used as predictors to develop tree-based models (RSF, RSF-ERT, CIF), Support Vector Machine models (LK-SVM, AK-SVM) and a gradient boosting model (XGBoost). Models were externally validated with cohorts of 2214 patients from Europe, 1537 from North America, and 671 from South America. Among these 8422 kidney transplant recipients, 1081 (12.84%) lost their grafts after a median post-transplant follow-up time of 6.25 years (Inter Quartile Range 4.33-8.73). At seven years post-risk evaluation, the ML models achieved a C-index of 0.788 (95% bootstrap percentile confidence interval 0.736-0.833), 0.779 (0.724-0.825), 0.786 (0.735-0.832), 0.527 (0.456-0.602), 0.704 (0.648-0.759) and 0.767 (0.711-0.815) for RSF, RSF-ERT, CIF, LK-SVM, AK-SVM and XGBoost respectively, compared with 0.808 (0.792-0.829) for the CBPS. In validation cohorts, ML models’ discrimination performances were in a similar range of those of the CBPS. Calibrations of the ML models were similar or less accurate than those of the CBPS. Thus, when using a transparent methodological pipeline in validated international cohorts, ML models, despite overall good performances, do not outperform a traditional CBPS in predicting kidney allograft failure. Hence, our current study supports the continued use of traditional statistical approaches for kidney graft prognostication.
AB - Machine learning (ML) models have recently shown potential for predicting kidney allograft outcomes. However, their ability to outperform traditional approaches remains poorly investigated. Therefore, using large cohorts of kidney transplant recipients from 14 centers worldwide, we developed ML-based prediction models for kidney allograft survival and compared their prediction performances to those achieved by a validated Cox-Based Prognostication System (CBPS). In a French derivation cohort of 4000 patients, candidate determinants of allograft failure including donor, recipient and transplant-related parameters were used as predictors to develop tree-based models (RSF, RSF-ERT, CIF), Support Vector Machine models (LK-SVM, AK-SVM) and a gradient boosting model (XGBoost). Models were externally validated with cohorts of 2214 patients from Europe, 1537 from North America, and 671 from South America. Among these 8422 kidney transplant recipients, 1081 (12.84%) lost their grafts after a median post-transplant follow-up time of 6.25 years (Inter Quartile Range 4.33-8.73). At seven years post-risk evaluation, the ML models achieved a C-index of 0.788 (95% bootstrap percentile confidence interval 0.736-0.833), 0.779 (0.724-0.825), 0.786 (0.735-0.832), 0.527 (0.456-0.602), 0.704 (0.648-0.759) and 0.767 (0.711-0.815) for RSF, RSF-ERT, CIF, LK-SVM, AK-SVM and XGBoost respectively, compared with 0.808 (0.792-0.829) for the CBPS. In validation cohorts, ML models’ discrimination performances were in a similar range of those of the CBPS. Calibrations of the ML models were similar or less accurate than those of the CBPS. Thus, when using a transparent methodological pipeline in validated international cohorts, ML models, despite overall good performances, do not outperform a traditional CBPS in predicting kidney allograft failure. Hence, our current study supports the continued use of traditional statistical approaches for kidney graft prognostication.
KW - artificial intelligence
KW - prediction
KW - transplantation
UR - http://www.scopus.com/inward/record.url?scp=85146460351&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146460351&partnerID=8YFLogxK
U2 - 10.1016/j.kint.2022.12.011
DO - 10.1016/j.kint.2022.12.011
M3 - Article
C2 - 36572246
AN - SCOPUS:85146460351
SN - 0085-2538
JO - Kidney International
JF - Kidney International
ER -