عنوان انگلیسی مقاله:
Distributed learning on 20 000+ lung cancer patients – The Personal Health Train
ترجمه فارسی عنوان مقاله:
یادگیری توزیع شده بر روی 20 000+ بیمار مبتلا به سرطان ریه - آموزش بهداشت شخصی
Sciencedirect - Elsevier - Radiotherapy and Oncology, 144 (2020) 189-200: doi:10:1016/j:radonc:2019:11:019
Timo M. Deist a,b,1, Frank J.W.M. Dankers a,c,1, Priyanka Ojha d, M. Scott Marshall d, Tomas Janssen d, Corinne Faivre-Finn e, Carlotta Masciocchi g, Vincenzo Valentini f,g, Jiazhou Wang h, Jiayan Chen h, Zhen Zhang h, Emiliano Spezi i,j, Mick Button j, Joost Jan Nuyttens k, René Vernhout k, Johan van Soest a, Arthur Jochems b, René Monshouwer c, Johan Bussink c, Gareth Price e,2, Philippe Lambin b,2, Andre Dekker a,
Background and purpose: Access to healthcare data is indispensable for scientific progress and innovation.
Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns.
The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR
(Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and
machine learning. Patient data never leaves a healthcare institute.
Materials and methods: Lung cancer patient-specific databases (tumor staging and post-treatment survival
information) of oncology departments were translated according to a FAIR data model and stored
locally in a graph database. Software was installed locally to enable deployment of distributed machine
learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available)
are patient privacy-preserving as only summary statistics and regression coefficients are exchanged
with the central server. A logistic regression model to predict post-treatment two-year survival was
trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction
error (RMSE) and calibration plots.
Results: In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in
5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using
the PHT. Summary statistics were computed across databases. A distributed logistic regression model
predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and
2011 and validated on 8 393 patients treated between 2012 and 2015.
Conclusion: The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data
sharing and enables fast data analyses across multiple institutes from different countries with different
regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing
Keywords: Lung cancer | Big data | Distributed learning | Federated learning | Machine learning | Survival analysis | Prediction modeling | FAIR data