Automated real-world data integration improves cancer outcome prediction
The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review
Review
1College of Medicine, Qassim University, Buraidah, Saudi Arabia
2Applied Biotechnology, Faculty of Chemistry, Warsaw University of Technology, Warsaw, Poland
3Malaysian Health Technology Assessment Section, Medical Development Division, Ministry of Health Malaysia, Wilayah Persekutuan Putrajaya, Malaysia
4Health Economics and Health Technology Assessment, School of Health and Wellbeing, University of Glasgow, Glasgow, United Kingdom
5Health Sciences Research Center, Imam Mohammad ibn Saud Islamic University, Riyadh, Saudi Arabia
*these authors contributed equally
Corresponding Author:
Nasser Alotaiq, PhD
Health Sciences Research Center
Imam Mohammad ibn Saud Islamic University
Othman Bin Affan Rd. Al-Nada 13317
Riyadh
Saudi Arabia
Phone: 966 50 411 9153
Email: naalotaiq@imamu.edu.sa
Abstract
Background: Machine learning (ML) and big data analytics are rapidly transforming health care, particularly disease prediction, management, and personalized care. With the increasing availability of real-world data (RWD) from diverse sour
2025 Agenda
June 25, 2025
Biodata Stage
- We'll discuss the importance of making multi-modal, real world cancer patient data available and interpretable to researchers and physician scientists and the requisite computational tools that are required to analyse the data effectively.
- Discuss methods of automated real world data integration.
- Demonstrate how state of the art platforms are allowing the assimilation, storage and access of huge clinical data sets. See how this data is being manipulated to discover new clinically actionable cancer drivers and identify new opportunities for precision medicine.
- The importance of having structured clinical data, the tools being used to structure data and the role of AI/ML (e.g. LLMs).
Automated real-world data integration improves cancer outcome prediction
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center
Dana Farber Cancer Institute)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer Center)
(Memorial Sloan Kettering Cancer C
A research team from Memorial Sloan Kettering Cancer Center (MSK) is demonstrating that cancer outcome predictions can be improved by breaking down hospitals' traditional data silos and analyzing the information—including physicians' clinical notes—with the help of artificial intelligence (AI).
A new study describes a real-time, automated approach developed at MSK that brings together doctors' free-text notes, clinical treatment and outcomes data, patient demographic data, and tumor genomic data from the MSK-IMPACT platform to identify biomarkers that can predict outcomes and likely responses to therapy. Dubbed MSK-CHORD (for Clinicogenomic Harmonized Oncologic Real-World Dataset), the effort is the largest of its kind, combing data from nearly 25,000 patients with non-small cell lung, breast, colorectal, prostate, and pancreatic cancers.
The study was led by co-first authors Justin Jee, MD, Ph.D., Christopher Fong, Ph.D., Karl Pichotta, Ph.D., Thinh Ngoc Tran, Ph.D., and Anisha Luthra, and overseen by senior author Nikolaus Schultz, Ph.D., Director of MSK's Cancer Data Science Initiative. It is published in the journal Nature.
The team found that cancer outcome predict
Structured, Multimodal Real-World Data Can Improve Cancer Outcome Prediction
Machine learning models are revolutionizing cancer outcome predictions by analyzing vast amounts of patient data to identify patterns and insights that traditional methods often miss. However, efforts to build these models are limited by manual extraction of key data elements from unstructured data such as clinical notes and pathology reports. This process is time-consuming, error-prone, and limits scalability.
However, a recent study published in Nature has demonstrated a promising pathway toward overcoming these obstacles to build robust, high-performing machine learning models from clinicogenomic data by leveraging AI to automatically annotate free-text clinician notes and reports.
Using Multimodal, Structured Data to Improve Model Performance and Identify Biomarkers
The study, conducted by a team of researchers at Memorial Sloan Kettering Cancer Center (MSK), introduces the MSK-CHORD dataset, a compilation of real-world clinical, radiographic, histopathologic, laboratory, and genomic sequencing data from 24,950 patients. The researchers achieved this by combining automatically-generated natural