The mission of the Health Data Science group is to produce clinically actionable insights from observational health data by enabling data-driven healthcare. Improved interoperability of data is a necessary pre-requisite for this mission.

The Health Data Science (HDS) group at the Department of Medical Informatics at the Erasmus MC aims to develop analytical methods and tools to enable data-driven healthcare. We perform research in the domain of clinical characterisation, population-level effect estimation, and we apply advanced machine learning and statistical methods to develop clinical prediction models at scale in distributed data networks. A pre-requisite for this work is improved interoperability of health data.

Improving interoperability of data

Use of databases in a multi-center study is severely hampered by a variety of challenges, e.g., each database has a different database structure and uses different terminology systems. In an ideal world, a harmonized approach would be available by which data and results from different databases could be combined to answer a specific research question. Standardized data models and common analytical tools should become a de facto standard. Our group, therefore, collaborates closely with the Observational Health Data Sciences and Informatics (OHDSI) initiative ( that is responsible for the development of the OMOP Common Data Model (OMOP-CDM), and leads its European Chapter ( to support its adoption in Europe. Moreover, we lead the European Health Data and Evidence Network (EHDEN) project that is standardizing a large volume of data sources in Europe to the OMOP-CDM.

Clinical Characterisation

This research domain focusses on the question: "What happened to them?". We develop and apply methods to describe the patient journey in time. Examples of these type of studies are: treatment utilization, disease trajectories, and descriptive co-morbidity analyses. We believe there is great value in descriptive analyses to increase the insights in current practice across the globe and to obtain more outcome-driven healthcare. Our group is currently involved in the methodological research for disease trajectories and treatment patterns, and we perform clinical studies such as drug utilization studies.

Patient-Level Prediction

This research domain focusses on the question: "What will happen to them?". Presently, health care is generating a huge amount of patient-specific information contained in Electronic Health Records (EHR). This includes structured data in the form of diagnoses, medications, laboratory test results, and unstructured data contained in clinical narratives. This opens unprecedented possibilities for research and ultimately patient care. Effective exploitation of these massive datasets demands novel methodology and an interdisciplinary approach. This is where our group wants to play an important role. We aim to assess how much predictive performance can be gained by leveraging the large amount of data originating from the complete EHR of a patient.

Population-Level Effect Estimation

This research domain focusses on the question: "What are the casual effects?". This field, also called counterfactual prediction or causal inference aims to assess the safety and effectiveness of treatments. It requires proper correction for confounding, e.g. by applying large-scale propensity score modelling. Examples are questions like: "Does metformin cause lactic acidosis?" or "Does Metformin cause lactic acidosis more than Glyburide?". We apply these methods at a large scale in different disease areas. Methodological work in our group focusses on the heterogeneity of treatment effect for which we are building pipelines against the OMOP-CDM.


Prediction Model Literature

Figure 1. The total number of prediction papers in PubMed. This demonstrates the very strong increase in publications in this research domain

Figure 2. Wordle presenting the disease areas in prediction literature. We created this by parsing the diseases in the abstract titles using natural language processing.