Our mission is to apply Data Science to support clinical decision making and enable data-driven healthcare. Improved interoperability of data is necessary for this exciting goal.

Read More


We perform methodological research in clinical prediction modelling and support the development of open-source analytical tools to support this exciting research domain.

Read More


More education for young health data scientists, medical students, and healthcare professional, is needed to train them in the opportunities and limitations of big data in healthcare.

Read More

Welcome to the Health Data Science group

The Health Data Science (HDS) group aims to develop analytical methods and tools to enable data-driven healthcare. We apply advanced machine learning and statistical methods to develop clinical prediction models at scale in distributed data networks.


Clinical decision making is a complicated task in which the clinician has to infer a diagnosis or treatment pathway based on the available medical history of the patient and the current clinical guidelines. Clinical prediction models have been developed to support this decision-making process and are used in clinical practice in a wide spectrum of specialties. These models predict a diagnostic or prognostic outcome based on a combination of patient characteristics, e.g. demographic information, disease history, treatment history. The number of publications describing clinical prediction models has increased strongly over the last 10 years as shown in the figures below. 

Surprisingly, most currently used models are estimated using small datasets and contain a limited set of patient characteristics. This low sample size, and thus low statistical power, forces the data analyst to make stronger modeling assumptions. The selection of the often limited set of patient characteristics is strongly guided by the expert knowledge at hand. This contrasts sharply with the reality of modern medicine wherein patients generate a rich digital trail, which is well beyond the power of any medical practitioner to fully assimilate.

Presently, health care is generating a huge amount of patient-specific information contained in the Electronic Health Records (EHR). This includes structured data in the form of diagnoses, medications, laboratory test results, and unstructured data contained in clinical narratives. This opens unprecedented possibilities for research and ultimately patient care. Effective exploitation of these massive dataset demands novel methodology and an interdisciplinary approach. This is where our group wants to play an important role. We aim to asses how much predictive performance can be gained by leveraging the large amount of data originating from the complete EHR of a patient.

However, actual use of these databases in a multi-center study is severely hampered by a variety of challenges, e.g., each database has a different database structure and uses different terminology systems. In an ideal world, a harmonized approach would be available by which data and results from different databases could be combined to answer a specific research question. Standardized data models and common analytical tools should become a de facto standard. Our group, therefore, collaborates closely with the Observational Health Data Sciences and Informatics (OHDSI) initiative (www.ohdsi.org) that is responsible for the development of the OMOP-CDM, and leads its European Chapter (www.ohdsi-europe.org) to support its adoption in Europe. 


Prediction Model Literature

Figure 1. The total number of prediction papers in PubMed. This demonstrates the very strong increase in publications in this research domain

Figure 2. Wordle presenting the disease areas in prediction literature. We created this by parsing the diseases in the abstract titles using natural language processing.


Join us!
We are always looking for talented people.