photograph of a laptop computer



Curriculum: Starter Track

Asking (the right) research questions in data science


“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question” said the renowned statistician John Tukey as early as 1969.

Based on my own experience in statistical consultations, much confusion occurs due to a mismatch between research question and data/methods. However, even more fundamentally, the research question is often not even clearly articulated at the outset – perhaps because researchers anticipate that the right question can only be answered approximately. But how can we discuss what data and methods are suitable, if we are unclear or vague about the question to be answered? It seems that now, in the era of big data characterised by an abundance of data and a similar abundance of methods for analysing the data, the issue of asking the right question receives a new urgency.

Learning contents

In this course we will discuss the different types of research questions one might face in a variety of applied fields within data science, such as psychology, epidemiology, genetics, or political & social sciences. Key distinctions concern questions that are (i) descriptive, (ii) predictive, or (iii) causal (i.e. about counterfactual prediction). We will consider how these types of research questions are interrelated with the choices / requirements of data, methods of analysis, and the need for more or less specific subject matter background knowledge. We will see how starting with a clear and explicit research question helps with assessing, and maybe avoiding, potential sources of (structural) bias in answering that research question.

Key topics that will be covered:

- Types of research questions (descriptive, predictive, causal/counterfactual)

- Issues of validity and structural bias (e.g. selection, confounding, ascertainment)

- The target trial principle

Learning outcomes

Upon completion, participants of the course will be able to:

- categorise research questions as descriptive, predictive or causal

- elicit a research question by formulating a target trial

- determine implications for the required data and choice of appropriate methods

- identify possible threats to validity / sources of structural bias.

Prior knowledge


Further reading

- Miguel A. Hernán, John Hsu & Brian Healy (2019) A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks, CHANCE, 32:1, 42-49.

- Miguel A. Hernán, James M. Robins, Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available, American Journal of Epidemiology, Volume 183, Issue 8, 15 April 2016, Pages 758–764.

- Huitfeldt A., Is caviar a risk factor for being a millionaire? BMJ 2016; 355:i6536 doi:10.1136/bmj.i6536;




February 16, 2023, 9:30 AM - 12:00 PM


Haus Oxford, 2nd floor
2.2090, Large Seminar Room

Mary-Somerville-Straße 2
28359 Bremen

Online via Zoom

Didelez, Prof. Dr. Vanessa

Leibniz-Institute for Prevention Research and Epidemiology - BIPS
Deputy Head of the Department of Biometry and Data Management

Professor of Statistics with focus on Causal Inference at the University of Bremen


Lecturers information