
ST-LE-2023-08
Curriculum: Starter TrackAsking (the right) research questions in data science
Motivation
“An approximate answer to the right question is worth a
great deal more than a precise answer to the wrong question” said the renowned
statistician John Tukey as early as 1969.
Based on my own experience in statistical consultations,
much confusion occurs due to a mismatch between research question and
data/methods. However, even more fundamentally, the research question is often
not even clearly articulated at the outset – perhaps because researchers
anticipate that the right question can only be answered approximately. But how
can we discuss what data and methods are suitable, if we are unclear or vague
about the question to be answered? It seems that now, in the era of big data
characterised by an abundance of data and a similar abundance of methods for
analysing the data, the issue of asking the right question receives a new
urgency.
Learning contents
In this course we will discuss the different types of
research questions one might face in a variety of applied fields within data
science, such as psychology, epidemiology, genetics, or political & social
sciences. Key distinctions concern questions that are (i) descriptive, (ii)
predictive, or (iii) causal (i.e. about counterfactual prediction). We will
consider how these types of research questions are interrelated with the
choices / requirements of data, methods of analysis, and the need for more or
less specific subject matter background knowledge. We will see how starting
with a clear and explicit research question helps with assessing, and maybe
avoiding, potential sources of (structural) bias in answering that research
question.
Key topics that will be covered:
- Types of research questions (descriptive, predictive,
causal/counterfactual)
- Issues of validity and structural bias (e.g. selection,
confounding, ascertainment)
- The target trial principle
Learning outcomes
Upon completion, participants of the course will be able to:
- categorise research questions as descriptive, predictive
or causal
- elicit a research question by formulating a target trial
- determine implications for the required data and choice of
appropriate methods
- identify possible threats to validity / sources of
structural bias.
Prior knowledge
---
Further reading
- Miguel A. Hernán, John Hsu & Brian Healy (2019) A
Second Chance to Get Causal Inference Right: A Classification of Data Science
Tasks, CHANCE, 32:1, 42-49.
- Miguel A. Hernán, James M. Robins, Using Big Data to
Emulate a Target Trial When a Randomized Trial Is Not Available, American
Journal of Epidemiology, Volume 183, Issue 8, 15 April 2016, Pages 758–764.
- Huitfeldt A., Is caviar a risk factor for being a
millionaire? BMJ 2016; 355:i6536 doi:10.1136/bmj.i6536; https://www.bmj.com/content/355/bmj.i6536
Motivation
“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question” said the renowned statistician John Tukey as early as 1969.
Based on my own experience in statistical consultations, much confusion occurs due to a mismatch between research question and data/methods. However, even more fundamentally, the research question is often not even clearly articulated at the outset – perhaps because researchers anticipate that the right question can only be answered approximately. But how can we discuss what data and methods are suitable, if we are unclear or vague about the question to be answered? It seems that now, in the era of big data characterised by an abundance of data and a similar abundance of methods for analysing the data, the issue of asking the right question receives a new urgency.
Learning contents
In this course we will discuss the different types of research questions one might face in a variety of applied fields within data science, such as psychology, epidemiology, genetics, or political & social sciences. Key distinctions concern questions that are (i) descriptive, (ii) predictive, or (iii) causal (i.e. about counterfactual prediction). We will consider how these types of research questions are interrelated with the choices / requirements of data, methods of analysis, and the need for more or less specific subject matter background knowledge. We will see how starting with a clear and explicit research question helps with assessing, and maybe avoiding, potential sources of (structural) bias in answering that research question.
Key topics that will be covered:
- Types of research questions (descriptive, predictive, causal/counterfactual)
- Issues of validity and structural bias (e.g. selection, confounding, ascertainment)
- The target trial principle
Learning outcomes
Upon completion, participants of the course will be able to:
- categorise research questions as descriptive, predictive or causal
- elicit a research question by formulating a target trial
- determine implications for the required data and choice of appropriate methods
- identify possible threats to validity / sources of structural bias.
Prior knowledge
---
Further reading
- Miguel A. Hernán, John Hsu & Brian Healy (2019) A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks, CHANCE, 32:1, 42-49.
- Miguel A. Hernán, James M. Robins, Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available, American Journal of Epidemiology, Volume 183, Issue 8, 15 April 2016, Pages 758–764.
- Huitfeldt A., Is caviar a risk factor for being a millionaire? BMJ 2016; 355:i6536 doi:10.1136/bmj.i6536; https://www.bmj.com/content/355/bmj.i6536
When?
February 16, 2023, 9:30 AM - 12:00 PM
Where?
UNICOM 2
Haus 2 (Oxford)
Large Seminar Room 2.2090, 2nd floor
Mary-Somerville-Str. 2
28359 Bremen
and
Online via Zoom
Registration closed
Didelez, Prof. Dr. Vanessa
Leibniz-Institute for Prevention Research and Epidemiology - BIPS
Deputy Head of the Department of Biometry and Data Management
Professor of Statistics with focus on Causal Inference at the University of Bremen
Email: didelez@leibniz-bips.de