Counterfactual Designs: Understanding What Didn’t Happen to Make Sense of What Did

We explore how counterfactual reasoning helps evaluate scientific interventions by comparing real outcomes with hypothetical alternatives. A powerful and underused tool in social sciences.
research
evaluation
methodology
Autor/a

Antonio Matas-Terron

Fecha de publicación

16 de mayo de 2025

What would have happened if…?
This simple question holds great power in scientific research. It’s not science fiction or alternate history, but a rigorous way to think about hypothetical scenarios to assess the effects of our decisions, policies, or interventions. This way of thinking is called counterfactual design.

In this post, I’ll explain what it is, why it’s essential in social sciences, and how it can be applied—in education, health, climate research, disaster management, and more.

What is counterfactual reasoning?

Suppose you want to know whether a scholarship improves academic performance. To know for sure, you’d need to compare the same student with and without the scholarship. But of course, that’s impossible—you can’t split a student in two.

That’s where counterfactual design comes in: we try to compare what actually happened with what would have happened if the intervention had not occurred. Since that alternative world doesn’t exist, we have to build it using experimental or quasi-experimental techniques.

How do we do it?

The best-known tool is the randomized controlled trial (RCT): participants are randomly assigned to a treatment group (e.g., a new teaching method) or a control group. The difference in outcomes between the two gives us an estimate of the “pure” effect of the intervention.

But when randomization isn’t feasible—due to ethical, logistical, or political reasons—we use quasi-experimental designs, such as:

  • Difference-in-Differences
  • Regression Discontinuity Designs
  • Qualitative Comparative Analysis (QCA)

All of these strategies share a common goal: construct a plausible counterfactual scenario as rigorously as possible.

Why does it matter?

Because it helps us avoid jumping to false conclusions. It’s not enough to observe an improvement after an intervention—it might be due to other factors like time, motivation, or external changes. Counterfactual design helps us say with greater confidence: “this improved because of the intervention, not something else.”

And it applies to fields as diverse as:

  • Public health (Did closing schools during a pandemic actually help?)
  • Disaster preparedness (Would different measures have reduced the damage?)
  • Climate change (What if emissions had been reduced earlier?)

But it’s not that simple

Counterfactual approaches have important limitations:

  • They require high-quality data, which is often missing or flawed.
  • They rely on strong causal assumptions that must be well justified.
  • They raise ethical issues, especially when human participants are involved and control groups must be formed.

Also, building credible hypothetical scenarios is as complex as it is tempting.

What about in education?

In our field, counterfactual thinking is still rare—but it holds huge potential. Evaluating educational policies, teaching methods, or inclusion programs could benefit from more rigorous frameworks that help us answer the key question: Did this work because we did it this way… or would it have happened anyway?

A real-world example: reducing dropout in first-year students

Imagine a university introduces a peer mentoring program to reduce dropout among first-year students. The program is applied in the Faculty of Education but not in others.

The counterfactual approach would involve constructing a comparable scenario: What would have happened to those students if they had not participated in the mentoring program? For instance, we could compare dropout rates with those from the Faculty of Psychology (where the program was not applied) using a difference-in-differences design.

Did dropout decrease more in Education than in Psychology that same year? Was the difference statistically significant? Can we attribute it to the mentoring program?

This kind of approach not only strengthens the findings—it can also convince policymakers that the intervention is worth scaling up.