Generate testable hypotheses from counterfactual reasoning
Use counterfactuals to identify knowledge gaps
Connect “what would have happened” questions to “what should we study next”
Apply counterfactual reasoning to guide experimental design
31.2 Introduction
This chapter shows how counterfactual reasoning leads to hypothesis generation—using “what would have happened” questions to identify what we need to study next. This connects the strongest form of causal reasoning (counterfactual) with the most forward-looking activity (designing new studies). This is part of Imagining in the Observable world—using counterfactuals to generate hypotheses about what to study next.
31.3 Counterfactuals → Hypotheses
31.3.1 The Connection
Counterfactual questions naturally lead to hypotheses:
Counterfactual: “What would have happened if we had treated earlier?”
Hypothesis: “Early treatment improves outcomes”
Study design: “Randomised trial comparing early vs late treatment”
31.3.2 Generating Hypotheses from Counterfactuals
Process: 1. Ask counterfactual question: “What would have happened if X had been different?” 2. Identify knowledge gap: What do we not know that prevents answering the counterfactual? 3. Formulate hypothesis: State what we expect to find 4. Design study: Create experiment to test the hypothesis
31.3.3 Example: Treatment Timing
Counterfactual question: “What would have happened if we had treated this patient earlier?”
Knowledge gap: We don’t know the effect of early vs late treatment.
Hypothesis: “Early treatment (within 24 hours) improves recovery compared to late treatment (after 48 hours).”
Study design: Randomised trial comparing early vs late treatment, controlling for baseline severity.
Here’s an example workflow from counterfactual question to hypothesis:
# Find project root and include ensure_packages.jlproject_root =let current =pwd()while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml")) parent =dirname(current) parent == current &&break current = parentend currentendinclude(joinpath(project_root, "scripts", "ensure_packages.jl"))@auto_using CausalDynamics Graphs# Step 1: Counterfactual questioncounterfactual_question ="What would have happened if we had treated this patient earlier?"println("Step 1: Counterfactual Question")println(" ", counterfactual_question)# Step 2: Check identifiability# Example graph: L → A → Y, L → Y (confounding)g =SimpleDiGraph(3)add_edge!(g, 1, 2) # L → Aadd_edge!(g, 2, 3) # A → Yadd_edge!(g, 1, 3) # L → Y# Check if treatment effect is identifiableadj_set =backdoor_adjustment_set(g, 2, 3) # A → Yis_identifiable = !isempty(adj_set)println("\nStep 2: Check Identifiability")if is_identifiableprintln(" ✓ Treatment effect is identifiable (adjust for L)")println(" → Can answer counterfactual from available data")elseprintln(" ✗ Treatment effect is NOT identifiable")println(" → Knowledge gap: cannot answer counterfactual")end# Step 3: If not identifiable, identify knowledge gapif !is_identifiableprintln("\nStep 3: Identify Knowledge Gap")println(" Missing: Unmeasured confounder or insufficient data")println(" Need: Additional measurements or experimental design")end# Step 4: Formulate hypothesishypothesis ="Early treatment (within 24 hours) improves recovery compared to late treatment (after 48 hours)"println("\nStep 4: Formulate Hypothesis")println(" ", hypothesis)# Step 5: Design studystudy_design ="Randomised trial: Early treatment (t < 24h) vs Late treatment (t > 48h), controlling for baseline severity"println("\nStep 5: Study Design")println(" ", study_design)
Step 1: Counterfactual Question
What would have happened if we had treated this patient earlier?
Step 2: Check Identifiability
✓ Treatment effect is identifiable (adjust for L)
→ Can answer counterfactual from available data
Step 4: Formulate Hypothesis
Early treatment (within 24 hours) improves recovery compared to late treatment (after 48 hours)
Step 5: Study Design
Randomised trial: Early treatment (t < 24h) vs Late treatment (t > 48h), controlling for baseline severity
31.4 Using Counterfactuals to Identify Knowledge Gaps
31.4.1 What We Know vs What We Don’t Know
Counterfactual reasoning reveals: - What we can answer: Counterfactuals that are identified from available data - What we cannot answer: Counterfactuals that are not identified (knowledge gaps)
31.4.2 Knowledge Gaps → Research Questions
When counterfactuals are not identified, this reveals knowledge gaps:
Unmeasured confounders: Need to measure additional variables
Unidentified mechanisms: Need to design experiments that identify mechanisms
Missing data: Need to collect data on specific variables
31.4.3 Example: Unmeasured Confounder
Counterfactual question: “What would have happened if treatment had been different?”
Knowledge gap: We don’t know \(U\)’s distribution or its relationship to treatment and outcome.
Hypothesis: “If we measure \(U\), we can identify the treatment effect.”
Study design: Collect data on \(U\) in new study.
31.5 Bounds and Partial Identification → Hypotheses
When counterfactuals are only partially identified, bounds can guide hypothesis generation:
Wide bounds: Large uncertainty → hypothesis about reducing uncertainty
Narrow bounds: Small uncertainty → hypothesis about the effect direction
Sensitivity analysis: How bounds change → hypothesis about what variables matter
31.5.1 Example: Sensitivity Analysis
Counterfactual question: “What would have happened if treatment had been different?”
Result: Partial identification with bounds \([0.1, 0.9]\) for treatment effect.
Hypothesis: “Treatment effect is positive (lower bound > 0), but magnitude is uncertain.”
Study design: Design study to narrow the bounds (measure suspected confounders, increase sample size).
31.5.2 Implementation: Bounds → Hypothesis
We can generate hypotheses from partial identification bounds:
# Find project root and include ensure_packages.jlproject_root =let current =pwd()while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml")) parent =dirname(current) parent == current &&break current = parentend currentendinclude(joinpath(project_root, "scripts", "ensure_packages.jl"))@auto_using CairoMakie# Example: Partial identification with bounds# Treatment effect bounds: [0.1, 0.9]lower_bound =0.1upper_bound =0.9point_estimate =0.5# Midpoint (uncertain)# Generate hypotheses from boundsif lower_bound >0 hypothesis_direction ="Treatment effect is positive (lower bound > 0)"elseif upper_bound <0 hypothesis_direction ="Treatment effect is negative (upper bound < 0)"else hypothesis_direction ="Treatment effect direction is uncertain (bounds span zero)"endbound_width = upper_bound - lower_boundif bound_width >0.5 hypothesis_uncertainty ="Large uncertainty → hypothesis: reduce uncertainty by measuring confounders"else hypothesis_uncertainty ="Small uncertainty → hypothesis: effect direction is clear"end# Visualise boundsletfig =Figure(size = (800, 400))ax =Axis(fig[1, 1], title ="Partial Identification Bounds", xlabel ="Treatment Effect", ylabel ="Density")# Shade the region between bounds# Create x points for the bandx_band = [lower_bound, upper_bound, upper_bound, lower_bound]y_band = [0, 0, 1, 1]poly!(ax, Point2f.(zip(x_band, y_band)), color = (:blue, 0.3))vlines!(ax, [lower_bound, upper_bound], color =:blue, linewidth =2, linestyle =:dash)vlines!(ax, [point_estimate], color =:red, linewidth =2, label ="Point estimate")vlines!(ax, [0], color =:black, linewidth =1, linestyle =:dot, label ="No effect")axislegend(ax) fig # Only this gets displayedendprintln("Bounds → Hypothesis:")println(" Bounds: [", lower_bound, ", ", upper_bound, "]")println(" Width: ", round(bound_width, digits=2))println(" Hypothesis (direction): ", hypothesis_direction)println(" Hypothesis (uncertainty): ", hypothesis_uncertainty)
Bounds → Hypothesis:
Bounds: [0.1, 0.9]
Width: 0.8
Hypothesis (direction): Treatment effect is positive (lower bound > 0)
Hypothesis (uncertainty): Large uncertainty → hypothesis: reduce uncertainty by measuring confounders
31.6 From Hypotheses to Study Design
31.6.1 Hypothesis → Research Question
Once we have a hypothesis, we can formulate a research question:
Hypothesis: “Early treatment improves outcomes”
Research question: “Does early treatment (within 24 hours) improve recovery compared to late treatment (after 48 hours)?”
31.6.2 Research Question → Study Design
The research question guides study design:
What to measure: Outcome, treatment, confounders
When to measure: Timing of measurements
What interventions: Treatment strategies to compare
Design type: Randomised trial, observational study, etc.
31.7 World Context
This chapter addresses Imagining in the Observable world—what alternative observable outcomes are possible, and what should we study next? Hypothesis generation from counterfactuals connects the strongest form of causal reasoning (counterfactual) with the most forward-looking activity (designing new studies). This completes the Observable “Imagining” phase by showing how counterfactuals lead to new research.
31.8 Key Takeaways
Counterfactuals → hypotheses: “What would have happened” questions lead to testable hypotheses
Knowledge gaps: Counterfactuals reveal what we don’t know
Bounds guide hypotheses: Partial identification suggests what to study
Hypotheses → study design: Counterfactual reasoning guides experimental design
31.9 Further Reading
Pearl (2009): Causality — Counterfactual framework