30  Counterfactual Reasoning: Unit-Level Alternatives

Status: Draft

v0.4

30.1 Learning Objectives

After reading this chapter, you will be able to:

  • Define counterfactual trajectories using shared exogenous noise in observable settings
  • Compute individual-level counterfactuals from observational data
  • Understand partial identification and bounds when data cannot resolve counterfactual structure
  • Apply counterfactual reasoning to observable outcomes

30.2 Introduction

Counterfactuals ask: β€œWhat would have happened for this specific unit under an alternative?” (Pearl 2009; Imbens and Rubin 2015) This is the strongest form of causal reasoning (Level 3), requiring an explicit structural model plus a notion of unit identity (shared exogenous variables). This chapter focuses on Imagining at the Observable layer: how to define and compute unit-level alternative outcomes from data and modelling assumptions.

30.3 Counterfactuals as Unit-Level Replays

Technically, counterfactuals compare trajectories for the same unit under different interventions. When we simulate a counterfactual trajectory \(Y^{\iota}_{1:T}(\mathbf{u})\), we replay the same mechanisms with the same \(\mathbf{u}\) but with a different intervention.

30.3.1 Counterfactuals Across Layers

In our three-layer framework (Structural, Dynamical, Observable), counterfactuals at the Observable layer involve:

  • For a specific unit: Fixed \(\mathbf{u}\) (unit identity via shared exogenous variables)
  • Under alternative conditions: Different interventions applied to the same mechanisms

The counterfactual trajectory \(Y^{\iota}_{1:T}(\mathbf{u})\) represents the unit’s alternative outcome under intervention \(\iota\), holding \(\mathbf{u}\) fixed.

30.3.2 The Process of Counterfactual Simulation

When we compute a counterfactual:

  1. Infer latent state and exogenous variables: From observations, infer the latent states and (when needed) the exogenous noise \(\mathbf{u}\) consistent with the data under the model.

  2. Simulate under an alternative intervention: With the same \(\mathbf{u}\) (same unit identity) but a different intervention, simulate the alternative trajectory.

  3. Compare: Compare the counterfactual trajectory to the observed trajectory (or to another counterfactual), depending on the query.

30.3.3 Why Shared Exogenous Noise Matters

The requirement for shared exogenous noise \(\mathbf{u}\) is what makes a counterfactual a unit-level object. The fixed \(\mathbf{u}\) represents unit identity under the model, allowing us to ask: β€œFor this specific unit (same \(\mathbf{u}\)), what would have happened under alternative conditions?”

This is fundamentally different from population-level interventions, which average over many different units. Counterfactuals require unit-level reasoning because they compare alternative outcomes for a specific unit, not averages over many.

30.4 Counterfactual Trajectories

30.4.1 Definition

For a fixed exogenous realisation \(\mathbf{u}\), the counterfactual trajectory under intervention \(\iota\) is:

\[ Y^{\iota}_{1:T}(\mathbf{u}) \]

Interpretation: β€œSame unit (same \(\mathbf{u}\)), different world (different intervention).”

30.4.2 Key Requirement

Counterfactuals require shared exogenous noise:

  • Same unit \(\Leftrightarrow\) same \(\mathbf{u}\)
  • Different worlds \(\Leftrightarrow\) different interventions
  • Counterfactual: Same \(\mathbf{u}\), different \(do(\cdot)\)

30.5 Computing Counterfactuals

There are two main approaches to computing counterfactuals:

  1. Unit-level approach: Infer exogenous noise \(\mathbf{u}\) for each unit, then simulate counterfactual for that unit
  2. Population-level approach: Estimate population counterfactual means directly from data (e.g., TMLE, g-methods)

30.5.1 Unit-Level Approach

Steps: 1. Infer \(\mathbf{u}\): From observed data, infer the exogenous noise realisation for this unit 2. Simulate counterfactual: With same \(\mathbf{u}\), simulate under alternative intervention 3. Compare: Compare counterfactual to observed outcome

Challenge: Inferring \(\mathbf{u}\) from observations requires strong assumptions (full structural model).

30.5.2 Implementation: Graph Structure and Exogenous Noise

The causal graph structure determines which exogenous variables must be shared for counterfactual reasoning:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using DAGMakie CairoMakie CausalDynamics Graphs

# Example: Counterfactual requires shared U
# Graph: U β†’ X, U β†’ Y, X β†’ Y
# Nodes: 1=U, 2=X, 3=Y
g = DiGraph(3)
add_edge!(g, 1, 2)  # U β†’ X
add_edge!(g, 1, 3)  # U β†’ Y
add_edge!(g, 2, 3)  # X β†’ Y

# To compute Y^{do(X=0)}(u) given Y^{do(X=1)}(u) = y_obs:
# We need the same u for both worlds

# The graph shows U affects both X and Y
# This means U must be shared across counterfactual worlds

# Check what variables affect Y (these determine which U's must be shared)
parents_Y = get_parents(g, 3)
println("Parents of Y: ", parents_Y)  # Set([1, 2]) = {U, X}

# For counterfactual Y^{do(X=x)}(u), we need:
# - Same U (exogenous noise affecting Y)
# - Different X (intervention changes X)

# The ancestors of Y tell us all variables that could affect Y:
ancestors_Y = get_ancestors(g, 3)
println("Ancestors of Y: ", ancestors_Y)  # Set([1, 2]) = {U, X}

# This shows: for counterfactual reasoning about Y,
# we need to fix U (shared exogenous noise) and vary X (intervention)

# Visualise graph
let
    # Highlight U (shared exogenous) in yellow, treatment and outcome in lightblue
    node_colors = [:yellow, :lightblue, :lightblue]

    fig, ax, p = dagplot(g;
        figure_size = (600, 400),
        layout_mode = :acyclic,
        node_color = node_colors,
        nlabels = ["U (exogenous)", "X (treatment)", "Y (outcome)"]
    )
    fig  # Only this gets displayed
end
Parents of Y: Set([2, 1])
Ancestors of Y: Set([2, 1])

Counterfactual at Observable level: shared exogenous noise U enables alternative observable outcomes

30.5.3 Implementation: Graph Structure for Counterfactual Reasoning

The causal graph structure determines what information is needed for counterfactual reasoning. We can use CausalDynamics.jl to identify necessary variables:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using CausalDynamics Graphs

# Example: Treatment counterfactual
# Graph: U β†’ A, U β†’ Y, A β†’ Y
# Nodes: 1=U, 2=A, 3=Y
g = DiGraph(3)
add_edge!(g, 1, 2)  # U β†’ A
add_edge!(g, 1, 3)  # U β†’ Y
add_edge!(g, 2, 3)  # A β†’ Y

# To compute counterfactual Y^{do(A=0)}(u) for unit with observed Y^{do(A=1)}(u) = y_obs:
# 1. We need to infer u from observations
# 2. The Markov boundary of Y tells us what variables are needed

mb_Y = markov_boundary(g, 3)  # Outcome Y
println("Markov boundary of Y: ", mb_Y)  # Set([1, 2]) = {U, A}

# This tells us: to reason about Y counterfactually, we need U and A
# If U is unobserved, counterfactuals are not fully identified

# Check if A β†’ Y is identifiable (necessary for counterfactuals)
adj_set = backdoor_adjustment_set(g, 2, 3)
println("Adjustment set for A β†’ Y: ", adj_set)  # Set([1]) = {U}

# If U is unobserved, we cannot identify the causal effect,
# and therefore cannot compute counterfactuals
Markov boundary of Y: Set([2, 1])
Adjustment set for A β†’ Y: Set([1])

30.5.4 Implementation: Checking Counterfactual Identifiability

We can use graph structure to check whether counterfactuals are identifiable:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using CausalDynamics Graphs

# Example: Can we compute counterfactual Y^{do(A=0)}(u) given Y^{do(A=1)}(u) = y_obs?
# Graph: U β†’ A, U β†’ Y, A β†’ Y, L β†’ A, L β†’ Y (L is observed confounder)
# Nodes: 1=U, 2=A, 3=Y, 4=L
g = DiGraph(4)
add_edge!(g, 1, 2)  # U β†’ A
add_edge!(g, 1, 3)  # U β†’ Y
add_edge!(g, 2, 3)  # A β†’ Y
add_edge!(g, 4, 2)  # L β†’ A
add_edge!(g, 4, 3)  # L β†’ Y

# Check if A β†’ Y is identifiable (necessary condition for counterfactuals)
is_identifiable = is_backdoor_adjustable(g, 2, 3)
println("A β†’ Y is identifiable: ", is_identifiable)  # true

adj_set = backdoor_adjustment_set(g, 2, 3)
println("Adjustment set: ", adj_set)  # Set([1, 4]) = {U, L}

# Problem: If U is unobserved, we cannot adjust for it
# This means counterfactuals are not fully identified

# However, if we can infer U from observations (e.g., via state-space inference),
# then counterfactuals become possible

# The Markov boundary tells us what we need to observe:
mb_Y = markov_boundary(g, 3)
println("Variables needed for Y: ", mb_Y)  # Set([1, 2, 4]) = {U, A, L}

# If U is unobserved, we need to infer it from other variables
# This requires additional assumptions about the noise structure
A β†’ Y is identifiable: true
Adjustment set: Set([4, 1])
Variables needed for Y: Set([4, 2, 1])

30.5.5 Implementation: Patient Counterfactual Example

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using DAGMakie CairoMakie CausalDynamics Graphs

# Patient counterfactual graph
# Nodes: 1=U (unobserved confounder), 2=A (treatment), 3=Y (outcome), 4=L (observed confounder)
g = DiGraph(4)
add_edge!(g, 1, 2)  # U β†’ A
add_edge!(g, 1, 3)  # U β†’ Y
add_edge!(g, 2, 3)  # A β†’ Y
add_edge!(g, 4, 2)  # L β†’ A
add_edge!(g, 4, 3)  # L β†’ Y

# To compute Y^{do(A=0)}(u) given Y^{do(A=1)}(u) = y_obs:
# 1. Check if A β†’ Y is identifiable
is_identifiable = is_backdoor_adjustable(g, 2, 3)
println("Treatment effect identifiable: ", is_identifiable)  # true

# 2. Find what variables are needed
adj_set = backdoor_adjustment_set(g, 2, 3)
println("Adjustment set: ", adj_set)  # Set([1, 4]) = {U, L}

# 3. Problem: U is unobserved
# Solution: Infer U from observations using state-space methods
# The graph structure shows that U affects both A and Y,
# so we can potentially infer U from (A, Y, L) if we have:
# - Structural assignments f_A(U, L) and f_Y(A, U, L)
# - Sufficient data to estimate these mechanisms

# 4. Once U is inferred, we can compute counterfactual:
#    - Given: Y^{do(A=1)}(u) = y_obs
#    - Infer: u from y_obs, a_obs=1, l_obs, and known mechanisms
#    - Compute: Y^{do(A=0)}(u) using same u but A=0

# The Markov boundary shows what we need:
mb_Y = markov_boundary(g, 3)
println("Variables needed for Y: ", mb_Y)  # Set([1, 2, 4]) = {U, A, L}

# Visualise graph
let
    # Highlight adjustment set (U and L) in yellow, treatment and outcome in lightblue
    node_colors = [:yellow, :lightblue, :lightblue, :yellow]

    fig, ax, p = dagplot(g;
        figure_size = (600, 400),
        layout_mode = :acyclic,
        node_color = node_colors,
        nlabels = ["U (unobserved)", "A (treatment)", "Y (outcome)", "L (observed)"]
    )
    fig  # Only this gets displayed
end
Treatment effect identifiable: true
Adjustment set: Set([4, 1])
Variables needed for Y: Set([4, 2, 1])

Patient counterfactual: determining what’s needed from graph structure

30.5.6 Implementation: Partial Identification

Graph structure can help identify when counterfactuals are only partially identified:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using CausalDynamics Graphs

# Example: Counterfactual with unobserved confounder
# Graph: U β†’ A, U β†’ Y, A β†’ Y (U unobserved)
# Nodes: 1=U, 2=A, 3=Y
g = DiGraph(3)
add_edge!(g, 1, 2)  # U β†’ A
add_edge!(g, 1, 3)  # U β†’ Y
add_edge!(g, 2, 3)  # A β†’ Y

# Check identifiability
is_identifiable = is_backdoor_adjustable(g, 2, 3)
println("A β†’ Y is identifiable: ", is_identifiable)  # true

adj_set = backdoor_adjustment_set(g, 2, 3)
println("Required adjustment: ", adj_set)  # Set([1]) = {U}

# Problem: U is unobserved, so we cannot adjust for it
# Result: Counterfactuals are not fully identified

# However, we can still reason about bounds:
# - The graph structure shows U is a confounder
# - We can use sensitivity analysis to bound counterfactual values
# - The d-separation structure shows what independence assumptions hold

# Check d-separation: A and Y are not d-separated (confounded)
println("A β«« Y (no adjustment): ", CausalDynamics.d_separated(g, 2, 3, []))  # false

# But if we could condition on U, they would be d-separated
println("A β«« Y | U: ", CausalDynamics.d_separated(g, 2, 3, [1]))  # true

# This structure tells us:
# - Full counterfactuals require U (unobserved)
# - Partial identification is possible via sensitivity analysis
# - Bounds depend on assumptions about U's distribution
A β†’ Y is identifiable: true
Required adjustment: Set([1])
A β«« Y (no adjustment): false
A β«« Y | U: false

30.6 Partial Identification and Bounds

30.6.1 When Full Identification Isn’t Possible

Sometimes counterfactuals are not fully identified from data. In these cases, we can compute boundsβ€”ranges of possible values.

30.6.2 Bounds Methods

  • Manski bounds: Worst-case bounds under minimal assumptions
  • Sensitivity analysis: How do bounds change with assumptions?
  • Instrumental variables: Tighter bounds with additional structure

30.6.3 Interpretation

Bounds tell us what we can know (the range) even when we can’t identify the exact value.

30.7 World Context

This chapter addresses Imagining at the Observable layer: what unit-level alternative outcomes are implied by a model, given the data we observed? Counterfactual reasoning uses data plus structural assumptions to define and compute outcomes under alternative interventions for the same unit.

30.8 Key Takeaways

  1. Counterfactuals are unit-level replays: Compare outcomes for the same unit under alternative interventions
  2. Shared exogenous noise: Maintains unit identity across interventions
  3. Unit-level reasoning: Counterfactuals are for specific units, not populations
  4. Partial identification: Bounds when full identification isn’t possible
  5. Counterfactuals bridge data and model: Use data plus assumptions to reason about alternatives

30.9 Further Reading