11  Transportability: Generalising Structural Claims

Status: Draft

v0.4

11.1 Learning Objectives

After reading this chapter, you will be able to:

  • Treat generalisation as a causal problem
  • Identify what changes across domains and what remains invariant
  • Encode domain shift in graphs and model structure
  • Defend cross-cohort claims and handle “dataset shift” scientifically

11.2 Introduction

Scientific claims often need to generalise across domains: different cohorts, sites, time periods, or experimental protocols (Bareinboim and Pearl 2013; Pearl and Bareinboim 2014). This chapter treats generalisation as a causal problem: what prehensive relations (edges) are invariant across domains, and which edges change? From an edge-first perspective, transportability asks: which edge structures and edge mechanisms are invariant across domains?

11.3 The Transportability Problem

Question: Can we transport a causal claim from domain \(\mathcal{D}_1\) to domain \(\mathcal{D}_2\)? (Bareinboim and Pearl 2013; Pearl and Bareinboim 2014)

Example domains: - Age: Different age groups - Site: Different locations/institutions - Year: Different time periods - Protocol: Different experimental conditions

11.4 What Changes vs What Stays the Same

11.4.1 Invariant Mechanisms

Some mechanisms may be invariant across domains: - Physical laws - Biological processes (in some cases) - Structural relationships (some prehensive relations)

11.4.2 Domain-Specific Mechanisms

Other mechanisms may vary across domains: - Treatment assignment policies - Measurement protocols - Population characteristics - Environmental conditions

11.5 Encoding Domain Shift in Graphs

Domain shift can be encoded using context variables \(\mathbf{C}\): - \(\mathbf{C}\) represents domain characteristics (age, site, year, protocol) - Mechanisms may depend on \(\mathbf{C}\): \(f_i(\text{Pa}(X_i), \mathbf{C}, U_i)\) - Some mechanisms are invariant: \(f_i(\text{Pa}(X_i), U_i)\) (no \(\mathbf{C}\) dependence)

11.5.1 Implementation: Encoding Domain Shift

We can encode domain shift by adding context variables to the graph:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using DAGMakie CairoMakie Graphs

# Example: Treatment effect that may vary by age (context C)
# Original graph: A → Y
# With domain shift: C → A, C → Y (treatment assignment and effect depend on age)

# Graph for domain 1 (young cohort)
g_domain1 = SimpleDiGraph(3)
add_edge!(g_domain1, 3, 1)  # C → A (age affects treatment assignment)
add_edge!(g_domain1, 1, 2)  # A → Y (treatment affects outcome)
add_edge!(g_domain1, 3, 2)  # C → Y (age affects outcome directly)

# Visualise
let
    fig, ax, p = dagplot(g_domain1;
        figure_size = (800, 400),
        layout_mode = :acyclic,
        nlabels = ["A", "Y", "C"]
    )
    fig  # Only this gets displayed
end

println("Domain shift encoding:")
println("  C (context/age) affects: A (treatment assignment) and Y (outcome)")
println("  Mechanism: Y := f(A, C, U)")
println("  If C dependence exists, transportability requires adjustment for C")
Domain shift encoding:
  C (context/age) affects: A (treatment assignment) and Y (outcome)
  Mechanism: Y := f(A, C, U)
  If C dependence exists, transportability requires adjustment for C

11.6 Transportability Criteria

Given a causal graph with domain variables, we can determine transportability: - Selection diagrams: Extend causal graphs to include selection variables - Transportability theorems: Conditions under which transport is possible - Sensitivity: How results change with domain assumptions

11.7 Practical Guidance

To defend cross-domain claims:

  1. Explicit domain variables: Include \(\mathbf{C}\) in your model
  2. Test invariance: Check if mechanisms are constant across domains
  3. Sensitivity analysis: How do results change with domain assumptions?
  4. External validation: Test predictions in new domains

11.8 Worked Example: Cross-Cohort Generalisation

11.9 Worked Example: Age-Dependent Mechanisms

Consider a treatment effect that may vary by age:

  • Domain 1: Young cohort (\(C = \text{young}\))
  • Domain 2: Old cohort (\(C = \text{old}\))

If the treatment mechanism depends on age: \[ Y \coloneqq f(A, C, U) \]

Then transportability requires:

  • Either: Mechanisms are invariant (no \(C\) dependence)
  • Or: We can adjust for \(C\) and transport the adjusted effect

Simply assuming invariance without testing is dangerous.

11.9.1 Implementation: Testing Invariance Across Domains

We can test whether mechanisms are invariant across domains:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using Random Distributions GLM DataFrames CairoMakie

Random.seed!(42)

# Simulate data from two domains
# Domain 1: Young cohort (C = 0)
# Domain 2: Old cohort (C = 1)

n_per_domain = 200

# Domain 1: Young cohort
A_d1 = rand(Bernoulli(0.3), n_per_domain)  # Less likely to treat
Y_d1 = 0.5 .* A_d1 .+ rand(Normal(0, 0.2), n_per_domain)  # Treatment effect = 0.5

# Domain 2: Old cohort
A_d2 = rand(Bernoulli(0.7), n_per_domain)  # More likely to treat
# Treatment effect may differ: Y = 0.3*A + ... (smaller effect)
Y_d2 = 0.3 .* A_d2 .+ rand(Normal(0, 0.2), n_per_domain)

# Combine data
df = DataFrame(
    C = [zeros(n_per_domain); ones(n_per_domain)],  # Domain indicator
    A = [A_d1; A_d2],
    Y = [Y_d1; Y_d2]
)

# Test invariance: Does treatment effect depend on C?
# Model: Y = α + β*A + γ*C + δ*A*C
model_interaction = lm(@formula(Y ~ A + C + A*C), df)
coef_interaction = coef(model_interaction)

# If interaction term (A*C) is significant, mechanisms are NOT invariant
# Coefficient order: intercept, A, C, A*C
interaction_effect = length(coef_interaction) >= 4 ? coef_interaction[4] : 0.0
println("Testing mechanism invariance:")
println("  Treatment effect in domain 1 (C=0): ", round(coef_interaction[2], digits=3))
println("  Treatment effect in domain 2 (C=1): ", round(coef_interaction[2] + interaction_effect, digits=3))
println("  Interaction term (A*C): ", round(interaction_effect, digits=3))

if abs(interaction_effect) > 0.1
    println("  ⚠️  Mechanisms are NOT invariant (interaction significant)")
    println("  → Transportability requires adjustment for C")
else
    println("  ✓ Mechanisms appear invariant (no significant interaction)")
    println("  → Transportability may be possible without adjustment")
end
Testing mechanism invariance:
  Treatment effect in domain 1 (C=0): 0.497
  Treatment effect in domain 2 (C=1): 0.26
  Interaction term (A*C): -0.237
  ⚠️  Mechanisms are NOT invariant (interaction significant)
  → Transportability requires adjustment for C

11.10 Dataset Shift as Causal Problem

“Dataset shift” is often a causal problem:

  • Covariate shift: \(P(X)\) changes, but \(P(Y \mid X)\) invariant
  • Label shift: \(P(Y)\) changes, but \(P(X \mid Y)\) invariant
  • Mechanism shift: Structural assignments change

Causal framing clarifies what can and cannot be transported.

11.11 World Context

This chapter addresses Imagining in the Structural world—what alternative structural possibilities exist across domains? Transportability is a centrifugal bridge concept (Structural → Observable): it applies structural principles (invariant mechanisms, graph structure) to determine whether causal claims can be generalised across observable domains. It bridges perfect structural forms (what mechanisms are invariant) with observable reality (what can be learned and transported across different observable contexts).

11.12 Key Takeaways

  1. Generalisation is a causal problem: what mechanisms are invariant?
  2. Domain variables \(\mathbf{C}\) encode what changes across domains
  3. Transportability criteria provide systematic methods for cross-domain claims
  4. External validation and sensitivity analysis are essential
  5. Transportability bridges Structural (invariant mechanisms) and Observable (domain-specific contexts)

11.13 Further Reading

  • Bareinboim and Pearl (2013): “A general algorithm for deciding transportability”
  • Pearl and Bareinboim (2014): “External validity”
  • Dahabreh et al. (2019): “Extending inferences from a randomized trial”
  • From Structure to Time: FEP and Attractors: Transition to Dynamical world