24 Observational Methods: Learning from Data

Status: Draft

v0.4

24.1 Learning Objectives

After reading this chapter, you will be able to:

Handle time-varying confounding using G-methods
Apply G-computation and IPTW for causal effect estimation
Understand when observational methods can identify causal effects
Recognise challenges in time-varying settings

24.2 Introduction

This chapter focuses on Seeing in the Observable world—what can we observe/learn from actualised data? These methods use observable data to learn about causal structure and effects, bridging from the Observable world (what we can measure) to the Structural/Dynamical worlds (what mechanisms exist). The key question: What can we learn about inner worlds (Structural/Dynamical) from the outer world (Observable)?

24.3 Time-Varying Confounding

24.3.1 Definition

Time-varying confounding occurs when (Robins 1986; Robins et al. 2000):

Past confounders \(L_t\) affect both treatment \(A_t\) and outcome \(Y_t\)
Past treatment \(A_{t-1}\) affects confounders \(L_t\)
Confounders \(L_t\) affect future treatment \(A_{t+1}\)

Result: Standard methods (e.g., regression adjustment) fail because confounding evolves over time.

24.3.2 Example: Treatment and Disease Severity

Treatment \(A_t\): Drug at time \(t\)
Confounder \(L_t\): Disease severity at time \(t\)
Outcome \(Y_t\): Recovery at time \(t\)

Problem: - \(L_t\) affects \(A_t\) (severity influences treatment choice) - \(L_t\) affects \(Y_t\) (severity influences recovery) - \(A_{t-1}\) affects \(L_t\) (treatment affects severity)

This creates a feedback loop: treatment affects confounder, which affects future treatment.

24.3.3 Framing in CDMs

A CDM naturally handles time-varying confounding:

\[ \begin{aligned} L_t &\coloneqq f_L(L_{t-1}, A_{t-1}, Y_{t-1}, U^L_t) \quad \text{(confounder dynamics)} \\ A_t &\coloneqq \pi(H_t, U^A_t) \quad \text{(treatment assignment)} \\ Y_t &\coloneqq f_Y(L_t, A_t, Y_{t-1}, U^Y_t) \quad \text{(outcome)} \end{aligned} \]

The CDM structure makes the confounding explicit and allows us to adjust for it properly.

24.4 G-Methods: Handling Time-Varying Confounding

G-methods are a class of methods designed for time-varying treatments and confounders (Robins 1986; Robins et al. 2000; Hernán and Robins 2020). The “G” stands for “generalised” or “g-computation.”

24.4.1 G-Computation (G-Formula)

Idea: Simulate what would happen under a treatment strategy by integrating over confounder distributions.

Formula: \[ \mathbb{E}[Y^{do(\bar{a}_{1:T})}] = \int \mathbb{E}[Y \mid \bar{a}_{1:T}, \bar{l}_{1:T}] \prod_{t=1}^{T} P(l_t \mid \bar{a}_{t-1}, \bar{l}_{t-1}) \, d\bar{l}_{1:T} \]

Steps: 1. Fit models for confounder evolution: \(P(L_t \mid \bar{A}_{t-1}, \bar{L}_{t-1})\) 2. Fit outcome model: \(\mathbb{E}[Y \mid \bar{A}_{1:T}, \bar{L}_{1:T}]\) 3. Simulate confounder trajectories under treatment strategy 4. Compute expected outcomes

Mathematical form: The g-formula expresses the interventional distribution as:

\[ P^{do(\bar{A}_t = \bar{a}_t)}(Y_t) = \int \prod_{s=1}^t P(Y_s \mid \bar{A}_s = \bar{a}_s, \bar{L}_s = \bar{l}_s) \prod_{s=1}^t P(L_s \mid \bar{A}_{s-1} = \bar{a}_{s-1}, \bar{L}_{s-1} = \bar{l}_{s-1}) \, d\bar{l}_t \]

Advantages: - Handles time-varying confounding naturally - Can estimate effects of complex treatment strategies - Provides full distribution, not just means

Limitations: - Requires correct specification of outcome and confounder models - Sensitive to model misspecification - Computationally intensive for long time horizons

24.4.2 Implementation: G-Computation

Here’s an example of implementing G-computation for time-varying treatments:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using Random Distributions CairoMakie

Random.seed!(42)

# Example: Time-varying treatment and confounder
# L_t: Disease severity (confounder)
# A_t: Treatment (depends on L_t)
# Y_t: Outcome (depends on L_t and A_t)

# Step 1: Fit models from observed data (simulated here for demonstration)
# In practice, these would be fitted from real data

# Confounder model: P(L_t | A_{t-1}, L_{t-1})
function fit_confounder_model(data)
    """
    Fit model for confounder evolution.
    
    NOTE: This is a simplified example. In practice, use flexible models
    (GLM with splines, random forests, Super Learner) to avoid misspecification bias.
    """
    # Simplified: L_t = α + β*A_{t-1} + γ*L_{t-1} + noise
    # In production, use flexible models (GLM with splines, machine learning)
    return (α=0.1, β=-0.2, γ=0.8, σ=0.1)
end

# Outcome model: E[Y_t | A_t, L_t, Y_{t-1}]
function fit_outcome_model(data)
    """
    Fit model for outcome expectation.
    
    NOTE: This is a simplified example. In practice, use flexible models
    to avoid misspecification bias in G-computation.
    """
    # Simplified: Y_t = α + β*A_t + γ*L_t + δ*Y_{t-1} + noise
    return (α=0.0, β=0.5, γ=-0.3, δ=0.7, σ=0.1)
end

# Simulate observed data (for demonstration)
n_subjects = 100
T = 5
data = []

for i in 1:n_subjects
    L = [rand(Normal(0.5, 0.2))]  # Initial severity
    A = Int[]
    Y = [rand(Normal(0.0, 0.1))]  # Initial outcome
    
    for t in 2:T
        # Treatment assignment (depends on L_{t-1})
        p_treat = 1 / (1 + exp(-2*(L[end] - 0.5)))  # Higher severity → more likely to treat
        push!(A, rand(Bernoulli(p_treat)))
        
        # Confounder evolution (depends on A_{t-1} and L_{t-1})
        L_new = 0.1 - 0.2*A[end] + 0.8*L[end] + rand(Normal(0, 0.1))
        push!(L, max(0, min(1, L_new)))  # Bound to [0,1]
        
        # Outcome (depends on A_t and L_t)
        Y_new = 0.5*A[end] - 0.3*L[end] + 0.7*Y[end] + rand(Normal(0, 0.1))
        push!(Y, Y_new)
    end
    
    push!(data, (L=L, A=A, Y=Y))
end

# Fit models
confounder_model = fit_confounder_model(data)
outcome_model = fit_outcome_model(data)

# Step 2: G-computation - simulate under treatment strategy
# Strategy: Always treat (A_t = 1 for all t)
function g_compute(strategy, confounder_model, outcome_model, n_sim::Int=1000)
    """
    Perform G-computation: simulate outcomes under a treatment strategy.
    
    Args:
        strategy: Function (t, L, Y) -> A_t (treatment at time t)
        confounder_model: Fitted model for confounder evolution
        outcome_model: Fitted model for outcome expectation
        n_sim: Number of Monte Carlo simulations
    
    Returns:
        Vector of final outcomes under the strategy
    """
    # Pre-allocate for performance
    outcomes = Vector{Float64}(undef, n_sim)
    
    for sim in 1:n_sim
        L = [rand(Normal(0.5, 0.2))]  # Initial severity
        Y = [rand(Normal(0.0, 0.1))]  # Initial outcome
        
        for t in 2:T
            # Treatment according to strategy
            # Strategy function takes (t, L, Y) where L and Y are current state vectors
            A_t = strategy(t, L, Y)
            
            # Simulate confounder under strategy
            L_new = confounder_model.α + confounder_model.β*A_t + 
                    confounder_model.γ*L[end] + rand(Normal(0, confounder_model.σ))
            push!(L, max(0, min(1, L_new)))
            
            # Simulate outcome under strategy
            Y_new = outcome_model.α + outcome_model.β*A_t + 
                   outcome_model.γ*L[end] + outcome_model.δ*Y[end] + 
                   rand(Normal(0, outcome_model.σ))
            push!(Y, Y_new)
        end
        
        outcomes[sim] = Y[end]
    end
    
    return outcomes
end

# Strategy 1: Always treat
strategy_always_treat = (t, L, Y) -> 1.0

# Strategy 2: Never treat
strategy_never_treat = (t, L, Y) -> 0.0

# Compute expected outcomes
outcomes_treat = g_compute(strategy_always_treat, confounder_model, outcome_model)
outcomes_no_treat = g_compute(strategy_never_treat, confounder_model, outcome_model)

# Visualise
let
fig = Figure(size = (800, 400))
ax = Axis(fig[1, 1], title = "G-Computation: Outcome Distributions Under Strategies", 
          xlabel = "Final Outcome", ylabel = "Density")

hist!(ax, outcomes_treat, bins=30, normalization=:pdf, 
      label="Always treat", color=(:blue, 0.5))
hist!(ax, outcomes_no_treat, bins=30, normalization=:pdf, 
      label="Never treat", color=(:red, 0.5))
axislegend(ax)

    fig  # Only this gets displayed
end

println("G-computation results:")
println("  E[Y | always treat] = ", round(mean(outcomes_treat), digits=3))
println("  E[Y | never treat] = ", round(mean(outcomes_no_treat), digits=3))
println("  Treatment effect = ", round(mean(outcomes_treat) - mean(outcomes_no_treat), digits=3))

G-computation results:
  E[Y | always treat] = 1.175
  E[Y | never treat] = -0.377
  Treatment effect = 1.552

24.4.3 Inverse Probability of Treatment Weighting (IPTW)

Idea: Weight observations by inverse probability of receiving their observed treatment, creating a pseudo-population where treatment is independent of confounders.

Weights: \[ w_i = \prod_{t=1}^{T} \frac{1}{P(A_t = a_{it} \mid \bar{A}_{t-1}, \bar{L}_{t})} \]

Use: Weighted regression or weighted means to estimate causal effects.

Stabilised weights (reduce variance): \[ w_i^{\text{stab}} = \prod_{t=1}^{T} \frac{P(A_t = a_{it} \mid \bar{A}_{t-1})}{P(A_t = a_{it} \mid \bar{A}_{t-1}, \bar{L}_{t})} \]

Interpretation: Subjects with treatment histories that are unlikely given their confounder history receive higher weights, creating a weighted population where treatment is independent of confounders.

Steps: 1. Fit treatment models: Estimate \(P(A_s \mid \bar{A}_{s-1}, \bar{L}_s)\) for each time \(s\) 2. Compute weights: Calculate IPTW weights \(w_t\) for each subject 3. Stabilize weights (optional): Use stabilized weights to reduce variance 4. Fit outcome model: Fit a model for \(E[Y_t \mid \bar{A}_t]\) on weighted data

Advantages: - Intuitive: creates a pseudo-randomized population - Can use standard regression on weighted data - Handles time-varying confounding

Limitations: - Requires correct specification of treatment model - Extreme weights can create instability - Sensitive to positivity violations

24.4.4 Implementation: IPTW

Here’s an example of implementing IPTW for time-varying treatments:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using Random Distributions GLM DataFrames CairoMakie

Random.seed!(42)

# Simulate data with time-varying confounding
n_subjects = 200
T = 5
data = []

for i in 1:n_subjects
    L = [rand(Normal(0.5, 0.2))]  # Initial severity
    A = Int[]
    Y = [rand(Normal(0.0, 0.1))]  # Initial outcome
    
    for t in 2:T
        # Treatment assignment (depends on L_{t-1} - confounding!)
        p_treat = 1 / (1 + exp(-2*(L[end] - 0.5)))
        push!(A, rand(Bernoulli(p_treat)))
        
        # Confounder evolution
        L_new = 0.1 - 0.2*A[end] + 0.8*L[end] + rand(Normal(0, 0.1))
        push!(L, max(0, min(1, L_new)))
        
        # Outcome
        Y_new = 0.5*A[end] - 0.3*L[end] + 0.7*Y[end] + rand(Normal(0, 0.1))
        push!(Y, Y_new)
    end
    
    push!(data, (L=L, A=A, Y=Y))
end

# Step 1: Fit treatment models P(A_t | A_{t-1}, L_t)
# For each time point, fit logistic regression
treatment_models = []

for t in 2:T
    # Prepare data for time t
    A_t = [d.A[t-1] for d in data]  # A_t
    A_prev = [t > 2 ? d.A[t-2] : 0 for d in data]  # A_{t-1}
    L_t = [d.L[t-1] for d in data]  # L_t
    
    df = DataFrame(A_t = A_t, A_prev = A_prev, L_t = L_t)
    
    # Fit logistic regression
    # NOTE: In production, use flexible models (splines, machine learning) for robustness
    # P(A_t = 1 | A_{t-1}, L_t) = logistic(α + β*A_{t-1} + γ*L_t)
    model = glm(@formula(A_t ~ A_prev + L_t), df, Binomial(), LogitLink())
    push!(treatment_models, model)
end

# Step 2: Compute IPTW weights
# w_i = ∏_t 1 / P(A_t = a_{it} | A_{t-1}, L_t)
weights = Float64[]

for d in data
    w = 1.0
    for t in 2:T
        A_t = d.A[t-1]
        A_prev = t > 2 ? d.A[t-2] : 0
        L_t = d.L[t-1]
        
        # Predict P(A_t = a_{it} | A_{t-1}, L_t)
        model = treatment_models[t-1]
        df_t = DataFrame(A_prev = [A_prev], L_t = [L_t])
        p_pred = predict(model, df_t)[1]
        
        # Weight = 1 / probability of observed treatment
        # Protect against division by zero
        p_observed = A_t == 1 ? max(p_pred, 1e-10) : max(1 - p_pred, 1e-10)
        w *= 1.0 / p_observed
    end
    push!(weights, w)
end

# Step 3: Stabilise weights (optional but recommended)
# w_stab = ∏_t P(A_t | A_{t-1}) / P(A_t | A_{t-1}, L_t)
# Numerator: marginal treatment probability (no confounders)
stabilised_weights = Float64[]

for d in data
    w_stab = 1.0
    for t in 2:T
        A_t = d.A[t-1]
        A_prev = t > 2 ? d.A[t-2] : 0
        L_t = d.L[t-1]
        
        # Marginal model: P(A_t | A_{t-1}) only
        # Fit on all data at this time point (not just this subject)
        A_t_all = [d.A[t-1] for d in data]
        A_prev_all = [t > 2 ? d.A[t-2] : 0 for d in data]
        df_marginal_all = DataFrame(A_t = A_t_all, A_prev = A_prev_all)
        model_marginal = glm(@formula(A_t ~ A_prev), df_marginal_all, Binomial(), LogitLink())
        p_marginal = predict(model_marginal, DataFrame(A_prev = [A_prev]))[1]
        
        # Conditional model: P(A_t | A_{t-1}, L_t)
        model_cond = treatment_models[t-1]
        df_cond = DataFrame(A_prev = [A_prev], L_t = [L_t])
        p_cond = predict(model_cond, df_cond)[1]
        
        # Stabilised weight
        # Protect against division by zero
        p_observed = A_t == 1 ? max(p_cond, 1e-10) : max(1 - p_cond, 1e-10)
        p_marginal_observed = A_t == 1 ? max(p_marginal, 1e-10) : max(1 - p_marginal, 1e-10)
        w_stab *= p_marginal_observed / p_observed
    end
    push!(stabilised_weights, w_stab)
end

# Step 4: Estimate treatment effect using weighted outcome model
# Compare outcomes under always treat vs never treat
final_outcomes = [d.Y[end] for d in data]
always_treat = [all(a .== 1) for a in [d.A for d in data]]

# Weighted mean difference
sum_w_treat = sum(stabilised_weights .* always_treat)
sum_w_no_treat = sum(stabilised_weights .* (.!always_treat))
mean_treat = sum_w_treat > 0 ? sum(stabilised_weights .* final_outcomes .* always_treat) / sum_w_treat : 0.0
mean_no_treat = sum_w_no_treat > 0 ? sum(stabilised_weights .* final_outcomes .* (.!always_treat)) / sum_w_no_treat : 0.0

# Visualise weights
let
fig = Figure(size = (1000, 400))
ax1 = Axis(fig[1, 1], title = "IPTW Weights Distribution", xlabel = "Weight", ylabel = "Density")
ax2 = Axis(fig[1, 2], title = "Stabilised Weights Distribution", xlabel = "Weight", ylabel = "Density")

hist!(ax1, weights, bins=50, normalization=:pdf, color=(:blue, 0.5))
hist!(ax2, stabilised_weights, bins=50, normalization=:pdf, color=(:green, 0.5))

    fig  # Only this gets displayed
end

println("IPTW results:")
println("  Mean unstabilised weight: ", round(mean(weights), digits=2))
println("  Mean stabilised weight: ", round(mean(stabilised_weights), digits=2))
println("  Max unstabilised weight: ", round(maximum(weights), digits=2))
println("  Max stabilised weight: ", round(maximum(stabilised_weights), digits=2))
println("\nTreatment effect (weighted):")
println("  E[Y | always treat] = ", round(mean_treat, digits=3))
println("  E[Y | never treat] = ", round(mean_no_treat, digits=3))
println("  Effect = ", round(mean_treat - mean_no_treat, digits=3))

IPTW results:
  Mean unstabilised weight: 16.71
  Mean stabilised weight: 1.03
  Max unstabilised weight: 147.51
  Max stabilised weight: 9.19

Treatment effect (weighted):
  E[Y | always treat] = 1.166
  E[Y | never treat] = 0.308
  Effect = 0.859

24.4.5 Doubly Robust Methods

Idea: Combine outcome model and treatment model. Consistent if either model is correct.

Examples: - Augmented IPTW (AIPTW): IPTW + outcome model correction - TMLE: Targeted maximum likelihood estimation (see TMLE and Doubly Robust Estimation)

24.5 Challenges in Time-Varying Settings

24.5.1 Positivity

Positivity (also called “overlap” or “common support”) requires that all treatment histories have positive probability given confounder history:

\[ P(A_t = a_t \mid \bar{A}_{t-1} = \bar{a}_{t-1}, \bar{L}_t = \bar{l}_t) > 0 \]

for all possible treatment values \(a_t\) and all confounder histories \(\bar{l}_t\) that occur in the data.

Positivity violation: Some treatment histories have zero (or near-zero) probability given confounder history.

Problem: - Cannot estimate effects for impossible treatment histories - IPTW weights become infinite (or very large) - G-computation cannot simulate under impossible interventions - Estimates become unstable or undefined

Solutions: 1. Restrict to feasible policies: Only estimate effects for treatment strategies that are possible given the data 2. Truncate weights: Cap IPTW weights at a maximum value (e.g., 10 or 20) to reduce instability 3. Use bounds: Compute bounds on treatment effects when positivity is violated 4. Sensitivity analysis: Assess how results change with different positivity assumptions

24.5.2 Model Misspecification

Problem: Models for confounders, treatment, or outcome may be incorrectly specified, leading to biased estimates.

G-computation: Sensitive to misspecification of outcome and confounder models. If these models are wrong, estimates will be biased.

MSMs (IPTW): Sensitive to misspecification of treatment model. If the treatment model is wrong, weights will be incorrect and estimates biased.

Solutions: 1. Use flexible models: Machine learning methods (random forests, neural networks, Super Learner (Laan et al. 2007)) can reduce misspecification 2. Model criticism: Check model fit using calibration, residual analysis, posterior predictive checks (see Model Validation with Observable Data) 3. Sensitivity analysis: Assess how results change with different model specifications 4. Robust methods: TMLE provides some protection through double robustness (see TMLE and Doubly Robust Estimation)

24.5.3 Unmeasured Confounding

Problem: Unobserved confounders bias estimates even when all measured confounders are adjusted for.

Time-varying setting: Unmeasured time-varying confounders are particularly problematic because they can create complex dependencies that are difficult to detect.

Solutions: 1. Sensitivity analysis: Quantify how robust results are to unmeasured confounding (see Identification: When Can We Learn from Data? for E-values and sensitivity parameters) 2. Bounds: Compute worst-case bounds on treatment effects under assumptions about unmeasured confounders 3. Instrumental variables: Use instruments that are independent of unmeasured confounders (when available) 4. Negative controls: Use known null effects to detect unmeasured confounding 5. Design improvements: Collect data on suspected unmeasured confounders in future studies

24.6 World Context

This chapter addresses Seeing in the Observable world—what can we observe/learn from actualised data? G-methods (G-computation, IPTW) use observable data to estimate causal effects, handling time-varying confounding. These methods bridge the Observable world (what we observe) with the Structural world (what would happen under interventions).

24.7 Inverse Problems in Causal Dynamics

Inverse problems involve learning model parameters from observational data (Rackauckas 2026). This is central to causal inference: we observe outcomes and want to learn the mechanisms (parameters) that generated them.

24.7.1 Connection to Identification

Parameter identifiability is a prerequisite for causal identification (see Identification: When Can We Learn from Data?). If parameters are not identifiable, we cannot learn causal mechanisms from data.

Key Questions: 1. Identifiability: Can parameters be uniquely determined from data? 2. Estimation: How do we estimate parameters from finite data? 3. Uncertainty: How uncertain are our parameter estimates?

24.7.2 Regularisation for Ill-Posed Problems

When inverse problems are ill-posed (multiple solutions or unstable), regularisation helps:

L2 regularisation: Penalise large parameter values
L1 regularisation: Encourage sparsity (few non-zero parameters)
Prior knowledge: Incorporate domain knowledge as priors

Example: Regularised parameter estimation

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using DifferentialEquations Optim Random LinearAlgebra

# SIR model
function sir!(du, u, p, t)
    S, I, R = u
    β, γ = p
    du[1] = -β * S * I
    du[2] = β * S * I - γ * I
    du[3] = γ * I
end

# Generate synthetic data
Random.seed!(42)
u0 = [0.99, 0.01, 0.0]
p_true = (0.3, 0.1)
tspan = (0.0, 50.0)
t_obs = 0.0:1.0:50.0

prob_true = ODEProblem(sir!, u0, tspan, p_true)
sol_true = solve(prob_true, Tsit5(), saveat = t_obs)
σ_obs = 0.01
observed_data = [max.(0.0, min.(1.0, u .+ σ_obs .* randn(3))) for u in sol_true.u]

# Unregularised cost function (data fit)
function cost(p)
    prob = ODEProblem(sir!, u0, tspan, p)
    sol = solve(prob, Tsit5(), saveat = t_obs)
    if sol.retcode != :Success
        return Inf
    end
    return sum([sum((sol.u[i] .- observed_data[i]).^2) for i in 1:length(sol.u)])
end

# Regularised cost function
function cost_regularised(p, λ)
    # Data fit term
    data_fit = cost(p)
    
    # Regularisation term (L2)
    reg = λ * sum(p.^2)
    
    return data_fit + reg
end

# Initial guess
p0 = [0.2, 0.15]  # β, γ

# Optimise without regularisation
result_unreg = optimize(cost, p0, LBFGS())
p_unreg = result_unreg.minimizer

# Optimise with regularisation (λ = 0.1)
result_reg = optimize(p -> cost_regularised(p, 0.1), p0, LBFGS())
p_reg = result_reg.minimizer

println("Regularised parameter estimation:")
println("  True parameters: β = $(p_true[1]), γ = $(p_true[2])")
println("  Unregularised: β = $(round(p_unreg[1], digits=3)), γ = $(round(p_unreg[2], digits=3))")
println("  Regularised (λ=0.1): β = $(round(p_reg[1], digits=3)), γ = $(round(p_reg[2], digits=3))")
println("  Regularisation reduces parameter magnitude: |p_reg| = $(round(norm(p_reg), digits=3)) vs |p_unreg| = $(round(norm(p_unreg), digits=3))")

Regularised parameter estimation:
  True parameters: β = 0.3, γ = 0.1
  Unregularised: β = 0.2, γ = 0.15
  Regularised (λ=0.1): β = 0.2, γ = 0.15
  Regularisation reduces parameter magnitude: |p_reg| = 0.25 vs |p_unreg| = 0.25

24.7.3 Multiple Shooting Methods

For long time horizons, multiple shooting breaks the problem into segments:

Divide time span into segments: \([t_0, t_1], [t_1, t_2], \ldots, [t_{n-1}, t_n]\)
Solve each segment independently
Add continuity constraints: \(u_i(t_i) = u_{i+1}(t_i)\)

Advantages: - More robust than single shooting - Each segment can be solved in parallel - Better for long time horizons

Implementation:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using DifferentialEquations Optim Random CairoMakie

# SIR model
function sir!(du, u, p, t)
    S, I, R = u
    β, γ = p
    du[1] = -β * S * I
    du[2] = β * S * I - γ * I
    du[3] = γ * I
end

# Generate synthetic data
Random.seed!(42)
u0_true = [0.99, 0.01, 0.0]
p_true = (0.3, 0.1)
tspan = (0.0, 100.0)  # Long time horizon
t_obs = 0.0:2.0:100.0

prob_true = ODEProblem(sir!, u0_true, tspan, p_true)
sol_true = solve(prob_true, Tsit5(), saveat = t_obs)
σ_obs = 0.01
observed_data = [max.(0.0, min.(1.0, u .+ σ_obs .* randn(3))) for u in sol_true.u]

# Multiple shooting: divide into 4 segments
n_segments = 4
t_segments = [0.0, 25.0, 50.0, 75.0, 100.0]
segment_states = [Vector{Float64}(undef, 3) for _ in 1:(n_segments-1)]  # States at segment boundaries

# Cost function with continuity constraints
function cost_multiple_shooting(p_and_states)
    p = (p_and_states[1], p_and_states[2])
    total_cost = 0.0
    
    # Solve each segment
    for i in 1:n_segments
        t_start = t_segments[i]
        t_end = t_segments[i+1]
        
        # Initial condition for this segment
        if i == 1
            u0_seg = u0_true
        else
            u0_seg = [p_and_states[2 + 3*(i-2) + j] for j in 1:3]
        end
        
        prob_seg = ODEProblem(sir!, u0_seg, (t_start, t_end), p)
        sol_seg = solve(prob_seg, Tsit5(), saveat = t_obs[t_obs .>= t_start .&& t_obs .<= t_end])
        
        if sol_seg.retcode != :Success
            return Inf
        end
        
        # Data fit for this segment
        for (j, t) in enumerate(t_obs)
            if t_start <= t <= t_end
                idx = findfirst(==(t), sol_seg.t)
                if idx !== nothing && j <= length(observed_data)
                    total_cost += sum((sol_seg.u[idx] .- observed_data[j]).^2)
                end
            end
        end
        
        # Continuity constraint (except for last segment)
        if i < n_segments
            final_state = sol_seg.u[end]
            next_initial = [p_and_states[2 + 3*(i-1) + j] for j in 1:3]
            total_cost += 100.0 * sum((final_state .- next_initial).^2)  # Penalty for discontinuity
        end
    end
    
    return total_cost
end

# Initial guess: parameters + segment boundary states
p0 = [0.2, 0.15]  # β, γ
states0 = [u0_true for _ in 1:(n_segments-1)]
x0 = vcat(p0, vcat(states0...))

# Optimise
result = optimize(cost_multiple_shooting, x0, LBFGS())

# Extract estimated parameters
p_estimated = (result.minimizer[1], result.minimizer[2])

println("Multiple shooting results:")
println("  True parameters: β = $(p_true[1]), γ = $(p_true[2])")
println("  Estimated: β = $(round(p_estimated[1], digits=3)), γ = $(round(p_estimated[2], digits=3))")

Multiple shooting results:
  True parameters: β = 0.3, γ = 0.1
  Estimated: β = 0.2, γ = 0.15

When to Use Multiple Shooting: - Long time horizons where single shooting fails - Poor initial parameter guesses - Need for parallel computation

24.7.4 Collocation Methods

Collocation methods fit splines to data and match derivatives, avoiding ODE solving:

Fit spline \(\tilde{u}(t)\) to observations
Estimate derivatives \(\tilde{u}'(t)\) from spline
Match ODE: \(\tilde{u}'(t) \approx f(\tilde{u}(t), p, t)\)

Advantages: - Very fast (no ODE solving) - Good for dense data - Useful for initial parameter estimates

Limitations: - Less accurate for sparse data - Doesn’t account for error accumulation - Usually used as first stage before shooting method

When to Use: - Dense observational data - Quick initial parameter estimates - Two-stage method: collocation → shooting

24.8 Key Takeaways

Time-varying confounding requires special methods (G-methods)
G-computation simulates outcomes under treatment strategies
IPTW creates pseudo-populations where treatment is independent of confounders
Challenges (positivity, misspecification, unmeasured confounding) require careful handling
G-methods bridge Observable (data) and Structural (interventions)

24.9 Further Reading

Robins (1986): “A new approach to causal inference in mortality studies”
Robins et al. (2000): “Marginal structural models”
Hernán and Robins (2020): Causal Inference — Comprehensive treatment
Rackauckas and Nie (2017): Parameter estimation and inverse problems in differential equations
Rackauckas (2026): Parallel Computing and Scientific Machine Learning — comprehensive treatment of inverse problems, multiple shooting, and collocation methods
TMLE and Doubly Robust Estimation: Robust estimation methods
Model Validation with Observable Data: Validating inferences
Parameter Estimation via Shooting Method: Shooting method for parameter estimation