25 TMLE and Doubly Robust Estimation

Status: Draft

v0.4

25.1 Learning Objectives

After reading this chapter, you will be able to:

Understand TMLE as a semiparametric efficient, double robust estimator
Apply the TMLE algorithm step-by-step
Recognise when TMLE is preferred over other methods
Use TMLE for robust causal effect estimation

25.2 Introduction

Targeted Maximum Likelihood Estimation (TMLE) is a semiparametric efficient, double robust estimator (Laan and Rubin 2006; Laan and Rose 2011; Schuler and Rose 2017). This chapter focuses on TMLE as a method for Seeing in the Observable world—learning from actualised data with robustness to model misspecification.

25.3 What Is TMLE?

TMLE is a semiparametric efficient, double robust estimator that:

Targets the causal parameter of interest
Uses machine learning for flexible models
Provides valid inference (confidence intervals, hypothesis tests)
Handles complex data (high-dimensional confounders, missing data)

25.3.1 Advantages

Double robust: Consistent if either outcome or treatment model is correct
Semiparametric efficient: Optimal variance among regular asymptotically linear estimators
Machine learning compatible: Can use Super Learner, neural networks, etc.
Valid inference: Confidence intervals and hypothesis tests

25.4 TMLE Algorithm

Steps:

Fit initial outcome model: Estimate \(Q_0(A, L) = E[Y \mid A, L]\) using flexible methods (e.g., Super Learner (Laan et al. 2007))
Fit treatment model: Estimate \(g_0(A \mid L) = P(A \mid L)\) for weights
Targeted update: Update the outcome model to target the treatment effect:
- Compute “clever covariate” \(H(A, L) = \frac{\mathbb{1}(A=a)}{g_0(A \mid L)}\)
- Fit a logistic regression: \(\text{logit}(Q_1(A, L)) = \text{logit}(Q_0(A, L)) + \epsilon H(A, L)\)
- Update: \(Q_1(A, L) = Q_0(A, L) + \epsilon H(A, L)\)
Compute effect: Estimate treatment effect from updated model
Inference: Compute standard errors and confidence intervals

25.4.1 Implementation: TMLE with TMLE.jl

Here’s how to use the TMLE.jl package for robust causal effect estimation:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using TMLE Random Distributions MLJLinearModels MLJModels DataFrames CategoricalArrays CairoMakie

Random.seed!(42)

# Simulate data with confounders
n = 500
L = rand(Normal(0, 1), n)  # Confounder
# Treatment probability depends on L: P(A=1 | L) = 1 / (1 + exp(-L))
p_A = 1 ./ (1 .+ exp.(-L))
A = Float64[rand(Bernoulli(p)) for p in p_A]  # Treatment (depends on L)
Y = 0.5 .* A .- 0.3 .* L .+ rand(Normal(0, 0.1), n)  # Outcome

# Convert treatment to categorical (required by TMLE.jl)
df = DataFrame(L = L, A = categorical(A), Y = Y)

# Step 1: Define the estimand (what we want to estimate)
# ATE: Average Treatment Effect E[Y | do(A=1)] - E[Y | do(A=0)]
Ψ = ATE(
    outcome = :Y,
    treatment_values = (A = (case = 1.0, control = 0.0),),
    treatment_confounders = (A = [:L],)
)

# Step 2: Define models for outcome and treatment
# In production, use flexible ML methods (Super Learner, random forests, etc.)
# Here we use linear regression for outcome and binary classifier for treatment
models = Dict(
    :Y => with_encoder(LinearRegressor()),  # Outcome model: E[Y | A, L]
    :A => with_encoder(LogisticClassifier(lambda = 0))  # Treatment model: P(A | L)
)

# Step 3: Create TMLE estimator
tmle = Tmle(models = models)

# Step 4: Estimate the ATE
tmle_result, cache = tmle(Ψ, df; verbosity = 0)

# Step 5: Extract ATE results
ATE_estimate = estimate(tmle_result)
ATE_CI = confint(OneSampleZTest(tmle_result))
ATE_pvalue = pvalue(OneSampleZTest(tmle_result))

# Step 6: Compute counterfactual means separately for visualization
# E[Y | do(A=1)] and E[Y | do(A=0)]
Ψ_A1 = CM(
    outcome = :Y,
    treatment_values = (A = 1.0,),
    treatment_confounders = (A = [:L],)
)
Ψ_A0 = CM(
    outcome = :Y,
    treatment_values = (A = 0.0,),
    treatment_confounders = (A = [:L],)
)

cm_A1_result, cache = tmle(Ψ_A1, df; cache = cache, verbosity = 0)
cm_A0_result, cache = tmle(Ψ_A0, df; cache = cache, verbosity = 0)

E_Y_A1 = estimate(cm_A1_result)  # E[Y | do(A=1)]
E_Y_A0 = estimate(cm_A0_result)  # E[Y | do(A=0)]

# Standard error from confidence interval
# SE = (upper - lower) / (2 * z_alpha/2) where z_0.025 ≈ 1.96
se_ATE = (ATE_CI[2] - ATE_CI[1]) / (2 * 1.96)

# Compute naive estimate for comparison
# Convert categorical back to numeric for comparison
A_numeric = [x == 1.0 ? 1 : 0 for x in df.A]
naive_ATE = mean(Y[A_numeric .== 1]) - mean(Y[A_numeric .== 0])

# Visualise
let
fig = Figure(size = (1000, 400))
ax1 = Axis(fig[1, 1], title = "Counterfactual Means", 
           xlabel = "Treatment", ylabel = "E[Y | do(A)]")
ax2 = Axis(fig[1, 2], title = "Treatment Effect with Confidence Interval", 
           xlabel = "Method", ylabel = "ATE")

# Plot counterfactual means
barplot!(ax1, [1, 2], [E_Y_A0, E_Y_A1], 
         label = ["E[Y | do(A=0)]", "E[Y | do(A=1)]"], 
         color = [:red, :blue])
ax1.xticks = ([1, 2], ["A=0", "A=1"])
axislegend(ax1)

# Compare with naive estimate
barplot!(ax2, [1, 2], [naive_ATE, ATE_estimate], 
         label = ["Naive", "TMLE"], color = [:red, :blue])
errorbars!(ax2, [1, 2], [naive_ATE, ATE_estimate], 
           [0, se_ATE], [0, se_ATE], color = :black)
ax2.xticks = ([1, 2], ["Naive", "TMLE"])
axislegend(ax2)

    fig  # Only this gets displayed
end

println("TMLE Results:")
println("  E[Y | do(A=1)] = ", round(E_Y_A1, digits=3))
println("  E[Y | do(A=0)] = ", round(E_Y_A0, digits=3))
println("  ATE = ", round(ATE_estimate, digits=3))
println("  95% CI: ", ATE_CI)
println("  p-value: ", round(ATE_pvalue, digits=4))
println("\nComparison:")
println("  Naive estimate (ignoring confounding): ", round(naive_ATE, digits=3))
println("  TMLE estimate (adjusted): ", round(ATE_estimate, digits=3))
println("  True ATE (from simulation): 0.5")

TMLE Results:
  E[Y | do(A=1)] = 0.525
  E[Y | do(A=0)] = 0.021
  ATE = 0.504
  95% CI: (0.48244758808884114, 0.5259279549376686)
  p-value: 0.0

Comparison:
  Naive estimate (ignoring confounding): 0.243
  TMLE estimate (adjusted): 0.504
  True ATE (from simulation): 0.5

25.4.2 Choosing Between Methods

G-computation is preferred when: - Outcome models are well-understood - Full distribution is needed - Treatment strategies are complex

MSMs (IPTW) are preferred when: - Treatment model is well-understood - Simple marginal effects are sufficient - Implementation simplicity is important

TMLE is preferred when: - Robustness to model misspecification is critical - Semiparametric efficiency is desired - Valid inference (confidence intervals) is needed

25.5 World Context

This chapter addresses Seeing in the Observable world—what can we observe/learn from actualised data? TMLE provides robust estimation from observable data, bridging the Observable world (what we observe) with the Structural world (what would happen under interventions).

25.6 Key Takeaways

TMLE is double robust and semiparametric efficient
TMLE algorithm combines outcome and treatment models with targeted update
TMLE is preferred when robustness and valid inference are critical
TMLE bridges Observable (data) and Structural (interventions)

25.7 Further Reading

Laan and Rubin (2006): “Targeted maximum likelihood learning”
Laan and Rose (2011): “Targeted learning”
Schuler and Rose (2017): “Targeted learning in R”
Observational Methods: Learning from Data: G-methods and IPTW
Model Validation with Observable Data: Validating TMLE results