32 Experimental Design: Optimal Measurements

Status: Draft

v0.4

32.1 Learning Objectives

After reading this chapter, you will be able to:

Choose what to measure and what to perturb to answer causal questions efficiently
Link identifiability, information gain, and “what experiment next?” to CDM structure
Design experiments that maximise information about causal mechanisms
Apply optimal experimental design to causal-dynamical systems

32.2 Introduction

Experiments are expensive (Chaloner and Verdinelli 1995; Ryan et al. 2016; Rothman et al. 2021). This chapter shows how to choose what to measure and what to perturb to answer causal questions efficiently, linking experimental design to CDM structure. This completes Imagining in the Observable world—designing optimal studies based on counterfactual reasoning and hypothesis generation.

32.3 The Experimental Design Problem

32.3.1 Question

“What should we measure and what should we perturb to learn about causal mechanisms?”

32.3.2 Constraints

Budget: Limited resources (time, money, subjects)
Ethics: Some interventions may be unethical
Feasibility: Some measurements may be impossible
Safety: Some perturbations may be dangerous

32.3.3 Goal

Maximise information about causal mechanisms subject to constraints.

32.4 Information-Theoretic Design

32.4.1 Mutual Information

Mutual information measures how much we learn about \(X\) from observing \(Y\):

\[ I(X; Y) = H(X) - H(X \mid Y) \]

where \(H(\cdot)\) is entropy.

32.4.2 Optimal Design

Choose measurements/perturbations that maximise mutual information:

\[ \max_{\text{design}} I(\theta; Y^{\text{design}}) \]

where \(\theta\) are parameters of interest and \(Y^{\text{design}}\) are observations under design.

32.4.3 Implementation: Information-Theoretic Design

We can demonstrate how to choose measurements that maximize mutual information:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using Random Distributions CairoMakie

Random.seed!(42)

# Example: Choose between two measurement designs
# Design 1: Measure X (high information about θ)
# Design 2: Measure Y (lower information about θ)

# True parameter
θ_true = 0.5

# Design 1: X = θ + noise (direct measurement)
σ1 = 0.1
X = θ_true .+ rand(Normal(0, σ1), 100)
info_design1 = 1 / σ1^2  # Information ∝ 1/σ²

# Design 2: Y = θ² + noise (indirect, nonlinear)
σ2 = 0.1
Y = θ_true^2 .+ rand(Normal(0, σ2), 100)
info_design2 = 1 / σ2^2 * (2*θ_true)^2  # Information depends on θ (lower for small θ)

# Compare information
println("Information-theoretic design comparison:")
println("  Design 1 (direct): Information ≈ ", round(info_design1, digits=2))
println("  Design 2 (indirect): Information ≈ ", round(info_design2, digits=2))
println("  → Design 1 provides more information about θ")
println("  → Choose design that maximizes I(θ; Y)")

Information-theoretic design comparison:
  Design 1 (direct): Information ≈ 100.0
  Design 2 (indirect): Information ≈ 100.0
  → Design 1 provides more information about θ
  → Choose design that maximizes I(θ; Y)

32.5 Identifiability and Design

32.5.1 The Link

Identifiability determines what can be learned in principle (see Identification: When Can We Learn from Data?).

Design determines what can be learned in practice.

Connection: Good designs make unidentifiable parameters identifiable, or improve precision of identifiable parameters.

32.5.2 Example: Unidentifiable Without Inputs

If parameters are unidentifiable without inputs, add inputs (perturbations) to make them identifiable.

32.5.3 Implementation: Design for Identifiability

We can demonstrate how to design experiments to make unidentifiable parameters identifiable:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using CausalDynamics Graphs

# Example: Treatment effect may be unidentifiable without intervention
# Graph: L → A → Y, L → Y (confounding)

g = SimpleDiGraph(3)
add_edge!(g, 1, 2)  # L → A
add_edge!(g, 2, 3)  # A → Y
add_edge!(g, 1, 3)  # L → Y

# Check identifiability from observational data
adj_set_obs = backdoor_adjustment_set(g, 2, 3)
is_identifiable_obs = !isempty(adj_set_obs)

println("Identifiability check:")
println("  From observational data: ", is_identifiable_obs ? "Identifiable" : "NOT identifiable")
if is_identifiable_obs
    println("    Adjustment set: ", adj_set_obs)
else
    println("    Problem: Unmeasured confounder or insufficient variation")
end

# Design solution: Randomise treatment (break L → A link)
println("\nDesign solution:")
println("  Randomised trial: Randomise A (breaks L → A link)")
println("  → Treatment effect becomes identifiable")
println("  → Can estimate E[Y | do(A=1)] - E[Y | do(A=0)] directly")

Identifiability check:
  From observational data: Identifiable
    Adjustment set: Set([1])

Design solution:
  Randomised trial: Randomise A (breaks L → A link)
  → Treatment effect becomes identifiable
  → Can estimate E[Y | do(A=1)] - E[Y | do(A=0)] directly

32.6 Design for Identification

To identify causal effects, designs should: - Provide variation: Different treatment levels - Control confounding: Randomisation or adjustment - Capture dynamics: Appropriate timing and frequency - Include perturbations: Inputs that help identify mechanisms

32.7 Adaptive Designs

Adaptive designs use information from previous observations to optimize future measurements:

Sequential design: Update design based on current knowledge
Multi-armed bandits: Balance exploration and exploitation
Active learning: Select most informative observations

32.7.1 Implementation: Adaptive Design

Here’s a simplified example of adaptive design:

# Find project root and include ensure_packages.jl
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))

@auto_using Random Distributions CairoMakie

Random.seed!(42)

# Example: Sequential design for parameter estimation
# Start with initial design, update based on observations

θ_true = 0.5
n_steps = 5
designs = []
estimates = []

# Initial design: uniform sampling
current_design = "uniform"

for step in 1:n_steps
    # Collect data under current design
    if current_design == "uniform"
        # Sample uniformly
        X = rand(Uniform(0, 1), 10)
    else
        # Focus on informative region (near current estimate)
        if step > 1
            est = estimates[end]
            X = rand(Normal(est, 0.1), 10)  # Sample near current estimate
        else
            X = rand(Uniform(0, 1), 10)
        end
    end
    
    # Observe outcomes
    Y = θ_true .* X .+ rand(Normal(0, 0.1), 10)
    
    # Estimate parameter (avoid division by zero)
    # Use weighted least squares: Y = θ*X, so θ = mean(Y*X) / mean(X²)
    θ_est = sum(Y .* X) / sum(X.^2)
    push!(estimates, θ_est)
    push!(designs, current_design)
    
    # Update design: switch to focused sampling after initial exploration
    if step == 2
        current_design = "focused"
    end
end

# Visualise
let
fig = Figure(size = (800, 400))
ax = Axis(fig[1, 1], title = "Adaptive Design: Parameter Estimation", 
          xlabel = "Step", ylabel = "Parameter Estimate")

lines!(ax, 1:n_steps, estimates, linewidth = 2, color = :blue, label = "Estimate")
hlines!(ax, [θ_true], color = :red, linestyle = :dash, linewidth = 2, label = "True value")
scatter!(ax, 1:n_steps, estimates, color = :blue, markersize = 8)
axislegend(ax)

    fig  # Only this gets displayed
end

println("Adaptive design:")
println("  Steps 1-2: Exploration (uniform sampling)")
println("  Steps 3-5: Exploitation (focused sampling near estimate)")
println("  Final estimate: ", round(estimates[end], digits=3), " (true = ", θ_true, ")")

Adaptive design:
  Steps 1-2: Exploration (uniform sampling)
  Steps 3-5: Exploitation (focused sampling near estimate)
  Final estimate: 0.502 (true = 0.5)

32.8 Study Design Types

Epidemiological research uses different study designs depending on the research question, available resources, and ethical constraints (Rothman et al. 2021). Understanding these designs helps choose appropriate designs for causal questions and recognise their strengths and limitations.

32.8.1 Experimental Designs

Randomised controlled trials (RCTs): Gold standard for causal inference
Cluster randomised trials: Randomisation at group level
Crossover trials: Each subject receives multiple treatments

32.8.2 Observational Designs

Cohort studies: Follow subjects over time
Case-control studies: Compare cases to controls
Cross-sectional studies: Single time point

32.9 World Context

This chapter addresses Imagining in the Observable world—what should we study next? Experimental design completes the Observable “Imagining” phase by showing how to design optimal studies based on counterfactual reasoning and hypothesis generation. This connects the strongest form of causal reasoning (counterfactual) with the most forward-looking activity (designing new studies).

32.10 Key Takeaways

Information-theoretic design: Maximise mutual information about parameters
Design for identification: Make unidentifiable parameters identifiable
Adaptive designs: Use previous observations to optimize future measurements
Study design types: Choose appropriate design for research question
Experimental design completes the Observable “Imagining” phase: Counterfactuals → hypotheses → study design

32.11 Further Reading

Chaloner and Verdinelli (1995): “Bayesian experimental design”
Ryan et al. (2016): “A review of modern computational approaches”
Rothman et al. (2021): Modern Epidemiology (4th ed.) — comprehensive coverage of study designs
Hypothesis Generation from Counterfactuals: Generating hypotheses from counterfactuals
CDMs: The Unified Framework: How all pieces fit together