34 Biological Case Studies

34.1 Overview

This chapter demonstrates the full Causal Dynamical Model (CDM) pipeline through two extended biological case studies. Each case study spans multiple organisational scales—from molecular to cellular, or from individual to population—and illustrates how the concepts developed throughout this book integrate into a coherent workflow.

After reading this chapter, you will be able to:

Apply the full CDM pipeline (Structure → Dynamics → Observation → Inference → Intervention → Counterfactual) to biological systems
Construct causal graphs for gene regulatory networks and multi-scale epidemiological systems
Formulate Hill function kinetics for GRN dynamics and coupled ODE/ABM models for epidemics
Recognise common patterns: latent structure, count data, multi-scale coupling, intervention design
Use the CDM framework for cross-scale transportability (e.g., lab to population)

The pipeline we follow is:

Structure → Specify the causal graph encoding which variables influence which
Dynamics → Formulate the temporal evolution (ODEs, SDEs, or agent-based models)
Observation → Define how latent states manifest as noisy measurements
Inference → Learn parameters and infer latent trajectories from data
Intervention → Simulate the effect of external manipulations
Counterfactual → Reason about what would have happened under alternative scenarios

By working through concrete biological applications, we show how the CDM framework provides a unified methodology for causal reasoning across scales, from gene regulation to epidemic dynamics.

34.2 Case Study 1: Gene Regulatory Network to Cell Fate

34.2.1 Biological Context

Gene regulatory networks (GRNs) control cell differentiation and fate decisions. Transcription factors bind to enhancers and promoters, activating or repressing target genes in a coordinated cascade. Signalling pathways from the extracellular environment modulate these networks, creating a multi-scale system: molecular (genes, proteins) → cellular (differentiation state) → tissue (cell type composition).

Waddington’s epigenetic landscape provides a powerful dynamical systems metaphor: cells roll down a rugged landscape of valleys (attractors) representing stable cell fates, with ridges (separatrices) between them. Perturbations—such as transcription factor overexpression—can push cells from one valley to another, enabling reprogramming. The CDM framework formalises this intuition: the causal graph encodes regulatory structure, the dynamics define the landscape, and interventions correspond to pushing cells across ridges.

The key insight for causal modelling is that gene regulation is mechanistic: transcription factors bind to DNA, recruit polymerase, and modulate transcription rates. This mechanistic structure implies a causal ordering—upstream genes must be expressed before they can influence downstream targets—and supports the use of structural causal models for intervention and counterfactual reasoning.

34.2.2 Structural World: The Causal Graph

At the molecular time scale, a GRN can be represented as a directed acyclic graph (DAG) when we consider discrete time steps and assume no instantaneous feedback within a single step. In practice, gene regulation involves feedback loops, but at coarse time resolution we can often approximate the structure as a DAG or use cyclic structural causal models (see Chapter 9b).

Consider a minimal 4-gene network: genes \(G_1\) and \(G_2\) are upstream regulators; \(G_3\) is activated by both; \(G_4\) is activated by \(G_3\) and repressed by \(G_2\). The causal graph encodes: \(G_1 \to G_3\), \(G_2 \to G_3\), \(G_2 \to G_4\), \(G_3 \to G_4\). Here \(G_2\) is a confounder of the \(G_3 \to G_4\) relationship (both cause \(G_3\) and \(G_4\)), and \(G_3\) is a mediator of the effect of \(G_1\) and \(G_2\) on \(G_4\).

The backdoor criterion identifies that conditioning on \(G_2\) blocks the backdoor path \(G_4 \leftarrow G_2 \to G_3\) when estimating the direct effect of \(G_3\) on \(G_4\). The frontdoor criterion applies when we have an unmeasured confounder: if \(G_1\) affects \(G_4\) only through \(G_3\), we can identify the effect via the frontdoor path \(G_1 \to G_3 \to G_4\).

The figure Figure 34.1 illustrates the minimal 4-gene network; the code chunk labelled chunk-grn-causal-graph builds the same graph in code. In state-space model diagrams, we use dashed outlines for latent variables (gene expression levels) and solid outlines for observed quantities. The graph structure is the foundation for all subsequent steps: it determines which interventions are well-defined, which causal effects are identifiable, and how to interpret the parameters of the dynamical model.

Figure 34.1: Minimal 4-gene regulatory network. \(G_1\) and \(G_2\) regulate \(G_3\); \(G_2\) and \(G_3\) regulate \(G_4\).

# Build GRN causal graph and identify confounders/mediators using CausalDynamics.jl
# First run may take 1–2 min while CausalDynamics (and deps) load.
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using CausalDynamics Graphs

# Minimal 4-gene network: G1, G2 -> G3 -> G4, G2 -> G4
g = SimpleDiGraph(4)
add_edge!(g, 1, 3)  # G1 -> G3
add_edge!(g, 2, 3)  # G2 -> G3
add_edge!(g, 2, 4)  # G2 -> G4
add_edge!(g, 3, 4)  # G3 -> G4

gene_names = ["G1", "G2", "G3", "G4"]
# Backdoor set for G3 -> G4: {G2} blocks G4 <- G2 -> G3
# Frontdoor: G3 mediates G1 -> G4

4-element Vector{String}:
 "G1"
 "G2"
 "G3"
 "G4"

34.2.3 Dynamical World: ODE Model

Gene regulation is typically modelled with Hill function kinetics, capturing the cooperative binding of transcription factors. When multiple transcription factors bind cooperatively to a promoter, the response is sigmoidal: low activator concentration yields minimal expression; high concentration yields maximal expression; the transition occurs over a narrow range determined by the dissociation constant \(K\) and cooperativity \(n\). For activation, the Hill function is \(H(x) = x^n / (K^n + x^n)\), where \(n\) is the cooperativity (Hill coefficient) and \(K\) is the half-saturation constant. For repression, \(H_{\text{rep}}(x) = 1 - H(x)\).

The state-space model has latent gene expression levels \(\mathbf{x}_t = (x_1, \ldots, x_4)\) evolving according to:

\[ \dot{x}_i = \alpha_i \cdot H(\mathbf{x}_{\text{Pa}(i)}) - \delta_i x_i + w_i \]

where \(\alpha_i\) is the maximal production rate, \(\delta_i\) is the degradation rate, \(\text{Pa}(i)\) denotes the parents of gene \(i\) in the causal graph, \(H\) is the appropriate Hill function of the parent expression levels, and \(w_i\) is process noise. For \(G_3\) with parents \(\{G_1, G_2\}\), we might use \(H(x_1, x_2) = H(x_1) \cdot H(x_2)\) (AND logic) or \(H(x_1) + H(x_2) - H(x_1)H(x_2)\) (OR logic).

The steady-state behaviour of such systems can exhibit bistability: two stable fixed points corresponding to distinct cell fates (e.g., differentiated vs. pluripotent). The separatrix between basins of attraction defines the “ridge” in Waddington’s landscape. Small perturbations near the ridge can tip the cell from one fate to another—the basis of reprogramming protocols. The ODE is implemented in the code chunk labelled chunk-grn-ode.

# Hill function kinetics for gene regulation ODE
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using OrdinaryDiffEq

function hill_activate(x, K, n)
    x^n / (K^n + x^n)
end

function grn_ode!(dx, x, p, t)
    α, δ, K, n = p
    # G1: constitutive; G2: constitutive; G3: activated by G1,G2; G4: activated by G3, repressed by G2
    dx[1] = α[1] - δ[1] * x[1]
    dx[2] = α[2] - δ[2] * x[2]
    h₁₂ = hill_activate(x[1], K, n) * hill_activate(x[2], K, n)
    dx[3] = α[3] * h₁₂ - δ[3] * x[3]
    h_rep = 1 - hill_activate(x[2], K, n)
    dx[4] = α[4] * hill_activate(x[3], K, n) * h_rep - δ[4] * x[4]
end

# Parameters: α (production), δ (degradation), K (half-saturation), n (Hill coeff)
α = [1.0, 1.0, 2.0, 2.0]
δ = [0.5, 0.5, 0.5, 0.5]
K, n = 0.5, 2.0
p = (α, δ, K, n)
x0 = [0.1, 0.1, 0.1, 0.1]
tspan = (0.0, 20.0)
prob = ODEProblem(grn_ode!, x0, tspan, p)
sol = solve(prob, Tsit5())

retcode: Success
Interpolation: specialized 4th order "free" interpolation
t: 20-element Vector{Float64}:
  0.0
  0.06844717736975105
  0.18575032467585562
  0.32440604105410964
  0.5225903679925564
  0.7738685334408734
  1.0574768122977394
  1.3960589045199954
  1.9051890332306463
  2.627365815858647
  3.4022141790836358
  4.3800741978060715
  5.5020633486356205
  6.847017426035597
  8.417498192564562
 10.300026762675573
 12.575682418111072
 15.364974797072094
 18.639000434351196
 20.0
u: 20-element Vector{Vector{Float64}}:
 [0.1, 0.1, 0.1, 0.1]
 [0.16392471274526094, 0.16392471274526094, 0.09726658114595128, 0.10133389873200045]
 [0.26851620853618835, 0.26851620853618835, 0.09784991929421966, 0.1024728530841456]
 [0.38448972640161766, 0.38448972640161766, 0.1159364712816581, 0.10350570030556348]
 [0.5368982006068846, 0.5368982006068846, 0.18568715210518075, 0.10979296682182861]
 [0.7096421024835914, 0.7096421024835914, 0.33961617708162906, 0.13506398517560497]
 [0.8802387632964167, 0.8802387632964167, 0.5670226169466351, 0.18304345304894534]
 [1.0546268511504289, 1.0546268511504289, 0.8675985312630389, 0.24157263246078217]
 [1.2670960297496092, 1.2670960297496092, 1.3141994985376872, 0.3020275551901736]
 [1.4892262930633853, 1.4892262930633853, 1.8657398441809674, 0.337129935178787]
 [1.6532850238897707, 1.6532850238897707, 2.3280190813237294, 0.33954825106618863]
 [1.787365221463711, 1.787365221463711, 2.7453187268558144, 0.3230835524519911]
 [1.8786608509864473, 1.8786608509864473, 3.056171811110973, 0.2989881264684044]
 [1.9380614599186476, 1.9380614599186476, 3.2761921251739134, 0.2746192273089915]
 [1.971752714665441, 1.971752714665441, 3.4119523031898153, 0.2554025202760825]
 [1.9889756361524553, 1.9889756361524553, 3.487643666330296, 0.24248125625941225]
 [1.9964607756028294, 1.9964607756028294, 3.52373307036383, 0.23529611869111702]
 [1.9991153038104499, 1.9991153038104499, 3.5378828542153484, 0.23208688107814168]
 [1.999822560606339, 1.999822560606339, 3.5420729662013404, 0.23102762414212918]
 [1.9999101481497887, 1.9999101481497887, 3.542629772802891, 0.2308782153497529]

The state-space formulation treats \(\mathbf{x}_t\) as latent and the ODE as the transition model. With additive process noise \(w_t \sim \mathcal{N}(0, \sigma_w^2 \mathbf{I})\), we obtain a stochastic dynamical system suitable for filtering and smoothing.

34.2.4 Observable World: Inference from Single-Cell RNA-Seq

Single-cell RNA-seq provides noisy, sparse snapshots of gene expression. Each cell is measured once (snapshot), and the data are counts—typically modelled as Poisson or negative binomial. The observation model is:

\[ Y_{it} \sim \text{Poisson}(\lambda_i \cdot x_{it} + \epsilon) \]

where \(\lambda_i\) is a cell-specific scaling factor (capturing sequencing depth) and \(\epsilon\) is a small constant for stability. The latent trajectory \(\mathbf{x}_t\) is inferred via filtering (forward pass) and smoothing (backward pass) in the state-space framework. Parameter learning proceeds via expectation-maximisation (EM): the E-step infers latent states given parameters; the M-step updates parameters given the inferred states. For nonlinear dynamics, the E-step may use extended Kalman filtering, unscented Kalman filtering, or particle filtering. The M-step updates \(\alpha\), \(\delta\), \(K\), \(n\) by maximising the expected complete-data log-likelihood.

A key challenge is pseudotime inference: single-cell data are cross-sectional (each cell measured once), so we do not observe true temporal trajectories. Instead, we infer a pseudotime ordering—a one-dimensional projection of cells along a developmental trajectory—from the expression data. Trajectory inference methods (e.g., Slingshot, Monocle) provide an approximate temporal ordering, which can then be used to fit the dynamical model. The CDM framework treats pseudotime as an estimated latent variable, with uncertainty propagated through the inference pipeline.

34.2.5 Interventions and Counterfactuals

A gene knockout corresponds to the intervention \(\mathrm{do}(x_i = 0)\): we set the expression of gene \(i\) to zero and propagate the effect through the dynamical system. In the ODE model, this means replacing the equation for \(\dot{x}_i\) with \(x_i \equiv 0\) and removing \(x_i\) from the parent functions of its children. The modified dynamical system is then simulated forward from the current state (or from equilibrium) to predict the post-intervention trajectory. This is the doing level of the causal hierarchy: we actively modify the system and observe the consequence.

Counterfactual reasoning answers: “What would have happened if gene \(X\) had been expressed at level \(x^*\)?” We use the three-step process: (1) Abduction—infer the exogenous noise \(U\) from the observed data; (2) Action—replace the structural equation for \(X\) with \(X = x^*\); (3) Prediction—simulate forward with the modified model. This enables cell reprogramming design: we ask what intervention would push a cell from fate A to fate B, and the counterfactual computation identifies the required transcription factor perturbations.

In the Yamanaka factors paradigm for induced pluripotency, four transcription factors (Oct4, Sox2, Klf4, c-Myc) are overexpressed to reprogram somatic cells to pluripotent stem cells. A counterfactual CDM could ask: “Given this cell’s observed trajectory toward differentiation, what would have happened if we had overexpressed Oct4 at day 2?” The answer informs optimal timing and dosing of reprogramming factors.

34.3 Case Study 2: Epidemiological Dynamics Across Scales

34.3.1 Biological Context

Infectious disease dynamics span multiple scales. Within-host: the virus replicates, the immune system responds, and viral load determines symptom severity and infectiousness. Between-host: contact patterns, transmission probability, and population structure determine the epidemic curve. The two scales are coupled: within-host viral load influences between-host transmission probability; population-level interventions (lockdowns, vaccination) alter contact rates and susceptibility.

This multi-scale structure is a hallmark of complex biological systems. The CDM framework allows us to represent the causal structure at each scale, specify the dynamics, and reason about cross-scale transportability—e.g., how lab-measured vaccine efficacy translates to population-level impact.

The within-host and between-host scales operate at different time resolutions: viral load peaks within days; epidemics unfold over months. Coupling these scales requires modelling the mapping from within-host state (e.g., viral load) to between-host parameters (e.g., per-contact transmission probability). This mapping is often nonlinear and context-dependent.

34.3.2 Structural World: Multi-Scale DAG

The within-host DAG encodes: viral load \(V\) → immune activation \(I\) → symptom onset \(S\). Viral load also directly influences symptom severity. The between-host DAG encodes: contact rate \(C\) → transmission events \(T\) → prevalence \(P\). A cross-scale edge connects viral load to infectiousness: \(V \to \beta\) (transmission rate), since higher viral load typically increases transmission probability.

This creates a hierarchical causal structure (Figure 34.2). Confounding can arise from unmeasured factors: e.g., host genetics affects both immune response and susceptibility to infection, creating a backdoor path. The backdoor criterion identifies that conditioning on host factors (when measured) or using instrumental variables (e.g., randomisation of vaccination) can identify causal effects.

Figure 34.2: Multi-scale epidemiological DAG. Within-host: viral load \(V\) → immune \(I\) → symptoms \(S\). Between-host: contact \(C\) → transmission \(T\) → prevalence \(P\). Cross-scale: \(V\) influences \(T\).

# Multi-scale DAG: within-host and between-host with cross-scale edges
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using CausalDynamics Graphs

# Within-host: V -> I -> S, V -> S
# Between-host: C -> T -> P
# Cross-scale: V influences transmission (represented as V -> T)
# Simplified combined graph with 6 nodes: V, I, S, C, T, P
g = SimpleDiGraph(6)
add_edge!(g, 1, 2)  # V -> I
add_edge!(g, 1, 3)  # V -> S
add_edge!(g, 2, 3)  # I -> S
add_edge!(g, 4, 5)  # C -> T
add_edge!(g, 5, 6)  # T -> P
add_edge!(g, 1, 5)  # V -> T (cross-scale)

true

The multi-scale DAG is built in the code chunk labelled chunk-epidemic-causal-graph. Identification of the effect of vaccination on epidemic outcomes requires careful consideration of the graph. Vaccination affects susceptibility (blocks \(C \to T\) for vaccinated individuals) and possibly infectiousness (reduces \(V\), hence the cross-scale edge \(V \to T\)). The causal effect is identified when vaccination is randomised (no unmeasured confounders of vaccination status).

34.3.3 Dynamical World: Coupled ODE/Agent-Based Models

Within-host dynamics can be modelled with an SIR-like ODE for immune dynamics. The viral load \(V\) grows exponentially until the immune response \(I\) ramps up; then \(V\) declines as immune clearance dominates. Symptom severity \(S\) typically tracks viral load and inflammatory response:

\[ \dot{V} = r V - \gamma I V, \quad \dot{I} = \alpha V - \delta I, \quad \dot{S} = f(I, V) \]

where \(V\) is viral load, \(I\) is immune activation, \(S\) is symptom severity, \(r\) is viral replication rate, \(\gamma\) is immune clearance, and \(f\) encodes the symptom response.

Between-host dynamics are naturally represented as an agent-based model (ABM) on a contact network. Each agent has a state (susceptible, infected, recovered) and position on the network. Transmission occurs along edges with probability depending on the infected agent’s viral load. Agents.jl provides the infrastructure for such models; a minimal ABM is shown in the code chunk labelled chunk-epidemic-abm.

The state-space representation of the epidemic curve treats the daily case count as a Poisson observation of the latent incidence, which is a function of the ABM state. The coupling is: within-host \(V\) determines per-contact transmission probability; the ABM simulates who gets infected; the epidemic curve aggregates over the population.

# Agent-based epidemic model on a contact network (Agents.jl)
project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using Agents Agents.Schedulers Graphs CairoMakie StableRNGs DataFrames

mutable struct Person <: AbstractAgent
    id::Int
    pos::Int
    state::Symbol  # :S, :I, :R
    viral_load::Float64  # Within-host state (simplified)
end

# Network: 100 nodes, mean degree ~6
g = erdos_renyi(100, 0.06, seed = 42)
node_positions = [[i] for i in 1:Graphs.nv(g)]
space = Agents.GraphSpace(g, node_positions)

function agent_step!(agent, model)
    rng = model.properties[:rng]
    if agent.state == :I
        # Transmission to susceptible neighbours with prob ∝ viral_load
        β = 0.1 * agent.viral_load  # Cross-scale: V -> transmission
        graph = getfield(model, :space).graph
        neighbor_nodes = Graphs.neighbors(graph, agent.pos)
        agents_dict = getfield(model, :agents)
        for nid in neighbor_nodes
            haskey(agents_dict, nid) || continue
            neighbor = model[nid]
            neighbor.state == :S || continue
            rand(rng) < β || continue
            neighbor.state = :I
            neighbor.viral_load = 1.0
        end
        # Recovery
        agent.viral_load *= 0.9
        if agent.viral_load < 0.01
            agent.state = :R
        end
    end
end

# Initialise model and add agents (one per node)
model = ABM(Person, space; scheduler = Agents.Schedulers.Randomly(),
            properties = Dict(:rng => StableRNG(1234)),
            agent_step! = agent_step!)
for i in 1:Graphs.nv(g)
    add_agent!(Person(i, i, :S, 0.0), model)
end
# Seed 5% infected
for id in collect(allids(model))[1:5]
    model[id].state = :I
    model[id].viral_load = 1.0
end

34.3.4 Observable World: Surveillance Data

Surveillance data are noisy and incomplete. Case counts are Poisson (or negative binomial) observations of true incidence, with under-reporting. The reporting fraction \(\rho\) may vary over time (e.g., declining as healthcare systems become overwhelmed) and across regions, introducing additional complexity. Serological surveys provide imperfect estimates of seroprevalence (sensitivity and specificity \(< 1\)). Hospitalisation data are a delayed, filtered signal of severe cases.

State-space inference estimates the latent epidemic state (prevalence, incidence, \(R_t\)) from these observations. Parameter learning for the transmission rate \(\beta\) proceeds via particle filtering or MCMC. Time-varying \(R_t\) (the effective reproduction number) can be estimated by treating \(\beta_t\) as a latent process with smoothness priors, enabling detection of changes in transmission due to interventions or behaviour change.

The observation model is:

\[ Y_t \sim \text{Poisson}(\rho \cdot \lambda_t) \]

where \(\lambda_t\) is the true incidence and \(\rho\) is the reporting fraction.

34.3.5 Interventions and Counterfactuals

Vaccination intervention: \(\mathrm{do}(\text{vaccination\_rate} = v)\) reduces susceptibility and/or infectiousness. In the ABM, we set a fraction of agents to “vaccinated” at \(t = 0\), modifying their transition probabilities.

Counterfactual: “What would the epidemic trajectory have been without the intervention?” We (1) infer the exogenous factors (contact patterns, initial conditions) from observed data; (2) remove the vaccination intervention; (3) simulate forward. This quantifies the impact of the vaccination campaign. Such counterfactual comparisons are central to policy evaluation: they answer “how many cases were prevented?” and “how many deaths were averted?” by comparing the observed trajectory (with intervention) to the counterfactual trajectory (without intervention). The difference is the attributable effect of the intervention.

Cross-scale transportability: Lab studies measure vaccine efficacy (reduction in viral load or infection probability in controlled settings). Transporting this to the population requires reasoning about the causal structure: lab conditions may differ from the field (different strains, age structure, contact patterns). The CDM framework provides the language for specifying transportability assumptions and identifying when population-level effects can be predicted from lab data.

Transportability analysis (Chapter 8) asks: under what conditions can we transfer the lab-measured effect to the population? If the only difference between lab and field is the distribution of contact patterns \(P(C)\), and the causal effect of vaccination on transmission is invariant across contexts, then the population effect is identified. If, however, vaccine efficacy depends on viral strain, and strain distribution differs between lab and field, transportability fails without additional assumptions or data.

34.4 Synthesis

Both case studies illustrate common patterns. Structure constrains what can be learned and intervened upon; the causal graph is the first object we specify. Dynamics encode the temporal evolution; the choice of ODE vs. ABM depends on scale and granularity. Observation links latent states to data; Poisson and negative binomial models are natural for count data. Inference recovers parameters and latent trajectories; EM, particle filtering, and MCMC are the workhorses. Intervention and counterfactual reasoning answer policy-relevant questions: gene knockouts, vaccination campaigns, and what-if scenarios.

The CDM pipeline is a general methodology for biological systems. It unifies structural causal modelling (Chapters 1–9), dynamical systems (Chapters 10–18), and observable inference (Chapters 19–27) into a single workflow.

Common patterns across case studies:

Latent structure: Both GRNs and epidemics have latent states (gene expression, viral load, true incidence) observed only through noisy measurements. State-space models provide the unifying framework.
Count data: RNA-seq counts and case counts both follow discrete distributions (Poisson, negative binomial). The observation model is a key component of the CDM.
Multi-scale coupling: GRNs couple molecular regulation to cellular fate; epidemics couple within-host dynamics to between-host transmission. The causal graph must explicitly represent cross-scale edges.
Intervention design: Both domains require answering “what if” questions—gene knockouts for reprogramming, vaccination for epidemic control. The do-calculus and counterfactual machinery are essential.

Open challenges include: scaling to high-dimensional GRNs (thousands of genes), handling unmeasured confounding in surveillance data (e.g., time-varying contact behaviour), and formalising cross-scale transportability when lab and field differ in unmeasured ways. As data quality and computational tools improve, the CDM framework provides a principled foundation for causal reasoning across the biological hierarchy.

The two case studies in this chapter—gene regulation and epidemiology—represent distinct scales and modelling traditions, yet both fit naturally into the CDM framework. This universality is a strength: the same conceptual machinery (causal graphs, state-space models, do-calculus, counterfactuals) applies across domains, enabling transfer of methods and insights. The biologist or epidemiologist equipped with the CDM pipeline has a systematic workflow for moving from observational data to actionable causal conclusions.

We have emphasised the pipeline rather than any single technique. No single method—whether causal discovery, parameter estimation, or counterfactual simulation—suffices alone. The power of the CDM framework lies in its integration: structure informs dynamics, dynamics inform observation, observation informs inference, and inference enables intervention and counterfactual reasoning. Each step builds on the previous, and the whole is greater than the sum of its parts.

The biological case studies in this chapter are illustrative rather than exhaustive. Similar pipelines apply to neural circuits (single neurons → networks → behaviour), ecological systems (species → communities → ecosystems), and metabolic networks (enzymes → pathways → whole-organism physiology). The CDM framework provides a general-purpose methodology for causal reasoning across the life sciences.