20  Multi-Scale Causal Dynamics

Status: Draft

v0.1

20.1 Learning Objectives

After reading this chapter, you will be able to:

  • Understand why multi-scale modelling is essential for complex biological and dynamical systems
  • Construct hierarchical state-space models with nested macro- and micro-level dynamics
  • Apply coarse-graining operators and understand conditions for causal consistency across scales
  • Extend transportability theory to cross-scale settings
  • Implement two-scale SSMs and check cross-scale transportability conditions in Julia

20.2 Why Multi-Scale Modelling?

Complex biological systems operate across organisational scales: molecular → cellular → tissue → organ → organism → population. A gene regulatory network influences protein expression; protein concentrations determine cell behaviour; cellular dynamics aggregate into tissue-level properties; tissue function manifests as organ physiology; organ systems integrate into organism-level phenotypes; and organism behaviour shapes population dynamics. Causal mechanisms at one scale produce emergent behaviours at another.

A complete causal dynamical model (CDM) framework must bridge these scales. Consider a drug development pipeline: we measure drug effects in cell culture (micro-scale), but we need to predict organism-level response (macro-scale). Gene expression changes (molecular) propagate to cellular phenotypes, which aggregate into tissue-level biomarkers. Without a principled framework for cross-scale reasoning, we cannot determine when causal conclusions at one scale transport to another.

Multi-scale modelling asks: when do causal structures at one level of description imply or constrain causal structures at another? The micro-scale captures finer-grained mechanisms; the macro-scale captures coarser patterns (often defined by aggregation or coarse-graining).

20.3 Hierarchical State-Space Models

We extend the state-space models of Chapter 12 to multiple nested scales. A hierarchical SSM couples latent states at different timescales and resolutions.

20.3.1 Slow/Fast Decomposition

A natural decomposition separates macro-level (slow) states \(X^{(M)}_t\) from micro-level (fast) states \(X^{(m)}_t\). The macro state evolves on a slower timescale and may depend on aggregated micro-level information; the micro state evolves faster and is modulated by the macro context.

20.3.2 Mathematical Formulation

Macro-level dynamics: \[ X^{(M)}_{t+1} = f^{(M)}\left(X^{(M)}_t, \bar{X}^{(m)}_t, \theta^{(M)}\right) + w^{(M)}_t \]

where \(\bar{X}^{(m)}_t\) denotes an aggregation of micro-level states (e.g., mean, sufficient statistic) that enters the macro dynamics.

Micro-level dynamics: \[ X^{(m)}_{t+1} = f^{(m)}\left(X^{(m)}_t, X^{(M)}_t, \theta^{(m)}\right) + w^{(m)}_t \]

The micro dynamics depend on the current macro state \(X^{(M)}_t\), encoding top-down modulation.

Observation model: \[ Y_t = h\left(X^{(M)}_t, X^{(m)}_t\right) + v_t \]

Observations may arise from either or both scales, depending on the measurement protocol.

20.3.3 Two-Scale Linear-Gaussian Example

The following Julia code illustrates the structure of a two-scale Gaussian Linear Dynamical System (GLDS). The macro level has slow dynamics; the micro level has fast dynamics modulated by the macro state.

# Two-scale GLDS: macro (slow) and micro (fast) dynamics
# Macro: x_M(t+1) = A_M * x_M(t) + B_M * x̄_m(t) + w_M
# Micro: x_m(t+1) = A_m * x_m(t) + B_m * x_M(t) + w_m
# Observation: y_t = H_M * x_M + H_m * x_m + v

project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using LinearAlgebra Distributions Random Statistics CairoMakie

# Dimensions
n_M = 2   # macro state dimension
n_m = 3   # micro state dimension
n_y = 2   # observation dimension

# Macro dynamics (slow): decay + coupling from micro mean
α_M = 0.95  # slow decay
A_M = α_M * I(n_M)
B_M = 0.1 * ones(n_M)  # coupling coefficients from scalar micro mean to macro

# Micro dynamics (fast): decay + modulation by macro
α_m = 0.7   # faster decay
A_m = α_m * I(n_m)
B_m = 0.2 * ones(n_m, n_M)  # top-down modulation

# Observation: both scales contribute
H_M = [1.0 0.0; 0.0 1.0]
H_m = [0.5 0.5 0.0; 0.0 0.5 0.5]

# Noise covariances
σ_w_M = 0.1
σ_w_m = 0.2
σ_v = 0.15
Q_M = (σ_w_M^2) * I(n_M)
Q_m = (σ_w_m^2) * I(n_m)
R = (σ_v^2) * I(n_y)

# Simulate T steps
T = 100
rng = Random.MersenneTwister(42)
x_M = zeros(n_M, T+1)
x_m = zeros(n_m, T+1)
y = zeros(n_y, T)

x_M[:, 1] = rand(rng, MvNormal(zeros(n_M), I(n_M)))
x_m[:, 1] = rand(rng, MvNormal(zeros(n_m), I(n_m)))

for t in 1:T
    x̄_m = mean(x_m[:, t])
    x_M[:, t+1] = A_M * x_M[:, t] + B_M .* x̄_m + rand(rng, MvNormal(zeros(n_M), Q_M))
    x_m[:, t+1] = A_m * x_m[:, t] + B_m * x_M[:, t] + rand(rng, MvNormal(zeros(n_m), Q_m))
    y[:, t] = H_M * x_M[:, t] + H_m * x_m[:, t] + rand(rng, MvNormal(zeros(n_y), R))
end

# Structure summary
println("Two-scale GLDS: macro dim=$n_M, micro dim=$n_m, obs dim=$n_y")
println("Macro decay α_M=$α_M (slow), micro decay α_m=$α_m (fast)")
Two-scale GLDS: macro dim=2, micro dim=3, obs dim=2
Macro decay α_M=0.95 (slow), micro decay α_m=0.7 (fast)

Two-scale GLDS: macro (slow) and micro (fast) latent states over time

20.4 Coarse-Graining and Emergence

When does a micro-level causal structure imply a macro-level causal structure? This question connects to renormalisation group ideas in physics: as we coarse-grain, what information is lost and what effective theory remains?

20.4.1 Coarse-Graining Operators

A coarse-graining map \(\phi: \mathcal{X}^{(m)} \to \mathcal{X}^{(M)}\) projects micro-level states to macro-level states. Common choices include:

  • Aggregation: Sum or count over micro units (e.g., total cell count)
  • Averaging: \(\bar{X}^{(M)} = \frac{1}{N}\sum_{i=1}^N X^{(m)}_i\)
  • Projection: Linear or nonlinear projection onto a lower-dimensional subspace
  • Sufficient statistics: Moments, order parameters, or other statistics that capture macro-relevant information

20.4.2 Causal Consistency

For the coarse-grained model to preserve causal semantics, the map \(\phi\) must commute with the do-operator in an appropriate sense. Formally, if we intervene on macro variable \(X^{(M)}_j\) in the macro model, the result should be consistent with intervening on the corresponding micro variables and then coarse-graining:

\[ \phi\left(\text{do}(X^{(m)} = x^{(m)})\right) \approx \text{do}\left(X^{(M)} = \phi(x^{(m)})\right) \]

This holds when \(\phi\) is a sufficient statistic for the macro dynamics and when no macro-relevant information is lost. In practice, it often fails: coarse-graining typically loses information, and the macro causal structure may differ from a simple projection of the micro structure.

20.4.3 Information Loss and Effective Theories

Coarse-graining entails information loss. The macro model is an effective theory—valid for certain questions at the macro scale, but not a complete description. The renormalisation group teaches us that effective theories can be causally consistent for a range of scales and questions, even when the full micro structure is unknown.

20.5 Cross-Scale Transportability

Transportability asks: when can causal conclusions from one domain be transferred to another? (See Chapter 8 on transportability.) Cross-scale transportability extends this to scales: when can causal conclusions at the micro scale be transported to the macro scale (or vice versa)?

20.5.1 Cross-Scale Selection Diagrams

We extend Pearl’s selection diagrams to include scale variables \(S\) that indicate which scale we observe or intervene at. Edges from \(S\) to a variable indicate that the mechanism for that variable differs across scales. For example, \(S \to Y\) means the observation process differs (we might observe \(Y^{(m)}\) in cells vs \(Y^{(M)}\) in organisms).

20.5.2 Formal Conditions for Cross-Scale Identifiability

A causal effect \(P(Y^{(M)} \mid \text{do}(A^{(M)}))\) is cross-scale identifiable from micro-level data and assumptions when:

  1. The effect can be expressed in terms of micro-level mechanisms and the coarse-graining map \(\phi\)
  2. No unobserved confounding between scale-specific variables
  3. The observation/intervention protocols at each scale are encoded in the graph

20.5.3 Example: Drug Effect from Cells to Organisms

A drug effect measured in cell culture (\(A^{(m)} \to Y^{(m)}\)) may or may not transport to organism-level outcome (\(A^{(M)} \to Y^{(M)}\)). Transportability requires:

  • The cellular mechanism is invariant (or its variation is modelled)
  • The coarse-graining from cellular response to organism response is known or identifiable
  • No scale-specific confounders (e.g., organism-level factors that affect outcome but not cell-level measurements)

20.6 Practical Implementation

20.6.1 Building a Two-Scale SSM Structure

The following code defines a struct representing the two-scale SSM and shows how to organise parameters for simulation.

# Struct for two-scale SSM (conceptual structure)
struct TwoScaleSSM
    n_M::Int
    n_m::Int
    n_y::Int
    f_M  # macro dynamics function
    f_m  # micro dynamics function
    h    # observation function
    θ_M  # macro parameters
    θ_m  # micro parameters
end

# Example: linear two-scale with aggregation
function macro_dynamics(x_M, x̄_m, θ)
    A_M, B_M, Q_M = θ
    A_M * x_M + B_M * x̄_m
end

function micro_dynamics(x_m, x_M, θ)
    A_m, B_m, Q_m = θ
    A_m * x_m + B_m * x_M
end

function observation(x_M, x_m, θ)
    H_M, H_m, R = θ
    H_M * x_M + H_m * x_m
end

# Parameter setup
n_M, n_m, n_y = 2, 3, 2
θ_M = (0.95 * I(2), 0.1 * ones(2, 3), 0.01 * I(2))
θ_m = (0.7 * I(3), 0.2 * ones(3, 2), 0.04 * I(3))
θ_h = (I(2), [0.5 0.5 0.0; 0.0 0.5 0.5], 0.02 * I(2))

model = TwoScaleSSM(n_M, n_m, n_y, macro_dynamics, micro_dynamics, observation, θ_M, θ_m)
println("Two-scale SSM: macro=$n_M, micro=$n_m, obs=$n_y")
Two-scale SSM: macro=2, micro=3, obs=2

20.6.2 Coarse-Graining a Micro-Level DAG to Macro-Level

We can represent coarse-graining of a causal graph: micro nodes aggregate into macro nodes, and edges are preserved when the corresponding macro-level influence exists.

# Coarse-graining: map micro DAG to macro DAG
# Micro: A1→B1, A2→B2, A1→A2 (3 nodes each in two groups)
# Macro: A→B with A = {A1,A2}, B = {B1,B2}

project_root = let
    current = pwd()
    while !isfile(joinpath(current, "Project.toml")) && !isfile(joinpath(current, "_quarto.yml"))
        parent = dirname(current)
        parent == current && break
        current = parent
    end
    current
end
include(joinpath(project_root, "scripts", "ensure_packages.jl"))
@auto_using Graphs

# Micro-level DAG
g_micro = SimpleDiGraph(6)
# Group 1: A1(1), B1(2); Group 2: A2(3), B2(4); cross-group A1(1)→A2(3)
add_edge!(g_micro, 1, 2)  # A1 → B1
add_edge!(g_micro, 3, 4)  # A2 → B2
add_edge!(g_micro, 1, 3)  # A1 → A2

# Partition: {1,3} → macro A, {2,4} → macro B
partition = [[1, 3], [2, 4]]

# Macro graph: edge A→B exists if any micro edge from group 1 to group 2
g_macro = SimpleDiGraph(2)
has_edge_1_to_2 = any(has_edge(g_micro, i, j) for i in partition[1], j in partition[2])
if has_edge_1_to_2
    add_edge!(g_macro, 1, 2)
end

println("Micro graph: ", ne(g_micro), " edges")
println("Macro graph: ", ne(g_macro), " edges (A→B)")
Micro graph: 3 edges
Macro graph: 1 edges (A→B)

20.6.3 Checking Cross-Scale Transportability Conditions

We sketch a function that checks whether a causal effect is identifiable across scales given a graph encoding scale-specific mechanisms.

# Check cross-scale transportability: can we express P(Y_M | do(A_M)) from micro data?
# Simplified: check if there exists a path from A to Y that doesn't go through scale-specific confounders

using Graphs

function has_causal_path(g, src, dst)
    # BFS for directed path from src to dst
    visited = falses(nv(g))
    queue = [src]
    visited[src] = true
    while !isempty(queue)
        v = popfirst!(queue)
        for w in outneighbors(g, v)
            w == dst && return true
            if !visited[w]
                visited[w] = true
                push!(queue, w)
            end
        end
    end
    false
end

# Example: macro graph A → M → Y, with S (scale) → Y (observation differs by scale)
g = SimpleDiGraph(4)  # A=1, M=2, Y=3, S=4
add_edge!(g, 1, 2)
add_edge!(g, 2, 3)
add_edge!(g, 4, 3)  # S affects Y mechanism

# Transportability: need to block S when estimating A→Y
# If S is observed, we can adjust; otherwise not transportable
println("A→Y path exists: ", has_causal_path(g, 1, 3))
println("S→Y edge (scale affects outcome): present - adjustment for S required for transportability")
A→Y path exists: true
S→Y edge (scale affects outcome): present - adjustment for S required for transportability

20.7 Connection to Structural, Dynamical, and Observable Layers

Multi-scale modelling touches all three layers at each scale:

20.7.1 Structural Layer

Multi-scale DAGs with cross-scale edges encode which macro variables influence which micro variables (top-down) and how micro variables aggregate to influence macro variables (bottom-up). Coarse-graining corresponds to moving from a finer structural description to a coarser one, while preserving causal semantics when the coarse-graining map commutes with intervention.

20.7.2 Dynamical Layer

Hierarchical SSMs provide the dynamical counterpart: scale-specific transition functions \(f^{(M)}\) and \(f^{(m)}\) with cross-scale coupling. The macro dynamics \(f^{(M)}\) depend on an aggregation of micro states; the micro dynamics \(f^{(m)}\) depend on the macro state. Timescale separation (slow/fast) reflects different rates of change at different scales.

20.7.3 Observable Layer

Measurements at different scales correspond to different observation functions \(h\). Cell-level assays observe \(Y^{(m)}\); organism-level outcomes observe \(Y^{(M)}\). The observation model \(Y_t = h(X^{(M)}_t, X^{(m)}_t) + v_t\) may mix both scales when measurements aggregate (e.g., tissue biopsy averaging over cells).

Multi-scale modelling then adds a further question: when we coarse-grain or transport causal claims, how do these layers align across scales?