2 Introduction

Status: In Progress

v0.3

2.1 Why Causal Dynamics?

This book is about modelling and reasoning in time-evolving systems using a single, explicit data-generating story: a (possibly latent) state that evolves through time, a measurement process that turns state into data (with noise and partial observability), and causal semantics that let us distinguish association from intervention and counterfactual “what if?” analysis. The focus is on practical methods for treatment policies in longitudinal health data, ecosystem recovery and intervention effort, thresholds in disease spread, and biological networks under perturbation.

Indeed, scientific models should answer not only “what will happen?” (prediction) but also “what should happen if we changed X?” (intervention) and “what would have happened for this individual under an alternative set of conditions?” (counterfactual) (Pearl 2009; Imbens and Rubin 2015). This book positions dynamical models, state-space inference, and graph theory under explicit causal semantics as developed by Judea Pearl (Pearl 2009) and his collaborators.

2.2 Why Causality Matters

Correlation is not causation. This maxim is familiar, but its implications run deeper than often appreciated. Understanding causality is essential because different questions require different levels of causal reasoning, and failing to distinguish between them leads to fundamental errors in scientific inference and decision-making.

2.2.1 The Three Levels of Reasoning

Pearl’s ladder of causation distinguishes three fundamentally different types of questions (Pearl 2009; Pearl and Mackenzie 2018; Bareinboim 2026):

Level 1: Association (Perceiving): “What will happen if I observe X?” This level deals with correlations, conditional probabilities, and predictive models. While essential for forecasting, association cannot answer interventional questions. Knowing that ice cream sales correlate with drowning deaths doesn’t tell us what happens if we ban ice cream.

Level 2: Intervention (Doing): “What will happen if I do X?” This requires causal structure to reason about the effects of actions. Scientific models must answer interventional questions to guide policy, treatment decisions, and system management. Without causal semantics, we cannot predict how interventions will affect system behaviour.

Level 3: Counterfactual (Imagining): “What would have happened for this specific unit if X had been different?” This strongest form of causal reasoning enables individualised treatment effects, attribution, and learning from mistakes. Counterfactual reasoning requires full structural causal models with explicit mechanisms.

2.2.2 Why Prediction Alone Is Insufficient

Statistical models that excel at prediction often fail catastrophically when used for intervention. A model trained to predict patient outcomes from observational data may learn spurious correlations (e.g., patients who receive treatment X have better outcomes) that reverse under intervention (e.g., giving treatment X to everyone may worsen outcomes due to confounding) (Pearl 2009; Pearl and Mackenzie 2018). Similarly, a machine learning model that predicts immunity to infection from vaccine-elicited gene expression data may learn correlations that fail when we experimentally manipulate those genes or the vaccine formulation, because the model hasn’t learned the causal mechanisms—it has only learned patterns that hold under the specific observational distribution (Pearl 2009; Pearl and Mackenzie 2018).

The fundamental problem: Observational associations reflect a mixture of causal effects, selection bias, and confounding. Without causal structure, we cannot separate these components or predict how interventions will change the system (Pearl 2009). This is particularly critical in biology, where understanding mechanisms—not just patterns—is essential for designing interventions, predicting responses to environmental change, and generalising knowledge across contexts.

2.2.3 Transportability and Data Fusion

Even when we have causal knowledge from one context, we face the transportability problem: can we generalise causal claims to new populations, settings, or time periods? Real-world causal inference often requires combining data from multiple sources: observational studies, experiments, different populations, or different time periods. Bareinboim & Pearl’s work shows that causal structure is essential for both transportability and data fusion (Pearl and Bareinboim 2014; Bareinboim and Pearl 2012, 2013, 2016).

Without causal semantics, we cannot determine which experimental results transport to observational settings, combine evidence from multiple studies with different designs, reason about when causal effects generalise across populations, or identify what additional data or experiments are needed to answer causal questions. This is particularly critical for complex systems where randomised controlled trials may be infeasible or unethical, requiring us to combine evidence from multiple sources (Bareinboim and Pearl 2016).

2.2.4 Why Causal Reasoning Matters for Complex Systems

Causal models provide the mathematical language to distinguish observation from intervention (via the \(do(\cdot)\) operator (Pearl 2009)), reason about mechanisms, handle confounding, enable counterfactual reasoning, and support transportability. For complex dynamical systems—ecosystems, economies, biological networks, social systems—causal reasoning is essential because:

Interventions are the goal: We want to know “What happens if we vaccinate earlier?” or “How does removing a keystone species affect ecosystem stability?” These are interventional questions requiring causal semantics (Pearl 2009).
Experiments are often impossible: Randomised controlled trials are infeasible for many complex systems, requiring us to reason causally from observational data, natural experiments, and multiple data sources (Bareinboim and Pearl 2016).
Systems vary across contexts: Understanding which causal mechanisms are invariant enables us to generalise knowledge across populations, settings, and time periods (Pearl and Bareinboim 2014; Bareinboim and Pearl 2012).
Feedback and dynamics create complexity: Time-varying confounding, feedback loops, and dynamic treatment strategies require causal methods that handle temporal structure (Robins 1986; Robins et al. 2000).

This book develops causal dynamical models CausalDynamics.jl that combine causal semantics with dynamical systems theory, enabling rigorous causal reasoning about complex systems that evolve through time.

2.2.5 Why Biology Demands Causal Reasoning

There is renewed interest in causality specifically in biology because biology is fundamentally the science of mechanisms—understanding how biological systems work, not merely predicting what they will do (McElreath 2020). This mechanistic focus makes causal reasoning essential, while mere correlation or prediction (of the kinds that machine learning provides) are insufficient for biological understanding (Pearl 2009; Pearl and Mackenzie 2018; McElreath 2020).

Biology asks “how” and “why” questions: How does a gene regulate protein expression? Why does a mutation cause disease? How do signalling pathways interact? These are questions about mechanisms—the causal processes that produce biological phenomena. Correlation tells us that two variables co-occur, but not how one causes the other. Prediction tells us what will happen, but not why it happens or what would happen if we intervened (Pearl 2009).

Mechanisms require causal structure: Understanding biological mechanisms means understanding causal relationships—which molecules activate which pathways, how feedback loops regulate homeostasis, how interventions propagate through networks. Structural causal models encode these mechanisms explicitly, enabling us to reason about how the system works, not just what patterns we observe (Pearl 2009).

Machine learning prediction is insufficient: Modern machine learning excels at finding patterns in high-dimensional biological data (gene expression, protein interactions, cellular states). But these predictive models often fail when used for intervention: a model that predicts disease from gene expression may learn spurious correlations that reverse under intervention. More fundamentally, prediction without mechanism is scientifically incomplete—it doesn’t tell us how the system works or enable us to design interventions (Pearl 2009; Pearl and Mackenzie 2018; McElreath 2020).

Biological systems are intervention targets: The goal in biology is often intervention—developing drugs, designing gene therapies, engineering metabolic pathways, manipulating ecosystems. These require answering “what happens if we do X?” questions that demand causal semantics. Correlation cannot tell us whether a drug will work; prediction cannot tell us how to design a therapeutic intervention (Pearl 2009).

Biological mechanisms generalise across contexts: Understanding which causal mechanisms are invariant enables us to generalise knowledge across species, cell types, and experimental conditions. This transportability is essential in biology, where we often study model organisms to understand human biology, or study cells in culture to understand tissue behaviour (Pearl and Bareinboim 2014; Bareinboim and Pearl 2012, 2013).

Biological systems exhibit feedback and dynamics: Biological systems are replete with feedback loops, time-varying processes, and dynamic interactions. Understanding these requires causal methods that handle temporal structure, feedback, and complex dependencies—precisely what causal dynamical models provide (Robins 1986; Robins et al. 2000).

Predicting effects of changing environments: Biology increasingly requires in silico experiments—counterfactual reasoning about how systems would behave under different environmental conditions, nutrient availability, temperature regimes, or ecological contexts. These questions demand counterfactual reasoning: “What would happen to this cell line if we changed the growth medium?” or “How would this ecosystem respond to climate change?” Causal models enable us to simulate alternative environments and predict system responses without conducting expensive or impossible real-world experiments. This capability is essential for understanding biological responses to environmental change, designing synthetic biological systems, and predicting ecosystem responses to perturbations (Pearl 2009).

The renewed interest in causality in biology reflects a recognition that understanding mechanisms requires causal reasoning, and that the predictive power of machine learning, while valuable, is insufficient for the mechanistic understanding that biology demands. Causal inference provides the mathematical language to move from “what patterns exist?” to “how does this system work?” and “what happens if we intervene?”—questions that are central to biological science (Pearl 2009; Pearl and Mackenzie 2018).

2.3 Why Time Matters

Most real-world systems evolve over time, and time fundamentally changes what causal questions we can ask and how we answer them:

Time creates feedback: Past outcomes affect future treatments, creating complex dependencies that static causal models cannot capture (Robins 1986; Robins et al. 2000).

Time enables interventions: Many causal questions only make sense in a temporal context—“What happens if we vaccinate earlier?” requires understanding the system’s evolution.

Time reveals mechanisms: Observing a system over time provides information about causal mechanisms that cross-sectional data cannot, showing how interventions propagate and how feedback loops operate. In biology, temporal data reveals how signalling pathways activate, how gene regulatory networks respond to perturbations, and how feedback loops maintain homeostasis—mechanistic understanding that static correlations cannot provide.

Time creates opportunities: Longitudinal data allows us to use temporal structure (lagged variables, natural experiments (Angrist 1990; Angrist and Pischke 2009), instrumental variables (Angrist et al. 1996; Angrist and Pischke 2009)) to identify causal effects that would be confounded in cross-sectional settings.

Time requires new methods: Standard causal inference methods fail with time-varying confounding, feedback, and dynamic treatment strategies (Robins 1986; Robins et al. 2000; Hernán and Robins 2020).

Time reveals attractor states: Complex dynamical systems tend toward attractor states, stable configurations toward which the system evolves (Strogatz 2014; Scheffer et al. 2009). Understanding how interventions affect these attractors (shifting equilibria, changing stability, or triggering regime transitions) is essential for predicting long-term behaviour, and for predicting the success of interventions (“How much effort would be required to bring meaningful change?”)

2.3.1 Example: Vaccination and Attractor States

Achieving herd immunity requires vaccinating a critical threshold of the population (typically 70-90% depending on disease transmissibility) (Anderson and May 1992; Keeling and Rohani 2011). Below this threshold, the system remains in the “disease-endemic” attractor state. Above it, the system transitions to a “disease-free” attractor. This illustrates how interventions can shift systems between different attractor states, stable configurations toward which the system evolves. The transition depends not just on the intervention itself, but on the system’s current state and the intervention threshold needed to trigger the regime shift.

Time enables resilience and robustness analysis: Interventions affect system resilience (ability to recover from perturbations) and robustness (ability to maintain function under varying conditions) (Ives and Carpenter 2007; Donohue et al. 2016). Causal questions about resilience require temporal analysis: “How does removing a keystone species affect ecosystem resilience?” (Paine 1969; Power et al. 1996), “How does early intervention affect recovery from later shocks?”, or “How do biological networks recover from perturbations?” Understanding these dynamics requires causal reasoning about mechanisms and temporal structure.

Time reveals intervention effort and system resistance: Complex systems exhibit varying degrees of resistance to change. Causal dynamics helps us understand intervention thresholds, controllability (can we reach desired states?) (Kalman 1960; Sontag 1998), and reachability (which states are accessible?) (Kalman 1960; Sontag 1998). This is crucial for practical intervention design.

2.3.2 Example: Conservation, Resistance, and Intervention Effort

Restoring a degraded ecosystem requires different intervention efforts depending on how far the system has deviated from its healthy attractor. A system near a tipping point may require minimal intervention to prevent regime shift, while a system that has already crossed a threshold may require substantial, sustained effort. Early intervention (when systems are still controllable and reachable) requires less effort than reversing regime shifts after they’ve occurred.

These temporal considerations—feedback, attractor states, resilience, and intervention effort—raise practical modelling questions. What do the discrete time steps in our models represent? How do latent states relate to observations? What extra assumptions are required to interpret model changes as interventions, and to define counterfactuals?

2.4 Conceptual Framework: Structural, Dynamical, Observable

To keep the book organised, we will use a simple, technical three-layer view of modelling time-evolving systems. The layers are not a claim about “what reality is”, but a way to separate (i) assumptions about causal structure, (ii) assumptions about temporal evolution, and (iii) assumptions about measurement and data.

2.4.1 Discrete Time Steps and State-Space Notation

In our CDM framework, each discrete time step \(t\) indexes the system state and observations in a standard state-space style:

\[ \begin{aligned} X_1 &\coloneqq f_1(C, U^x_1) \quad \text{(initial state)} \\ X_{t+1} &\coloneqq f(X_t, A_t, C, U^x_{t+1}) \quad \text{for } t \geq 1 \text{ (state transition)} \\ Y_t &\coloneqq h(X_t, C, U^y_t) \quad \text{(measurement model)} \end{aligned} \]

Here \(X_t\) is a (possibly latent) system state, \(Y_t\) is what we observe, \(A_t\) is an action/intervention variable when relevant, \(C\) is context, and \(U^x_t, U^y_t\) are exogenous variables capturing unmodelled variability and measurement noise.

2.4.2 Three Worlds: Structural, Dynamical, Observable

We organise the technical content into three layers:

Structural: the causal structure and invariances you assume (graphs, mechanisms, constraints, identification conditions).
Dynamical: the time-evolution model (ODEs/SDEs/state transitions, feedback, stability, attractors).
Observable: the data/measurement layer (what you observe, how it is generated from the latent state, and how you estimate/validate models).

2.4.3 The Concentric Structure of the Three Worlds

In practice, the layers are nested in the sense that observation models depend on dynamical states, and dynamical models depend on structural assumptions. The separation is mainly bookkeeping: it helps us keep track of which assumptions are doing which work.

This progression is captured mathematically by the observation model:

\[ Y_t = h(X_t, C, U^y_t) \]

The latent state \(X_t\) sits “inside” the observation model, and \(h\) specifies how \(X_t\) generates \(Y_t\) (up to measurement noise).

2.4.4 Why the Three Worlds Framework Matters for Reasoning and Modelling

This three-layer view helps with reasoning and modelling:

2.4.4.1 1. Clarifies the Nature of Latent States

Without the three-layer view: Latent states are “unobserved variables” (a purely negative definition). With the three-layer view: latent states are the internal variables the dynamical model evolves and the observation model maps into data. This clarifies why we infer them (to connect dynamics to data) and why they are only partially identifiable (different latent models can imply similar observables without additional structure).

2.4.4.2 2. Makes Sense of Counterfactuals

Without the three-layer view: Counterfactuals are “alternative outcomes” that require shared exogenous variables, but the role of that sharing can feel ad hoc. With the three-layer view: counterfactuals are defined by holding fixed a unit’s exogenous variables \(\mathbf{u}\) (what is not modelled) while changing an action/condition and propagating that change through the same structural and dynamical mechanisms.

2.4.4.3 3. Explains Why Interventions Work

Without the three-layer view: Interventions are “parameter changes” or “graph surgeries”, which can seem abstract. With the three-layer view: interventions are changes to the structural/dynamical mechanisms (or the inputs \(A_t\)) whose consequences are then pushed forward through the state transition and measurement model. This keeps clear what is changed (mechanism/input) and what is held fixed (the rest of the model).

2.4.4.4 4. Unifies Different Types of Reasoning and Mathematical Approaches

The three-layer structure unifies the three levels of Reason (rational) in Pearl’s causal hierarchy. These three levels complement the three modelling layers: while the layers organise modelling assumptions, the levels describe how we reason with those assumptions:

Level 1 (Association): Observing the Observable world—what actually manifested
Level 2 (Intervention): Modifying the model’s generating process via \(do(\cdot)\) and predicting consequences
Level 3 (Counterfactual): Replaying the model for a specific unit (fixed \(\mathbf{u}\)) under a different action/condition

All three levels are unified by the same ingredients: structural assumptions (what causes what), a dynamical model (how the system evolves), and an observation model (how data are generated).

Mathematical approaches across worlds: Just as the three levels of Reason correspond to different modes of reasoning, each of the three worlds entails different (though overlapping) mathematical approaches:

Structural: Graph theory, identification, do-calculus, transportability—tools for causal structure and invariances
Dynamical: ODEs, SDEs, regime switching, network dynamics—tools for time-dependent processes
Observable: G-methods, MSMs, TMLE, longitudinal causal inference—tools for actualised observations

These mathematical approaches overlap and build upon each other because practical models typically involve all three layers at once.

2.4.4.5 5. Enables Multi-Scale Reasoning

The three-layer structure applies at many scales (molecular to ecological) because, at any scale, we can separate causal assumptions, temporal evolution, and measurement.

2.4.4.6 6. Provides a Framework for Model Criticism

The three-layer perspective also helps with model criticism: are the structural assumptions plausible, is the dynamical model adequate, and does the measurement model match the data collection process?

2.4.5 Latent States Across Worlds: Reconstructing the Embodiment Process

Latent states \(X_t\) are internal variables in a state-space model that we infer from data via filtering/smoothing (Durbin and Koopman 2012; Särkkä 2013). The key idea is simply that we often infer a dynamical state that is not directly observed, because doing so yields better prediction, clearer mechanism, and a basis for intervention/counterfactual simulation.

2.4.6 Causal Dependencies and Graph Structure

Causal dependencies are captured by structural equations:

\[ X_{t+1} \coloneqq f(X_t, A_t, C, U^x_{t+1}) \]

This says the next state depends on the current state, the action/input, context, and exogenous noise. In the baseline formulation we take a Markov view: the next state depends only on the current state (plus action and noise). In principle, dependence can extend to multiple past time steps; we return to that generalisation—and its connection to sequence models and delay dynamics—in Beyond the Markovian baseline: history and attention.

In causal-dynamical models, selectivity appears in variable selection, Markov boundaries (Pearl 1988, 2014; Tsamardinos et al. 2003), and network communities (Newman 2018; Barabási 2016). This highlights why sparsity and modularity are often modelling assumptions, not just computational conveniences.

2.4.7 Graph Structure

The mathematical representation of causal dependencies is graph structure, specifically, directed acyclic graphs (DAGs) in structural causal models. In an SCM, we have \(\mathcal{M} = (G, U, F, P(U))\) where \(G = (V, E)\) is a DAG representing causal structure: vertices are variables and directed edges encode parent–child relationships in the structural equations.

DAGs are natural for many causal problems because causal influence is directional, and parent sets \(\text{Pa}(X_i)\) represent which variables directly enter the mechanism for \(X_i\). Different graph structures correspond to different patterns: chains, forks (common causes), colliders (convergence), and modular clusters.

Graph structure applies across scales because causal dependencies exist across scales. Graph interventions (node/edge surgeries) modify causal assumptions and change the implied interventional distributions. Graph theory provides the mathematical foundation for reasoning (d-separation, backdoor criteria) and connects to dynamics because the graph constrains which variables can influence which others in \(F\) (Pearl 2009; Spirtes et al. 2000).

2.4.8 Sparse Matrices: The Computational Bridge

The connection between graph theory, linear algebra, and structural equations becomes concrete through sparse matrices. Every directed graph \(G\) can be represented as an adjacency matrix \(A\), where \(A_{ij} = 1\) indicates an edge \(j \to i\).

In most real systems, each variable depends on only a small subset of others. This sparsity means the adjacency matrix is sparse—most entries are zero. Sparse matrix representations store only non-zero entries, making computation feasible for large systems.

When structural equations are linear, they can be written in matrix form:

\[ \mathbf{X}_{t+1} = F \mathbf{X}_t + G \mathbf{A}_t + \mathbf{U}_{t+1} \]

where \(F\) is the transition matrix. Sparsity in dependencies often means \(F\) is sparse.

The Three-Way Connection: Graph theory provides structure, linear algebra provides computation, and sparse matrices provide the bridge (efficient representation and computation).

2.4.9 State Transitions, Interventions, and Counterfactuals

State transitions are formalised through:

\[ X_{t+1} = f(X_t, A_t, C, U^x_{t+1}) \]

This transition captures how the next state depends on the current state, the action/input, context, and exogenous noise \(U^x_{t+1}\).

Interventions \(do(A_t=a)\) and structural modifications (changing mechanisms \(f\) or parameters \(\theta\)) make the causal semantics explicit: they define a modified data-generating process whose consequences we can predict.

Counterfactuals explore unit-level alternatives under different conditions:

\[ Y^{do(A = a)}(\mathbf{u}) \quad \text{vs} \quad Y^{do(A = 0)}(\mathbf{u}) \]

The counterfactual \(Y^{do(A = a)}(\mathbf{u})\) represents what would have happened for the same unit (same \(\mathbf{u}\)) under different conditions.

2.4.9.1 Why Counterfactuals Require Inner Worlds

Counterfactuals require reasoning about what could have been for a specific unit (fixed \(\mathbf{u}\)) under alternative conditions. When we infer \(P(X_t \mid Y_{1:t})\) using state-space inference (Durbin and Koopman 2012; Särkkä 2013), we infer latent states from data; counterfactual reasoning then replays the same mechanisms with a different action/condition while holding fixed the unit’s exogenous variables \(\mathbf{u}\) (Pearl 2009).

2.4.10 Terminology Note

The book sticks to standard language from causal inference, dynamical systems, and state-space modelling. When ideas can be stated plainly (state transitions, causal dependencies, exogenous noise), we do so.

2.5 The Gap We Address and Our Approach

Two communities have developed powerful but largely separate toolkits for understanding complex systems. Causal inference researchers excel at answering interventional and counterfactual questions (Pearl 2009; Imbens and Rubin 2015) but often work in static or cross-sectional settings, missing the temporal structure, feedback, and attractor dynamics that characterise complex systems. Dynamical systems modellers excel at prediction and understanding mechanisms, equilibria, and long-term behaviour (Strogatz 2014; Hirsch et al. 2012) but often lack explicit causal semantics for interventions and counterfactuals.

Real-world systems are both dynamical (evolving over time with feedback, equilibria, and attractor states) and require causal reasoning (answering interventional and counterfactual questions). We need frameworks that preserve mechanistic structure while adding causal semantics (Pearl 2009), handle time-varying confounding and feedback (Robins 1986; Robins et al. 2000; Laan and Rubin 2006), enable reasoning about attractor states and resilience (Strogatz 2014; Ives and Carpenter 2007), and support forecasting, interventional simulation, and counterfactual reasoning from a single model.

2.5.1 Example Questions That Bridge Both Worlds

“What happens if we vaccinate earlier?” — requires both causal semantics (intervention) and temporal structure (when matters)
“How would ecosystem recovery change if we remove a predator?” — requires understanding both intervention effects and system dynamics toward attractor states
“What would patient outcomes have been under a different treatment protocol?” — requires both counterfactual reasoning and temporal structure
“How much vaccination coverage is needed to achieve herd immunity?” — requires understanding intervention thresholds and attractor transitions
“Can we restore a degraded ecosystem, and how much effort would it take?” — requires reasoning about system resistance, controllability, and reachability

CDMs provide the unified framework that addresses this gap, enabling causal reasoning about complex dynamical systems.

2.5.2 Our Approach: Causal Dynamical Models

We introduce Causal Dynamical Models (CDMs) as a unified framework that bridges causal inference and dynamical systems modelling. A CDM combines:

Mechanistic process models: ODEs, SDEs, and network dynamics define the latent state evolution (Strogatz 2014; Øksendal 2013; Newman 2018)
Explicit causal semantics: Structural intervention operators (\(do(\cdot)\)) and counterfactual reasoning with shared exogenous noise (Pearl 2009)
State-space inference: Methods for learning from partial, noisy observations (filtering, smoothing, parameter learning) (Durbin and Koopman 2012; Särkkä 2013)
Graph structure: When applicable, directed graphs encode causal dependencies and determine identifiability (Pearl 2009; Spirtes et al. 2000)
Multiple query types: One CDM supports forecasting, interventional simulation, and counterfactual reasoning

2.5.3 A First Example: Protein Concentration

To illustrate the framework, consider a simple example: modelling changes in protein concentration over time under drug treatment. This biological process involves a true underlying state \(X_t\) (the actual protein concentration) that we cannot directly observe, and noisy measurements \(Y_t\) that we can observe. In the Causal Dynamical Model (CDM) framework, we distinguish between endogenous variables (determined by the model’s structural equations) and exogenous variables (external inputs, including noise). We also include a confounder \(C_t\) (e.g., disease severity) that affects both the intervention \(A_{t-1}\) and the outcome \(X_t\), creating a spurious association that must be adjusted for to identify the true causal effect. The structure of this CDM can be visualised as follows:

Figure 2.1: Causal Dynamical Model (CDM) structure showing endogenous and exogenous variables

Diagram notation:

\(f\): State transition or evolution function (e.g., \(f: A \cdot X_{t-1}\) means the state transition function uses \(A \cdot X_{t-1}\))
\(\pi\): Policy or intervention assignment function (e.g., \(\pi: \gamma \cdot C_{t-1}\) means the treatment policy assigns treatment based on \(\gamma \cdot C_{t-1}\))
\(h\): Observation or measurement function (e.g., \(h: X_t\) means the observation is generated from \(X_t\))

Key components:

Endogenous variables (blue/orange/green/yellow): \(X_t\) (latent state), \(Y_t\) (observation), \(A_t\) (action/intervention), \(C_t\) (confounder) — determined by structural equations
Exogenous variables (pink): \(U^x_t\) (process noise), \(U^y_t\) (measurement noise), \(U^c_t\) (confounder noise) — external inputs representing uncertainty and variability
Confounding: \(C_{t-1}\) creates a backdoor path \(A_{t-1} \leftarrow C_{t-1} \rightarrow X_t\), requiring adjustment for \(C_{t-1}\) to identify the causal effect of \(A_{t-1}\) on \(X_t\). Note that \(A_{t-1}\) also affects \(C_t\) (treatment reduces disease severity), but this does not create confounding for the effect of \(A_{t-1}\) on \(X_t\) since \(C_t\) does not directly affect \(X_t\) (only \(C_{t-1}\) affects \(X_t\))

Structural equations:

Confounder (endogenous): \(C_t \coloneqq \rho \cdot C_{t-1} + \alpha \cdot A_{t-1} + U^c_t\) where \(U^c_t \sim \text{Gamma}(\alpha_c, \theta_c)\) (strictly positive, ensuring disease severity remains non-negative) — e.g., disease severity that evolves over time and is affected by treatment
Intervention (endogenous, confounded): \(A_{t-1} \coloneqq \gamma \cdot C_{t-1} + U^a_{t-1}\) and \(A_t \coloneqq \gamma \cdot C_t + U^a_t\) — treatment decisions depend on disease severity at the same time point (confounding)
State transition (endogenous): \(X_t \coloneqq A \cdot X_{t-1} + B \cdot A_{t-1} + D \cdot C_{t-1} + U^x_t\) where \(U^x_t \sim \mathcal{N}(0, \sigma_w^2)\) — state depends on previous state, previous treatment, and previous confounder
Observation (endogenous): \(Y_t \coloneqq X_t + U^y_t\) where \(U^y_t \sim \mathcal{N}(0, \sigma_v^2)\)
Intervention via do-operator: \(do(A_{t-1} = a_{t-1})\) sets \(A_{t-1}\) to \(a_{t-1}\), breaking the confounder link

This CDM explicitly separates endogenous variables (determined by structural equations: \(X_t\), \(Y_t\), \(A_t\), \(C_t\)) from exogenous variables (external inputs: \(U^x_t\), \(U^y_t\), \(U^c_t\), \(U^a_t\)). The structural equations show how endogenous variables are determined: the confounder \(C_t\) (disease severity) evolves over time, the intervention \(A_{t-1}\) depends on \(C_{t-1}\) (creating confounding—treatment decisions are based on observed disease severity), the state \(X_t\) depends on the previous state \(X_{t-1}\), previous treatment \(A_{t-1}\), and previous confounder \(C_{t-1}\), and the observation \(Y_t\) is generated from the state plus measurement noise. The exogenous variables represent uncertainty and variability that are external to the system’s causal structure. The confounder creates confounding through \(C_{t-1} \rightarrow A_{t-1}\) and \(C_{t-1} \rightarrow X_t\), meaning that naive comparisons of outcomes by treatment level will be biased. To identify the true causal effect of \(A_{t-1}\) on \(X_t\), we must adjust for \(C_{t-1}\) (e.g., by stratification or regression). The intervention \(A_{t-1}\) can be set exogenously via \(do(A_{t-1} = a_{t-1})\), which breaks the confounder link and allows us to estimate the true causal effect. The simulation results are shown in Figure 2.2. This example demonstrates the core idea: we model systems that evolve through time using structural causal equations, where we can intervene (drug treatment), observe noisy measurements, adjust for confounders, and reason about what would happen under different interventions or what might have been different for a specific case (counterfactuals).

include("scripts/ensure_packages.jl")

@auto_using Distributions Statistics StableRNGs CairoMakie

# Set random seed for reproducibility
rng = StableRNG(34)

# Causal Dynamical Model (CDM) of protein concentration over time
# Endogenous: X_t (latent state), Y_t (observation), A_t (intervention), C_t (confounder)
# Exogenous: U^x_t (process noise), U^y_t (measurement noise), U^c_t (confounder noise), U^a_t (intervention noise)

"""
    simulate_cdm(x₀, c₀, A, B, D, ρ, α, γ, σ_w, σ_v, σ_c, σ_a, T; do_intervention=nothing, μ_c=0.05)

Simulate a Causal Dynamical Model (CDM) for protein concentration over time with confounder.

# Arguments

- `x₀`, `c₀`: Initial state and confounder
- `A`, `B`, `D`: State transition parameters (decay, treatment effect, confounder effect)
- `ρ`, `α`, `γ`: Confounder dynamics (persistence, treatment effect on confounder, confounding strength)
- `σ_w`, `σ_v`, `σ_c`, `σ_a`: Noise standard deviations (σ_c used for confounder noise variance)
- `T`: Number of time steps
- `do_intervention`: Optional intervention sequence. If provided, breaks confounder link.
- `μ_c`: Mean of confounder noise (default 0.1). Used with σ_c to parameterise Gamma distribution.

# Returns

- `x`, `y`, `a`, `c`: Endogenous variable sequences (state, observation, intervention, confounder)
- `u_x`, `u_y`, `u_c`, `u_a`: Exogenous noise sequences

# Structural Equations:
#   C_t := ρ·C_{t-1} + α·A_{t-1} + U^c_t  (U^c_t ~ Gamma, strictly positive)
#   A_t := γ·C_t + U^a_t  (or do(A_t = a_t) if intervention provided)
#   X_t := A·X_{t-1} + B·A_{t-1} + D·C_{t-1} + U^x_t
#   Y_t := X_t + U^y_t
"""
function simulate_cdm(x₀, c₀, A, B, D, ρ, α, γ, σ_w, σ_v, σ_c, σ_a, T; do_intervention=nothing, μ_c=0.1, rng=StableRNG(34))
    x = zeros(T)  # X_t: latent state
    y = zeros(T)  # Y_t: observation
    a = zeros(T)  # A_t: intervention
    c = zeros(T)  # C_t: confounder
    u_x = zeros(T)  # U^x_t: process noise
    u_y = zeros(T)  # U^y_t: measurement noise
    u_c = zeros(T)  # U^c_t: confounder noise
    u_a = zeros(T)  # U^a_t: intervention noise

    x[1] = max(0.0, x₀)  # Ensure initial state is non-negative
    c[1] = max(0.0, c₀)  # Ensure initial confounder is non-negative

    # Parameterise Gamma distribution for confounder noise: U^c_t ~ Gamma(α_c, θ_c)
    # Mean = α_c * θ_c = μ_c, Variance = α_c * θ_c² = σ_c²
    # Therefore: θ_c = σ_c²/μ_c, α_c = μ_c²/σ_c²
    α_c = μ_c^2 / (σ_c^2)
    θ_c = σ_c^2 / μ_c

    # First time step: A_1 := γ·C_1 + U^a_1
    if do_intervention !== nothing
        a[1] = do_intervention[1]
    else
        u_a[1] = rand(rng, Distributions.Normal(0, σ_a))
        a[1] = γ * c[1] + u_a[1]
    end
    u_y[1] = rand(rng, Distributions.Normal(0, σ_v))
    y[1] = max(0.0, x[1] + u_y[1])  # Ensure observation is non-negative

    for t in 2:T
        # C_t := ρ·C_{t-1} + α·A_{t-1} + U^c_t
        # U^c_t ~ Gamma(α_c, θ_c): strictly positive noise for disease severity
        u_c[t] = rand(rng, Distributions.Gamma(α_c, θ_c))
        # Ensure non-negativity: disease severity cannot be negative
        # (even with positive noise, α·A_{t-1} can be negative if treatment reduces severity)
        c[t] = max(0.0, ρ * c[t-1] + α * a[t-1] + u_c[t])

        # A_t := γ·C_t + U^a_t  (or do(A_t = a_t) if intervention)
        if do_intervention !== nothing
            a[t] = do_intervention[t]
        else
            u_a[t] = rand(rng, Distributions.Normal(0, σ_a))
            a[t] = γ * c[t] + u_a[t]
        end

        # X_t := A·X_{t-1} + B·A_{t-1} + D·C_{t-1} + U^x_t
        # Ensure non-negativity: protein concentrations cannot be negative
        u_x[t] = rand(rng, Distributions.Normal(0, σ_w))
        x[t] = max(0.0, A * x[t-1] + B * a[t-1] + D * c[t-1] + u_x[t])

        # Y_t := X_t + U^y_t
        # Ensure non-negativity: observed concentrations cannot be negative
        u_y[t] = rand(rng, Distributions.Normal(0, σ_v))
        y[t] = max(0.0, x[t] + u_y[t])
    end

    return x, y, a, c, u_x, u_y, u_c, u_a
end

# Example: Modelling protein expression under drug treatment with confounder
# This demonstrates the CDM structure with explicit endogenous and exogenous variables
# and shows how to adjust for confounding
#
# Structural parameters:
#   A = 0.9: protein decays at 10% per time step (90% remains)
#   B = 0.5: drug treatment increases protein concentration by 0.5 units per unit dose (causal effect)
#   D = 0.3: disease severity directly affects protein concentration (confounder effect)
#   ρ = 0.8: disease severity persistence over time
#   γ = 0.6: disease severity affects treatment decisions (confounding)
#
# Exogenous noise distributions:
#   U^x_t ~ N(0, σ_w²) where σ_w = 0.316 (σ_w² = 0.1, biological variability)
#   U^y_t ~ N(0, σ_v²) where σ_v = 0.447 (σ_v² = 0.2, experimental error)
#   U^c_t ~ Gamma(α_c, θ_c) where mean = μ_c = 0.1, variance = σ_c² = 0.05 (strictly positive, disease severity variability)
#   U^a_t ~ N(0, σ_a²) where σ_a = 0.316 (σ_a² = 0.1, treatment decision variability)

A = 0.9  # State transition parameter (decay rate)
B = 0.5  # Intervention effect strength (true causal effect of A_{t-1} on X_t)
D = 0.3  # Confounder effect strength (direct effect of C_{t-1} on X_t)
ρ = 0.8  # Confounder persistence
α = -0.2  # Treatment effect on confounder (effect of A_{t-1} on C_t, negative means treatment reduces disease severity)
γ = 0.6  # Confounding strength (how C_{t-1} affects A_{t-1})
T = 100  # Number of time steps
σ_w = sqrt(0.1)  # Exogenous process noise standard deviation
σ_v = sqrt(0.2)  # Exogenous measurement noise standard deviation
σ_c = sqrt(0.05)  # Exogenous confounder noise standard deviation
σ_a = sqrt(0.1)  # Exogenous intervention noise standard deviation

# Simulate with confounding (natural assignment: A_{t-1} depends on C_{t-1})
x_conf, y_conf, a_conf, c_conf, u_x, u_y, u_c, u_a = simulate_cdm(
    1.0, 0.5, A, B, D, ρ, α, γ, σ_w, σ_v, σ_c, σ_a, T
)

# Simulate with intervention (do-operator breaks confounder link)
a_intervention = ones(T)  # Set via do(A_{t-1} = 1) - breaks confounder link
x_interv, y_interv, a_interv, c_interv, _, _, _, _ = simulate_cdm(
    1.0, 0.5, A, B, D, ρ, α, γ, σ_w, σ_v, σ_c, σ_a, T; do_intervention=a_intervention
)

# Simulate control intervention (do(A_{t-1} = 0)) for proper comparison
a_control = zeros(T)  # Set via do(A_{t-1} = 0) - breaks confounder link
x_control, y_control, a_ctrl, c_ctrl, _, _, _, _ = simulate_cdm(
    1.0, 0.5, A, B, D, ρ, α, γ, σ_w, σ_v, σ_c, σ_a, T; do_intervention=a_control
)

# Naive estimation (without adjusting for confounder) - BIASED
# This estimates association, not causation, because of confounding
a_high_mask = a_conf .> 0.5
a_low_mask = a_conf .<= 0.5
if sum(a_high_mask) > 0 && sum(a_low_mask) > 0
    naive_effect = mean(y_conf[a_high_mask]) - mean(y_conf[a_low_mask])
else
    naive_effect = NaN
end

# Adjusted estimation (conditioning on confounder) - UNBIASED
# Adjust by stratifying on confounder levels
c_high = c_conf .> median(c_conf)
c_low = c_conf .<= median(c_conf)
c_high_a_high = c_high .& a_high_mask
c_high_a_low = c_high .& a_low_mask
c_low_a_high = c_low .& a_high_mask
c_low_a_low = c_low .& a_low_mask

if sum(c_high_a_high) > 0 && sum(c_high_a_low) > 0
    adjusted_effect_high = mean(y_conf[c_high_a_high]) - mean(y_conf[c_high_a_low])
else
    adjusted_effect_high = NaN
end

if sum(c_low_a_high) > 0 && sum(c_low_a_low) > 0
    adjusted_effect_low = mean(y_conf[c_low_a_high]) - mean(y_conf[c_low_a_low])
else
    adjusted_effect_low = NaN
end

if !isnan(adjusted_effect_high) && !isnan(adjusted_effect_low)
    adjusted_effect = (adjusted_effect_high + adjusted_effect_low) / 2
else
    adjusted_effect = NaN
end

# True causal effect (from intervention simulations)
# Compare intervention group (do(A_t = 1)) to control group (do(A_t = 0))
true_effect = mean(y_interv) - mean(y_control)

# Display results
println("CDM simulation with confounder complete!")
println("\nEndogenous variables (confounded simulation):")
println("  Final true protein concentration X[$(T)]: ", round(x_conf[end], digits=3))
println("  Final measured concentration Y[$(T)]: ", round(y_conf[end], digits=3))
println("  Final disease severity C[$(T)]: ", round(c_conf[end], digits=3))
println("  Final treatment A[$(T)]: ", round(a_conf[end], digits=3))
println("\nCausal effect estimation:")
println("  True causal effect (B parameter): ", round(B, digits=3))
println("  Naive estimate (biased by confounding): ", round(naive_effect, digits=3))
println("  Adjusted estimate (conditioning on C_t): ", round(adjusted_effect, digits=3))
println("  True effect from intervention: ", round(true_effect, digits=3))
println("\nNote: The naive estimate is biased because C_t affects both A_t and X_t.")
println("Adjusting for C_t (stratification) recovers the true causal effect.")

CDM simulation with confounder complete!

Endogenous variables (confounded simulation):
  Final true protein concentration X[100]: 1.732
  Final measured concentration Y[100]: 1.787
  Final disease severity C[100]: 0.689
  Final treatment A[100]: 0.351

Causal effect estimation:
  True causal effect (B parameter): 0.5
  Naive estimate (biased by confounding): 0.1
  Adjusted estimate (conditioning on C_t): 0.303
  True effect from intervention: 3.352

Note: The naive estimate is biased because C_t affects both A_t and X_t.
Adjusting for C_t (stratification) recovers the true causal effect.

Figure 2.2: Protein concentration over time: confounded vs interventional simulations showing the effect of adjusting for confounders

2.5.4 Three Modes of Reasoning

CDMs enable three distinct modes of causal reasoning, corresponding to Pearl’s causal hierarchy (Pearl 2009; Bareinboim and Pearl 2016). These three levels of Reason complement the three modelling layers (Structural, Dynamical, Observable): the layers organise assumptions in the model, while the levels distinguish the questions we can answer.

Level 1 (Forecasting): “What will happen?” — Conditional predictions from observational data
Level 2 (Intervention): “What will happen if we do X?” — Interventional simulations under \(do(\cdot)\)
Level 3 (Counterfactual): “What would have happened for this unit if X had been different?” — Unit-level counterfactual reasoning with shared exogenous noise

2.5.5 What CDMs Unify

CDMs bring together concepts from across the book, where each of the three layers entails different (though overlapping) mathematical approaches:

Structural (Part I): Graph theory, identification theory, do-calculus, transportability, state-space inference, filtering/smoothing, model criticism, identifiability (Pearl 2009; Spirtes et al. 2000; Durbin and Koopman 2012; Särkkä 2013)
Dynamical (Part II): ODEs, SDEs, regime switching, network dynamics (Strogatz 2014; Øksendal 2013; Hamilton 1994; Newman 2018)
Observable (Part III): G-methods, MSMs, TMLE, longitudinal causal inference (Robins 1986; Robins et al. 2000; Laan and Rubin 2006)

Just as the three levels of Reason represent different modes of reasoning, the three layers represent different kinds of modelling assumptions. In practice, the methods overlap and are often used together.

CDMs enable us to answer causal questions about complex dynamical systems—intervention effects, system resilience, intervention effort, and long-term behaviour—while maintaining the mechanistic structure that makes models interpretable and actionable. They provide a common language for causal reasoning across domains (epidemiology (Anderson and May 1992; Keeling and Rohani 2011), ecology (Ives and Carpenter 2007; Donohue et al. 2016), systems biology) while respecting the temporal structure that makes these systems complex.

The book is structured around three modelling layers (Structural, Dynamical, Observable) and three levels of Reason (association/forecasting, intervention, counterfactual) (Pearl 2009; Bareinboim and Pearl 2016).

2.6 A Unified View: Structure, Dynamics, and Learning/Decision-Making

So far we have introduced three ingredients: Pearl’s causal hierarchy and structural models (Pearl 2009; Pearl and Mackenzie 2018), and dynamical systems and attractors (Strogatz 2014; Hirsch et al. 2012). This section makes their relationships explicit and sets out how the rest of the book uses them together, including a brief, optional connection to the Free Energy Principle (FEP) and active inference as one family of tools for modelling goal-directed behaviour under uncertainty (Friston et al. 2006; Friston 2010, 2013).

2.6.1 From Abstract Structure to Executable Models

The mathematical structures we use in this book—graphs, matrices, probability models, and state-space constructions—do not describe any particular system on their own; they specify a space of possible model classes.

When we instantiate these as structural causal models, dynamical systems, or statistical estimators, we obtain executable mechanisms: models that can be simulated, fitted, and interrogated.

Note

Mathematical objects (graphs, matrices, probability models) provide reusable structure. A Causal Dynamical Model instantiates these structures as a concrete model that can be simulated, fitted, and interrogated.

2.6.2 Structural Causality: What Depends on What

Pearl’s structural causal models, graphs, d-separation, and Markov properties provide our primary language for which parts of the past matter for which aspects of the present (Pearl 2009, 1988; Spirtes et al. 2000). A Markov boundary or blanket is the minimal set of variables needed to render a target conditionally independent of the rest (Pearl 1988, 2014; Tsamardinos et al. 2003).

Backdoor, frontdoor, and transportability criteria tell us which paths must be blocked or opened in order to answer a given causal question from data. Part I of this book develops these structural tools in detail. They describe what is sufficient for identification and intervention reasoning, independently of the precise dynamical laws or measurement processes.

2.6.3 Dynamics, Attractors, and Robustness

Structural models alone do not tell us how systems unfold through time, nor how they behave under feedback, noise, and nonlinearities. For this we need dynamical systems: ODEs, SDEs, regime-switching processes, and state-space models (Strogatz 2014; Hirsch et al. 2012; Øksendal 2013; Durbin and Koopman 2012; Särkkä 2013).

Homeostasis is then not a static state but a non-equilibrium steady pattern: trajectories remain near (or return to) a stable regime under typical perturbations. Part II develops this dynamical view, showing how causal semantics, attractors, and interventions interact in deterministic and stochastic systems.

The Observable world is where systems are measured, controlled, and learned about. Here Markov blankets and the Free Energy Principle (FEP) provide one way to connect causal dynamics with estimation and control. A Markov blanket partitions variables into internal, sensory, active, and external states such that internal states are conditionally independent of external states given the blanket (Pearl 1988; Friston 2013).

Attractors and their basins give a precise language for robustness, resistance, and resilience:

Robustness: trajectories return to (or stay near) an attractor under moderate perturbations
Resistance: perturbations are strongly rejected; patterns change little even under sustained forcing
Resilience: the system can incorporate substantial perturbations, possibly moving between nearby attractors, while preserving its qualitative behaviour

2.6.4 Markov Blankets, Free Energy, and Control

Under the FEP and active inference, systems are modelled as possessing generative models of their environment and acting to minimise expected surprise (variational free energy) under those models (Friston et al. 2006; Friston 2010). We will treat this primarily as a modelling toolbox and a useful set of connections to estimation and control.

We will not turn this book into a textbook on FEP, but we will borrow elements of this framing when we discuss forecasting under intervention, policy evaluation, and decision-making in Part III.

2.6.5 Putting the Parts Together

The three Parts of the book offer successive views on the same causal-dynamical modelling problem:

Structural (Part I) foregrounds identifiability and causal structure: which variables and assumptions are needed to answer which causal questions
Dynamical (Part II) foregrounds temporal unfolding: attractors, feedback, robustness, and trajectories under intervention
Observable (Part III) foregrounds estimation and decision-making: learning models from data and using them to guide action under uncertainty

CDMs sit at the intersection of these views. They use structural assumptions to constrain dynamical models, and they use observable-layer methods (from G-methods and TMLE to active inference) to link those models to data and decision-making.

2.7 A Note on This Book

This is very much a work in progress. The motivation is first and foremost my own self-directed learning project, experimenting with AI agents to help work through the material. A secondary aim is to explore what writing a technical book in the age of LLMs could be like. The third objective is to make this experiment available to the wider community. The reader is invited to take a more active role by reaching out to AI assistants to expand or explain material—think of it as a “make your own adventure” book for this age.

2.8 Who This Book Is For

Applied scientists working with time-series, ecological, or biological systems who need to make causal claims
Statisticians and machine learning researchers interested in combining causal inference with dynamical systems
Graduate students in statistics, applied mathematics, ecology, epidemiology, or computational biology

2.9 Structure

The book is organised into four parts with 28 chapters total, applying the Pearl Causal Hierarchy (Seeing, Doing, Imagining) within each of the three worlds (Structural, Dynamical, Observable). This creates a 3×3 grid of 9 phases (3 chapters each = 27 chapters) plus 1 synthesis chapter = 28 chapters total.

Part I — Structural (Chapters 1-9)

Phase 1: Seeing in Structural (Chapters 1-3)

Chapter 1: The Causal Hierarchy and Three Worlds
Chapter 2: The Primary Unit: The Dyad
Chapter 3: Graph Theory and Causal Patterns

Phase 2: Doing in Structural (Chapters 4-6)

Chapter 4: Structural Causal Models as Executable Mechanisms
Chapter 5: Identification: When Can We Learn from Data?
Chapter 6: Do-Calculus: Rules for Interventions

Phase 3: Imagining in Structural (Chapters 7-9)

Chapter 7: Counterfactuals: Alternative Structural Scenarios
Chapter 8: Transportability: Generalising Structural Claims
Chapter 9: From Structure to Time: FEP and Attractors

Part II — Dynamical (Chapters 10-18)

Phase 4: Seeing in Dynamical (Chapters 10-12)

Chapter 10: Deterministic Dynamics: ODEs as Causal Processes
Chapter 11: Stochastic Dynamics: SDEs and Random Processes
Chapter 12: State-Space Models: Inferring Structure from Observations

Phase 5: Doing in Dynamical (Chapters 13-15)

Chapter 13: Intervening in Deterministic Systems
Chapter 14: Intervening in Stochastic Systems
Chapter 15: Advanced Dynamics: Regimes, Networks, and Resilience

Phase 6: Imagining in Dynamical (Chapters 16-18)

Chapter 16: Counterfactual Dynamics: Alternative Trajectories
Chapter 17: Sensitivity Analysis and Robustness in Dynamics
Chapter 18: From Dynamical to Observable: Measurement and Actualisation

Part III — Observable (Chapters 19-27)

Phase 7: Seeing in Observable (Chapters 19-21)

Chapter 19: Observational Methods: Learning from Data
Chapter 20: TMLE and Doubly Robust Estimation
Chapter 21: Model Validation with Observable Data

Phase 8: Doing in Observable (Chapters 22-24)

Chapter 22: Interventional Reasoning: Forecasting Under Interventions
Chapter 23: Policy Evaluation and Dynamic Treatment Strategies
Chapter 24: Causal Decision-Making

Phase 9: Imagining in Observable (Chapters 25-27)

Chapter 25: Counterfactual Reasoning: Unit-Level Alternatives
Chapter 26: Hypothesis Generation from Counterfactuals
Chapter 27: Experimental Design: Optimal Measurements

Part IV — Synthesis (Chapter 28)

Chapter 28: CDMs: The Unified Framework

Implementation sections with Julia code are integrated throughout the chapters.

2.10 A Note on Implementation

Examples and code use Julia, chosen for its excellent scientific computing ecosystem (DifferentialEquations.jl, StateSpaceModels.jl, etc.) and performance. The principles translate to other languages, but Julia’s composability makes it ideal for building complex causal-dynamical models (Rackauckas et al. 2020).

2.10.1 Code Standards and Package Management

All code examples use the project’s package management system (see scripts/ensure_packages.jl) to automatically ensure required packages are installed. Code is executable and integrated with the text. The project uses Quarto’s native Julia engine (no Jupyter required), and all code works in both Quarto preview and standalone Julia.

Mathematical notation in code: Code examples use standard mathematical notation (Greek letters, subscripts, superscripts) to match equations as closely as possible. Variable names follow mathematical notation from the text (e.g., x_t, σ_w, α, β).

2.11 How to Read This Book

This book can be read in multiple ways depending on your goals and background. We provide three guided tracks.

2.12 Practitioner Track: Build Working Causal-SSMs Quickly

Goal: Get hands-on with causal dynamical models for your application domain.

Path:

Read Notation and Modelling Conventions for the CDM formalism
Skim Part I — Structural (Chapters 1-3) for causal foundations—focus on understanding \(do(\cdot)\) and interventions
Study Part II — Dynamical (Chapter 12) for state-space inference and model criticism, with implementation sections
Deep dive into Part III — Observable (Chapters 19-27) for observational methods, interventional reasoning, counterfactual reasoning, and study design, with implementation sections
Reference Part II — Dynamical (Chapters 10-11, 13-15) for your specific dynamical system type (ODE/SDE/advanced), with implementation sections
Read Part IV (Chapter 28) to see how all pieces integrate in CDMs

Time estimate: 2-3 weeks for a working model

Skip or skim: Advanced chapters (23-27) unless relevant

2.13 Theory Track: Identification and Transportability First

Goal: Understand when causal questions are answerable from data and how to generalise claims.

Path:

Read Part I — Structural (Chapters 1-9) thoroughly—this is the foundation
- Phase 1 (Chapters 1-3): Seeing in Structural—causal hierarchy, dyad, graph theory
- Phase 2 (Chapters 4-6): Doing in Structural—SCMs, identification, do-calculus
- Phase 3 (Chapters 7-9): Imagining in Structural—counterfactuals, transportability, transition
Read Part III — Observable (Chapters 19-27) to see how identification applies to observable data
Study Part II — Dynamical (Chapters 10-12) for state-space inference and identifiability
Reference Part II — Dynamical (Chapters 13-18) and Part III — Observable (Chapters 22-27) for interventional and counterfactual reasoning

Time estimate: 4-6 weeks

Skip or skim: Implementation sections unless needed

2.14 Research Track: New Model Classes and Guarantees

Goal: Extend the framework, develop new methods, or establish theoretical guarantees.

Path:

Read the entire book sequentially (Chapters 1-28)
Pay special attention to:
- Part II — Dynamical (Chapters 10-18) for dynamics, networks, and state-space inference
- Part III — Observable (Chapters 19-27) for observational methods, interventional/counterfactual reasoning, and study design
- Part IV (Chapter 28) for synthesis of all concepts
- Appendices for reporting standards and software patterns
Study all implementation sections throughout
Work through exercises and develop your own extensions

Time estimate: 3-4 months for full mastery

Additional resources: Original papers cited throughout, especially on identification, transportability, and bounds

2.15 Cross-Cutting Themes

The framework developed in this book encompasses several interconnected themes:

Causal Dynamical Models (CDMs): A unified framework combining structural causal models with state-space inference and dynamical systems
Interventional vs Conditional Forecasting: Understanding when conditioning on treatment differs from setting treatment
Counterfactual Reasoning: Formal methods for “what would have been” queries in dynamical systems
Transportability: Generalising causal claims across domains, cohorts, and experimental protocols
Networked Dynamics: Causal reasoning about structure, interventions, and robustness in complex networks
Conceptual framing: The book uses three modelling layers (Structural, Dynamical, Observable) to organise assumptions and methods

Regardless of track, these themes appear throughout:

Worked Example boxes: Brief “Sheep system” call-outs illustrating L1/L2/L3 queries
Notation consistency: CDM formalism used throughout
Model criticism: Emphasis on diagnostics and validation
Transportability: How to defend cross-domain claims

2.16 Concept Reference: Three Layers and Three Levels

This book’s framework can be understood through two complementary dimensions: the three modelling layers (how we organise assumptions) and the three levels of Reason (how we organise causal queries). The following tables provide a quick reference for navigating concepts across these dimensions. For comprehensive reference tables, see the Concept Reference Appendix.

2.16.1 Quick Reference: Concepts by Layer

Note: The layer listed is where each concept primarily lives (structural assumptions, dynamical evolution, or observation/measurement). Many methods combine layers in practice.

Layer	Key Concepts	Mathematical Approaches
Structural	Graph theory, d-separation, Markov boundary, identification, do-calculus, counterfactuals, transportability	Graph algorithms, identification theory, do-calculus
Structural	State-space models, identifiability, FEP, Markov blanket (in addition to graph theory, identification, etc.)	Filtering, smoothing, PPCs
Dynamical	ODEs, SDEs, regime switching, resilience, robustness	ODE/SDE integration (network structure covered under Graph Theory)
Observable	CDMs, correlation analysis, conditional forecasting	Methods that operate on observed data
Observable → Structural	Interventional forecasting, counterfactual simulation, G-methods, TMLE, policy evaluation, causal representation learning	Methods using observable data to reason about structural interventions/counterfactuals
Structural → Observable	Experimental design	Structural principles applied to design observable data collection

2.16.2 Quick Reference: Concepts by Level of Reason

Level	Question	Key Methods	World(s)
L1: Association	“What will happen?”	Conditional forecasting, filtering, smoothing, d-separation	All
L2: Intervention	“What will happen if we do X?”	Do-calculus, interventional forecasting, G-methods, TMLE, policy evaluation, experimental design	Structural, Observable, Structural → Observable
L3: Counterfactual	“What would have happened for this unit?”	Counterfactual simulation, shared exogenous noise, bounds	Structural, Observable → Structural

2.16.3 Quick Reference: Methods by Layer and Level

Method	Layer	L1	L2	L3
d-separation	Structural	✓	✓	✓
Do-calculus	Structural	—	✓	✓
Kalman filter	Observable	✓	✓	✓
G-computation	Observable → Structural	—	✓	✓
TMLE	Observable → Structural	—	✓	✓
ODE integration	Observable	✓	✓	✓
Correlation analysis	Observable	✓	—	—

Legend: ✓ = Applies at this level; — = Does not apply

These quick reference tables help orient you to where concepts fit in the framework. For detailed tables including all concepts, methods, philosophical foundations, and practical workflows, see the Concept Reference Appendix.

2.17 Prerequisites by Track

Prerequisites vary by track (see How to Read This Book for details):

Practitioner track: Basic statistics, some programming, exposure to time-series or ODEs
Theory track: Strong probability/statistics background, familiarity with graphical models helpful
Research track: Graduate-level statistics/mathematics, some causal inference background recommended

Common prerequisites across all tracks:

Basic probability and statistics
Linear algebra and calculus (including matrix operations; familiarity with sparse matrices helpful but not required)
Some exposure to dynamical systems (ODEs) or state-space models
Basic programming (examples use Julia)

Note on Mathematical Connections: This book connects graph theory, linear algebra (especially sparse matrices), and structural equations. Sparse dependency graphs naturally give rise to sparse matrices, which enable efficient computation. You don’t need deep expertise in sparse matrix algorithms, but understanding that graphs can be represented as (sparse) adjacency matrices will be helpful.

If you don’t have these prerequisites, don’t worry—you can and should reach out to your AI robots for help.

No prior causal inference background is required—we build from first principles.

2.18 World-Specific Paths

Structural Focus: Chapters 1-9, then applications in later chapters
Dynamical Focus: Chapters 1-3, 9, then 10-18, then applications
Observable Focus: Chapters 1-3, 9, 12, 18, then 19-27

2.19 Notation and Modelling Conventions

This section introduces the book’s unified notation for Causal Dynamical Models (CDMs), including interventions \(do(\cdot)\), policies \(do(\pi)\), and unit-level counterfactuals with shared exogenous noise.

2.19.1 Causal Dynamical Model (CDM)

A CDM is an SCM (Pearl 2009) whose endogenous variables are time-indexed and partitioned into latent process variables and observations, with explicit intervention operators and counterfactual semantics.

Alternative name: Causal State-Space Model (CSSM) when emphasising the state-space inference structure; DSCM (Dynamic SCM) when emphasising structural equations.

2.19.2 Core Object

A CDM is a tuple: \[ \mathcal{M} = \big(G,\; U,\; F,\; P(U)\big) \]

where:

\(G\): A directed acyclic graph (DAG) encoding direct causal dependencies. Formally, \(G = (V, E)\) where \(V\) is the set of variables (vertices) and \(E\) is the set of directed edges. The graph structure determines the topology of causal dependencies: an edge \(X_i \rightarrow X_j\) means \(X_i\) is a parent of \(X_j\) in the structural assignment for \(X_j\). For time-indexed systems, \(G\) typically encodes temporal dependencies (past variables influence future ones) and may include spatial/network structure (which nodes influence which others). The graph \(G\) can be sparse, enabling scalable computation.
\(U\): The set of exogenous variables (noise terms) representing unmodelled variation. These include:
- \(U^x_t\): Process noise for state dynamics
- \(U^y_t\): Observation noise for measurements (measurement error)
- \(U^a_t\): Action/treatment noise (only present if \(\mathbf{A}_t\) is generated by a stochastic behavioural policy \(\pi\); if \(\mathbf{A}_t\) is purely set via intervention \(do(\mathbf{A}_t = \mathbf{a}_t)\), there is no noise)
- \(U^1\): Initial state noise (for the first occasion)
The exogenous variables make stochasticity explicit and structural, enabling Pearl’s causal semantics for interventions and counterfactuals.
\(F\): The set of structural assignments (functions) encoding the mechanisms that generate endogenous variables. Each assignment has the form: \[ X_i \coloneqq f_i(\text{Pa}(X_i), U_i) \] where \(\text{Pa}(X_i)\) are the parents of \(X_i\) in \(G\), and \(f_i\) specifies how parents and noise combine to generate \(X_i\). For time-indexed systems, \(F\) includes:
- \(f_1\): Initial state assignment
- \(f\): State transition function
- \(h\): Observation function (how occasions manifest as observations)
- \(\pi\): Optional policy function (if the action variable \(\mathbf{A}_t\) is generated by a behavioural policy rather than set via intervention)
\(P(U)\): The joint distribution over exogenous variables. This distribution is typically factorised as: \[ P(U) = P(U^1) \prod_{t=1}^{T} P(U^x_t) P(U^y_t) P(U^a_t) \] (where some factors may be absent if certain variables are deterministic). The distribution \(P(U)\) enables probabilistic reasoning while maintaining the structural semantics: interventions modify \(F\) (the assignments), not \(P(U)\) (the noise distribution), preserving the distinction between causal structure and stochasticity.

The tuple \(\mathcal{M} = (G, U, F, P(U))\) provides a complete specification of a CDM: the graph \(G\) tells us which variables directly affect which others, the assignments \(F\) specify the mechanisms, \(U\) specifies the exogenous variation, and \(P(U)\) gives the probabilistic structure. Together, they support association (conditioning), intervention (modifying assignments), and counterfactual reasoning (unit-level alternatives under shared \(\mathbf{u}\)) (Pearl 2009).

2.19.3 Variable Sets (Time-Indexed)

For \(t = 1,2,\dots,T\):

Latent/process state: \(\mathbf{X}_t \in \mathbb{R}^d\)
Action/treatment variable (possibly empty): \(\mathbf{A}_t \in \mathcal{A}\) — a variable that can be set via intervention \(do(\mathbf{A}_t = \mathbf{a}_t)\) or generated by a policy \(\pi(\mathbf{H}_t, \mathbf{C}, \mathbf{U}^a_t)\)
Observation: \(\mathbf{Y}_t \in \mathcal{Y}\)
Optional context/domain variables (time-invariant): \(\mathbf{C}\)

Exogenous noises:

\(\mathbf{U}^x_t\) for state dynamics
\(\mathbf{U}^y_t\) for observation/measurement
\(\mathbf{U}^a_t\) for action/treatment assignment (if \(\mathbf{A}_t\) is generated by a stochastic policy rather than purely set via intervention)

Action vs Intervention: Terminology Note

\(\mathbf{A}_t\) is the “action/treatment variable”—a variable that can be either:

Set via intervention: \(do(\mathbf{A}_t = \mathbf{a}_t)\) (an experimenter sets it to a specific value)
Generated by a policy: \(\mathbf{A}_t \coloneqq \pi(\mathbf{H}_t, \mathbf{C}, \mathbf{U}^a_t)\) (a behavioural policy generates it, possibly stochastically)

“Intervention” is the broader concept that includes:

Action/treatment interventions: \(do(\mathbf{A}_t = \mathbf{a}_t)\) — setting the action variable
Mechanism interventions: \(do(f \leftarrow f^\star)\) — modifying the state transition function
Parameter interventions: \(do(\theta \leftarrow \theta^\star)\) — modifying model parameters

Why “action/treatment”? We use both terms to bridge communities: “action” is standard in reinforcement learning and control theory, while “treatment” is standard in causal inference and epidemiology. Both refer to the same variable \(\mathbf{A}_t\) that can be intervened upon.

Action noise \(U^a_t\): This is only present when \(\mathbf{A}_t\) is generated by a stochastic policy. If \(\mathbf{A}_t\) is purely set via intervention (no policy), there is no \(U^a_t\) noise.

2.19.4 Structural Assignments (Markovian Baseline)

A compact default form: \[ \begin{aligned} \mathbf{X}_1 &\coloneqq f_1(\mathbf{C}, \mathbf{U}^x_1) \quad \text{(initial state)} \\ \mathbf{A}_t &\coloneqq \pi(\mathbf{H}_t, \mathbf{C}, \mathbf{U}^a_t) \quad \text{(optional; behavioural policy generating actions)}\\ \mathbf{X}_{t+1} &\coloneqq f(\mathbf{X}_t, \mathbf{A}_t, \mathbf{C}, \mathbf{U}^x_{t+1}) \quad \text{for } t \geq 1 \\ \mathbf{Y}_t &\coloneqq h(\mathbf{X}_t, \mathbf{C}, \mathbf{U}^y_t) \end{aligned} \] where \(\mathbf{H}_t \coloneqq (\mathbf{Y}_{1:t}, \mathbf{A}_{1:t-1})\) is the observed history.

This covers:

ODE/SDE discretisations (via \(f\))
Switching models (include a discrete latent \(\mathbf{S}_t\) inside \(\mathbf{X}_t\))
Networked systems (define components per node; see below)

2.19.5 Beyond the Markovian baseline: history and attention

The baseline above is Markovian in state: \(\mathbf{X}_{t+1}\) depends only on \(\mathbf{X}_t\) (plus \(\mathbf{A}_t\), \(\mathbf{C}\), noise). That is often justified when the current state is a sufficient statistic for the future—it summarises everything from the past that matters. In many systems, however, the past can matter in ways that are not fully compressed into \(\mathbf{X}_t\): long delays, distributed memory, or dependencies that do not have a compact state representation.

Sequence models and attention. Large language models (LLMs) and related architectures (e.g. transformers) make the next output depend on all previous positions in the sequence, via attention: each position can “attend to” earlier positions with learned weights (Vaswani et al. 2017). So the effective dependence is not “only the previous state” but a weighted combination of many past states. In the minimal formulation (e.g. microgpt by Andrej Karpathy—a dependency-free Python GPT; a Julia version with a dynamics demo is in scripts/microgpt.jl), the next-token distribution is a function of the full preceding context, with attention determining which parts of that context matter.

One way to interpret this in dynamical-systems terms is learned dependency selection over history: instead of committing to a fixed lag structure, the model learns which past states are predictive in the current context. The Markovian baseline corresponds to dependence on the immediate predecessor; fixed-delay dynamics (e.g. delay differential equations, Stochastic dynamics) correspond to dependence on one or a few fixed lags; attention corresponds to learned weights over many lags.

Implications for causal dynamics.

When Markov is enough: If the process has a compact sufficient statistic (e.g. many dynamical systems, well-specified state-space models), the Markovian form \(\mathbf{X}_{t+1} = f(\mathbf{X}_t, \ldots)\) is appropriate and keeps inference and intervention tractable.
When history matters explicitly: If the system has long-range dependencies, delayed feedback, or no compact state, we may need either (i) richer state (e.g. including lagged values or a memory buffer) so that the expanded state is again Markovian, or (ii) explicit history dependence in the transition, e.g. \(\mathbf{X}_{t+1} = f(\mathbf{X}_t, \mathbf{X}_{t-1}, \ldots, \mathbf{X}_{t-\ell}, \mathbf{A}_t, \ldots)\) or delay differential equations with fixed lags, or (iii) attention-like mechanisms where the next state or output is a learned function of a weighted combination of past states—useful when the relevant past is not a fixed window but varies with context.

Why go beyond Markovian assumptions? The literature on state-space models, POMDPs, and causal time series gives several well-established reasons (attention is one way to accommodate them; others include belief-state filters, higher-order Markov models, or expanding the state). Partial observability is the most direct: we typically observe \(\mathbf{Y}_t\), not the latent state \(\mathbf{X}_t\). The current observation is then rarely a sufficient statistic for the state; optimal prediction and control are history-dependent (e.g. in POMDPs, policies map observation histories or belief states to actions) (e.g. Koller and Friedman 2009). Incomplete or aggregated state: if causally relevant state variables are missing or collapsed (e.g. through dimension reduction), the one-step transition property can fail and multi-step dependencies appear (e.g. Araujo et al. 2025). Violations of independence of exogenous variables: causal and state-space models usually assume exogenous noise \(\mathbf{U}^x_t\), \(\mathbf{U}^y_t\) is independent across time (or has known structure). When unmeasured confounders persist or process noise is serially correlated, the Markov state is no longer sufficient; conditioning on a longer history can absorb that residual dependence. Observation model misspecification or temporally dependent observation noise: if the observation process is wrong or observation noise is non-Markov, the single-step summary of the past is inadequate; history-dependent prediction can partly compensate. Nonstationarity: when causal strengths or noise variances change over time, the invariant one-step Markov representation may break down (e.g. Shen et al. 2019). In all these cases, letting the next-step distribution depend on a learned weighting of the past (e.g. via attention) relaxes the demand that one step compresses everything that matters—without assuming the underlying violations away.

Computationally efficient alternatives. Given this book’s focus on causal dynamics and state-space inference, several options are often more efficient than full attention over raw history. When the latent process is Markov but we have partial observability, the standard and efficient solution is belief-state filtering: maintain \(P(\mathbf{X}_t \mid \mathbf{Y}_{1:t})\) as the sufficient statistic for the observation history. The Kalman filter updates this in \(O(d^2)\) per step (state dimension \(d\)); particle filters in \(O(N)\) per step for \(N\) particles. No storage or attention over the full history is needed—the belief state is the compressed history. This is the core of the state-space inference material in the book (State-space models). When the process itself is not first-order Markov (e.g. higher-order dynamics, serially correlated noise), alternatives to \(O(n^2)\) attention include: expanding the state to \((\mathbf{X}_t, \mathbf{X}_{t-1}, \ldots, \mathbf{X}_{t-\ell})\) so the expanded process is Markov—cost is \(O(\ell)\) per step; recurrent state (e.g. RNN/LSTM), which compresses history into a fixed-size hidden state in \(O(1)\) per step but is less interpretable for causal reasoning; and structured sequence models (e.g. linear attention, S4, Mamba), which achieve \(O(n)\) or subquadratic cost for long sequences while retaining a recurrent or convolutional structure. For causal modelling and intervention, belief-state filtering and expanded-state formulations also keep the graph and \(do(\cdot)\) semantics explicit, whereas black-box attention over observations does not.

The policy \(\mathbf{A}_t = \pi(\mathbf{H}_t, \ldots)\) is already history-dependent (it can use the full observed history \(\mathbf{H}_t\)). The question is whether the process dynamics should also depend on more than \(\mathbf{X}_t\). For the bulk of this book we keep the Markovian process model; the ideas above suggest where and how to relax it when the problem demands it. Runnable examples—belief-state filtering and expanded-state (AR(2)) in State-space models, and a Markov-vs-attention comparison on discretised dynamics in Stochastic dynamics—and the script scripts/microgpt.jl (inspired by Andrej Karpathy’s microgpt) are in the relevant chapters and the repo.

2.19.6 Distributions Induced by the CDM

Even though the model is written as assignments, it induces the usual probabilistic factorisation: \[ P(\mathbf{y}_{1:T}) = \int P(\mathbf{u}) \prod_{t=1}^{T} \delta\!\big(\mathbf{x}_t - F_t(\cdot)\big)\; \delta\!\big(\mathbf{y}_t - H_t(\cdot)\big)\; d\mathbf{u}\, d\mathbf{x} \]

For practical work, you usually work with the implied conditional densities (SSM form): \[ P(\mathbf{x}_1)\prod_{t=1}^{T-1} P(\mathbf{x}_{t+1}\mid \mathbf{x}_t,\mathbf{a}_t)\prod_{t=1}^{T} P(\mathbf{y}_t\mid \mathbf{x}_t) \] with the understanding that these are shorthand for structural equations + noise.

2.19.7 Interventions (Pearl Level 2) in CDMs

Use structural interventions explicitly:

Action/treatment intervention: \(do(\mathbf{A}_t = \mathbf{a}_t)\) replaces the assignment for \(\mathbf{A}_t\) by a constant, setting the action variable to a specific value (Pearl 2009).
Mechanism/parameter intervention: \(do(f \leftarrow f^\star)\) or \(do(\theta \leftarrow \theta^\star)\) replaces the state transition mechanism or its parameters (this is often the right abstraction for “vaccination changes immune dynamics”) (Anderson and May 1992).

Notation:

Interventional distribution: \[ P^{do(\mathbf{a}_{1:T-1})}_{\mathcal{M}}(\mathbf{y}_{1:T}) \]
Policy intervention (dynamic strategy): \[ do(\pi):\;\mathbf{A}_t \coloneqq \pi(\mathbf{H}_t) \]

2.19.8 Counterfactuals (Pearl Level 3) in CDMs

For a fixed exogenous realisation \(\mathbf{u}\), define the counterfactual trajectory under intervention \(\iota\) (e.g., \(do(\mathbf{A}=\mathbf{a})\) or \(do(\pi)\)): \[ \mathbf{Y}^{\iota}_{1:T}(\mathbf{u}) \]

Population counterfactual quantities average over \(P(\mathbf{U})\), and unit-level counterfactuals condition on evidence \(E=e\) (implemented by inferring a posterior over exogenous/noise or initial state consistent with \(e\)).

2.19.9 Networked CDMs

Let nodes be \(i \in \mathcal{V}\). Node state \(X^i_t\). Parent set \(Pa(i)\) from a directed interaction graph \(G = (V, E)\) where \(V\) is the set of vertices (variables) and \(E\) is the set of directed edges.

Structural dynamics: \[ X^{i}_{t+1} \coloneqq f_i\!\Big(X^{i}_t,\; \{X^{j}_t: j\in Pa(i)\},\; A_t,\; C,\; U^{i}_{t+1}\Big) \]

The graph \(G\) encodes direct dependencies: which nodes directly affect which others. The parent set \(Pa(i)\) lists the direct inputs to node \(i\)’s update rule. The function \(f_i\) specifies how those inputs (plus noise) generate the next state \(X^i_{t+1}\).

This is the clean bridge between:

ecological/physiological interaction networks (Newman 2018; Barabási 2016)
networked dynamical simulation (Barrat et al. 2008; Porter and Gleeson 2016)
causal reasoning about node/edge interventions (remove node, cut edge, change coupling) (Pearl 2009)
Whitehead (1978): Process and Reality — philosophical background (optional)

2.19.10 Quick Notation Reference

Symbol	Meaning	Notes
\(\mathbf{X}_t\)	latent/process state	may include continuous + discrete components
\(\mathbf{Y}_t\)	observation	measurement mechanism is explicit via \(h\)
\(\mathbf{A}_t\)	action/treatment variable	can be set by intervention \(do(\cdot)\) or generated by a behavioural policy \(\pi\)
\(\mathbf{C}\)	context/domain variables	age, site, year; used for transportability
\(\mathbf{U}^x_t,\mathbf{U}^y_t\)	exogenous noise	makes stochasticity explicit (SCM semantics)
\(do(\cdot)\)	intervention operator	replaces an assignment/mechanism
\(\pi\)	policy (dynamic intervention rule)	\(\mathbf{A}_t \coloneqq \pi(\mathbf{H}_t)\)
\(\iota\)	generic intervention descriptor	mechanism edit, action set, policy, edge cut
\(\mathbf{Y}^{\iota}(\mathbf{u})\)	counterfactual outcome	“same noise, different world”

2.19.11 Why This Notation Is Useful

This unified notation:

Keeps Pearl semantics (assignments + interventions + counterfactuals) explicit (Pearl 2009)
Stays compatible with state-space computation (transition/observation densities) (Durbin and Koopman 2012; Särkkä 2013)
Naturally accommodates networks (Newman 2018), switching/regimes (Hamilton 1994), and domain shift (\(\mathbf{C}\)) (Bareinboim and Pearl 2013; Pearl and Bareinboim 2014)
Clarifies a frequent ambiguity in applied work: whether “treatment” is merely conditioned on or structurally set/changes mechanisms (Robins et al. 2000; Hernán and Robins 2020)

2.20 References

Aitchison, John. 1986. The Statistical Analysis of Compositional Data. Chapman & Hall.

Allen, Linda J. S. 2007. An Introduction to Stochastic Processes with Applications to Biology. 2nd ed. Pearson Prentice Hall.

Anderson, Roy M., and Robert M. May. 1992. Infectious Diseases of Humans: Dynamics and Control. Oxford University Press.

Angrist, Joshua D. 1990. “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records.” American Economic Review 80 (3): 313–36.

Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91 (434): 444–55.

Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.

Araujo, Wellington, Luiz Oliveira, Marcio Machado, Anderson Nascimento, et al. 2025. “Quantifying First-Order Markov Violations in Noisy Reinforcement Learning: A Causal Discovery Approach.” arXiv Preprint.

Arnold, Vladimir I. 2012. Ordinary Differential Equations. 3rd ed. Springer.

Avin, Chen, Ilya Shpitser, and Judea Pearl. 2005. “Identifiability of Path-Specific Effects.” Proceedings of the 19th International Joint Conference on Artificial Intelligence, 357–63.

Barabási, Albert-László. 2016. Network Science. Cambridge University Press.

Bareinboim, Elias. 2026. Causal AI.

Bareinboim, Elias, and Judea Pearl. 2012. “Causal Transportability with Limited Experiments.” Proceedings of the 27th AAAI Conference on Artificial Intelligence, 95–101.

Bareinboim, Elias, and Judea Pearl. 2013. “A General Algorithm for Deciding Transportability of Experimental Results.” Journal of Causal Inference 1 (1): 107–34.

Bareinboim, Elias, and Judea Pearl. 2016. “Causal Inference and the Data-Fusion Problem.” Proceedings of the National Academy of Sciences 113 (27): 7345–52.

Barrat, Alain, Marc Barthélemy, and Alessandro Vespignani. 2008. Dynamical Processes on Complex Networks. Cambridge University Press.

Battiston, Federico, Giulia Cencetti, Iacopo Iacopini, et al. 2020. “Networks Beyond Pairwise Interactions: Structure and Dynamics.” Physics Reports 874: 1–92. https://doi.org/10.1016/j.physrep.2020.05.004.

Bellman, Richard. 1957. Dynamic Programming. Princeton University Press.

Benson, Austin R., Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg. 2018. “Simplicial Closure and Higher-Order Link Prediction.” Proceedings of the National Academy of Sciences 115 (48): E11221–30. https://doi.org/10.1073/pnas.1800683115.

Bertsekas, Dimitri P. 2019. Reinforcement Learning and Optimal Control. Athena Scientific.

Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association 112 (518): 859–77.

Bonabeau, Eric. 2002. “Agent-Based Modeling: Methods and Techniques for Simulating Human Systems.” Proceedings of the National Academy of Sciences 99 (suppl 3): 7280–87.

Bongers, Stephan, Patrick Forré, Jonas Peters, and Joris M. Mooij. 2021. “Foundations of Structural Causal Models with Cycles and Latent Variables.” Annals of Statistics 49 (5): 2885–915.

Brauer, Fred, Carlos Castillo-Chavez, and Zhilan Feng. 2019. Mathematical Models in Epidemiology. Springer.

Chaloner, Kathryn, and Isabella Verdinelli. 1995. “Bayesian Experimental Design: A Review.” Statistical Science 10 (3): 273–304.

Chen, Ricky T. Q., Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. 2018. “Neural Ordinary Differential Equations.” Advances in Neural Information Processing Systems 31.

Chickering, David Maxwell. 2002. “Optimal Structure Identification with Greedy Search.” Journal of Machine Learning Research 3: 507–54.

Chung, Junyoung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C. Courville, and Yoshua Bengio. 2015. “A Recurrent Latent Variable Model for Sequential Data.” Advances in Neural Information Processing Systems 28.

Dahabreh, Issa J., Sarah E. Robertson, Jon A. Steingrimsson, Elizabeth A. Stuart, and Miguel A. Hernán. 2019. “Extending Inferences from a Randomized Trial to a New Target Population.” Statistics in Medicine 39 (14): 1999–2014.

Datseris, George, Ali Vahdati, and Timothy C. DuBois. 2022. “Agents.jl: A Performant and Feature-Full Agent-Based Modeling Software of Minimal Code Complexity.” SIMULATION 98 (3): 191–208. https://doi.org/10.1177/00375497211068820.

Donohue, Ian, Helmut Hillebrand, José M. Montoya, et al. 2016. “Navigating the Complexity of Ecological Stability.” Ecology Letters 19 (9): 1172–85.

Doucet, Arnaud, Nando de Freitas, and Neil Gordon. 2001. “Sequential Monte Carlo Methods in Practice.” Statistics for Engineering and Information Science (New York).

Durbin, James, and Siem Jan Koopman. 2012. Time Series Analysis by State Space Methods. 2nd ed. Oxford University Press.

Eberhardt, Frederick, Clark Glymour, and Richard Scheines. 2007. “On the Number of Experiments Required to Identify the Causal Structure of a System.” Journal of Machine Learning Research 8: 2651–98.

Fairbanks, James. 2025. “Going Beyond Graphs: Simplicial, Hyper, and Relational Structure.” JuliaCon Global 2025.

Friston, Karl. 2010. “The Free-Energy Principle: A Unified Brain Theory?” Nature Reviews Neuroscience 11 (2): 127–38.

Friston, Karl. 2013. “Life as We Know It.” Journal of The Royal Society Interface 10 (86): 20130475.

Friston, Karl, James Kilner, and Lee Harrison. 2006. “A Free Energy Principle for the Brain.” Journal of Physiology-Paris 100 (1–3): 70–87.

Frühwirth-Schnatter, Sylvia. 2006. Finite Mixture and Markov Switching Models. Springer.

Fudenberg, Drew, and Jean Tirole. 1991. Game Theory. MIT Press.

Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2019. “Visualization in Bayesian Workflow.” Journal of the Royal Statistical Society: Series A 182 (2): 389–402.

Gardiner, Crispin. 2009. Stochastic Methods: A Handbook for the Natural and Social Sciences. 4th ed. Springer.

Ge, Hong, Kai Xu, Will Tebbutt, Mohamed Tarek, Martin Trapp, et al. 2024. Turing.jl: Bayesian Inference with Probabilistic Programming. V. 0.30. Released. https://turing.ml/.

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3rd ed. Chapman & Hall/CRC.

Gilbert, Nigel. 2008. Agent-Based Models. Quantitative Applications in the Social Sciences. SAGE Publications.

Granger, Clive W. J. 1969. “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods.” Econometrica 37 (3): 424–38.

Guckenheimer, John, and Philip Holmes. 1983. Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields. Springer.

Hamilton, James D. 1994. Time Series Analysis. Princeton University Press.

Hart, David Bentley. 2024. All Things Are Full of Gods: The Mysteries of Mind and Life. Yale University Press.

Hernán, Miguel A., and James M. Robins. 2020. Causal Inference: What If. Chapman & Hall/CRC.

Hethcote, Herbert W. 2000. “The Mathematics of Infectious Diseases.” SIAM Review 42 (4): 599–653.

Hirsch, Morris W., Stephen Smale, and Robert L. Devaney. 2012. Differential Equations, Dynamical Systems, and an Introduction to Chaos. 3rd ed. Academic Press.

Holling, Crawford S. 1973. “Resilience and Stability of Ecological Systems.” Annual Review of Ecology and Systematics 4: 1–23.

Imai, Kosuke, Luke Keele, and Teppei Yamamoto. 2010. “Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects.” Statistical Science 25 (1): 51–71.

Imbens, Guido W., and Donald B. Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.

Ives, Anthony R., and Stephen R. Carpenter. 2007. “Stability and Diversity of Ecosystems.” Science 317 (5834): 58–62.

Kalman, Rudolf E. 1960. “On the General Theory of Control Systems.” Proceedings First International Congress on Automatic Control 1: 481–93.

Keeling, Matt J., and Pejman Rohani. 2011. Modeling Infectious Diseases in Humans and Animals. Princeton University Press.

Kermack, William Ogilvy, and Anderson G. McKendrick. 1927. “A Contribution to the Mathematical Theory of Epidemics.” Proceedings of the Royal Society of London. Series A 115 (772): 700–721.

Khalil, Hassan K. 2002. Nonlinear Systems. 3rd ed. Prentice Hall.

King, Ruth, Perry de Valpine, and Rachel S. McCrea. 2016. “Statistical Ecology.” Annual Review of Statistics and Its Application 3: 401–26.

Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” arXiv Preprint arXiv:1312.6114.

Kirk, Geoffrey S., John E. Raven, and Malcolm Schofield. 1983. The Presocratic Philosophers: A Critical History with a Selection of Texts. 2nd ed. Cambridge University Press.

Kitano, Hiroaki. 2004. “Biological Robustness.” Nature Reviews Genetics 5 (11): 826–37.

Koller, Daphne, and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press.

Krishnan, Rahul G., Uri Shalit, and David Sontag. 2017. “Structured Inference Networks for Nonlinear State Space Models.” AAAI Conference on Artificial Intelligence.

Laan, Mark J. van der, Eric C. Polley, and Alan E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).

Laan, Mark J. van der, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer.

Laan, Mark J. van der, and Daniel Rubin. 2006. “Targeted Maximum Likelihood Learning.” The International Journal of Biostatistics 2 (1).

Liu, Xuanqing, Si Si, Wei Cao, Sanjiv Kumar, and Cho-Jui Hsieh. 2019. “Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise.” arXiv Preprint arXiv:1906.02355.

Manski, Charles F. 2003. Partial Identification of Probability Distributions. Springer.

Massey, James L. 1990. “Causality, Feedback and Directed Information.” Proceedings of the 1990 International Symposium on Information Theory and Its Applications (Hawaii, USA), 303–5.

McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. 3rd ed. Routledge.

Miao, Hongyu, Xiaoyue Xia, Alan S. Perelson, and Hulin Wu. 2011. “On Identifiability of Nonlinear ODE Models and Applications in Viral Dynamics.” SIAM Review 53 (1): 3–39.

Morris, Luke, Andrew Baas, Evan Arias, Micah Gatlin, Evan Patterson, and James Fairbanks. 2024. “Decapodes: A Diagrammatic Tool for Representing, Composing, and Computing Spatialized Partial Differential Equations.” Journal of Computational Science 81: 102345. https://doi.org/10.1016/j.jocs.2024.102345.

Murphy, Susan A. 2003. “Optimal Dynamic Treatment Regimes.” Journal of the Royal Statistical Society: Series B 65 (2): 331–55.

Newman, Mark E. J. 2018. Networks. 2nd ed. Oxford University Press.

Nowak, Martin A. 2006. Evolutionary Dynamics: Exploring the Equations of Life. Harvard University Press.

Øksendal, Bernt. 2013. Stochastic Differential Equations: An Introduction with Applications. 6th ed. Springer.

Paine, Robert T. 1969. “A Note on Trophic Complexity and Community Stability.” The American Naturalist 103 (929): 91–93.

Pearl, Judea. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

Pearl, Judea. 2009. Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge University Press.

Pearl, Judea. 2014. “Probabilistic and Causal Inference: The Works of Judea Pearl.” ACM Turing Award Lecture.

Pearl, Judea, and Elias Bareinboim. 2014. “External Validity: From Do-Calculus to Transportability Across Populations.” Statistical Science 29 (4): 579–95.

Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books.

Peters, Jonas, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.

Porter, Mason A., and James P. Gleeson. 2016. Dynamical Systems on Networks: A Tutorial. Springer.

Power, Mary E., David Tilman, James A. Estes, et al. 1996. “Challenges in the Quest for Keystones.” BioScience 46 (8): 609–20.

Puterman, Martin L. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 2nd ed. Wiley.

Rackauckas, Christopher. 2026. Parallel Computing and Scientific Machine Learning (SciML): Methods and Applications. https://doi.org/10.5281/zenodo.6917234.

Rackauckas, Christopher, Yingbo Ma, Julius Martensen, et al. 2020. “Universal Differential Equations for Scientific Machine Learning.” arXiv Preprint arXiv:2001.04385. https://arxiv.org/abs/2001.04385.

Rackauckas, Christopher, and Qing Nie. 2017. “DifferentialEquations.jl – a Performant and Feature-Rich Ecosystem for Solving Differential Equations in Julia.” Journal of Open Research Software 5 (1): 15. https://doi.org/10.5334/jors.151.

Raue, Andreas, Clemens Kreutz, Thomas Maiwald, et al. 2009. “Structural and Practical Identifiability Analysis of Partially Observed Dynamical Models by Exploiting the Profile Likelihood.” Bioinformatics 25 (15): 1923–29.

Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. 2014. “Stochastic Backpropagation and Approximate Inference in Deep Generative Models.” International Conference on Machine Learning, 1278–86.

Richardson, Thomas S., and James M. Robins. 2013. “Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality.” Center for the Statistics and the Social Sciences, University of Washington Series, no. 128.

Robins, James M. 1986. “A New Approach to Causal Inference in Mortality Studies with a Sustained Exposure Period—Application to Control of the Healthy Worker Survivor Effect.” Mathematical Modelling 7 (9–12): 1393–512.

Robins, James M., Miguel A. Hernán, and Babette Brumback. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” Epidemiology 11 (5): 550–60.

Rosenbaum, Paul R. 2002. Observational Studies. 2nd ed. Springer.

Rothman, Kenneth J., Sander Greenland, and Timothy L. Lash. 2021. Modern Epidemiology. 4th ed. Lippincott Williams & Wilkins.

Runge, Jakob, Sebastian Bathiany, Erik Bollt, et al. 2019. “Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets.” Science Advances 5 (11): eaau4996. https://doi.org/10.1126/sciadv.aau4996.

Ryan, Elizabeth G., Christopher C. Drovandi, James M. McGree, and Anthony N. Pettitt. 2016. “A Review of Modern Computational Approaches for Optimal Experimental Design in Regression Models.” Journal of the Royal Statistical Society: Series C 65 (5): 779–816.

Saltelli, Andrea, Marco Ratto, Terry Andres, et al. 2008. Global Sensitivity Analysis: The Primer. John Wiley & Sons.

Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. Cambridge University Press.

Scheffer, Marten, Jordi Bascompte, William A. Brock, et al. 2009. “Early-Warning Signals for Critical Transitions.” Nature 461 (7260): 53–59.

Schreiber, Thomas. 2000. “Measuring Information Transfer.” Physical Review Letters 85 (2): 461–64. https://doi.org/10.1103/PhysRevLett.85.461.

Schulam, Peter, and Suchi Saria. 2017. “Reliable Decision Support Using Counterfactual Models.” Advances in Neural Information Processing Systems 30.

Schuler, Maya S., and Sherri Rose. 2017. “Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies.” American Journal of Epidemiology 185 (1): 65–73.

Shen, Zhichao, Jing Liu, Yong Jiang, Zhaoran Chen, et al. 2019. “Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models.” PMC / Proceedings of Machine Learning Research.

Shpitser, Ilya, and Judea Pearl. 2006. “Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models.” Proceedings of the 21st National Conference on Artificial Intelligence 2: 1219–26.

Sontag, Eduardo D. 1998. Mathematical Control Theory: Deterministic Finite Dimensional Systems. 2nd ed. Springer.

Spirtes, Peter, Clark Glymour, and Richard Scheines. 2000. Causation, Prediction, and Search. 2nd ed. MIT Press.

Strogatz, Steven H. 2014. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. 2nd ed. Westview Press.

Sugihara, George, Robert May, Hao Ye, et al. 2012. “Detecting Causality in Complex Ecosystems.” Science 338 (6106): 496–500. https://doi.org/10.1126/science.1227079.

Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. 2nd ed. MIT Press.

Tennenholtz, Guy, Assaf Hallak, Shie Mannor, Uri Shalit, Lior Shani, and Aviv Tamar. 2020. “Off-Policy Evaluation in Partially Observable Environments.” Proceedings of the AAAI Conference on Artificial Intelligence 34 (04): 6148–56.

Tsamardinos, Ioannis, Constantin F. Aliferis, Alexander R. Statnikov, and Evgeny Statnikov. 2003. “Algorithms for Large Scale Markov Blanket Discovery.” Proceedings of the 16th International FLAIRS Conference, 376–81.

VanderWeele, Tyler J. 2015. Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press.

VanderWeele, Tyler J., and Peng Ding. 2017. “Sensitivity Analysis in Observational Research: Introducing the e-Value.” Annals of Internal Medicine 167 (4): 268–74.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, et al. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27 (5): 1413–32.

Wainwright, Martin J., and Michael I. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in Machine Learning. Now Publishers.

Whitehead, Alfred North. 1929. The Function of Reason. Princeton University Press.

Whitehead, Alfred North. 1978. Process and Reality: An Essay in Cosmology. Edited by David Ray Griffin and Donald W. Sherburne. Free Press.

Wooldridge, Michael. 2009. An Introduction to MultiAgent Systems. 2nd ed. John Wiley & Sons.

Zhang, Jiji. 2008. “On the Completeness of Orientation Rules for Causal Discovery in the Presence of Latent Confounders and Selection Bias.” Artificial Intelligence 172 (16): 1873–96. https://doi.org/10.1016/j.artint.2008.08.001.