Bellman, Richard. 1957. Dynamic Programming. Princeton University Press.
Bertsekas, Dimitri P. 2019. Reinforcement Learning and Optimal Control. Athena Scientific.
Laan, Mark J. van der, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer.
Laan, Mark J. van der, and Daniel Rubin. 2006. “Targeted Maximum Likelihood Learning.” The International Journal of Biostatistics 2 (1).
Murphy, Susan A. 2003. “Optimal Dynamic Treatment Regimes.” Journal of the Royal Statistical Society: Series B 65 (2): 331–55.
Puterman, Martin L. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 2nd ed. Wiley.
Robins, James M. 1986. “A New Approach to Causal Inference in Mortality Studies with a Sustained Exposure Period—Application to Control of the Healthy Worker Survivor Effect.” Mathematical Modelling 7 (9–12): 1393–512.
Robins, James M., Miguel A. Hernán, and Babette Brumback. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” Epidemiology 11 (5): 550–60.
Schulam, Peter, and Suchi Saria. 2017. “Reliable Decision Support Using Counterfactual Models.” Advances in Neural Information Processing Systems 30.
Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. 2nd ed. MIT Press.
Tennenholtz, Guy, Assaf Hallak, Shie Mannor, Uri Shalit, Lior Shani, and Aviv Tamar. 2020. “Off-Policy Evaluation in Partially Observable Environments.” Proceedings of the AAAI Conference on Artificial Intelligence 34 (04): 6148–56.