## CRL Task 1: Generalised Policy Learning

In the previous blog post we developed some ideas and theory needed to discuss a causal approach to reinforcement learning. We formalised notions of multi-armed bandits (MABs), Markov Decision Processes (MDPs), and some causal notions. In this blog post we’ll finally get to developing some causal reinforcement learning ideas. The first of which is dubbed *Task 1*, for CRL can help solve. This is *Generalised Policy Learning*. Let’s begin.

## This Series

- Causal Reinforcement Learning
- Preliminaries for CRL
- CRL Task 1: Generalised Policy Learning
- CRL Task 2: Interventions – When and Where?
- CRL Task 3: Counterfactual Decision Making
- CRL Task 4: Generalisability and Robustness
- Task 5: Learning Causal Models
*(Coming soon)*Task 6: Causal Imitation Learning*(Coming soon)*Wrapping Up: Where To From Here?

## Generalised Policy Learning

Reinforcement learning typically involves learning and optimising some policy about how to interact in an environment to maximise some reward signal. Typical reinforcement learning agents are trained in isolation, exploiting copious amounts of computing power and energy resources.…