## CRL Task 2: Interventions – When and Where?

In the previous blog post we discussed the gorey details of generalised policy learning – the first task of CRL. We went into some very detailed mathematical description of dynamic treatment regimes and generalised modes of learning for data processing agents. The next task is a bit more conceptual and focuses on the question on how to identfy optimal areas of intervention in a system. This is clearly very important for an RL agent where its entire learning mechanism is based on these very interventions in some system with a feedback mechanism. Let’s begin!

## This Series

1. Causal Reinforcement Learning
2. Preliminaries for CRL
3. CRL Task 1: Generalised Policy Learning
4. CRL Task 2: Interventions – When and Where?
5. CRL Task 3: Counterfactual Decision Making
6. CRL Task 4: Generalisability and Robustness
7. (Coming soon) Task 5: Learning Causal Models
8. (Coming soon) Task 6: Causal Imitation Learning
9. (Coming soon) Wrapping Up: Where To From Here?

## CRL Task 1: Generalised Policy Learning

In the previous blog post we developed some ideas and theory needed to discuss a causal approach to reinforcement learning. We formalised notions of multi-armed bandits (MABs), Markov Decision Processes (MDPs), and some causal notions. In this blog post we’ll finally get to developing some causal reinforcement learning ideas. The first of which is dubbed Task 1, for CRL can help solve. This is Generalised Policy Learning. Let’s begin.

## This Series

1. Causal Reinforcement Learning
2. Preliminaries for CRL
3. CRL Task 1: Generalised Policy Learning
4. CRL Task 2: Interventions – When and Where?
5. CRL Task 3: Counterfactual Decision Making
6. CRL Task 4: Generalisability and Robustness
7. (Coming soon) Task 5: Learning Causal Models
8. (Coming soon) Task 6: Causal Imitation Learning
9. (Coming soon) Wrapping Up: Where To From Here?

## Generalised Policy Learning

Reinforcement learning typically involves learning and optimising some policy about how to interact in an environment to maximise some reward signal.…

## Preliminaries for CRL

In the previous blog post we discussed and motivated the need for a causal approach to reinforcement learning. We argued that reinforcement learning naturally falls on the interventional rung of the ladder of causation. In this blog post we’ll develop some ideas necessary for understanding the material covered in this series. This might get quite technical, but don’t worry. There is still always something to take away. Let’s begin.

## This Series

1. Causal Reinforcement Learning
2. Preliminaries for CRL
3. CRL Task 1: Generalised Policy Learning
4. CRL Task 2: Interventions – When and Where?
5. CRL Task 3: Counterfactual Decision Making
6. CRL Task 4: Generalisability and Robustness
7. (Coming soon) Task 5: Learning Causal Models
8. (Coming soon) Task 6: Causal Imitation Learning
9. (Coming soon) Wrapping Up: Where To From Here?

## Preliminaries

As you probably recall from high school, probability and statistics are almost entirely formulated on the idea of drawing random samples from an experiment. One imagines observing realisations of outcomes from some set of possibilities when drawing from an assortment of independent and identically distributed (i.i.d.) events.…

## Causal Reinforcement Learning: A Primer

As part of any honours degree at the University of Cape Town, one is obliged to write a thesis ‘droning’ on about some topic. Luckily for me, applied mathematics can pertain to pretty much anything of interest. Lo and behold, my thesis on merging causality and reinforcement learning. This was entitled Climbing the Ladder: A Survey of Counterfactual Methods in Decision Making Processes and was supervised by Dr Jonathan Shock.

In this series of posts I will break down my thesis into digestible blog chucks and go into quite some detail of the emerging field of Causal Reinforcement Learning (CRL) – which is being spearheaded by Elias Bareinboim and Judea Pearl, among others. I will try to present this in such a way as to satisfy those craving some mathematical detail whilst also trying to paint a broader picture as to why this is generally useful and important. Each of these blog posts will be self contained in some way.…

## A simple introduction to causal inference

Introduction

Causal inference is a branch of Statistics that is increasing in popularity. This is because it allows us to answer questions in a more direct way than do other methods. Usually, we can make inference about association or correlation between a variable and an outcome of interest, but these are often subject to outside influences and may not help us answer the questions in which we are most interested.

Causal inference seeks to remedy this by measuring the effect on the outcome (or response variable) that we see when we change another variable (the ‘treatment’). In a sense, we are looking to reproduce the situation that we have when we do an designed experiment (with a ‘treated’ and a ‘control’ group). The goal here is to have groups that are otherwise the same (with regard to factors that might influence the outcome) but where one is ‘treated’ and the other is not.…

## Correlation vs Mutual Information

This post is based on a (very small) part of the (dense and technical) paper Fooled by Correlation by N.N. Taleb, found at (1)

Notes on the main ideas in this post are available from Universidad de Cantabria, found at (2)

The aims of this post are to 1) introduce mutual information as a measure of similarity and 2) to show the nonlinear relationship between correlation and information my means of a relatively simple example

Introduction

A significant part of Statistical analysis is understanding how random variables are related – how much knowledge about the value of one variable tells us about the value of another. This post will consider this issue in the context of Gaussian random variables. More specifically, we will compare- and discuss the relationship between- correlation and mutual information.

Mutual Information

The Mutual Information between 2 random variables is the amount of information that one gains about a random variable by observing the value of the other.…

Introduction

A key consideration when analysing stratified data is how the behaviour of each category differs and how these differences might influence the overall observations about the data. For example, a data set might be split into one large category that dictates the overall behaviour or there may be a category with statistics that are significantly different from the other categories that skews the overall numbers. These features of the data are important to be aware of and go find to prevent drawing erroneous conclusions from your analysis. Context, the source of the data and a careful analysis of the data can prevent this. Simpson’s paradox is an interesting result of some of these effects.

Simpson’s paradox is observed in statistics when a trend is observed in a number of different groups but it is not observed in the overall data or the opposite trend is observed.

Observing the overall data might therefore lead us to draw a conclusion, but when the data is grouped we might conclude something different.…

## Automatic Differentiation

Much of this content is based on lecture slides from slides from Professor David Barber at University College London: resources relating to this can be found at: www.cs.ucl.ac.uk/staff/D.Barber/brml

What is Autodiff?

Autodiff, or Automatic Differentiation, is a method of determining the exact derivative of a function with respect to its inputs. It is widely used in machine learning- in this post I will give an overview of what autodiff is and why it is a useful tool.

The above is not a very helpful definition, so we can compare autodiff first to symbolic differentiation and numerical approximations before going into how it works.

Symbolic differentiation is what we do when we calculate derivatives when we do it by hand, i.e. given a function $f$, we find a new function $f'$. This is really good when we want to know how functions behave across all inputs. For example if we had $f(x) = x^2 + 3x + 1$ we can find the derivative as $f'(x) = 2x + 3$ and then we can find the derivative of the function for all values of $x$.…

## Captain Raymond Holt vs Claude Shannon

Overview

In this post I am going to introduce a pretty famous riddle, made popular recently by the police sitcom Brooklyn Nine-Nine as well as the idea of the entropy of a probability distribution, made popular by Claude Shannon. Then I am going to go through a solution that is presented in Information Theory, Inference, and Learning Algorithms (2), a brilliant book on the topic by the late David MacKay, as well as some intuitions from his lecture series on the topic. Hopefully, by the end of it, you will be familiar with another property of a probability distribution and be able to impress your friends with your riddle-solving abilities.

The Riddle

The riddle is presented by Captain Holt (pictured above) to his team of detectives as follows (1):

‘There are 12 men on an island, 11 weigh exactly the same amount, but 1 of them is slightly lighter or heavier: you must figure which.* The island has no scales, but there is a see-saw.

By | October 23rd, 2019|English, Fun|0 Comments
The two types of averages that we usually focus on are the sample mean from a set of data and the expectation that comes from a probability distribution. For example if three men weigh 70kg, 80kg, and 90kg respectively then the sample mean of their weight is $\bar x = \frac{70+80+90}{3} = 80$. Alternatively, we might say that the arrival times of trains are exponentially distributed with parameter $\lambda = 3$ we can use the properties of the exponential distribution to find the mean (or expectation). In this case the mean is $\mu = \frac{1}{\lambda} = \frac{1}{3}$.