## CRL Task 3: Counterfactual Decision Making

In the previous blog post we discussed some theory of how to select optimal and possibly optimal interventions in a causal framework. For those interested in the decision science, this blog post may be more inspiring. This next task involves applying counterfactual quantities to boost learning performance. This is clearly very important for an RL agent where its entire learning mechanism is based on interventions in a system. What if intervention isn’t possible? Let’s begin!

## This Series

1. Causal Reinforcement Learning
2. Preliminaries for CRL
3. CRL Task 1: Generalised Policy Learning
4. CRL Task 2: Interventions – When and Where?
5. CRL Task 3: Counterfactual Decision Making
6. CRL Task 4: Generalisability and Robustness
7. Task 5: Learning Causal Models
8. (Coming soon) Task 6: Causal Imitation Learning
9. (Coming soon) Wrapping Up: Where To From Here?

## Counterfactual Decision Making

A key feature of causal inference is its ability to deal with counterfactual queries. Reinforcement learning, by its nature, deals with interventional quantities in a trial-and-error style of learning.…

## CRL Task 2: Interventions – When and Where?

In the previous blog post we discussed the gorey details of generalised policy learning – the first task of CRL. We went into some very detailed mathematical description of dynamic treatment regimes and generalised modes of learning for data processing agents. The next task is a bit more conceptual and focuses on the question on how to identfy optimal areas of intervention in a system. This is clearly very important for an RL agent where its entire learning mechanism is based on these very interventions in some system with a feedback mechanism. Let’s begin!

## This Series

1. Causal Reinforcement Learning
2. Preliminaries for CRL
3. CRL Task 1: Generalised Policy Learning
4. CRL Task 2: Interventions – When and Where?
5. CRL Task 3: Counterfactual Decision Making
6. CRL Task 4: Generalisability and Robustness
7. Task 5: Learning Causal Models
8. (Coming soon) Task 6: Causal Imitation Learning
9. (Coming soon) Wrapping Up: Where To From Here?

## CRL Task 1: Generalised Policy Learning

In the previous blog post we developed some ideas and theory needed to discuss a causal approach to reinforcement learning. We formalised notions of multi-armed bandits (MABs), Markov Decision Processes (MDPs), and some causal notions. In this blog post we’ll finally get to developing some causal reinforcement learning ideas. The first of which is dubbed Task 1, for CRL can help solve. This is Generalised Policy Learning. Let’s begin.

## This Series

1. Causal Reinforcement Learning
2. Preliminaries for CRL
3. CRL Task 1: Generalised Policy Learning
4. CRL Task 2: Interventions – When and Where?
5. CRL Task 3: Counterfactual Decision Making
6. CRL Task 4: Generalisability and Robustness
7. Task 5: Learning Causal Models
8. (Coming soon) Task 6: Causal Imitation Learning
9. (Coming soon) Wrapping Up: Where To From Here?

## Generalised Policy Learning

Reinforcement learning typically involves learning and optimising some policy about how to interact in an environment to maximise some reward signal. Typical reinforcement learning agents are trained in isolation, exploiting copious amounts of computing power and energy resources.…

## Preliminaries for CRL

In the previous blog post we discussed and motivated the need for a causal approach to reinforcement learning. We argued that reinforcement learning naturally falls on the interventional rung of the ladder of causation. In this blog post we’ll develop some ideas necessary for understanding the material covered in this series. This might get quite technical, but don’t worry. There is still always something to take away. Let’s begin.

## This Series

1. Causal Reinforcement Learning
2. Preliminaries for CRL
3. CRL Task 1: Generalised Policy Learning
4. CRL Task 2: Interventions – When and Where?
5. CRL Task 3: Counterfactual Decision Making
6. CRL Task 4: Generalisability and Robustness
7. Task 5: Learning Causal Models
8. (Coming soon) Task 6: Causal Imitation Learning
9. (Coming soon) Wrapping Up: Where To From Here?

## Preliminaries

As you probably recall from high school, probability and statistics are almost entirely formulated on the idea of drawing random samples from an experiment. One imagines observing realisations of outcomes from some set of possibilities when drawing from an assortment of independent and identically distributed (i.i.d.) events.…

## Causal Reinforcement Learning: A Primer

As part of any honours degree at the University of Cape Town, one is obliged to write a thesis ‘droning’ on about some topic. Luckily for me, applied mathematics can pertain to pretty much anything of interest. Lo and behold, my thesis on merging causality and reinforcement learning. This was entitled Climbing the Ladder: A Survey of Counterfactual Methods in Decision Making Processes and was supervised by Dr Jonathan Shock.

In this series of posts I will break down my thesis into digestible blog chucks and go into quite some detail of the emerging field of Causal Reinforcement Learning (CRL) – which is being spearheaded by Elias Bareinboim and Judea Pearl, among others. I will try to present this in such a way as to satisfy those craving some mathematical detail whilst also trying to paint a broader picture as to why this is generally useful and important. Each of these blog posts will be self contained in some way.…

## Basic Reverse Image Search Using an Autoencoder

Introduction

In this post we are going to create a simple reverse image search on the MNIST handwritten image dataset. That is to say, given any image, we want to return images that look most similar to it. To do this, we will use an autoencoder, trained using Tensorflow 2.

The dataset

The MNIST dataset is a commonly-used dataset in machine learning comprised of 28-by-28 images of handwritten digits between 0 and 9. For our purposes we would be interested in our image searcher returning images of the same number as the query images, i.e. if we input a 3 we want the images returned to all be 3s. However, if we had, say, four 3s and one 2 that mightn’t be too bad, considering how 2 and 3 look a bit similar. However, if we had three 3s, one 1 and a 7 we might say that the performance is not up to standard.…

## The Objective Function

In both Supervised and Unsupervised machine learning, most algorithms are centered around minimising (or, equivalently) maximising some objective function. This function is supposed to somehow represent what the model knows/can get right. Normally, as one would expect, the objective function does not always reflect exactly what we want.

The objective function presents 2 main problems: 1. how do we minimise it (the answer to this is up for debate and there is lots of interesting research about efficient optimisation of non-convex functions and 2) assuming we can minimise it perfectly, is it the correct thing to be minimising?

It is point 2 which is the focus of this post.

Let’s take the example of square-loss-linear-regression. To do so we train a linear regression model with a square loss $\mathcal{L}(\mathbf{w})=\sum_i (y_i - \mathbf{w}^Tx_i)^2$. (Where we are taking the inner product of learned weights with a vector of features for each observation to predict the outcome).…

## The Wisdom of the Crowds

This content comes primarily from the notes of Mark Herbster (contributed to by Massi Pontil and John Shawe-Taylor) of University College London.

Introduction

The Wisdom of the Crowds, or majority rule and related ideas tend to come up pretty often. Democracy is based (partly) on the majority of people being able to make the correct decision, often you might make decisions in a group of friends based on what the most people want, and it is logical to take into account popular opinion when reasoning on issues where you have imperfect information. On the other hand, of course, there is the Argumentum ad Populum fallacy which states that a popular belief isn’t necessarily true.

This is idea appears also in Applied Machine Learning – ensemble methods such as Random Forests, Gradient Boosted Models (especially XGBoost) and stacking of Neural Networks have resulted in overall more powerful models. This is especially notable in Kaggle competitions, where it is almost always an ensemble model (combination of models) that achieves the best score.…

## Automatic Differentiation

Much of this content is based on lecture slides from slides from Professor David Barber at University College London: resources relating to this can be found at: www.cs.ucl.ac.uk/staff/D.Barber/brml

What is Autodiff?

Autodiff, or Automatic Differentiation, is a method of determining the exact derivative of a function with respect to its inputs. It is widely used in machine learning- in this post I will give an overview of what autodiff is and why it is a useful tool.

The above is not a very helpful definition, so we can compare autodiff first to symbolic differentiation and numerical approximations before going into how it works.

Symbolic differentiation is what we do when we calculate derivatives when we do it by hand, i.e. given a function $f$, we find a new function $f'$. This is really good when we want to know how functions behave across all inputs. For example if we had $f(x) = x^2 + 3x + 1$ we can find the derivative as $f'(x) = 2x + 3$ and then we can find the derivative of the function for all values of $x$.…

## K-means: Intuitions, Maths and Percy Tau

Much of this content is based on lecture slides from slides from Professor David Barber at University College London: resources relating to this can be found at: www.cs.ucl.ac.uk/staff/D.Barber/brml

The K-means algorithm

The K-means algorithm is one of the simplest unsupervised learning* algorithms. The aim of the K-means algorithm is, given a set of observations $\mathbf{x}_1, \mathbf{x}_2, \dots \mathbf{x}_n$, to group these observations into K different groups in the best way possible (‘best way’ here refers to minimising a loss/cost/objective function).

This is a clustering algorithm, where we want to assign each observation to a group that has other similar observations in it. This could be useful, for example, to split Facebook users into groups that will each be shown a different advertisement.

* unsupervised learning is performed on data without labels, i.e. we have a group of data points $x_1, \dots, x_n$ (scalar or vector) and we want to find something out about how this data is structured.…

By | September 7th, 2019|English|0 Comments