When Animals Dream: The Hidden World of Animal Consciousness, by David M. Peña-Guzmán – A review

NB. I was sent this book as a review copy.

From Princeton University Press

The links between dreams, consciousness and memory are absolutely fascinating. I had imagined that the insight that we could get into animal dreaming was limited to dogs running in their sleep and waking themselves up by banging, violently into a wall. However, there is so much more research on this topic than I was aware of. The signals that octopuses can provide to us as to their thoughts through their colouration, shape and texture is incredibly rich, and an entire narrative can seemingly be read off from these visual clues while they sleep. David clearly has some serious frustration with researchers who don’t want to make the leap to the conclusion that this is really dreaming in the way that we know it, but could simply be the animal running through stereotyped behaviour simulations, in an unconscious way.…

By | December 13th, 2023|Book reviews, Reviews, Uncategorized|2 Comments

CRL Task 5: Learning Causal Models

We’ve now come to one of the most vital aspects of this theory – how can we learn causal models? Learning models is often an exceptionally computationally intensive process, so getting this right is crucial. We now develop some mathematical results which guarantee bounds on our learning. We’ll start by discussing the current state of this field in relation to causal inference and reinforcement learning.

This Series

  1. Causal Reinforcement Learning
  2. Preliminaries for CRL
  3. CRL Task 1: Generalised Policy Learning
  4. CRL Task 2: Interventions – When and Where?
  5. CRL Task 3: Counterfactual Decision Making
  6. CRL Task 4: Generalisability and Robustness
  7. Task 5: Learning Causal Models
  8. (Coming soon) Task 6: Causal Imitation Learning
  9. (Coming soon) Wrapping Up: Where To From Here?

Learning Causal Models

Perhaps one of the most computationally difficult processes in the field of causal inference is that of learning underlying causal structure by algorithmically identifying cause-effect relationships. In recent years there has been a surge of interest in learning such relationships in the fields of machine learning and artificial intelligence, though it has been relatively prevalent in the social sciences for many years now (e.g.…

By | September 19th, 2021|Uncategorized|0 Comments

CRL Task 3: Counterfactual Decision Making

In the previous blog post we discussed some theory of how to select optimal and possibly optimal interventions in a causal framework. For those interested in the decision science, this blog post may be more inspiring. This next task involves applying counterfactual quantities to boost learning performance. This is clearly very important for an RL agent where its entire learning mechanism is based on interventions in a system. What if intervention isn’t possible? Let’s begin!

This Series

  1. Causal Reinforcement Learning
  2. Preliminaries for CRL
  3. CRL Task 1: Generalised Policy Learning
  4. CRL Task 2: Interventions – When and Where?
  5. CRL Task 3: Counterfactual Decision Making
  6. CRL Task 4: Generalisability and Robustness
  7. Task 5: Learning Causal Models
  8. (Coming soon) Task 6: Causal Imitation Learning
  9. (Coming soon) Wrapping Up: Where To From Here?

Counterfactual Decision Making

A key feature of causal inference is its ability to deal with counterfactual queries. Reinforcement learning, by its nature, deals with interventional quantities in a trial-and-error style of learning.…

By | July 10th, 2021|English, Level: intermediate, Uncategorized|6 Comments

A challenging limit

This post comes mostly from the youtube video by BlackPenRedPen found here: https://www.youtube.com/watch?v=89d5f8WUf1Y&t=3s

This in turn comes from Brilliant.com – details and links can be found in the original video

In this post we will have a look at a complicated-looking limit that has an interesting solution. Here it is:

\lim_{n \rightarrow \infty} ( \frac{n!}{n^n})^{\frac{1}{n}}

This looks pretty daunting – but we will break the solution down into sections:

  • taking the logarithms and rearranging
  • recognising something familiar
  • finding the numerical value


Step 1: Taking the Logarithm

The first step here is to take the logarithm, a generally useful trick when applying limits. First we assign the variable L to the limit (so that we can solve for it in the end). Now lets do some algebra:

L = \lim_{n \rightarrow \infty} ( \frac{n!}{n^n})^{\frac{1}{n}}

\ln(L) = \ln(\lim_{n \rightarrow \infty} ( \frac{n!}{n^n})^{\frac{1}{n}})

Noting that the natural logarithm \ln is a continuous function and therefore we can take the limit outside of the function:

\ln(L) =  \lim_{n \rightarrow \infty} \ln( (\frac{n!}{n^n})^{\frac{1}{n}})

Next we can use the logarithm laws to bring down the exponent:

\ln(L) =  \lim_{n \rightarrow \infty}  \frac{1}{n} \ln(\frac{n!}{n^n})

Alright, now we have taken the logarithm, step 1 is complete.…

By | November 29th, 2020|MAM1000, Uncategorized|0 Comments

Parrondos Paradox


In this post we will have a look at Parrondos paradox. In a paper* entitled “Information Entropy and Parrondo’s Discrete-Time Ratchet”** the authors demonstrate a situation where, by switching between 2 losing strategies, we can create a winning strategy.


The setup to this paradox is as follows:

We have 2 games that we can play – if we win we get 1 unit of wealth, if we lose, it costs 1 unit of wealth. Game A gives us a payout of 1 with a probability of slightly less than 0.5. Clearly if we play this game for long enough we will end up losing.

Game B is a little more complicated in that it is defined with reference to our existing winnings. If our current level of wealth is a multiple of M we play a game where the probability of winning is slightly less than 0.1. If it is not a multiple of M, the probability of winning is slightly less than 0.75.…

By | November 11th, 2020|Uncategorized|0 Comments

Basic Reverse Image Search Using an Autoencoder


In this post we are going to create a simple reverse image search on the MNIST handwritten image dataset. That is to say, given any image, we want to return images that look most similar to it. To do this, we will use an autoencoder, trained using Tensorflow 2.

The dataset

The MNIST dataset is a commonly-used dataset in machine learning comprised of 28-by-28 images of handwritten digits between 0 and 9. For our purposes we would be interested in our image searcher returning images of the same number as the query images, i.e. if we input a 3 we want the images returned to all be 3s. However, if we had, say, four 3s and one 2 that mightn’t be too bad, considering how 2 and 3 look a bit similar. However, if we had three 3s, one 1 and a 7 we might say that the performance is not up to standard.…

By | October 21st, 2020|Uncategorized|0 Comments

A simple introduction to causal inference



Causal inference is a branch of Statistics that is increasing in popularity. This is because it allows us to answer questions in a more direct way than do other methods. Usually, we can make inference about association or correlation between a variable and an outcome of interest, but these are often subject to outside influences and may not help us answer the questions in which we are most interested.

Causal inference seeks to remedy this by measuring the effect on the outcome (or response variable) that we see when we change another variable (the ‘treatment’). In a sense, we are looking to reproduce the situation that we have when we do an designed experiment (with a ‘treated’ and a ‘control’ group). The goal here is to have groups that are otherwise the same (with regard to factors that might influence the outcome) but where one is ‘treated’ and the other is not.…

By | August 20th, 2020|English, Uncategorized|0 Comments

Inverse Reinforcement Learning: Guided Cost Learning and Links to Generative Adversarial Networks


In the first post we introduced inverse reinforcement learning, then we stated some result on the characterisation of admissible reward functions (i.e reward functions that solve the inverse reinforcement learning problem), then on the second post we saw a way in which we proceed with solving problems, more or less, using a maximum entropy framework, and we encountered two problems:
1. It would be hard to use the method introduced if we did not know the dynamics of the system already, and
2. We have to solve the MDP in the inner loop, which may be an expensive process.

Here, we shall attempt to mitigate the challenges that we have encountered, as before, and we shall give a rather beautiful closing which shall link concepts in this space of inverse reinforcement learning to ‘general’ machine learning structures, in particular generative adversarial networks.

Inverse Reinforcement Learning with Unknown Dynamics and Possibly Higher Dimensional Spaces

As we saw previously, the maximum entropy inverse reinforcement learning approach proceeds by defining the probability of a certain trajectory under the expert as being,

p(\tau)=\dfrac{1}{Z}e^{R_\psi (\tau)},


Z=\int e^{R_\psi(\tau)}d \tau.

We mentioned that this is hard to compute in higher dimensional spaces.…

By | May 28th, 2020|Uncategorized|0 Comments

Maximum Entropy Inverse Reinforcement Learning: Algorithms and Computation

In the previous post we introduced inverse reinforcement learning. We defined the problem that is associated with this field, which is that of reconstructing a reward function given a set of demonstrations, and we saw what the ability to do this implies. In addition to this, we also saw came across some classification results as well as convergence guarantees from selected methods that were simply referred to in the post. There were some challenges with the classification results that we discussed, and although there were attempts to deal with these, there is still quite a lot that we did not talk about.

Maximum Entropy Inverse Reinforcement Learning

We shall now introduce a probabilistic approach based on what is known as the principle of maximum entropy, and this provides a well defined globally normalised distribution over decision sequences, while providing the same performance assurances as previously mentioned methods. This probabilistic approach allows moderate reasoning about uncertainty in the setting inverse reinforcement learning, and the assumptions further limits the space in which we search for solutions which we saw, last time, was quite massive.…

By | May 22nd, 2020|Uncategorized|0 Comments

Inverse Reinforcement Learning: The general basics

Standard Reinforcement Learning

The very basic ideas in Reinforcement Learning are usually defined in the context of Markov Decision Processes. For everything that follows, unless stated otherwise, assume that the structures are finite.

A Markov Decision Process (MDP) is a tuple (S,A, P, \gamma, R) where the following is true:
1. S is the set of states s_k with k\in \mathbb{N} .
2. A is the set of actions a_k with k\in \mathbb{N} .
3. P is the matrix of transition probabilities for taking action a_k given state s_j.
4. \gamma is the discount factor in the unit interval.
5. R is defined as the reward function, and is taken as a function from A\times S\to \mathbb{R}.

In this context, we have policies as maps

\pi:S\to A ,

state value functions for a policy, \pi , evaluated at s_1 as

V^\pi(s_1)=\mathbb{E}[\sum_{i=0}\gamma ^i R(s_i)|\pi],

and state action values defined as

Q^\pi (s,a)=R(s)+\gamma \mathbb{E}_{s'\sim P_{sa}}[V^\pi (s')].

The optimal functions are defined as

V^*(s)=\sup_\pi V^{\pi}(s),


Q^*(s,a)=\sup_\pi Q^\pi (s,a).

Here we assume that we have a reward function, and this reward function is used to determine an optimal policy.

By | May 17th, 2020|Uncategorized|0 Comments