## Inverse Reinforcement Learning: Guided Cost Learning and Links to Generative Adversarial Networks

Recap

In the first post we introduced inverse reinforcement learning, then we stated some result on the characterisation of admissible reward functions (i.e reward functions that solve the inverse reinforcement learning problem), then on the second post we saw a way in which we proceed with solving problems, more or less, using a maximum entropy framework, and we encountered two problems:
1. It would be hard to use the method introduced if we did not know the dynamics of the system already, and
2. We have to solve the MDP in the inner loop, which may be an expensive process.

Here, we shall attempt to mitigate the challenges that we have encountered, as before, and we shall give a rather beautiful closing which shall link concepts in this space of inverse reinforcement learning to ‘general’ machine learning structures, in particular generative adversarial networks.

Inverse Reinforcement Learning with Unknown Dynamics and Possibly Higher Dimensional Spaces

As we saw previously, the maximum entropy inverse reinforcement learning approach proceeds by defining the probability of a certain trajectory under the expert as being,

$p(\tau)=\dfrac{1}{Z}e^{R_\psi (\tau)},$

where

$Z=\int e^{R_\psi(\tau)}d \tau.$

We mentioned that this is hard to compute in higher dimensional spaces.…

## Maximum Entropy Inverse Reinforcement Learning: Algorithms and Computation

In the previous post we introduced inverse reinforcement learning. We defined the problem that is associated with this field, which is that of reconstructing a reward function given a set of demonstrations, and we saw what the ability to do this implies. In addition to this, we also saw came across some classification results as well as convergence guarantees from selected methods that were simply referred to in the post. There were some challenges with the classification results that we discussed, and although there were attempts to deal with these, there is still quite a lot that we did not talk about.

Maximum Entropy Inverse Reinforcement Learning

We shall now introduce a probabilistic approach based on what is known as the principle of maximum entropy, and this provides a well defined globally normalised distribution over decision sequences, while providing the same performance assurances as previously mentioned methods. This probabilistic approach allows moderate reasoning about uncertainty in the setting inverse reinforcement learning, and the assumptions further limits the space in which we search for solutions which we saw, last time, was quite massive.…

## Inverse Reinforcement Learning: The general basics

Standard Reinforcement Learning

The very basic ideas in Reinforcement Learning are usually defined in the context of Markov Decision Processes. For everything that follows, unless stated otherwise, assume that the structures are finite.

A Markov Decision Process (MDP) is a tuple $(S,A, P, \gamma, R)$ where the following is true:
1. $S$ is the set of states $s_k$ with $k\in \mathbb{N}$.
2. $A$ is the set of actions $a_k$ with $k\in \mathbb{N}$.
3. $P$ is the matrix of transition probabilities for taking action $a_k$ given state $s_j$.
4. $\gamma$ is the discount factor in the unit interval.
5. $R$ is defined as the reward function, and is taken as a function from $A\times S\to \mathbb{R}$.

In this context, we have policies as maps

$\pi:S\to A$,

state value functions for a policy, $\pi$, evaluated at $s_1$ as

$V^\pi(s_1)=\mathbb{E}[\sum_{i=0}\gamma ^i R(s_i)|\pi]$,

and state action values defined as

$Q^\pi (s,a)=R(s)+\gamma \mathbb{E}_{s'\sim P_{sa}}[V^\pi (s')]$.

The optimal functions are defined as

$V^*(s)=\sup_\pi V^{\pi}(s),$

and

$Q^*(s,a)=\sup_\pi Q^\pi (s,a).$

Here we assume that we have a reward function, and this reward function is used to determine an optimal policy.

## Correlation vs Mutual Information

This post is based on a (very small) part of the (dense and technical) paper Fooled by Correlation by N.N. Taleb, found at (1)

Notes on the main ideas in this post are available from Universidad de Cantabria, found at (2)

The aims of this post are to 1) introduce mutual information as a measure of similarity and 2) to show the nonlinear relationship between correlation and information my means of a relatively simple example

Introduction

A significant part of Statistical analysis is understanding how random variables are related – how much knowledge about the value of one variable tells us about the value of another. This post will consider this issue in the context of Gaussian random variables. More specifically, we will compare- and discuss the relationship between- correlation and mutual information.

Mutual Information

The Mutual Information between 2 random variables is the amount of information that one gains about a random variable by observing the value of the other.…

## The Res-Net-NODE Narrative

Humble Beginnings: Ordinary Differential Equations

The story begins with differential equations. Consider $f$ such that $f:[0,T]\times \mathbb{R}^n\to \mathbb{R}^n$ is a continuous function. We can construct a rather simple differential equation given this in the following way. We let

$\begin{cases} {y'(t)}=f(t,y(t))\\ y(0)=y_0\in \mathbb{R}^n \end{cases}$

A solution to this system is a continuous map that is defined in the neighbourhood of $t=0$ such that this map satisfies the differential equation.

Ordinary differential equations are well-studied, and we know that, for example, a solution to the given differential equation will exist whenever the function $f$ satisfies the following:

$(\forall x,y\in \mathbb{R}^n)(\exists C>0 (\in \mathbb{R}))(||f(t,y)-f(t,x)||\leq C||y-x||)$

This property is known as Lipschitz continuity. A function that satisfies this condition is said to be Lipschitz. We shall see that whenever we require this condition, on up coming situations, wonderful things happen!

A remarkable field that almost always couples with differential equations is numerical analysis, where we learn to solve differential equations numerically and we study these numerical schemes. We shall explore numerical integration briefly.…

By | March 18th, 2020|Uncategorized|Comments Off on The Res-Net-NODE Narrative

## Scaled Reinforcement Learning: A Brief Introduction to Deep Q-Learning

This blog post is a direct translation of a talk that was given by the author on the 17th of February 2020. The ideas was to very briefly introduce Deep Q-Learning to an audience that was familiar with the fundamental concepts of reinforcement learning. If the person reading this is not familiar with these basics, then a very great introduction can be found here: An Introduction to Reinforcement Learning. Without the additional details from the talk, one will note that this post is rather brief, and should really be used as a tool to gain an overview for the method or a gateway to relevant resources. This will not the case for posts later in the series, because the intention is to deal more with the mathematical aspect of reinforcement learning.

Basic Reinforcement Learning Notions

The idea behind reinforcement learning is that there is an agent that interacts with the environment in order to achieve a certain task.…

## Curves for the Mathematically Curious – an anthology of the unpredictable, historical, beautiful and romantic, by Julian Havil – a review

NB I was sent this book as a review copy.

What a beautiful idea. What a beautiful book! In studying mathematics, one comes across various different curves while studying calculus, or number theory, or geometry in various forms and they are asides of the particular subject. The idea however of flipping the script and looking at curves themselves and from them gaining insight into: statistics, combinatorics, number theory, analysis, cryptography, fractals, Fourier series, axiomatic set theory and so much more is just wonderful.

This book looks at ten carefully chosen curves and from them shows how much insight one can get into vast swathes of mathematics and mathematical history. The curves chosen are:

1. The Euler Spiral – an elegant spiral which leads to many other interesting parametrically defined curves
2. The Weierstrass Curve – an everywhere continuous but nowhere differentiable function
3. Bezier Curves – which show up in computer graphics and beyond
4. The Rectangular Hyperbola – which leads to the investigation of logarithms and exponentials
5. The Quadratrix of Hippies – which are tightly linked to the impossible problems of antiquity
6. Peano’s Function and Hilbert’s Curve – space filling curves which lead to a completely flipped understanding of the possibilities of infinitely thin lines
7. Curves of Constant Width – curves which can perfectly fit down a hallway as they rotate.

## The Objective Function

In both Supervised and Unsupervised machine learning, most algorithms are centered around minimising (or, equivalently) maximising some objective function. This function is supposed to somehow represent what the model knows/can get right. Normally, as one would expect, the objective function does not always reflect exactly what we want.

The objective function presents 2 main problems: 1. how do we minimise it (the answer to this is up for debate and there is lots of interesting research about efficient optimisation of non-convex functions and 2) assuming we can minimise it perfectly, is it the correct thing to be minimising?

It is point 2 which is the focus of this post.

Let’s take the example of square-loss-linear-regression. To do so we train a linear regression model with a square loss $\mathcal{L}(\mathbf{w})=\sum_i (y_i - \mathbf{w}^Tx_i)^2$. (Where we are taking the inner product of learned weights with a vector of features for each observation to predict the outcome).…

## Tales of Impossibility – The 2000 year quest to solve the mathematical problems of antiquity, by David S. Richeson – a review

NB I was sent this book as a review copy.

Four impossible puzzles, all described in detail during the height of classical Greek Mathematics. All simple to define and yet so tempting that it has taken not only the brain power of many, many thousands of mathematicians (amateur and professional alike), but also two millennia to show that however hard you may try, these puzzles are just not possible. The puzzles are:

• Squaring the circle: With only a compass and a straight edge, draw a square with the same area as that of a given circle.
• Doubling the cube: With only a compass and a straight edge, draw the edge of a cube with volume twice that of a cube whose edge is given.
• Constructing regular polygons: Given a compass and a straight edge, construct a regular n-gon in a given circle for $n\ge 3$.
• Trisecting an angle: Given a compass and a straight edge, and a given angle, construct an angle that is one third of the original.

## What is mathematics?

Below you find some thoughts on this wide question, I encourage you to think about. What is your vision of mathematics? It will be most probably the result of your own experience with the subject, traumas that happened along the way and realizing that, could make you more conscious about your relationship with the subject and the walls you might have built against the subject or part of the subject. In a sense, by understanding the bias and blockage, by objectively thinking about its value, you could allow yourself to be able to equip yourself with the full set of skills mathematics gives you to build your own greatest life.

Many of thoughts in the following around this topic are taken from the paper I recommend to read: Teaching and Learning “What is Mathematics and why we should ask, where one should experience and learn that and how to teach it”, by Gunter M.…