I am a Mathematics student at the University of Cape Town.

## Inverse Reinforcement Learning: Guided Cost Learning and Links to Generative Adversarial Networks

Recap

In the first post we introduced inverse reinforcement learning, then we stated some result on the characterisation of admissible reward functions (i.e reward functions that solve the inverse reinforcement learning problem), then on the second post we saw a way in which we proceed with solving problems, more or less, using a maximum entropy framework, and we encountered two problems:
1. It would be hard to use the method introduced if we did not know the dynamics of the system already, and
2. We have to solve the MDP in the inner loop, which may be an expensive process.

Here, we shall attempt to mitigate the challenges that we have encountered, as before, and we shall give a rather beautiful closing which shall link concepts in this space of inverse reinforcement learning to ‘general’ machine learning structures, in particular generative adversarial networks.

Inverse Reinforcement Learning with Unknown Dynamics and Possibly Higher Dimensional Spaces

As we saw previously, the maximum entropy inverse reinforcement learning approach proceeds by defining the probability of a certain trajectory under the expert as being,

$p(\tau)=\dfrac{1}{Z}e^{R_\psi (\tau)},$

where

$Z=\int e^{R_\psi(\tau)}d \tau.$

We mentioned that this is hard to compute in higher dimensional spaces.…

## Maximum Entropy Inverse Reinforcement Learning: Algorithms and Computation

In the previous post we introduced inverse reinforcement learning. We defined the problem that is associated with this field, which is that of reconstructing a reward function given a set of demonstrations, and we saw what the ability to do this implies. In addition to this, we also saw came across some classification results as well as convergence guarantees from selected methods that were simply referred to in the post. There were some challenges with the classification results that we discussed, and although there were attempts to deal with these, there is still quite a lot that we did not talk about.

Maximum Entropy Inverse Reinforcement Learning

We shall now introduce a probabilistic approach based on what is known as the principle of maximum entropy, and this provides a well defined globally normalised distribution over decision sequences, while providing the same performance assurances as previously mentioned methods. This probabilistic approach allows moderate reasoning about uncertainty in the setting inverse reinforcement learning, and the assumptions further limits the space in which we search for solutions which we saw, last time, was quite massive.…

## Inverse Reinforcement Learning: The general basics

Standard Reinforcement Learning

The very basic ideas in Reinforcement Learning are usually defined in the context of Markov Decision Processes. For everything that follows, unless stated otherwise, assume that the structures are finite.

A Markov Decision Process (MDP) is a tuple $(S,A, P, \gamma, R)$ where the following is true:
1. $S$ is the set of states $s_k$ with $k\in \mathbb{N}$.
2. $A$ is the set of actions $a_k$ with $k\in \mathbb{N}$.
3. $P$ is the matrix of transition probabilities for taking action $a_k$ given state $s_j$.
4. $\gamma$ is the discount factor in the unit interval.
5. $R$ is defined as the reward function, and is taken as a function from $A\times S\to \mathbb{R}$.

In this context, we have policies as maps

$\pi:S\to A$,

state value functions for a policy, $\pi$, evaluated at $s_1$ as

$V^\pi(s_1)=\mathbb{E}[\sum_{i=0}\gamma ^i R(s_i)|\pi]$,

and state action values defined as

$Q^\pi (s,a)=R(s)+\gamma \mathbb{E}_{s'\sim P_{sa}}[V^\pi (s')]$.

The optimal functions are defined as

$V^*(s)=\sup_\pi V^{\pi}(s),$

and

$Q^*(s,a)=\sup_\pi Q^\pi (s,a).$

Here we assume that we have a reward function, and this reward function is used to determine an optimal policy.

## The Res-Net-NODE Narrative

Humble Beginnings: Ordinary Differential Equations

The story begins with differential equations. Consider $f$ such that $f:[0,T]\times \mathbb{R}^n\to \mathbb{R}^n$ is a continuous function. We can construct a rather simple differential equation given this in the following way. We let

$\begin{cases} {y'(t)}=f(t,y(t))\\ y(0)=y_0\in \mathbb{R}^n \end{cases}$

A solution to this system is a continuous map that is defined in the neighbourhood of $t=0$ such that this map satisfies the differential equation.

Ordinary differential equations are well-studied, and we know that, for example, a solution to the given differential equation will exist whenever the function $f$ satisfies the following:

$(\forall x,y\in \mathbb{R}^n)(\exists C>0 (\in \mathbb{R}))(||f(t,y)-f(t,x)||\leq C||y-x||)$

This property is known as Lipschitz continuity. A function that satisfies this condition is said to be Lipschitz. We shall see that whenever we require this condition, on up coming situations, wonderful things happen!

A remarkable field that almost always couples with differential equations is numerical analysis, where we learn to solve differential equations numerically and we study these numerical schemes. We shall explore numerical integration briefly.…

By | March 18th, 2020|Uncategorized|Comments Off on The Res-Net-NODE Narrative

## Scaled Reinforcement Learning: A Brief Introduction to Deep Q-Learning

This blog post is a direct translation of a talk that was given by the author on the 17th of February 2020. The ideas was to very briefly introduce Deep Q-Learning to an audience that was familiar with the fundamental concepts of reinforcement learning. If the person reading this is not familiar with these basics, then a very great introduction can be found here: An Introduction to Reinforcement Learning. Without the additional details from the talk, one will note that this post is rather brief, and should really be used as a tool to gain an overview for the method or a gateway to relevant resources. This will not the case for posts later in the series, because the intention is to deal more with the mathematical aspect of reinforcement learning.

Basic Reinforcement Learning Notions

The idea behind reinforcement learning is that there is an agent that interacts with the environment in order to achieve a certain task.…

## Review: Calculus Reordered

Book title: Calculus Reordered: A History of the Big Ideas
Author : David M. Bressoud

Princeton University Press
Link to the book: Calculus Reordered: A History of the Big Ideas

Discussions on the history of different fields are usually dry, wordy and generally, when you are studying the field, hard to read. This is because they are usually geared towards the general audience, and in doing so most authors tend to strip away the very exciting technical details. I expected the same treatment from the author, but I was pleasantly surprised.

The book contains $5$ chapters, which are the following:

1) Accumulations
2) Ratios of Change
3) Sequences of Partial Sums
4) The Algebra of Inequalities
5) Analysis

Each of these chapters has a central theme that is being covered, but they are not at all disjoint. For instance, the last three contain the history of concepts that would normally be found in a first course for Real Analysis, while the first two are essentially the more applied spectrum to serve as some form of motivation for going through all this trouble, although they can certainly stand on their own.…

## Linear Algebra for the Memes

I recently saw a post on Quora asking what people generally find exciting about Linear Algebra, and it really took me back, since Linear Algebra was the first thing in the more modern part of mathematics that I fell in love with, thanks to Dr Erwin. I decided to write a Mathemafrica post on concepts that I believe are foundational in Linear Algebra, or at least concepts whose beauty almost gets me in tears (of course this is only a really small part of what you would expect to see in a proper first Linear Algebra course). I did my best to keep it as fluffy as I saw necessary. I hope you will find some beauty as well in the content. If not, then maybe it will be useful for the memes. The post is incomplete as it stands. It has been suggested that this can be made more accessible to a wider audience than as it stands by possibly building up on it, so I shall work on that, but for now, enjoy this!

## Basics of Vector Representation

Not so long ago, I started reading some linear algebra, just out of interest. I was uncertain about whether or not I would understand the concepts, or if it would be worth it to go through all the trouble. I can now say that it was worth it. Honestly, it was the most frustrating, but at the same time rewarding, experience. I have come to realise that there are things that we often have to accept without knowing the beauty of the logic behind their existence, and the idea presented here is one of them. This post answers a simple question about vector notation.

You might have asked yourself at some point in your life (… or maybe you haven’t, but you should): Why is it “legal” to write a vector,$A$, as ${ A=(a_{1},a_{2},\ldots,a_{n}) }$, and why can we switch between different notations without finding trouble (for example, we can represent the vector in the form: ${A = \sum\ a_ia^*_i}$ ) ?…

## Counting to Infinity

I have decided to share something which I found interesting while reading up some Mathematics this holiday.

The idea I am going to talk about is that of the cardinality of a set. In simple terms,

Definition 1: Cardinality is a “measure  of the size” of a set.

Example:
Suppose that we have a set, $A$, such that $A=\{a,b,c,d,e\}$. The cardinality of the set, denoted by $|A|$, is $5$, because there are $5$ elements.
It is indeed worth noting that unlike ‘lists’, in Mathematics, order and number of elements doesn’t determine much concerning the identity  of a set. This means that

$\{a,b,c\}=\{a,a,a,b,c\}=\{a,b,b,b,c,c\}=\{a,...,b,...,c, ...\}$

as long as you use the same elements, they are all equal. Of course, cardinalities would be the same, because all of them are a representation of the same mathematical entity.

Let’s  talk about something slightly more interesting. Suppose that you have another set with infinitely many elements, like the set of natural numbers, $N$, or real numbers $\mathbb{R}$.…