In the first post we introduced inverse reinforcement learning, then we stated some result on the characterisation of admissible reward functions (i.e reward functions that solve the inverse reinforcement learning problem), then on the second post we saw a way in which we proceed with solving problems, more or less, using a maximum entropy framework, and we encountered two problems:
1. It would be hard to use the method introduced if we did not know the dynamics of the system already, and
2. We have to solve the MDP in the inner loop, which may be an expensive process.
Here, we shall attempt to mitigate the challenges that we have encountered, as before, and we shall give a rather beautiful closing which shall link concepts in this space of inverse reinforcement learning to ‘general’ machine learning structures, in particular generative adversarial networks.
Inverse Reinforcement Learning with Unknown Dynamics and Possibly Higher Dimensional Spaces
As we saw previously, the maximum entropy inverse reinforcement learning approach proceeds by defining the probability of a certain trajectory under the expert as being,
We mentioned that this is hard to compute in higher dimensional spaces.…