While reinforcement learning has achieved recent success in many challenging domains, these methods generally require millions of samples from the environment to learn optimal behaviours, limiting their real-world applicability. A major challenge is thus in designing sample-efficient agents that can transfer their existing knowledge to solve new tasks quickly. This is particularly important for agents in a multitask or lifelong setting, since learning to solve complex tasks from scratch is typically impractical. This talk discusses how an agent can autonomously learn a particular family of goal-oriented value functions, which can then be used to solve new tasks through composition. First, for goal-reaching tasks expressible as Boolean sentences, we show how these value functions can be combined to solve these new tasks optimally. We further demonstrate how an agent can combine these value functions to produce near-optimal behaviours given complex temporal task specifications, such as regular fragments of linear temporal logic, without further learning.
Steven James is a senior lecturer at the University of the Witwatersrand, South Africa. He received his PhD from the same institute in 2021, where he was also the first African recipient of a Google PhD fellowship in machine learning. As co-PI of the RAIL lab, his interests revolve around reinforcement learning and planning, with a specific focus on creating agents capable of learning and reusing knowledge over their lifetimes to solve as many tasks as possible.
25 October 2023