By David Gaddy
When learning to follow natural language instructions, neural networks tend to be very data hungry – they require a huge number of examples pairing language with actions in order to learn effectively. This post is about reducing those heavy data requirements by first watching actions in the environment before moving on to learning from language data. Inspired by the idea that it is easier to map language to meanings that have already been formed, we introduce a semi-supervised approach that aims to separate the formation of abstractions from the learning of language.
Empirically, we find that pre-learning of patterns in the environment can help us learn grounded language with much less data.
Before we dive into the details, let’s look at an example to see why neural networks struggle to learn from smaller amounts of data. For now, we’ll use examples from the SHRDLURN block stacking task, but later we’ll look at results on another environment.
Let’s put ourselves in the shoes of a model that is learning to follow instructions. Suppose we are given the single training example below, which pairs a language command with an action in the environment:
This example tells
Read the rest of this post here