Multimodal Agents Grounded via Interactive Communication (MAGIC)

Multimodal Agents Grounded via Interactive Communication (MAGIC)

One of the main goals of artificial intelligence is to build artificial agents that can interact with humans using natural language. To fully master language, an agent needs to know how to use it to accomplish a goal; to interact with another speaker; and to refer to objects in the external reality. My research project aims at equipping an artificial agent with all these skills in one single learning framework.
Communication helps humans accomplish things in the world and cooperate with each other, resulting in continuous and incremental updating of the speakers’ knowledge state. However, traditional machine learning methods used to model language are based on static and passive regimes, and are typically not grounded in external reality. I propose a radically different research programme, based on recent advancements in training neural networks using reinforcement learning, that will enable the move from a static, fully supervised to a dynamic, interactive learning where the agents need to use language to accomplish a task in the visual world.

Principal researchers

Elia Bruni