Events
We have weekly seminars, usually on wednesdays at noon, in which we regularly schedule invited speakers. We will keep this calendar updated (and send reminders on twitter/relevant mailing lists). Feel free to join, or ask for a zoom passcode if you're interested by an announced topic!
COLT- Seminar
Where: Room 52.737
We introduce an approach that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This test can be used to identify the most informative distance measure out of a pool of candidates, to compare the representations in deep neural networks, and to infer causali
COLT- Seminar
Where: Room 52.737
COLT- Seminar
When: September 13th, 2023, 12.00 to 13.00
Where: Room 52.737
Speaker: Luca Moschella, Sapienza University of Rome
Title: Leveraging Emerging Similarities for Latent Space Communication
Description: Neural networks encode the complex structure of data manifolds in high-dimensional spaces through latent representations. The distribution of data points in the latent space should ideally rely solely on the task, data, loss, and specific architecture constraints. However, factors like random weight initialization, training hyperparameters, and other sources of randomness during training can lead to incoherent latent spaces that hinder reuse. Notably, a consistent phenomenon emerges when data semantics remain unchanged: the angles between encodings within distinct latent spaces exhibit similarity. During this talk, we will delve into two empirical strategies that harness this phenomenon, facilitating latent communication across diverse architectures and data modalities: Relative Projection: We will demonstrate the creation of a new, relative representation that inherently remains invariant against such transformations. Direct Transformation: We will showcase how prior knowledge about relationships/transformations between different spaces can directly guide the translation from one space to another.
In both cases, we facilitate efficient communication between latent spaces, bridging gaps between distinct domains, models, and modalities; enabling zero-shot model stitching, reuse and latent evaluation. This holds true for both generation and classification tasks, showcasing the versatility and applicability of these strategies.
COLT- Seminar
When: July 5th, 2023, 12.00 to 13.00
Where: Room 52.737
Speaker: Roberto Dessì
Title: Toolformer: Language Models Can Teach Themselves to Use Tools
Abstract: Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this talk I’ll present the Toolformer and show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. Toolformer is a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. Toolformer incorporates a range of tools, including a calculator, a Q&A system, two different search engines, a translation system, and a calendar. It achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
COLT- Seminar
When: June 29th, 2023, 12.00 to 13.00
Where: Room 52.737
Speaker: Mario Giulianelli, University of Amsterdam
Title: Using neural language generators as computational models of human language use.
Description: While natural language generation (NLG) systems are widely deployed in real-world applications, evidence that they faithfully reproduce aspects of human linguistic behaviour is scarce. In a first study, we analyse variability in human production, a characterising aspect of language production that is often overlooked in NLG research. We propose a statistical framework to quantify variability and to assess language generators' alignment to the production variability observed in humans. In a second study, we use the previously introduced statistical framework to define a novel measure of utterance surprisal that quantifies (un)predictability as distance from plausible alternatives (here, we generate alternatives using neural NLG systems). We test the psychometric predictive power of this measure, showing that it predicts human acceptability judgements better than (standard) probabilistic surprisal and that it is complementary to probabilistic surprisal as a predictor of utterance-level reading times. Overall, these two studies contribute new empirical evidence that neural language generators can be used for the computational modelling of human language use.
COLT- Seminar
When: June 12th, 2023, 14.30 to 15.30
Where: Room 52.737
Speaker: Milica Denić, Tel Aviv University
Title: Recursive numeral systems optimize the trade-off between lexicon size and average morphosyntactic complexity
Description: Human languages vary in terms of which meanings they lexicalize, but there are important constraints on this variation. It has been argued that languages are under two competing pressures: the pressure to be simple (e.g., to have a small lexicon size) and to allow for informative (i.e., precise) communication with their lexical items, and that which meanings get lexicalized may be explained by languages finding a good way to trade off between these two pressures (Kemp and Regier, 2012 and much subsequent work). However, in certain semantic domains, it is possible to reach very high levels of informativeness even if very few meanings from that domain are lexicalized. This is due to productive morphosyntax, which may allow for construction of meanings which are not lexicalized. Consider the semantic domain of natural numbers: many languages lexicalize few natural number meanings as monomorphemic expressions, but can precisely convey any natural number meaning using morphosyntactically complex numerals. In such semantic domains, lexicon size is not in direct competition with informativeness. What explains which meanings are lexicalized in such semantic domains? We will argue that in such cases, languages are (near-)optimal solutions to a different kind of trade-off problem: the trade-off between the pressure to lexicalize as few meanings as possible (i.e, to minimize lexicon size) and the pressure to produce as morphosyntactically simple utterances as possible (i.e, to minimize average morphosyntactic complexity of utterances). This study in conjunction with previous work on communicative efficiency suggests that, in order to explain which meanings get lexicalized across languages and across semantic domains, a more general approach may be that languages are finding a good way to trade off between not two but three pressures: be simple, be informative, and minimize average morphosyntactic complexity of utterances.
COLT- Seminar
Where: Room 52.737
Speaker: Mathieu Rita , ENS/CoML - INRIA/Microsoft Research
Bio:
I am a Ph.D. student under the supervision of Emmanuel Dupoux (ENS/FAIR), Olivier Pietquin (Google Brain) and Florian Strub (DeepMind). I work between the Inria-Microsoft Research Joint Lab and the CoML team in Paris, which is located at ENS Paris. Prior to that, I received an engineering degree from Ecole Polytechnique and a MSc degree in Mathematics, Computer Vision and Machine Learning from Ecole Normale Supérieure Paris-Saclay. My research explores the theoretical and experimental aspects of training RL objectives with language models, with a specific focus on constructing self-play multi-agent systems. I particularly investigate how scaling populations and generations of agents can help address language learning challenges, such as overfitting, exploration or drift. As an application, I simulate language evolution and study the pre-requisites necessary to the emergence of language universals, such as compositionality.
Title: Neural Communication Games
Description:
In this talk, I will present machine learning and information theoretical views of neural communication games. These perspectives provide insights into the dynamics of those games and explain alignments/misalignments between neural emergent communication results and empirical findings from cognitive science and socio-linguistics (population effects, iterated learning, etc.).
COLT- Seminar
COLT- Seminar
Where: Room 52.737
Speaker: Clément Romac , Hugging Face - INRIA
Bio: Now a first year PhD student jointly supervised by Pierre-Yves Oudeyer (FLOWERS, Inria) and Thomas Wolf (Hugging Face) studying how autonomous Deep RL agents can leverage large Language Models.
Title: Grounding LLMs in Interactive Environments with Online RL.
Description: Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. We study an approach to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5.
COLT- Seminar
Where: Room 52.737 or online on zoom
Speaker: Emanuele La Malfa
Abstract: In this presentation, I will be discussing the concept of robustness in the field of natural language processing (NLP) and the various research questions that can be explored through this lens. Specifically, I will delve into semantic robustness, which encompasses a broad range of linguistic phenomena and is a treatment-response notion that is distinct from the traditional idea of robustness in computer vision, which only takes into account word substitutions and deletions. Additionally, I will explore syntax robustness, which refers to a model’s ability to accurately represent language structures even when faced with manipulations. Furthermore, I will highlight how robustness can be used not only as a desirable property of a model, but also as a means of formally explaining a model’s decisions. Finally, I will share some of the projects I am currently working on, including one on the robustness of language models for code, and another on how to accurately measure robustness in the presence of input data noise.