We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:


 Artificial Intelligence

 Nonlinear Time Series Analysis

 Web Research 


 Music Technology

 Interactive  Technologies

 Barcelona MedTech

 Natural Language  Processing

 Nonlinear Time Series  Analysis


Wireless Networking

Educational Technologies




Back [MSc thesis] Cross-Entropy method for Kullback-Leibler control in multi-agent systems

[MSc thesis] Cross-Entropy method for Kullback-Leibler control in multi-agent systems

Author: Beatriz Cabrero Daniel

Supervisor: Mario Ceresa, Vicenç Gómez

MSc program: Master in Intelligent Interactive Systems

We consider the problem of computing optimal control policies in large-scale multiagent systems, for which the standard approach via the Bellman equation is intractable. Our formulation is based on the Kullback-Leibler control framework, also known as Linearly-Solvable Markov Decision Problems. In this setting, adaptive importance sampling methods have been derived that, when combined with function approximation, can be effective for high-dimensional systems. Our approach iteratively learns an importance sampler from which the optimal control can be extracted and requires to simulate and reweight agents’ trajectories in the world multiple times. We illustrate our approach through a modified version of the popular stag-hunt game; in this scenario, there is a multiplicity of optimal policies depending on the “temperature” parameter of the environment. The system is built inside Pandora, a multi-agent-based modeling framework and toolbox for parallelization, freeing us from dealing with memory management when running multiple simulations. By using function approximation and assuming some particular factorization of the system dynamics, we are able to scale-up our method to problems with M = 12 agents moving in two-dimensional grids of size N = 21×21, improving on existing methods that perform approximate inference on a temporal probabilistic graphical model.

Additional material: