List of results published directly linked with the projects co-funded by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Program (MDM-2015-0502).

List of publications acknowledging the funding in Scopus.

The record for each publication will include access to postprints (following the Open Access policy of the program), as well as datasets and software used. Ongoing work with UPF Library and Informatics will improve the interface and automation of the retrieval of this information soon.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository   as well as at the UPF e-repository   

 

 

Back [MSc thesis] Cross-Entropy method for Kullback-Leibler control in multi-agent systems

[MSc thesis] Cross-Entropy method for Kullback-Leibler control in multi-agent systems

Author: Beatriz Cabrero Daniel

Supervisor: Mario Ceresa, Vicenç Gómez

MSc program: Master in Intelligent Interactive Systems

We consider the problem of computing optimal control policies in large-scale multiagent systems, for which the standard approach via the Bellman equation is intractable. Our formulation is based on the Kullback-Leibler control framework, also known as Linearly-Solvable Markov Decision Problems. In this setting, adaptive importance sampling methods have been derived that, when combined with function approximation, can be effective for high-dimensional systems. Our approach iteratively learns an importance sampler from which the optimal control can be extracted and requires to simulate and reweight agents’ trajectories in the world multiple times. We illustrate our approach through a modified version of the popular stag-hunt game; in this scenario, there is a multiplicity of optimal policies depending on the “temperature” parameter of the environment. The system is built inside Pandora, a multi-agent-based modeling framework and toolbox for parallelization, freeing us from dealing with memory management when running multiple simulations. By using function approximation and assuming some particular factorization of the system dynamics, we are able to scale-up our method to problems with M = 12 agents moving in two-dimensional grids of size N = 21×21, improving on existing methods that perform approximate inference on a temporal probabilistic graphical model.

Additional material: