List of results published directly linked with the projects co-funded by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Program (MDM-2015-0502).

List of publications acknowledging the funding in Scopus.

The record for each publication will include access to postprints (following the Open Access policy of the program), as well as datasets and software used. Ongoing work with UPF Library and Informatics will improve the interface and automation of the retrieval of this information soon.

The MdM Strategic Research Program has its own community in Zenodo for material available in this repository   as well as at the UPF e-repository   



Back Wilhelmi F, Cano C, Neu G, Bellalta B, Jonsson A, Barrachina-Muñoz S. Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits. Ad Hoc Networks

Wilhelmi FCano CNeu GBellalta BJonsson ABarrachina-Muñoz S. Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits. Ad Hoc Networks

Next-generation wireless deployments are characterized by being dense and uncoordinated, which often leads to inefficient use of resources and poor performance. To solve this, we envision the utilization of completely decentralized mechanisms that enhance Spatial Reuse (SR). In particular, we concentrate in Reinforcement Learning (RL), and more specifically, in Multi-Armed Bandits (MABs), to allow networks to modify both their transmission power and channel based on their experienced throughput. In this work, we study the exploration-exploitation trade-off by means of the ε-greedy, EXP3, UCB and Thompson sampling action-selection strategies. Our results show that optimal proportional fairness can be achieved, even if no information about neighboring networks is available to the learners and WNs operate selfishly. However, there is high temporal variability in the throughput experienced by the individual networks, specially for ε-greedy and EXP3. We identify the cause of this variability to be the adversarial setting of our setup in which the set of most played actions provide intermittent good/poor performance depending on the neighboring decisions. We also show that this variability is reduced using UCB and Thompson sampling, which are parameter-free policies that perform exploration according to the reward distribution of each action.

Additional material: