DTIC MdM Strategic Program: Artificial and Natural Intelligence for ICT and beyond (UPF)

Learn more

Details

See the videos

The second Maria de Maeztu Strategic Research Program (CEX2021-001195-M) of the Department of Information and Communication Technologies (DTIC) takes place between 2023 and 2026. The website for this program is under construction. You can find some details in this news.

The first María de Maeztu Strategic Research Program (MDM-2015-0502) took place between January 2016 and June 2020. It was focused on data-driven knowledge extraction, boosting synergistic research initiatives across our different research areas.

Back Wilhelmi F, Cano C, Neu G, Bellalta B, Jonsson A, Barrachina-Muñoz S. Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits. Ad Hoc Networks

Wilhelmi F, Cano C, Neu G, Bellalta B, Jonsson A, Barrachina-Muñoz S. Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits. Ad Hoc Networks

Next-generation wireless deployments are characterized by being dense and uncoordinated, which often leads to inefficient use of resources and poor performance. To solve this, we envision the utilization of completely decentralized mechanisms that enhance Spatial Reuse (SR). In particular, we concentrate in Reinforcement Learning (RL), and more specifically, in Multi-Armed Bandits (MABs), to allow networks to modify both their transmission power and channel based on their experienced throughput. In this work, we study the exploration-exploitation trade-off by means of the ε-greedy, EXP3, UCB and Thompson sampling action-selection strategies. Our results show that optimal proportional fairness can be achieved, even if no information about neighboring networks is available to the learners and WNs operate selfishly. However, there is high temporal variability in the throughput experienced by the individual networks, specially for ε-greedy and EXP3. We identify the cause of this variability to be the adversarial setting of our setup in which the set of most played actions provide intermittent good/poor performance depending on the neighboring decisions. We also show that this variability is reduced using UCB and Thompson sampling, which are parameter-free policies that perform exploration according to the reward distribution of each action.

https://doi.org/10.1016/j.adhoc.2019.01.006

Additional material:

Article in arXiv: https://arxiv.org/abs/1710.11403
Software in GitHub: https://github.com/wn-upf/Collaborative_SR_in_WNs_via_Selfish_MABs
Dataset with results in Zenodo: https://doi.org/10.5281/zenodo.1036737

Link: https://www.sciencedirect.com/science/article/pii/S1570870518302646

Related Assets

Department of Information and Communication Technologies, UPF

Grant CEX2021-001195-M funded by MCIN/AEI /10.13039/501100011033

Department of Information and Communication Technologies, UPF

[email protected]

Àngel Lozano - Scientific director
Aurelio Ruiz - Program management