Wilhelmi F, Cano C, Neu G, Bellalta B, Jonsson A, Barrachina-Muñoz S. Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits. Ad Hoc Networks
We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:
Artificial Intelligence |
Nonlinear Time Series Analysis |
Web Research |
Music Technology |
Interactive Technologies |
Barcelona MedTech |
Natural Language Processing |
Nonlinear Time Series Analysis |
UbicaLab |
Wireless Networking |
Educational Technologies |
Wilhelmi F, Cano C, Neu G, Bellalta B, Jonsson A, Barrachina-Muñoz S. Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits. Ad Hoc Networks
Wilhelmi F, Cano C, Neu G, Bellalta B, Jonsson A, Barrachina-Muñoz S. Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits. Ad Hoc Networks
Next-generation wireless deployments are characterized by being dense and uncoordinated, which often leads to inefficient use of resources and poor performance. To solve this, we envision the utilization of completely decentralized mechanisms that enhance Spatial Reuse (SR). In particular, we concentrate in Reinforcement Learning (RL), and more specifically, in Multi-Armed Bandits (MABs), to allow networks to modify both their transmission power and channel based on their experienced throughput. In this work, we study the exploration-exploitation trade-off by means of the ε-greedy, EXP3, UCB and Thompson sampling action-selection strategies. Our results show that optimal proportional fairness can be achieved, even if no information about neighboring networks is available to the learners and WNs operate selfishly. However, there is high temporal variability in the throughput experienced by the individual networks, specially for ε-greedy and EXP3. We identify the cause of this variability to be the adversarial setting of our setup in which the set of most played actions provide intermittent good/poor performance depending on the neighboring decisions. We also show that this variability is reduced using UCB and Thompson sampling, which are parameter-free policies that perform exploration according to the reward distribution of each action.
https://doi.org/10.1016/j.adhoc.2019.01.006
Additional material:
- Article in arXiv: https://arxiv.org/abs/1710.11403
- Software in GitHub: https://github.com/wn-upf/Collaborative_SR_in_WNs_via_Selfish_MABs
- Dataset with results in Zenodo: https://doi.org/10.5281/zenodo.1036737