Paper ID: 2408.17101

Strategic Arms with Side Communication Prevail Over Low-Regret MAB Algorithms

Ahmed Ben Yahmed (CREST, ENSAE Paris), Clément Calauzènes, Vianney Perchet (CREST, ENSAE Paris)

In the strategic multi-armed bandit setting, when arms possess perfect information about the player's behavior, they can establish an equilibrium where: 1. they retain almost all of their value, 2. they leave the player with a substantial (linear) regret. This study illustrates that, even if complete information is not publicly available to all arms but is shared among them, it is possible to achieve a similar equilibrium. The primary challenge lies in designing a communication protocol that incentivizes the arms to communicate truthfully.

Submitted: Aug 30, 2024