Neural Bandit

Neural bandits are a class of algorithms that combine the power of neural networks with bandit algorithms to solve sequential decision-making problems where the reward function is unknown and potentially complex. Current research focuses on improving the efficiency and theoretical guarantees of these algorithms, exploring architectures like Thompson Sampling and Upper Confidence Bound (UCB) coupled with various neural network types, including Graph Convolutional Networks and Bayesian neural networks, to handle diverse data structures and feedback mechanisms. This field is significant for its applications in personalized recommendations, influence maximization, and optimizing large language model instructions, offering efficient solutions for problems with high-dimensional data and complex reward structures.

Papers