Bandit Convex
Bandit convex optimization (BCO) focuses on efficiently minimizing convex functions when only function values (not gradients) are observable, a challenge arising in many real-world applications. Current research emphasizes developing algorithms with improved regret bounds—measuring the cumulative difference between the chosen actions and the optimal solution—across various settings, including adversarial, stochastic, and delayed feedback scenarios, often employing techniques like online Newton methods, gradient descent variants, and FTRL (Follow-the-Regularized-Leader). These advancements are significant for improving the efficiency of online learning and control systems in situations with limited information, impacting fields such as online advertising, robotics, and resource allocation.