Paper ID: 2301.03679

Transformers as Policies for Variable Action Environments

Niklas Zwingenberger

In this project we demonstrate the effectiveness of the transformer encoder as a viable architecture for policies in variable action environments. Using it, we train an agent using Proximal Policy Optimisation (PPO) on multiple maps against scripted opponents in the Gym-$\mu$RTS environment. The final agent is able to achieve a higher return using half the computational resources of the next-best RL agent, which used the GridNet architecture. The source code and pre-trained models are available here: https://github.com/NiklasZ/transformers-for-variable-action-envs

Submitted: Jan 9, 2023