Paper ID: 2111.10227

Policy Gradient Approach to Compilation of Variational Quantum Circuits

David A. Herrera-Martí

We propose a method for finding approximate compilations of quantum unitary transformations, based on techniques from policy gradient reinforcement learning. The choice of a stochastic policy allows us to rephrase the optimization problem in terms of probability distributions, rather than variational gates. In this framework, the optimal configuration is found by optimizing over distribution parameters, rather than over free angles. We show numerically that this approach can be more competitive than gradient-free methods, for a comparable amount of resources, both for noiseless and noisy circuits. Another interesting feature of this approach to variational compilation is that it does not need a separate register and long-range interactions to estimate the end-point fidelity, which is an improvement over methods which rely on the Hilbert-Schmidt test. We expect these techniques to be relevant for training variational circuits in other contexts.

Submitted: Nov 19, 2021