Joint Differentiable Optimization and Verification for Certified Reinforcement Learning [2201.12243]