Transformer tricks: Precomputing the first layer [2402.13388]