Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures [2410.08971]