Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos [2303.08345]