YFACC: A Yor\`ub\'a speech-image dataset for cross-lingual keyword localisation through visual grounding [2210.04600]