From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities [2410.02155]