SHYI: Action Support for Contrastive Learning in High-Fidelity Text-to-Image Generation [2501.09055]