Transformer Based Variational

Transformer-based variational autoencoders (VAEs) aim to leverage the strengths of transformers for improved generative modeling, particularly focusing on enhanced semantic control and diversity in generated outputs. Current research emphasizes incorporating structural information (e.g., syntactic structures, sentiment) into the latent space through techniques like vector quantization, manifold learning, and recurrent mechanisms within the transformer architecture. This approach promises more nuanced and controllable generation in various applications, including natural language processing, computer vision (e.g., motion synthesis), and interactive systems (e.g., live video commenting).

Papers