Paper ID: 2410.14719
A Transformer Based Generative Chemical Language AI Model for Structural Elucidation of Organic Compounds
Xiaofeng Tan
For over half a century, computer-aided structural elucidation systems (CASE) for organic compounds have relied on complex expert systems with explicitly programmed algorithms. These systems are often computationally inefficient for complex compounds due to the vast chemical structural space that must be explored and filtered. In this study, we present a transformer based generative chemical language artificial intelligence (AI) model, an innovative end-to-end architecture designed to replace the logic and workflow of the classic CASE framework for ultra-fast and accurate spectroscopic-based structural elucidation. Our model employs an encoder-decoder architecture and self-attention mechanisms, similar to those in large language models, to directly generate the most probable chemical structures that match the input spectroscopic data. This approach demonstrates the potential of transformer based generative AI to accelerate traditional scientific problem-solving processes. The model's ability to iterate quickly based on new data highlights its potential for rapid advancements in structural elucidation.
Submitted: Oct 13, 2024