Paper ID: 2304.10494

Infinite Physical Monkey: Do Deep Learning Methods Really Perform Better in Conformation Generation?

Haotian Zhang, Jintu Zhang, Huifeng Zhao, Dejun Jiang, Yafeng Deng

Conformation Generation is a fundamental problem in drug discovery and cheminformatics. And organic molecule conformation generation, particularly in vacuum and protein pocket environments, is most relevant to drug design. Recently, with the development of geometric neural networks, the data-driven schemes have been successfully applied in this field, both for molecular conformation generation (in vacuum) and binding pose generation (in protein pocket). The former beats the traditional ETKDG method, while the latter achieves similar accuracy compared with the widely used molecular docking software. Although these methods have shown promising results, some researchers have recently questioned whether deep learning (DL) methods perform better in molecular conformation generation via a parameter-free method. To our surprise, what they have designed is some kind analogous to the famous infinite monkey theorem, the monkeys that are even equipped with physics education. To discuss the feasibility of their proving, we constructed a real infinite stochastic monkey for molecular conformation generation, showing that even with a more stochastic sampler for geometry generation, the coverage of the benchmark QM-computed conformations are higher than those of most DL-based methods. By extending their physical monkey algorithm for binding pose prediction, we also discover that the successful docking rate also achieves near-best performance among existing DL-based docking models. Thus, though their conclusions are right, their proof process needs more concern.

Submitted: Mar 8, 2023