Paper ID: 2410.11391

Benchmarking Data Efficiency in $Δ$-ML and Multifidelity Models for Quantum Chemistry

Vivin Vinod, Peter Zaspel

The development of machine learning (ML) methods has made quantum chemistry (QC) calculations more accessible by reducing the compute cost incurred in conventional QC methods. This has since been translated into the overhead cost of generating training data. Increased work in reducing the cost of generating training data resulted in the development of $\Delta$-ML and multifidelity machine learning methods which use data at more than one QC level of accuracy, or fidelity. This work compares the data costs associated with $\Delta$-ML, multifidelity machine learning (MFML), and optimized MFML (o-MFML) in contrast with a newly introduced Multifidelity$\Delta$-Machine Learning (MF$\Delta$ML) method for the prediction of ground state energies over the multifidelity benchmark dataset QeMFi. This assessment is made on the basis of training data generation cost associated with each model and is compared with the single fidelity kernel ridge regression (KRR) case. The results indicate that the use of multifidelity methods surpasses the standard $\Delta$-ML approaches in cases of a large number of predictions. For cases, where $\Delta$-ML method might be favored, such as small test set regimes, the MF$\Delta$-ML method is shown to be more efficient than conventional $\Delta$-ML.

Submitted: Oct 15, 2024