Paper ID: 2406.13827

Fine-Tuning BERTs for Definition Extraction from Mathematical Text

Lucy Horowitz, Ryan Hathaway

In this paper, we fine-tuned three pre-trained BERT models on the task of "definition extraction" from mathematical English written in LaTeX. This is presented as a binary classification problem, where either a sentence contains a definition of a mathematical term or it does not. We used two original data sets, "Chicago" and "TAC," to fine-tune and test these models. We also tested on WFMALL, a dataset presented by Vanetik and Litvak in 2021 and compared the performance of our models to theirs. We found that a high-performance Sentence-BERT transformer model performed best based on overall accuracy, recall, and precision metrics, achieving comparable results to the earlier models with less computational effort.

Submitted: Jun 19, 2024

Topics

Binary Classification
Pre Trained BERT
Fine Tuned BERT
Sentence BERT
Mathematical Text
Extraction Method

Links

arXiv PDF