Tree Edit Distance

Tree edit distance measures the similarity between hierarchical data structures like trees, by quantifying the minimum number of edits (insertion, deletion, or renaming of nodes) needed to transform one tree into another. Current research focuses on applying tree edit distance to diverse domains, including code similarity analysis (using Abstract Syntax Trees), scientific discovery (via symbolic regression), and document understanding (extracting tables of contents from images). These applications leverage various models, from optimized algorithms for efficient computation to transformer-based architectures and multimodal approaches incorporating visual and textual information. The resulting improvements in accuracy and efficiency have significant implications for fields ranging from software engineering to scientific data analysis.

Papers