Paper ID: 2409.14179
On Lexical Invariance on Multisets and Graphs
Muhan Zhang
In this draft, we study a novel problem, called lexical invariance, using the medium of multisets and graphs. Traditionally in the NLP domain, lexical invariance indicates that the semantic meaning of a sentence should remain unchanged regardless of the specific lexical or word-based representation of the input. For example, ``The movie was extremely entertaining'' would have the same meaning as ``The film was very enjoyable''. In this paper, we study a more challenging setting, where the output of a function is invariant to any injective transformation applied to the input lexical space. For example, multiset {1,2,3,2} is equivalent to multiset {a,b,c,b} if we specify an injective transformation that maps 1 to a, 2 to b and 3 to c. We study the sufficient and necessary conditions for a most expressive lexical invariant (and permutation invariant) function on multisets and graphs, and proves that for multisets, the function must have a form that only takes the multiset of counts of the unique elements in the original multiset as input. For example, a most expressive lexical invariant function on {a,b,c,b} must have a form that only operates on {1,1,2} (meaning that there are 1, 1, 2 unique elements corresponding to a,c,b). For graphs, we prove that a most expressive lexical invariant and permutation invariant function must have a form that only takes the adjacency matrix and a difference matrix as input, where the (i,j)th element of the difference matrix is 1 if node i and node j have the same feature and 0 otherwise. We perform synthetic experiments on TU datasets to verify our theorems.
Submitted: Sep 21, 2024