Automatic Evaluation of Language Generation Technology Based on Structure Alignment

Katsuki Chousa, Tsutomu Hirao

January, 2025

Abstract

Language generation techniques require automatic evaluation to carry out efficient and reproducible experiments. While n-gram matching is standard, it fails to capture semantic equivalence with different wording. Recent methods have addressed this issue by using contextual embeddings from pre-trained language models to compute the similarity between reference and hypothesis. However, these methods frequently disregard the syntax of sentences, despite its crucial role in determining meaning, and thus assign unjustifiably high scores. This paper proposes an automatic evaluation metric that considers both the words in sentences and their syntactic structures. We integrate syntactic information into the recent embedding-based approach. Experimental results obtained from two NLP tasks show that our method is at least comparable to standard baselines.

Type

Conference paper

Publication

Proceedings of the 31st International Conference on Computational Linguistics

Add the full text or supplementary notes for the publication here using Markdown formatting.