基于跨度建模与跨度对比损失的习语及比喻语言检测 / Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss
1️⃣ 一句话总结
这篇论文提出了一种基于BERT和RoBERTa的模型,通过结合槽位损失和一种新的跨度对比损失来提升对习语等非字面含义表达的检测能力,在现有数据集上取得了当前最佳性能。
The category of figurative language contains many varieties, some of which are non-compositional in nature. This type of phrase or multi-word expression (MWE) includes idioms, which represent a single meaning that does not consist of the sum of its words. For language models, this presents a unique problem due to tokenization and adjacent contextual embeddings. Many large language models have overcome this issue with large phrase vocabulary, though immediate recognition frequently fails without one- or few-shot prompting or instruction finetuning. The best results have been achieved with BERT-based or LSTM finetuning approaches. The model in this paper contains one such variety. We propose BERT- and RoBERTa-based models finetuned with a combination of slot loss and span contrastive loss (SCL) with hard negative reweighting to improve idiomaticity detection, attaining state of the art sequence accuracy performance on existing datasets. Comparative ablation studies show the effectiveness of SCL and its generalizability. The geometric mean of F1 and sequence accuracy (SA) is also proposed to assess a model's span awareness and general performance together.
基于跨度建模与跨度对比损失的习语及比喻语言检测 / Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss
这篇论文提出了一种基于BERT和RoBERTa的模型,通过结合槽位损失和一种新的跨度对比损失来提升对习语等非字面含义表达的检测能力,在现有数据集上取得了当前最佳性能。
源自 arXiv: 2603.22799