菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal

Memes represent a tightly coupled, multimodal form of social expression, in which visual context and overlaid text jointly convey nuanced affect and commentary. Inspired by cognitive reappraisal in psychology, we introduce Meme Reappraisal, a novel multimodal generation task that aims to transform negatively framed memes into constructive ones while preserving their underlying scenario, entities, and structural layout. Unlike prior works on meme understanding or generation, Meme Reappraisal requires emotion-controllable, structure-preserving multimodal transformation under multiple semantic and stylistic constraints. To support this task, we construct MER-Bench, a benchmark of real-world memes with fine-grained multimodal annotations, including source and target emotions, positively rewritten meme text, visual editing specifications, and taxonomy labels covering visual type, sentiment polarity, and layout structure. We further propose a structured evaluation framework based on a multimodal large language model (MLLM)-as-a-Judge paradigm, decomposing performance into modality-level generation quality, affect controllability, structural fidelity, and global affective alignment. Extensive experiments across representative image-editing and multimodal-generation systems reveal substantial gaps in satisfying the constraints of structural preservation, semantic consistency, and affective transformation. We believe MER-Bench establishes a foundation for research on controllable meme editing and emotion-aware multimodal generation. Our code is available at: this https URL.

顶级标签: multi-modal natural language processing benchmark
详细标签: meme generation emotion control multimodal evaluation affective computing image-text editing 或 搜索:

MER-Bench:一个用于多模态表情包再评价的综合基准 / MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal


1️⃣ 一句话总结

这篇论文提出了一个名为‘表情包再评价’的新任务,旨在将负面情绪的表情包自动转化为积极正面的版本,并为此创建了一个包含详细标注的基准数据集和一套基于多模态大模型的评估体系,以推动可控的表情包编辑和情感感知的多模态内容生成研究。

源自 arXiv: 2603.15020