ShotCrop$^3$: Cropping Human-Centric Images into Cinematic Triple-Shot Compositions

📄 Abstract - ShotCrop$^3$: Cropping Human-Centric Images into Cinematic Triple-Shot Compositions

Prior work on aesthetic composition typically produces a single aesthetically pleasing crop, overlooking the narrative value of composing multiple shots from one scene. In practice, multi-shot composition is critical for downstream creative workflows: commercial posters often require multiple crops with different emphases (e.g., context, subject, and emotion/product details) to present key story beats. Therefore, we propose \textbf{Triple-Shot Compositions (TSC)}, a composition task that generates a three-shot set -- establishing, medium, and close-up -- from a single human-centric image, each paired with a brief shot description to support visual narration. To learn TSC with limited expert annotations, we introduce \textbf{ShotCrop} which undergoes a three-stage training process: it first applies Chain-of-Thought supervised fine-tuning to establish basic reasoning and aesthetic shot-cropping skills, then performs semi-supervised fine-tuning with high-confidence pseudo labels to further enhance aesthetic capability, and is finally optimized with Group Relative Policy Optimization for \textbf{ShotCrop} (GRPO-S) using a composite reward tailored for it. Specifically, our pseudo-labeling strategy combines MLLM-based scoring, aesthetic assessment, and CLIP similarity to retain high-confidence training signals. In addition, we present TSC-Bench, a benchmark of 1.2k expert-annotated test cases. Notably, ShotCrop achieves an average improvement of \textbf{2.82} times over GPT-5 in shot localization accuracy.

ShotCrop³：将人物图像裁剪为电影化的三镜头构图 / ShotCrop$^3$: Cropping Human-Centric Images into Cinematic Triple-Shot Compositions

1️⃣ 一句话总结

本文提出一种新任务——从单张人物照片中自动生成三个不同视角和叙事功能的裁剪版本（广角交代环境、中景聚焦主体、特写强调细节），并开发了名为ShotCrop的算法，通过分阶段训练（先学基础推理、再用伪标签提升美感、最后用强化学习优化）实现高质量多镜头构图，其定位精度比GPT-5高出近3倍。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要