菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-26
📄 Abstract - Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching

Slide-based teaching is widely used in higher education, yet in online, hybrid, and asynchronous contexts, slides often lose the instructor presence, narrative continuity, and expressive framing that help learners connect with content. Full lecture video can partly restore these qualities, but it is time-consuming to record, revise, and reuse. This study addresses that pedagogical and production challenge by presenting a practice-based analysis of an open-source workflow for creating talking slide avatars for slide-based teaching. The workflow integrates OpenVoice for text-to-speech generation and voice cloning with Ditto-TalkingHead for audio-driven talking-image synthesis, enabling instructors to transform a script and a static portrait into a short narrated video that can be embedded in slide decks or HTML-based lecture materials. Rather than treating this workflow merely as a technical solution, the study frames talking slide avatars as multimodal communication artifacts at the intersection of digital pedagogy, aesthetic education, and art-technology practice. Using a practice-based implementation and analytic reflection approach, the study documents the production pipeline, examines its communicative and aesthetic affordances, and proposes practical guidelines for script length, image selection, pacing, disclosure, accessibility, and ethical use. The study makes three primary contributions: it presents an educator-oriented open-source production model, reframes talking avatars as an educational communication design problem, and proposes a responsible pathway for incorporating generative synthetic media into teaching. It concludes that short, transparent, and carefully designed avatars can humanize slide-based instruction while providing a reusable communicative layer for introductions, transitions, reminders, and recaps across online, hybrid, and asynchronous learning environments.

顶级标签: multi-modal aigc education
详细标签: talking avatar text-to-speech slide teaching open-source workflow synthetic media 或 搜索:

会说话的幻灯片虚拟人:面向教学的开源多模态沟通方法 / Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching


1️⃣ 一句话总结

本文介绍了一种免费、可复用的方法,让老师仅用一张照片和一段文字就能生成一个会说话的虚拟人视频,并嵌入到PPT或网页课件中,从而在网课、混合式教学中增加教师形象,提升学生的参与感,同时避免了录制全程视频的高昂时间成本。

源自 arXiv: 2604.23703