菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-10
📄 Abstract - PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue

Document Layout Analysis (DLA) is crucial for document artificial intelligence and has recently received increasing attention, resulting in an influx of large-scale public DLA datasets. Existing work often combines data from various domains in recent public DLA datasets to improve the generalization of DLA. However, directly merging these datasets for training often results in suboptimal model performance, as it overlooks the different layout structures inherent to various domains. These variations include different labeling styles, document types, and languages. This paper introduces PromptDLA, a domain-aware Prompter for Document Layout Analysis that effectively leverages descriptive knowledge as cues to integrate domain priors into DLA. The innovative PromptDLA features a unique domain-aware prompter that customizes prompts based on the specific attributes of the data domain. These prompts then serve as cues that direct the DLA toward critical features and structures within the data, enhancing the model's ability to generalize across varied domains. Extensive experiments show that our proposal achieves state-of-the-art performance among DocLayNet, PubLayNet, M6Doc, and D$^4$LA. Our code is available at this https URL.

顶级标签: computer vision natural language processing multi-modal
详细标签: document layout analysis domain adaptation prompt engineering document understanding visual document processing 或 搜索:

PromptDLA:一个以描述性知识为线索的领域感知提示文档布局分析框架 / PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue


1️⃣ 一句话总结

这篇论文提出了一个名为PromptDLA的新框架,它通过一个能根据数据领域特点自动生成提示的‘领域感知提示器’,将领域先验知识作为线索来指导模型,从而有效提升了文档布局分析模型在混合不同领域数据训练时的泛化能力和性能。

源自 arXiv: 2603.09414