菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-17
📄 Abstract - Perspectives - Interactive Document Clustering in the Discourse Analysis Tool Suite

This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements a flexible, aspect-focused document clustering pipeline with human-in-the-loop refinement capabilities. We showcase how this process can be initially steered by defining analytical lenses through document rewriting prompts and instruction-based embeddings, and further aligned with user intent through tools for refining clusters and mechanisms for fine-tuning the embedding model. The demonstration highlights a typical workflow, illustrating how DH researchers can leverage Perspectives's interactive document map to uncover topics, sentiments, or other relevant categories, thereby gaining insights and preparing their data for subsequent in-depth analysis.

顶级标签: natural language processing systems data
详细标签: document clustering interactive analysis digital humanities human-in-the-loop embedding fine-tuning 或 搜索:

视角:话语分析工具套件中的交互式文档聚类工具 / Perspectives - Interactive Document Clustering in the Discourse Analysis Tool Suite


1️⃣ 一句话总结

这篇论文介绍了一个名为‘视角’的交互式工具,它通过让数字人文学者参与聚类过程,帮助他们灵活地探索和整理海量、无结构的文档集,从而发现主题、情感等模式,为后续深度分析做好准备。

源自 arXiv: 2602.15540