菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-20
📄 Abstract - Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to the International Classification of Diseases (ICD) using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Utilizing a specialized dataset of 145,513 Spanish psychiatric descriptions, various text representation paradigms were evaluated, ranging from classical frequency-based models (BoW, TF-IDF) to state-of-the-art Large Language Models (LLMs) such as e5\_large, BioLORD, and Llama-3-8B. Results indicate that transformer-based embeddings consistently outperform traditional methods by capturing implicit semantic cues and nuanced medical terminology. The e5\_large model, through end-to-end fine-tuning, achieved the highest performance with a $F1_{micro}$ score of 0.866. This research demonstrates that adapting LLMs to specific clinical nomenclature is essential for overcoming the challenges of ``long-tail'' label distributions and the inherent ambiguity of psychiatric discourse.

顶级标签: medical natural language processing llm
详细标签: icd classification psychiatric diagnoses clinical nlp text embeddings spanish dataset 或 搜索:

精神病诊断的自动ICD分类:从经典自然语言处理到大语言模型 / Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models


1️⃣ 一句话总结

该研究利用从传统词频模型到最新大语言模型的多种自然语言处理技术,自动将精神科医生的自由文本诊断描述映射为国际疾病分类(ICD)编码,并发现基于Transformer的深度学习模型(尤其是经过微调的e5_large)在捕捉语义细节和处理罕见标签方面显著优于传统方法,最高F1分数达到0.866。

源自 arXiv: 2605.21154