菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-19
📄 Abstract - CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization

Subcellular localization is a crucial biological task for drug target identification and function annotation. Although it has been biologically realized that subcellular localization is closely associated with protein structure, no existing dataset offers comprehensive 3D structural information with detailed subcellular localization annotations, thus severely hindering the application of promising structure-based models on this task. To address this gap, we introduce a new benchmark called $\mathbf{CAPSUL}$, a $\mathbf{C}$omprehensive hum$\mathbf{A}$n $\mathbf{P}$rotein benchmark for $\mathbf{SU}$bcellular $\mathbf{L}$ocalization. It features a dataset that integrates diverse 3D structural representations with fine-grained subcellular localization annotations carefully curated by domain experts. We evaluate this benchmark using a variety of state-of-the-art sequence-based and structure-based models, showcasing the importance of involving structural features in this task. Furthermore, we explore reweighting and single-label classification strategies to facilitate future investigation on structure-based methods for this task. Lastly, we showcase the powerful interpretability of structure-based methods through a case study on the Golgi apparatus, where we discover a decisive localization pattern $\alpha$-helix from attention mechanisms, demonstrating the potential for bridging the gap with intuitive biological interpretability and paving the way for data-driven discoveries in cell biology.

顶级标签: biology medical benchmark
详细标签: protein localization structural biology bioinformatics dataset interpretability 或 搜索:

CAPSUL:用于蛋白质亚细胞定位的综合性人类蛋白质基准数据集 / CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization


1️⃣ 一句话总结

这篇论文提出了一个名为CAPSUL的新基准数据集,它整合了人类蛋白质的3D结构信息和精细的亚细胞定位标注,并通过实验证明引入结构特征能显著提升定位预测性能,同时展示了基于结构的方法具有良好的生物学可解释性。

源自 arXiv: 2603.18571