ArkTS-CodeSearch: A Open-Source ArkTS Dataset for Code Retrieval

📄 Abstract - ArkTS-CodeSearch: A Open-Source ArkTS Dataset for Code Retrieval

ArkTS is a core programming language in the OpenHarmony ecosystem, yet research on ArkTS code intelligence is hindered by the lack of public datasets and evaluation benchmarks. This paper presents a large-scale ArkTS dataset constructed from open-source repositories, targeting code retrieval and code evaluation tasks. We design a single-search task, where natural language comments are used to retrieve corresponding ArkTS functions. ArkTS repositories are crawled from GitHub and Gitee, and comment-function pairs are extracted using tree-sitter-arkts, followed by cross-platform deduplication and statistical analysis of ArkTS function types. We further evaluate all existing open-source code embedding models on the single-search task and perform fine-tuning using both ArkTS and TypeScript training datasets, resulting in a high-performing model for ArkTS code understanding. This work establishes the first systematic benchmark for ArkTS code retrieval. Both the dataset and our fine-tuned model will be released publicly and are available at this https URL and this https URL the first systematic benchmark for ArkTS code retrieval.

ArkTS-CodeSearch：一个用于代码检索的开源ArkTS数据集 / ArkTS-CodeSearch: A Open-Source ArkTS Dataset for Code Retrieval

1️⃣ 一句话总结

这篇论文创建了首个面向OpenHarmony生态核心语言ArkTS的大规模公开数据集和评估基准，并通过训练模型提升了用自然语言查询匹配ArkTS代码的能力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要