JobArabi:来自社交媒体的阿拉伯语招聘公告语料库与分析 / JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media
1️⃣ 一句话总结
该论文构建并分析了JobArabi——一个包含超过两万条来自社交平台X的阿拉伯语招聘公告的语料库,揭示了招聘语言中存在的性别倾向、地域差异和情绪化表达等社会语言学规律,为阿拉伯语自然语言处理和劳动市场研究提供了重要资源。
This paper introduces JobArabi, a large-scale corpus of Arabic job announcements collected from social media between January 2024 and October 2025. The dataset contains 20,528 public posts from X and captures more than two years of employment-related discourse across Arabic-speaking online communities. The corpus was compiled using a linguistically informed query framework covering 21 Arabic keyword families that reflect gendered, plural, formal, and dialectal expressions of recruitment language. The resulting dataset includes posts from institutional, commercial, and individual accounts and provides metadata such as timestamps, engagement indicators, and geolocation when available, enabling temporal and regional analysis of employment discourse. Quantitative analysis reveals several sociolinguistic patterns in online recruitment, including the persistence of gendered hiring language, regional variation in occupational demand, and the emotional framing of recruitment messages. These findings highlight the potential of Arabic social media as a resource for studying labor market communication and linguistic change. The JobArabi corpus, together with documentation and collection scripts, will be released to support research in Arabic NLP, computational social science, and digital labor studies.
JobArabi:来自社交媒体的阿拉伯语招聘公告语料库与分析 / JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media
该论文构建并分析了JobArabi——一个包含超过两万条来自社交平台X的阿拉伯语招聘公告的语料库,揭示了招聘语言中存在的性别倾向、地域差异和情绪化表达等社会语言学规律,为阿拉伯语自然语言处理和劳动市场研究提供了重要资源。
源自 arXiv: 2605.20960