查询作为锚点:基于大语言模型的场景自适应用户表征 / Query as Anchor: Scenario-Adaptive User Representation via Large Language Model
1️⃣ 一句话总结
这篇论文提出了一个名为‘查询作为锚点’的新框架,它利用大语言模型,根据不同的业务查询动态生成用户表征,从而在工业应用中统一了通用性和任务敏感性,并在支付宝的实际场景中验证了其高效性和优越性能。
Industrial-scale user representation learning requires balancing robust universality with acute task-sensitivity. However, existing paradigms primarily yield static, task-agnostic embeddings that struggle to reconcile the divergent requirements of downstream scenarios within unified vector spaces. Furthermore, heterogeneous multi-source data introduces inherent noise and modality conflicts, degrading representation. We propose Query-as-Anchor, a framework shifting user modeling from static encoding to dynamic, query-aware synthesis. To empower Large Language Models (LLMs) with deep user understanding, we first construct UserU, an industrial-scale pre-training dataset that aligns multi-modal behavioral sequences with user understanding semantics, and our Q-Anchor Embedding architecture integrates hierarchical coarse-to-fine encoders into dual-tower LLMs via joint contrastive-autoregressive optimization for query-aware user representation. To bridge the gap between general pre-training and specialized business logic, we further introduce Cluster-based Soft Prompt Tuning to enforce discriminative latent structures, effectively aligning model attention with scenario-specific modalities. For deployment, anchoring queries at sequence termini enables KV-cache-accelerated inference with negligible incremental latency. Evaluations on 10 Alipay industrial benchmarks show consistent SOTA performance, strong scalability, and efficient deployment. Large-scale online A/B testing in Alipay's production system across two real-world scenarios further validates its practical effectiveness. Our code is prepared for public release and will be available at: this https URL.
查询作为锚点:基于大语言模型的场景自适应用户表征 / Query as Anchor: Scenario-Adaptive User Representation via Large Language Model
这篇论文提出了一个名为‘查询作为锚点’的新框架,它利用大语言模型,根据不同的业务查询动态生成用户表征,从而在工业应用中统一了通用性和任务敏感性,并在支付宝的实际场景中验证了其高效性和优越性能。
源自 arXiv: 2602.14492