菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-03
📄 Abstract - Distribution-Aware End-to-End Embedding for Streaming Numerical Features in Click-Through Rate Prediction

This paper explores effective numerical feature embedding for Click-Through Rate prediction in streaming environments. Conventional static binning methods rely on offline statistics of numerical distributions; however, this inherently two-stage process often triggers semantic drift during bin boundary updates. While neural embedding methods enable end-to-end learning, they often discard explicit distributional information. Integrating such information end-to-end is challenging because streaming features often violate the i.i.d. assumption, precluding unbiased estimation of the population distribution via the expectation of order statistics. Furthermore, the critical context dependency of numerical distributions is often neglected. To this end, we propose DAES, an end-to-end framework designed to tackle numerical feature embedding in streaming training scenarios by integrating distributional information with an adaptive modulation mechanism. Specifically, we introduce an efficient reservoir-sampling-based distribution estimation method and two field-aware distribution modulation strategies to capture streaming distributions and field-dependent semantics. DAES significantly outperforms existing approaches as demonstrated by extensive offline and online experiments and has been fully deployed on a leading short-video platform with hundreds of millions of daily active users.

顶级标签: machine learning model training data
详细标签: ctr prediction numerical embedding streaming features distribution estimation online learning 或 搜索:

面向流式数值特征的分布感知端到端嵌入方法,用于点击率预测 / Distribution-Aware End-to-End Embedding for Streaming Numerical Features in Click-Through Rate Prediction


1️⃣ 一句话总结

本文提出了一种名为DAES的新方法,它能够在线学习(流式训练)场景下,有效地将数值特征的分布信息融入点击率预测模型,从而显著提升预测准确性,并已成功应用于一个拥有数亿日活用户的短视频平台。

源自 arXiv: 2602.03223