Two technical threads dominate Week 11 of 2026 (March 8–14) in recommendation system research. First, generative recommendation (GR) is undergoing full-stack optimization — transitioning from "making it work" to "making it work well, fast, and fairly" — Netflix/Meta's exponential reward-weighted SFT addresses post-training alignment, LinkedIn's causal attention reformulation halves sequence length, Kuaishou's FP8 quantization reduces OneRec-V2 inference latency by 49%, and Alibaba's differentiable geometric indexing eliminates long-tail bias at its root. Five papers advance GR's industrial maturity across five dimensions. Second, LLM-based recommendation is shifting from "single-pass inference" toward an agentic paradigm — Meta's VRec inserts verification steps into reasoning chains, Meituan's RecPilot replaces traditional recommendation lists with a multi-agent framework, USTC's TriRec introduces tri-party coordination for the first time, and RUC/JD's RecThinker enables autonomous tool invocation.
Industrial recommendation ranking shifts to systematic scaling engineering. Alibaba's SORT achieves orders +6.35%, Kuaishou's FlashEvaluator and SOLAR optimize evaluator and attention efficiency, ByteDance's HAP enables adaptive compute budget allocation. Generative recommendation enters objective alignment phase. 36 papers analyzed.
本周共收录 23 篇推荐系统相关论文,其中 5 分论文 5 篇,4 分 10 篇,3 分 8 篇,整体质量出色。Generative Recommendation(生成式推荐) 是本周最显著的技术主线,6 篇论文直接聚焦于此,涵盖 Semantic ID 编码、受限解码优化、广告场景部署和多任务统一框架。另一条主线是 LLM 与推荐系统的融合范式——"LLM-as-Rec"(LLM 作为推荐骨干)与"LLM-for-Rec"(LLM 辅助推荐)两条路径本周都有重要进展。工业部署论文占比极高(6 篇含 Online A/B 测试),来自 AliExpress、快手、Apple App Store 等一线平台。
从精排切换成深度学习以来,工业界一直会把排序的模型结构研究切分成基本的两部分,序列处理和特征交叉,甚至有一些公司的排序组,下面都拆成两个Team分别处理行为序列和特征交叉。从最早的时候,比如序列用DIN来处理,序列就被压成了一个或多个向量表征,再参与与其他特征的交叉。我们可以理解成MLP(concat(DIN, Features)),发展到今天大多数的模型研究,还是分立地把MLP换成DCN,增加个LHUC,复杂化为Rank Mixer或Transformer,把DIN叠加MHA,直接换成Transformer,可以写成RankMixer(concat(Transformer, Features))。 从MLP(concat(DIN, Features))到RankMixer(concat(Transformer, Features)),本质没有变,就是序列处理和特征交叉是一个隐式的两阶段处理,序列被压缩到Vector Space才和特征发生交叉。而LLM的有趣之处,就是在Next Token Prediction利用到的交叉发生在词序列的Token Space之中,它能启发推荐排序模型的,就是每一个特征的交叉应该发生在用户序列的Token Space之中。