RecSys Weekly 2026-W25

This week's recommendation systems research clusters around three themes: full lifecycle co-design for large-scale graph retrieval, Transformer-based sequence modeling deployed across platforms, and a shift from DNN to Transformer-native architectures for multi-task ranking. Meta, Airbnb, Alibaba, Shopee, and NetEase Cloud Music all published online deployment work with specific AB metrics. Thread 1 (End-to-end design of large-scale graph systems): Meta's RankGraph-2 (Meta) couples graph construction, representation learning, and online serving into a joint optimization. On a billion-node graph, it reduces compute cost by 83%, achieves 3.8x the recall of GAT+Deep Graph Infomax, and lifts online CTR by +0.96% and CVR by +2.75%. Along the same line, HighLevel's ScoreGate (HighLevel) uses a statistical fusion of two scores to adaptively control the number of retrieved chunks in RAG. In production, it cuts tokens by 34.8% while maintaining recall between 97.77% and 99.34%. Thread 2 (Generative recommendation moves from theory to production): Airbnb's JourneyFormer (Airbnb) deploys a Transformer-based sequence model in search ranking to handle long, sparse user behavior. Alibaba's OneBar (Alibaba) uses an end-to-end generative framework for video e-commerce query recommendation, achieving a 21.67% GMV lift. Both point to the same direction: generative recommendation needs engineering trade-offs under real constraints (cold start, latency, sparse labels) rather than chasing offline metrics alone. Thread 3 (Transformer-native paradigm for multi-task ranking): Shopee's OneRank (Shopee) eliminates the encoder-predictor separation, embedding task-private channels and gradient isolation inside the Transformer. Online CTR is up +1.2%, CVR +0.8%. NetEase Cloud Music's PIANO (NetEase Cloud Music) uses a learnable [CLS] token for list-level multi-objective re-ranking, lifting CTR by +0.62% and CVR by +4.45%. Both demonstrate that internalizing multi-objective reasoning into the Tr

RecSys Weekly 2026-W23

This week's research in recommendation systems falls along three technical threads. Thread 1: Generative recommendation moves from functioning to stability — semantic IDs and reasoning become the industrial focus. Pinterest's UniPinRec unifies retrieval and ranking end-to-end (online engagement +1%, latency -11.1%), pushing generative recommendation beyond just retrieval. Kuaishou's OneReason (online deployment) reveals why reasoning mode fails in generative recommendation — missing both perception and cognition factors — and proposes a three-level CoT format plus specialized-unified training. Both point to the same conclusion: the core bottleneck in generative recommendation has shifted from model architecture to data format (semantic IDs) and system coordination. Thread 2: Cross-domain cold start moves from feature transfer to learning transfer — LLMs as cross-domain bridges begin large-scale deployment. Kuaishou's RGCD-Rep (serving 400M+ users) uses MLLM reasoning distillation to transfer short-video user interest to live streaming, with significant cold-start engagement gains. Meta's Quantizing Intent paper (online AUC +1.522% for cold start) quantifies organic feed behavior into semantic IDs for ad ranking, proving that behavioral richness determines cross-domain transfer quality. Both reveal that the key to cross-domain transfer isn't aligning features — it's building transferable semantic representations. Thread 3: LLM/Agent-enhanced recommendation moves toward industry differentiation — from general retrieval to deep adaptation in vertical scenarios. Li Auto's HPRO (132-day A/B, sales +9.5%) introduces preference optimization for lead scoring, solving sparse supervision and funnel hierarchy. Kuaishou's Taiji (CTR +12.4%, revenue +15.2%) proposes Pareto-optimal policy optimization, finding the optimal trade-off between semantics and IDs. Syft's DynTree (survival rate improved 1.5x) uses offline agent tree-building plus online lightweight subtree selection for

RecSys Weekly 2026-W22

This week's recommendation system research clusters around three technical threads. Industrial knowledge distillation enters the transfer rate quantification era: ByteDance, Meta, Microsoft, and Alibaba each demonstrated large-scale distillation frameworks. ByteDance's Rec-Distill (24B teacher, 20K sequence) achieves distillation transfer rate >60%, Alibaba's GPlan compresses LLM reasoning into implicit tokens, Meta's LoopFM doubles distillation transfer rate via structured intermediate representations, and Microsoft's HARNESS-LM recovers 98% of teacher accuracy with 190M parameters. The common direction across all four: distillation is no longer just a model compression technique — it's a way to "monetize" large model capabilities into quantifiable business metrics. Generative recommendation moves from item generation to intent-conditioned generation: Alibaba's QGS deploys conditional next-item prediction in Quark search, Netflix reveals task-specific scaling ceilings in a 1B parameter generative recommender, and Tsinghua's SID collision analysis finds Hit@10 overestimated by 103%. The three papers together indicate that generative recommendation is entering a phase of refined evaluation and conditional control. Recommendation system scaling shifts from "stacking parameters" to multidimensional synergy and test-time compute: Coupang's system study shows additive scaling effects across backbone, embedding, and data dimensions for CVR models. Alibaba's UTTSI introduces test-time compute to CTR for the first time, lifting CTR by 5.3% without model changes. Meta's rank-aware decomposition boosts DLRM throughput by 87.5%. The core tension in scaling has moved from "can we go bigger" to "how do we use it efficiently."

RecSys Weekly 2026-W21

This week in recommendation systems research clusters around three technical fronts: generative recommendation moves from "proving feasibility" to "industrial deployment and optimization," debiasing and calibration shift from single methods to fusion frameworks, and search/retrieval systems make concrete advances in cold start and heterogeneous acceleration. Generative recommendation enters the industrial deep end: Four deployment papers from Kuaishou, Tencent, and Meituan cover core pain points — reasoning enhancement (RPORec), long-term interest modeling (GenLI), and world knowledge integration (LWGR). The common thread: the core question for generative recommendation has shifted from "can it work?" to "how do we stably and controllably replace or augment the traditional pipeline?" Debiasing and calibration moves from "correcting the mean" to "governing the distribution." ByteDance's PEARL, Kuaishou's DADF, and Pinterest's PRL-PUTS each deliver production-grade solutions from contrasting perspectives: percentile comparison, residual correction, and utility weight tuning. PEARL's Watch Duration +2.10% and DADF's time spent +0.347% show that distribution-level bias correction still has substantial headroom. Search retrieval systems focus on cold start and system efficiency. Taobao's GrowthGR (new item GMV +5.3%) and Airbnb's synthetic data framework (query length KL divergence down to 0.66) demonstrate the engineering potential of LLMs + counterfactual inference for cold start. HUAWEI and JD.com's Ascend-RaBitQ pushes NPU acceleration for billion-scale vector search to 4.6x, setting a new hardware-algorithm co-optimization baseline for large-scale retrieval.

RecSys Weekly 2026-W20

This week's recommendation systems research breaks down along three technical fronts: generative recommendation architectures moving from tokenizer optimization to inference efficiency; LLM-enhanced recommendation evolving from isolated auxiliary modules to agents with memory and reasoning; and system-level quantization and thread orchestration emerging as the real bottleneck for production deployment. Theme 1 "Decoupling and Acceleration in Generative Recommendation": Alibaba deployed CQ-SID / EG-GRPO on TmallAPP, using category-aware semantic IDs and expert-guided reinforcement learning to achieve +1.15% GMV, with generative retrieval contributing 72.63% of purchases. Tencent and Tsinghua's AsymRec proposed an asymmetric continuous-discrete framework that replaces symmetric quantization with multi-expert projections, averaging 15.8% improvement. Meituan's DIG embeds the tokenizer into a discriminative ranking model for end-to-end training, improving both retrieval and ranking. Snap's SID-MLP distills the Transformer decoder into an MLP, achieving 8.74x speedup with no loss in accuracy. The common thread: generative recommendation is transitioning from "can run" to "runs stably and fast," with the core tactic being decoupling input/output representations and replacing overly dense structures. Theme 2 "LLM Recommendation Toward Reasoning and Memory": Microsoft Research's PGR introduced look-ahead guided retrieval, using Tree-of-Thought to expand query steps, achieving nearly 3x recall improvement on MemoryQuest. Meituan's RecRM-Bench provides 1 million structured entries covering four reward dimensions (instruction following, fact consistency, etc.) for agent-based recommendation systems. SDAR (Meituan) uses gated auxiliary objectives to stabilize On-Policy Self-Distillation (OPSD), outperforming GRPO by 7–10% on ALFWorld, Search-QA, and WebShop. The difference: PGR focuses on look-ahead reasoning before retrieval; SDAR focuses on training stability. But the shared

RecSys Weekly 2026-W13

Three storylines defined this week's recommendation systems research. First, Semantic ID-based generative recommendation moved from paradigm validation into hard engineering. The specific problems: cold-start signal balancing, ad monetization, out-of-distribution robustness, and reasoning over item tokens. Alibaba's OneSearch-V2 delivered CTR +3.98% and conversion rate +3.05% in production. Second, LLM Agents in recommendation and search shifted from "end-to-end replacement" toward "layered collaboration" — reasoning stays with the LLM, execution goes to deterministic modules, and reinforcement learning aligns intermediate steps with final objectives. Third, industrial search ranking hit an efficiency wall — Taobao's KARMA uses semantic regularization to prevent LLM fine-tuning from destroying knowledge, UniScale argues that data and model scaling must be co-designed, and DIET compresses training data to 1–2% while preserving performance trends.

RecSys Weekly 2026-W12

This week's recommendation systems research runs along three technical threads. First, Semantic ID-driven generative retrieval keeps gaining momentum. Spotify released two papers simultaneously — one deploys a SID system in production with A/B test results (new show discovery rate +14.3%), the other treats SID as a standalone modality unifying search, recommendation, and reasoning. Industrial SID systems have moved past "can this work?" into "how do we make it work better." Second, multimodal retrieval and representation compression: Apple delivered a production-grade unified retrieval architecture for text, images, and video; Aalto University distilled a 2B-parameter VLM into a 69M text encoder (50x latency reduction); POSTECH identified and fixed a modality collapse problem in VLM embedders for recommendation.

RecSys Weekly 2026-W16

Across 17 recommendation-system papers this week, industry teams used live deployments as the argument. Three technical storylines stand out.

RecSys Weekly 2026-W15

The central narrative this week: generative recommendation is moving from single-scenario proof-of-concept to full-pipeline production deployment. Papers from Meituan, Snapchat, and Meta no longer debate whether Semantic IDs work — they tackle the real operational pain points: multi-business expansion, codebook fairness, incremental training, and reranking integration. MBGR (2604.02684) delivers CTR +1.24% online across Meituan's multi-business food delivery platform, the top-rated paper this week.

RecSys Weekly 2026-W14

This week's recommendation systems research centers on three technical threads: engineering generative recommendation for production, agent-driven system self-evolution, and efficient scaling of ranking models.