Recsys Weekly 2026-W10 | Recsys Frontier

type

Post

status

Published

date

Mar 9, 2026

slug

rec-weekly-2026-W10-en

summary

Industrial recommendation ranking shifts to systematic scaling engineering. Alibaba's SORT achieves orders +6.35%, Kuaishou's FlashEvaluator and SOLAR optimize evaluator and attention efficiency, ByteDance's HAP enables adaptive compute budget allocation. Generative recommendation enters objective alignment phase. 36 papers analyzed.

Weekly Overview

Week 10 of 2026 (March 2–8) has one dominant story across recommendation system research: engineering at scale. This goes beyond converting algorithms into production systems. Engineering thinking is now reaching every stage of the recommendation pipeline — from architecture design and training paradigms to evaluation methodology. Of the 36 papers covered this week, industrial contributions exceed 40% (15/36). Teams from Alibaba, Tencent, Kuaishou, ByteDance, Bilibili, JD.com, and Xiaohongshu all published system-level work backed by online validation.

Three threads tie this week together. First, scaling engineering in the ranking pipeline — SORT (Alibaba, orders +6.35%), FlashEvaluator (Kuaishou, sustained revenue growth), and SOLAR (Kuaishou, Video Views +0.68%) advance industrial Transformer deployment along three dimensions: ranking architecture, evaluator models, and attention mechanisms. HAP (ByteDance, deployed on Toutiao for 9 months) shows that compute budget allocation should be adaptive rather than uniform, starting from the pre-ranking stage. Second, generative recommendation is moving from proof-of-concept to objective alignment — OneRanker (Tencent WeChat Channels Ads, GMV-Normal +1.34%) achieves architecture-level deep integration of generation and ranking. CGR (Bilibili) embeds constrained optimization into the decoding process. Teams are using GRPO with increasing precision as an alignment tool in query rewriting (JD.com) and multimodal reasoning (MLLMRec-R1). Third, the granularity of multimodal representation and debiasing continues to refine — CLEAR identifies and eliminates cross-modal redundancy through null-space projection. CAMMSR demonstrates that modality weights must be dynamically adjusted per category. TIPS extends causal debiasing from static to time-aware formulations. k-hop fairness generalizes fairness evaluation from first-order neighborhoods to multi-hop structures.

Ranking Model Architecture and System Efficiency

Since HSTU and Wukong established the scaling trajectory for recommendation ranking, the core challenge for industry has been deploying Transformer architectures under high feature sparsity, low label density, and stringent latency constraints. This week's five papers advance this direction simultaneously across ranking, pre-ranking, and re-ranking stages.

SORT: A Systematically Optimized Ranking Transformer (2603.03988)

This Alibaba paper directly addresses two core obstacles to Transformers in industrial ranking: high feature sparsity and low label density. SORT's approach is systematic rather than a single-point breakthrough. Request-centric sample organization groups candidates under the same request into training sequences, providing natural context boundaries for local attention. Query pruning removes low-value tokens at inference time, directly compressing computation. Generative pre-training alleviates label sparsity through self-supervised signals. On the engineering front, SORT pushes Model FLOPs Utilization (MFU) to 22% — for reference, LLM training typically targets 40–60% MFU. Recommendation models inherently achieve lower MFU due to embedding table lookups and sparse features, making 22% high for this domain.

Online results: orders +6.35%, buyers +5.97%, GMV +5.47%, with latency reduced by 44.67% and throughput doubled (+121.33%). Compared to the DIN, DeepFM, and DCN baselines it extends, SORT's advantage lies not only in absolute metrics but in demonstrating scaling behavior across data scale, model scale, and sequence length — a concrete validation in e-commerce ranking of the trajectory pioneered by Wukong and HSTU.

FlashEvaluator: Expanding Search Space with Parallel Evaluation (2603.02565)

Kuaishou's FlashEvaluator targets the efficiency bottleneck of evaluators in the Generator-Evaluator (G-E) framework. Conventional approaches score K candidate sequences individually, yielding O(K) complexity with no cross-sequence comparison. FlashEvaluator's core innovation is cross-sequence token information sharing — all K sequences are evaluated in a single forward pass, achieving sub-linear complexity. This is not merely an engineering speedup: the explicit cross-sequence comparison mechanism lets the evaluator see the "full picture," producing more accurate ranking decisions.

This complements SORT-Gen — Taobao's generative re-ranking model for the Billions Subsidy program (CLICK +4.13%, GMV +8.10%). SORT-Gen optimizes the generation side, while FlashEvaluator optimizes the evaluation side. FlashEvaluator has been deployed online at Kuaishou with sustained revenue growth — the paper does not disclose specific online lift percentages. The paper also provides theoretical proofs and generalization experiments on NLP tasks, indicating that this architectural concept extends beyond recommendation scenarios.

HAP: Heterogeneity-Aware Adaptive Pre-ranking (2603.03770)

This ByteDance paper reveals a long-overlooked problem in pre-ranking: training sample heterogeneity. Pre-ranking training data mixes coarse-grained recall results, fine-grained ranking signals, and exposure feedback — three sample types with vastly different difficulty levels. HAP's analysis demonstrates that naive mixed training causes gradient conflicts. Hard samples dominate gradient directions while easy samples are wasted; uniformly increasing model complexity yields poor cost-effectiveness on easy samples.

HAP's solution decouples easy and hard samples into separate optimization paths: a lightweight model processes all candidates for efficient coverage, while a stronger model handles only hard candidates for precise improvement. This adaptive compute budget allocation aligns with GRACE (ranking consistency via multiple binary classification tasks, offline AUC +0.75%, online CVR +1.28%) and IntTower (efficient pre-ranking via Light-SE and contrastive regularization). But HAP goes further — it not only distinguishes sample difficulty but actively mitigates gradient conflicts through conflict-sensitive sampling. Deployed on Toutiao for 9 months, it achieved user usage duration +0.4% and active days +0.05% with no additional compute cost. The paper also open-sources an industrial-grade mixed-sample dataset, a valuable resource for the pre-ranking research community.

SOLAR: SVD-Optimized Lifelong Attention for Recommendation (2603.02561)

Also from Kuaishou, SOLAR approaches attention mechanism efficiency from a mathematical foundation. The core observation is that attention matrices in recommendation systems inherently exhibit low-rank structure — this is not coincidental but a default inductive bias of representation learning. Building on this, SVD-Attention achieves theoretically lossless complexity reduction on low-rank matrices: from O(N²d) to O(Ndr), while preserving the softmax mechanism. This stands in contrast to linear attention (O(Nd²) but discarding softmax) — SOLAR does not sacrifice expressiveness but instead exploits structural properties of the data itself.

SOLAR supports direct modeling of user behavior sequences at the scale of tens of thousands and candidate sets of thousands, without any filtering or truncation. This aligns with Kuaishou's earlier TWIN V2 — scaling sequences to the million level via hierarchical clustering. SOLAR's advantage lies in end-to-end attention computation rather than retrieval-based approximation. Online Video Views +0.68% plus business metric improvements validate the low-rank assumption in production deployment. Worth noting: both SOLAR and FlashEvaluator come from Kuaishou, optimizing Transformer efficiency along two dimensions — attention mechanisms and evaluator models — demonstrating Kuaishou's systematic investment in ranking system engineering.

Scaling Laws for Reranking in Information Retrieval (2603.04816)

This academic paper fills a theoretical gap that has been open for years: scaling laws for the re-ranking stage. While Wukong validated scaling laws for recommendation models and SparseCTR demonstrated scaling phenomena across three orders of magnitude of FLOPs in CTR prediction, the re-ranking stage has lacked systematic study. Covering pointwise, pairwise, and listwise re-ranking paradigms, the paper finds that NDCG and MAP follow predictable power laws. Training and evaluating a series of small models (up to 400M parameters) can accurately predict the performance of 1B models, in both in-domain and out-of-domain settings.

The paper also identifies boundaries of scaling: MRR and Contrastive Entropy do not follow power laws in certain scenarios. This means not all metrics can be explained by scaling. The choice of which metrics to guide model scaling decisions is itself an engineering judgment. Although lacking online validation, this work provides a theoretical tool for resource planning in industrial systems. When training costs routinely reach millions of dollars, the value of predicting large model performance from small model experiments is clear.

These five papers trace a clear trajectory: industrial recommendation ranking is shifting from "point architectural innovations" to "systematic scaling engineering." SORT and SOLAR demonstrate scaling behavior across data, model, and sequence dimensions. FlashEvaluator pushes efficiency optimization from model internals to the cross-sequence system architecture level. Scaling Laws for Reranking attempts to establish predictable performance-resource mappings for the re-ranking stage. HAP provides a complementary perspective — compute budgets should be adaptively adjusted based on sample difficulty rather than uniformly allocated. The next directions likely include unified scaling prediction across the full pipeline, deep fusion of low-rank structure and sparse attention, and extending dynamic compute budget allocation from pre-ranking to the entire ranking chain.

Generative Recommendation and Full-Pipeline Unified Modeling

From HSTU establishing the generative recommendation paradigm, to OneMall/UniSearch validating unified architectures across scenarios, to Rank-GRPO/SCoTER introducing RL alignment — generative recommendation is moving from proof-of-concept to objective alignment. This week's four papers advance along generation-ranking integration, constraint-aware decoding, train-inference consistency, and set-valued retrieval.

OneRanker: Unified Generation and Ranking with One Model (2603.02999)

The core tension in generative ad recommendation is that interest coverage and commercial value optimization are inherently conflicting objectives. Single-stage fusion creates optimization tension, while stage decoupling causes irreversible information loss. OneRanker proposes a three-layer mechanism for architecture-level deep integration of generation and ranking. First, a value-aware multi-task decoupling architecture — leveraging task token sequences and causal masks to separate interest coverage and value optimization spaces within shared representations, avoiding objective conflicts in traditional multi-task learning. Second, a coarse-to-fine synergistic objective-aware mechanism — the generation stage achieves implicit awareness through Fake Item Tokens, while the ranking decoder performs explicit value alignment at the candidate level. Third, input-output bilateral consistency guarantees — through Key/Value passthrough mechanisms and Distribution Consistency (DC) Constraint Loss for end-to-end co-optimization.

After full deployment in the WeChat Channels advertising system, OneRanker delivers GMV-Normal +1.34%. Compared to HSTU's focus on unified sequence transduction modeling, OneRanker concentrates on the generation-ranking objective alignment problem. Compared to Kuaishou's HoME — which addresses Expert Collapse, Expert Degradation, and Expert Underfitting in MoE through hierarchical masks and Feature-gate/Self-gate mechanisms (online watch time +0.954%) — OneRanker's innovation elevates multi-task decoupling from the expert routing level to the sequence generation level. Causal masks achieve a more natural task space separation.

CGR: Constraint-Aware Generative Re-ranking (2603.04227)

Ad feed re-ranking is a constrained combinatorial optimization problem that must simultaneously maximize platform revenue and maintain user experience. Existing generative ranking methods achieve list-level optimization through autoregressive decoding but suffer from high inference latency and limited constraint handling capability. CGR introduces two key innovations: unifying the generator and evaluator into a single network rather than a two-stage Generator-Evaluator, and constraint-aware reward pruning that integrates constraint satisfaction directly into the decoding process. Bilibili's online A/B test demonstrates improvements in both revenue and user engagement — specific lift percentages not disclosed — while meeting strict latency requirements. Compared to CAVE's approach of modeling list value as the expectation of sub-list values, CGR converts constrained optimization into bounded neural decoding, embedding business constraints more directly into the generation process.

APAO: Adaptive Prefix-Aware Optimization (2603.02730)

Generative recommendation suffers from a fundamental train-inference inconsistency: training assumes ground-truth history is always available, but beam search at inference time prunes low-probability branches. The result is that correct items may be prematurely discarded simply because their initial tokens (prefixes) score low. APAO introduces prefix-level optimization loss to align training objectives with inference conditions. It designs an adaptive worst-prefix optimization strategy that dynamically focuses training on the most vulnerable prefixes. Across the Beauty, Sports, and Toys datasets, APAO achieves average Recall@20 improvements of 2.1–4.8% and can serve as a general-purpose plugin for multiple generative recommendation backbones. This direction complements RelayGR's approach to long-sequence inference efficiency — APAO improves the quality of each beam search pass, while RelayGR addresses system bottlenecks under long sequences.

Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion (R4T, 2603.06397)

Set-valued retrieval is a class of problems that has received relatively little systematic study: given a broad intent, the system must return a result set optimizing high-order properties (diversity, coverage, complementarity). R4T's core insight is using RL as an "objective converter" rather than an inference engine. Step one: train a fan-out LLM with composite set-level rewards. Step two: use this LLM to synthesize objective-aligned training pairs. Step three: train a lightweight diffusion retriever to model the conditional distribution of set-valued outputs. This three-step pipeline reduces fan-out latency by an order of magnitude while improving retrieval quality on fashion and music benchmarks. The "heavy model at training time, lightweight model at inference time" paradigm suits the latency constraints of industrial deployment.

These four papers trace three convergence paths. On full-pipeline unification, OneRanker and CGR advance deep generation-ranking integration in advertising and feed scenarios — causal mask-based task space separation and constraint-aware decoding are the key techniques. On RL's evolving role, R4T uses RL to generate training data rather than directly performing inference — the field is learning to use RL with greater precision. On train-inference consistency, APAO addresses prefix bias in beam search — the kind of seemingly minor inconsistency that gets amplified in production systems. These three paths are converging toward unified backbones that simultaneously address cross-stage information transfer, RL objective alignment, and train-inference consistency.

LLM Reasoning-Enhanced Recommendation

LLM-recommendation integration has evolved from NoteLLM-style static encoding to reasoning injection. Meta's Foundation-Expert Paradigm and Kuaishou's Next Interest Flow validated that LLM reasoning chains and world knowledge can directly serve recommendation decisions. This week's five papers share a common theme: how to efficiently embed reasoning capability into online pipelines while controlling latency and deployment costs.

MLLMRec-R1: Incentivizing Reasoning Capability in MLLMs for Multimodal Sequential Recommendation (2603.06243)

Extending the GRPO reasoning pipeline to multimodal sequential recommendation faces two fundamental obstacles. Visual tokens grow explosively with history length and candidate set size, making group-based rollout costs prohibitive. Existing CoT supervision suffers from reward inflation — training reward improvements do not reliably translate to ranking performance gains. MLLMRec-R1's solution is straightforward: convert visual signals to text offline to eliminate visual tokens, construct high-quality multimodal CoT supervision through refinement and confidence-aware evaluation, then selectively inject reliable CoT samples via a mixed-granularity data augmentation strategy. It outperforms multiple state-of-the-art methods including LLaVA, BLIP-2, UniSRec, and SASRec across three benchmark datasets. Complementing Rank-GRPO's approach of redefining rank-level rewards, MLLMRec-R1 focuses more on quality control of the reward signal itself.

Relevance Matters: Multi-Task LLM Query Rewriting (2603.02555)

This query rewriting framework from JD.com and Tsinghua University injects relevance tasks into the generation process. The specific path: pre-train on JD.com user-product data, then perform multi-task SFT (query generation + relevance annotation), followed by GRPO alignment of relevance and conversion objectives. Since deployment on JD.com in August 2025, UCVR has improved — the paper does not disclose specific percentages. The key takeaway is that query rewriting should not pursue semantic equivalence alone; it must explicitly model downstream relevance.

IDProxy: Cold-Start CTR Prediction with Multimodal LLMs (2603.01590)

Xiaohongshu's IDProxy leverages multimodal LLMs to generate proxy embeddings from text and image content. It fuses them with existing ID embedding spaces through alignment losses while optimizing CTR objectives end-to-end. This approach bypasses limitations of traditional cold-start methods — rather than waiting for behavioral data to accumulate, it uses content signals to directly "proxy" behavioral signals. Online A/B tests demonstrate CTR +2.1% and CVR +2.5%, deployed in Xiaohongshu's Explore Feed for content recommendation and display advertising, serving hundreds of millions of users daily.

LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval (2603.01425)

Dense retrieval faces a structural contradiction: LLMs possess strong Chain-of-Thought reasoning capabilities, but existing retrievers only use them as static encoders. The rewrite-then-retrieve pipeline can leverage CoT, but autoregressive generation introduces unacceptable latency. LaSER, from Alibaba and Renmin University, resolves this with a self-distillation framework. It constructs dual views on a shared LLM backbone — an Explicit view encoding actual reasoning paths, and a Latent view performing implicit latent thinking. The key innovation is a trajectory alignment mechanism that synchronizes intermediate hidden states of the implicit path with the semantic progression of explicit reasoning segments. This means the retriever can "think silently" — completing reasoning without generating text.

Experiments cover over ten baselines including DPR, ANCE, Contriever, E5, BGE, and GritLM. LaSER outperforms state-of-the-art on both in-domain and out-of-domain reasoning-intensive benchmarks — specific metrics in the original paper. Compared to the concurrent ReFeed, which still follows the rewrite-then-retrieve route and improves recall by approximately 5–10% on NQ and HotpotQA, LaSER completely eliminates the inference-time latency of explicit reasoning — a path better suited for industrial deployment.

SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems (2603.03536)

This work from UIC, UIUC, and Amazon addresses a severely overlooked corner: personalized safety in LLM-based conversational recommendation. The core problem is that safety-sensitive information implicitly revealed by users during conversations — trauma triggers, self-harm history, phobias — may be violated by recommendation results. Safe-SFT establishes foundational safety awareness, while Safe-GDPO jointly optimizes recommendation quality and safety alignment during the reinforcement learning stage. Experiments demonstrate up to 96.5% reduction in safety violations compared to the strongest baselines (including ChatGPT and GPT-4), while maintaining competitive recommendation quality.

This week's five papers point in the same direction: LLM reasoning capability is moving from "external augmentation" to "internalization." LaSER distills explicit CoT into latent space to eliminate reasoning latency. SafeCRS internalizes safety reasoning into policy optimization for real-time constraints. On the industrial side, LaSER's self-distillation paradigm is likely to see rapid adoption — it requires no additional reasoning modules at inference time, keeping deployment overhead identical to standard dense retrievers.

Multimodal Recommendation and Cross-Modal Representation Learning

Multimodal recommendation is shifting from "how to fuse more modalities" to "how to manage modality contributions with precision" — from heuristic fusion in MMGCN and LATTICE to frequency-adaptive fusion in SSR, to DiffRec bringing diffusion models into recommendation. This week's five papers advance along redundancy elimination, category-adaptive weighting, hierarchical denoising, embedding quality, and cross-modal alignment.

Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval (2603.04836)

This industrial paper from Target directly addresses the "text-heavy, vision-light" problem in e-commerce retrieval systems. The core innovation is a two-stage alignment strategy: first align query with product text, then align query with product images, combined with a modality fusion network to capture cross-modal complementary information. This staged alignment design is more stable than direct end-to-end training — it first establishes textual semantic anchors, then integrates visual signals on that foundation. However, the paper does not report online A/B experiment data, a gap for an industrial paper.

CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation (2603.04320)

CAMMSR targets an overlooked but real problem: a user's modality preference varies dynamically across product categories. When shopping for clothing, images matter more; when shopping for electronics, text descriptions carry greater weight. CAMMSR's solution is a Category-guided Attentive MoE module (CAMoE): an auxiliary category prediction task provides supervisory signals for the gating network, allowing different experts to learn item representations from different modal perspectives with adaptive weight assignment. Another highlight is modality-swapped contrastive learning — swapping modality information at the sequence level for data augmentation, strengthening cross-modal alignment. It outperforms baselines including SASRec, BERT4Rec, and MMRec on four public datasets. Compared to M3oE's decoupling via three MoE module types, CAMMSR's MoE focuses more specifically on modality-dimension adaptivity at a finer granularity.

CLEAR: Null-Space Projection for Cross-Modal De-Redundancy (2603.01536)

CLEAR makes a counterintuitive point: existing methods over-pursue cross-modal consistency, which paradoxically causes cross-modal redundancy. Representations from different modalities overlap heavily, drowning out complementary information. This explains why adding more modalities sometimes fails to improve performance. The technical solution is elegant: decompose the cross-modal covariance matrix of visual and textual representations via SVD, identify dominant shared directions, then project multimodal features into the complementary null space to suppress redundant components while preserving modality-specific information. As a plug-in module, CLEAR integrates seamlessly into existing models including MMGCN, LATTICE, BM3, and DualGNN, delivering consistent improvements of 1–3% across three public datasets.

MealRec: Multi-granularity Sequential Modeling via Hierarchical Diffusion Models for Micro-Video Recommendation (2603.01926)

Micro-video recommendation faces dual noise sources: noise in multimodal content itself and unreliability of implicit feedback. MealRec addresses both simultaneously using hierarchical diffusion models at two granularities. Temporal-guided Content Diffusion (TCD) operates at the intra-video granularity, leveraging temporal guidance and personalized collaborative signals to refine video representations. Noise-unconditional Preference Denoising (NPD) operates at the inter-video granularity, recovering user preferences from corrupted states through blind denoising. Recall@20 improves by 3.2%–8.7% across four micro-video datasets. Unlike DiffRec, which applies diffusion directly on the interaction matrix, MealRec embeds the diffusion process hierarchically across content representation and preference modeling.

Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality (CoCoA, 2603.01471)

CoCoA, from Baidu and the Chinese Academy of Sciences, addresses the architectural bias of MLLMs when used for embedding. Causal attention and the next-token prediction paradigm inherently discourage forming globally compact representations. CoCoA restructures the attention flow and introduces an EOS token-based content reconstruction task, forcing the model to compress input semantics into the EOS embedding. It improves embedding quality on the MMEB-V1 benchmark using Qwen2-VL and Qwen2.5-VL — specific improvement magnitudes not reported. This aligns with NoteLLM's direction of compressing notes into a single token, but CoCoA addresses the problem at the pre-training paradigm level.

CLEAR exposes the cross-modal redundancy problem. CAMMSR demonstrates that modality weights must be dynamically adjusted per category. MealRec performs hierarchical denoising at both the content and preference levels. Together, these works drive a shift from blindly stacking modalities to interpretable quantification of modality contributions. Deeper integration of diffusion models with causal inference methods may emerge as the next breakthrough.

Recommendation Fairness and Causal Debiasing

Debiasing and fairness research continues to refine in granularity. From ESMM's entire-space modeling to circumvent selection bias, to Multi-IPW/Multi-DR bringing inverse propensity weighting into multi-task estimation — these methods are fundamentally static. This week's five papers span time-aware IPS, sample-level model merging routing, k-hop fairness, diffusion-based state purification, and proactive preference guidance — covering the full spectrum from feature-level debiasing to system-level fairness.

TIPS: Time-aware Inverse Propensity Scoring (2603.04986)

Sequential recommendation involves two intertwined biases: selection bias (exposed but unclicked items are mistakenly treated as uninteresting) and exposure bias (unexposed items are assumed irrelevant). Traditional IPS methods are static and cannot capture the temporal dynamics of user behavior. TIPS extends IPS to a time-aware version. As a plug-in module, it delivers consistent improvements on SASRec, GRU4Rec, Caser, and BERT4Rec, with average NDCG@10 improvement reaching 5.2% per the paper's experimental results. This is technically an incremental improvement, but it fills a practical gap.

BD-Merging: Bias-Aware Dynamic Model Merging (2603.03920)

Model merging is increasingly popular in multi-task recommendation, but reliability under distribution shift has remained a blind spot. BD-Merging's core technical pipeline: a joint evidence head learns uncertainty on a unified label space, Adjacency Discrepancy Scoring (ADS) quantifies evidence alignment between adjacent samples, and ADS-guided contrastive learning trains a debiasing router that adaptively assigns weights at the sample level. Unlike Gradient Surgery and similar methods that handle task conflicts at the gradient level, BD-Merging achieves adaptive routing at the sample level — a finer granularity.

k-hop Fairness: Addressing Disparities Beyond First-Order Neighborhoods (2603.03867)

This work extends fairness concepts in link prediction from first-order neighborhoods to k-hop neighborhoods. Experiments yield three key findings: models tend to replicate structural bias at different k-hop levels; rewiring the graph creates interdependencies in structural bias across different hops; and post-processing methods outperform existing baselines in k-hop performance-fairness trade-offs. Although validated only on academic datasets, this conceptual framework carries practical implications for fairness auditing in social recommendation systems.

DSRM-HRL: Fairness Begins with State (2603.03820)

This framework redefines the root cause of fairness-aware recommendation: the problem is not reward shaping but state estimation failure. Implicit feedback is contaminated by popularity noise and exposure bias, creating distorted states that mislead RL agents. DSRM recovers a low-entropy latent preference manifold using a diffusion model, while HRL's high-level policy regulates long-term fairness trajectories and the low-level policy optimizes short-term engagement under dynamic constraints. On the KuaiRec and KuaiRand simulators, it outperforms baselines on both recommendation utility and exposure fairness, achieving a superior Pareto frontier — specific numbers in the original paper.

HRL4PFG: Proactive Guiding Strategy for Item-side Fairness (2603.03094)

Existing fairness methods promote exposure by directly inserting long-tail items into recommendation results, but this creates a mismatch between user preferences and recommended content, undermining long-term engagement. HRL4PFG takes a "proactive guidance" rather than "passive injection" approach — the macro-level process of hierarchical RL generates fairness guidance targets, while the micro-level process fine-tunes recommendations in real time. In simulation experiments, it improves cumulative interaction rewards and maximum user interaction length compared to baselines — specific numbers in the original paper.

From static IPS to time-aware IPS, from dyadic fairness to k-hop fairness, from global model merging to sample-level adaptive routing — each step pursues finer-grained bias modeling. Simultaneously, debiasing is moving from "post-processing patches" to "architecture-level embedding" — DSRM-HRL purifies state representations at the source via diffusion models, and HRL4PFG proactively guides preferences rather than passively injecting items. Industrial deployment remains the core bottleneck: most of this week's work stays at the academic validation stage. How to achieve fine-grained debiasing in billion-scale systems without introducing excessive computational overhead is the critical leap from paper to production.

Directions to Watch

Scaling Engineering for Recommendation Systems

This week validates the predictability and engineerability of recommendation system scaling from multiple angles. SORT demonstrates scaling behavior of ranking Transformers across data, model, and sequence length dimensions. Scaling Laws for Reranking establishes power-law predictions for the re-ranking stage for the first time. Together, these two papers advance the construction of scaling theory for recommendation systems — parallel to but distinct from LLM scaling laws. Recommendation system scaling must find the optimal frontier under stringent latency and cost constraints. HAP provides a complementary perspective from engineering practice: compute budget allocation in pre-ranking should adapt to sample difficulty rather than distribute uniformly. This is not a theoretical contribution at the scaling law level but an engineering optimization in resource allocation strategy. Industrial teams from Alibaba and ByteDance, along with academic institutions such as UMass Amherst, are actively advancing this direction. The next step will likely see unified full-pipeline scaling prediction tools that help engineering teams make resource allocation decisions before training begins.

The "Internalization" Paradigm for LLM Reasoning Capabilities

LaSER's self-distillation framework represents a trend worth watching: compressing LLM explicit reasoning capabilities into implicit representations that can be served online. This goes beyond simple knowledge distillation — the trajectory alignment mechanism synchronizes intermediate hidden states of the implicit path with the semantic progression of explicit reasoning. It essentially teaches the model to "think silently." MLLMRec-R1 validates the same idea from a different angle: converting visual signals to text, pushing CoT reasoning quality control to its limits, then internalizing reasoning capability through GRPO. This direction directly addresses the core contradiction of LLM-based recommendation — the conflict between reasoning depth and inference latency. LaSER's zero additional inference overhead and MLLMRec-R1's strategy of eliminating visual tokens both point to the same solution: pay the cost at training time, harvest efficiency at inference time.

Fine-Grained Management of Multimodal Representations

The cross-modal redundancy problem revealed by CLEAR may reshape the fundamental approach to multimodal recommendation. Looking at the body of work from MMGCN, LATTICE, and BM3 onward, the dominant theme in recent multimodal recommendation has been "add more modalities, build stronger alignment." Yet CLEAR's experiments demonstrate that when representations from different modalities overlap heavily, adding more modalities may paradoxically fail to improve performance. CAMMSR further demonstrates that modality weights must be dynamically adjusted per category — a shift is needed from "one-size-fits-all fusion" to "selective utilization." A separate trend worth watching is the language-representation-replacing-ID approach: AlphaFree's user-free/ID-free/GNN-free scheme (up to approximately 40% improvement over non-language representation methods, GPU memory reduction up to 69%) indicates that the foundational representation paradigm of recommendation systems is also undergoing transformation.

Paper Roundup

Ranking Model Architecture and System Efficiency

SORT: A Systematically Optimized Ranking Transformer — Systematic optimization of industrial ranking Transformers through request-centric sample organization, local attention, query pruning, and generative pre-training; Alibaba e-commerce online orders +6.35%, latency -44.67%.

FlashEvaluator: Expanding Search Space with Parallel Evaluation — Cross-sequence token information sharing achieves sub-linear evaluation complexity; deployed at Kuaishou with sustained revenue growth.

Not All Candidates are Created Equal: HAP — Conflict-sensitive sampling and adaptive compute budget allocation address pre-ranking sample heterogeneity; deployed on Toutiao for 9 months, usage duration +0.4%.

SOLAR: SVD-Optimized Lifelong Attention for Recommendation — SVD-Attention achieves lossless attention complexity reduction to O(Ndr) on low-rank matrices; Kuaishou online Video Views +0.68%.

Scaling Laws for Reranking in Information Retrieval — First systematic study of scaling laws for the re-ranking stage; training a series of small models (up to 400M parameters) accurately predicts 1B model NDCG performance.

Generative Recommendation and Full-Pipeline Unified Modeling

OneRanker: Unified Generation and Ranking with One Model — Value-aware multi-task decoupling and Fake Item Tokens enable deep generation-ranking integration; Tencent WeChat Channels Ads GMV-Normal +1.34%.

Constraint-Aware Generative Re-ranking (CGR) — Converts constrained optimization to bounded neural decoding, unifying sequence generation and reward estimation; Bilibili online improvements in revenue and engagement — specific magnitudes not disclosed.

APAO: Adaptive Prefix-Aware Optimization — Prefix-level optimization loss addresses train-inference inconsistency in generative recommendation; Recall@20 improvement of 2.1–4.8%.

Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion (R4T) — Three-step pipeline (RL training → synthetic data → lightweight diffusion retriever) reduces fan-out latency by an order of magnitude.

LLM Reasoning-Enhanced Recommendation

MLLMRec-R1: Incentivizing Reasoning Capability in MLLMs — Visual signal textualization eliminates expensive visual tokens, establishing a practical GRPO multimodal reasoning pipeline; outperforms state-of-the-art across three benchmarks.

Relevance Matters: Multi-Task LLM Query Rewriting — Injects relevance tasks into LLM query rewriting with GRPO objective alignment; deployed at JD.com with UCVR improvement.

IDProxy: Cold-Start CTR Prediction with Multimodal LLMs — Multimodal LLM generates proxy embeddings aligned with ID space; Xiaohongshu online CTR +2.1%, CVR +2.5%.

LaSER: Internalizing Reasoning into Latent Space for Dense Retrieval — Dual-view self-distillation internalizes explicit CoT reasoning into retriever latent space; balances reasoning depth with efficiency.

SafeCRS: Personalized Safety Alignment for LLM-Based CRS — Safe-SFT + Safe-GDPO achieves conversational recommendation safety alignment; safety violations reduced by 96.5%.

Multimodal Recommendation and Cross-Modal Representation Learning

Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval — Two-stage alignment and modality fusion network enable unified text-image e-commerce retrieval.

CAMMSR: Category-Guided Attentive MoE for Multimodal Sequential Recommendation — Category-guided attentive MoE dynamically allocates modality weights; modality-swapped contrastive learning enhances cross-modal alignment.

MealRec: Multi-granularity Sequential Modeling via Hierarchical Diffusion Models for Micro-Video Recommendation — Temporal-guided content diffusion + unconditional preference denoising; Recall@20 improvement of 3.2%–8.7%.

CLEAR: Null-Space Projection for Cross-Modal De-Redundancy — SVD identifies cross-modal redundancy subspace, null-space projection preserves complementary information; plug-in improvement of 1–3%.

Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality (CoCoA) — Collaborative attention + EOS reconstruction task optimizes MLLM embedding quality.

Recommendation Fairness and Causal Debiasing

TIPS: Time-aware Inverse Propensity Scoring — Extends static IPS to a time-aware version; as a plugin, average NDCG@10 improvement reaches 5.2%.

BD-Merging: Bias-Aware Dynamic Model Merging — Joint evidence head + Adjacency Discrepancy Scoring trains a debiasing router; improves model merging robustness under distribution shift.

k-hop Fairness: Addressing Disparities in Graph Link Prediction — Extends fairness evaluation from first-order neighborhoods to k-hop; post-processing methods achieve superior performance-fairness trade-offs.

DSRM-HRL: Fairness Begins with State — Diffusion model purifies user state + hierarchical RL decouples utility and fairness; achieves superior Pareto frontier on KuaiRec.

HRL4PFG: Proactive Guiding Strategy for Item-side Fairness — Hierarchical RL proactively guides user preferences toward long-tail items; improves cumulative interaction rewards.

Other

MAC: Multi-Attribution CVR Benchmark — First public multi-attribution CVR benchmark dataset and PyMAL open-source library; MoAE model outperforms state-of-the-art.

Scaling RAG with RAG Fusion — Production evaluation finds RAG Fusion's recall gains are offset by re-ranking; Hit@10 drops from 0.51 to 0.48.

DenoiseBid: Uncertainty Quantification for Autobidding — Bayesian methods calibrate noisy CTR/CVR estimates; improves autobidding efficiency.

Design Experiments to Compare MAB Algorithms — Artificial Replay experiment design reduces MAB algorithm comparison costs by nearly half.

Dual-Calibration and LLM-Generated Nudges for News — Topic-geography dual calibration + LLM display interventions; 120-participant 5-week experiment improves news diversity.

DisenReason: Behavior Disentanglement for Shared-Account Recommendation — Frequency-domain behavior disentanglement + latent user reasoning; MRR@5 improvement of 12.56%.

AgentSelect: Benchmark for Agent Recommendation — First LLM agent recommendation benchmark; 110K queries, 100K agents, 250K interaction records.

Reproducing Distillation for Cross-Encoders — Systematic comparison of distillation strategies finds pairwise MarginMSE and listwise InfoNCE consistently optimal.

S2CDR: Smoothing-Sharpening for Cross-Domain Recommendation — Heat equation smoothing + sharpening recovery enables training-free cross-domain recommendation; average NDCG@20 improvement of 12.7%.

AlphaFree: Recommendation Free from Users, IDs, and GNNs — Language representations replace IDs, contrastive learning replaces GNNs; up to approximately 40% improvement over non-language representation methods, memory reduction up to 69%.

NextAds: Next-generation Personalized Video Advertising — Proposes a generative video ad personalization paradigm; defines creative generation and integration as two core tasks.

ReFeed: Retrieval Feedback-Guided Query Rewriting — Retrieval feedback-driven dataset construction; style-aware query rewriting improves recall by 5–10%.