RecSys Weekly 2026-W25

This week's recommendation systems research clusters around three themes: full lifecycle co-design for large-scale graph retrieval, Transformer-based sequence modeling deployed across platforms, and a shift from DNN to Transformer-native architectures for multi-task ranking. Meta, Airbnb, Alibaba, Shopee, and NetEase Cloud Music all published online deployment work with specific AB metrics. Thread 1 (End-to-end design of large-scale graph systems): Meta's RankGraph-2 (Meta) couples graph construction, representation learning, and online serving into a joint optimization. On a billion-node graph, it reduces compute cost by 83%, achieves 3.8x the recall of GAT+Deep Graph Infomax, and lifts online CTR by +0.96% and CVR by +2.75%. Along the same line, HighLevel's ScoreGate (HighLevel) uses a statistical fusion of two scores to adaptively control the number of retrieved chunks in RAG. In production, it cuts tokens by 34.8% while maintaining recall between 97.77% and 99.34%. Thread 2 (Generative recommendation moves from theory to production): Airbnb's JourneyFormer (Airbnb) deploys a Transformer-based sequence model in search ranking to handle long, sparse user behavior. Alibaba's OneBar (Alibaba) uses an end-to-end generative framework for video e-commerce query recommendation, achieving a 21.67% GMV lift. Both point to the same direction: generative recommendation needs engineering trade-offs under real constraints (cold start, latency, sparse labels) rather than chasing offline metrics alone. Thread 3 (Transformer-native paradigm for multi-task ranking): Shopee's OneRank (Shopee) eliminates the encoder-predictor separation, embedding task-private channels and gradient isolation inside the Transformer. Online CTR is up +1.2%, CVR +0.8%. NetEase Cloud Music's PIANO (NetEase Cloud Music) uses a learnable [CLS] token for list-level multi-objective re-ranking, lifting CTR by +0.62% and CVR by +4.45%. Both demonstrate that internalizing multi-objective reasoning into the Tr

AI Tech Daily - 2026-06-20

AI hit a major inflection point today. DeepSeek dropped DeepSeek-V4, a 1.6T MoE model that slashes long-context costs by 3.7x and beats GPT-5.4 — all open-source. Meanwhile, Subquadratic claims to have cracked the O(n²) attention bottleneck, and GLM-5.2 is now the first open model that independent d

AI Tech Daily - 2026-06-19

AI hit multiple inflection points today. Anthropic's Claude Opus 4.7 autonomously controlled a robot 20x faster than humans, while Qualcomm is reportedly acquiring Tenstorrent for $8-10B to challenge NVIDIA's inference dominance with RISC-V. Noam Shazeer — one of the "Attention is All You Need" auth

AI Tech Daily - 2026-06-18

AI hit multiple inflection points today. Noam Shazeer, co-author of the original Transformer paper, left Google for OpenAI — a decade-long pursuit finally realized. Vercel launched its eve agent framework with a full stack of components, while AWS and Hugging Face both unveiled critical agent infras

AI Tech Daily - 2026-06-17

Today marks a seismic shift in AI infrastructure and industry structure. SpaceX acquired Cursor for $60B in the largest startup M&A of 2026, signaling AI coding tools have become critical infrastructure. On the model front, Zhipu AI open-sourced GLM-5.2 (744B params, MIT license) topping the Artific

AI Tech Daily - 2026-06-17

AI hit a seismic inflection point today: SpaceX acquired Cursor for $600 billion in the largest startup M&A deal ever, signaling AI coding tools have become critical infrastructure. Microsoft is reportedly exploring replacing OpenAI with DeepSeek for Copilot Cowork to cut costs, while GLM-5.2 (744B

AI Tech Daily - 2026-06-17

AI reshaped the tech landscape today: SpaceX acquired Cursor for $60B — the largest startup M&A deal of 2026, signaling AI coding tools are now critical infrastructure. Meanwhile, AI CEOs sat alongside world leaders at the G7 summit lunch, marking the industry's formal entry into geopolitics. NVIDIA

AI Tech Daily - 2026-06-17

AI's infrastructure layer is maturing fast today. AWS MCP Server hit general availability, giving developers a standardized way to connect AI agents to cloud services. The auth.md protocol launched to solve the growing authentication headache for MCP-based tools, while Microsoft's SkillOpt formalize

AI Tech Daily - 2026-06-16

AI infrastructure hit multiple milestones today: vLLM v0.23.0 ships with full DeepSeek-V4 support, while LMSYS's DFlash speculative decoding engine becomes SGLang's default, delivering 4.3x throughput on 397B models. Sakana AI launched its first commercial product Marlin — an 8-hour autonomous deep

AI Tech Daily - 2026-06-15

AI safety and efficiency dominated today's headlines. US authorities suspended Anthropic's most advanced Claude models — Fable 5 and Mythos 5 — with co-founder Andrej Karpathy reportedly barred from accessing them due to citizenship status. Meanwhile, AMD's Ryzen AI Max+ 395 launched with 128GB shar

AI Tech Daily - 2026-06-14

A geopolitical shockwave hit AI today: the US government ordered Anthropic to cut off foreign users from Fable 5 and Mythos 5, marking a shift from geographic to identity-based export controls. MiniMax fired back by open-sourcing M3 weights, promising "M3 will never do this." On the infra side, NVID

AI Weekly 2026-W24

Last week's core narrative boils down to two words: "good enough." Claude Fable 5 pushed general-purpose model capabilities to a new high while halving its price. But more importantly, the industry's deliverables for Agent evaluation, safety, memory, and reasoning optimization are shifting from "paper concepts" to "runnable code and frameworks." Anthropic's prefill walkback, Kimi Work's 300 local parallel agents, MiniMax's sparse attention kernel — these events all point to a single signal: AI engineering in the first half of 2026 is moving from "can it run?" to "can it run reliably?"