为什么LayerNorm+AdamW成了深度网络的标准配置?从尺度不变性到梯度动力学

深度网络依赖LayerNorm(RMSNorm),这创造了局部的尺度不变性(Scale Invariance),它带了独特的梯度动力学(Gradient Dynamics)。在这个独特的动力学场域中,我们关于机器学习的直觉被颠覆了,Norm的物理含义从特征强度表示变成了学习进度的旋钮,Norm理论上稳步增加,SGD自带学习率衰减,但是刹车踩的太狠导致了学习的早停,而Weight Decay从正则化项进化为有效学习率的动态调节阀。AdamW如何成为标配:Adam做到了梯度的步长恒定,有效学习率的平缓刹车;Warmup来处理训练早期的权重过小(梯度爆炸)和二阶矩估计不准的问题;AdamW修正了L2正则的问题,引入Weight Decay,把“方向更新”和“进度控制”拆成两个干净的旋钮。

推荐算法只可锦上添花,不能雪中送炭

在和很多产品、运营团队合作的过程中,我常不得不扮演那个“泼冷水”的角色,特别是当大家对推荐算法寄予厚望的时候。 听到这样的战略规划:“我们明年目标是增长 80%,推荐系统是其中的关键。” 我的观点很直接:如果你的增长战略严重依赖推荐算法,一旦算法效果不及预期,目标就直接崩盘,那么这本质上是一个糟糕的战略**。对于规模增长,推荐算法不能雪中送炭,它只能在规模之上锦上添花。

从RL比SFT更不容易遗忘到反观推荐系统缺陷

最近陆续有了一些研究LLM中RL相比SFT更不容易造成灾难性遗忘的工作,清晰地支出是RL的On-Policy特性带来了参数的稳定,而SFT将模型参数推向与预训练分布差异很大的方向,导致了遗忘问题(如图,遗忘问题的衡量就是随着新任务的学习,旧任务的平均表现下降)。 这一清晰地结论,点亮了我对很多事情的理解,推荐系统原来孤立的问题也有可能连成一片,有了更深层次的支撑。 本文包括: • LLM领域,RL比SFT更不容易造成灾难性遗忘的工作解读 • 推荐系统是标准的off-policy 监督学习,(猜想)许多缺陷也应当由此而生

AI Tech Daily - 2026-05-12

Today's report covers a wide range of sources: 21 articles (5 featured), 26 KOL tweets, 5 GitHub trending projects, and 1 podcast episode. The most notable trend is the shift from training-centric to inference-centric AI infrastructure, highlighted by Stratechery's deep dive and OpenAI's new securit

AI Tech Daily - 2026-05-11

Today's report covers a wide range of AI activity: 3 featured articles, 5 GitHub trending projects, and 12 KOL tweets. The biggest story is the explosion of Agent infrastructure — from Anthropic's official skills repo to Nous Research's self-improving agent framework, the ecosystem is maturing fast.

AI Tech Daily - 2026-05-10

Today's AI landscape is dominated by Agent infrastructure — from GitHub's Spec-Kit for spec-driven coding to Anthropic's official Claude Agent SDK and ByteDance's UI-TARS Desktop. Meanwhile, China released its first AI Agent policy framework, and Apple open-sourced LiTo for 3D generation. The big pi

AI Tech Daily - 2026-05-09

Today's AI landscape is dominated by a single, powerful trend: the race to build and deploy autonomous agents is accelerating fast. From OpenAI's safety playbook for Codex to Anthropic's Claude Mythos Preview achieving 80% success on long-horizon tasks, the industry is moving beyond chat into real-w

AI Tech Daily - 2026-05-08

Today's report covers 18 articles, 27 tweets, 5 GitHub projects, and 2 podcast episodes. The big story: AI agents are everywhere — from GitHub's token optimization playbook to Mozilla's security breakthrough with Claude Mythos. The Jevons paradox is playing out in real time: inference costs dropped

AI Tech Daily - 2026-05-07

Today's AI landscape is dominated by a single theme: agents are getting serious. From Anthropic's massive infrastructure deal with xAI to GitHub's new validation framework for non-deterministic agent behavior, the industry is moving beyond toy demos into production-grade systems. We're covering 13 a

AI Tech Daily - 2026-05-06

Today's report covers 16 articles (5 featured), 29 KOL tweets, 5 GitHub trending projects, and 1 podcast episode. The big trend: AI infrastructure is heating up fast — xAI's Grok 4.3 API, OpenAI's GPT-5.5 Instant, and major funding rounds for DeepInfra and RadixArk all point to a platform race. On t

AI Tech Daily - 2026-05-05

Today's AI landscape is dominated by a single, massive theme: AI systems are starting to build themselves. From Import AI's data-driven prediction of automated AI R&D by 2028, to a flurry of new Agent frameworks and tools on GitHub, the shift from "AI as a tool" to "AI as an autonomous worker" is ac

Claude Code 使用技巧与 Agentic Engineering

从 Vibe Coding 到 Agentic Engineering 的演进,系统梳理 Claude Code 命令体系、Skills 系统、Hooks、Subagents、MCP 服务器、辅助工具生态及核心工作流。