周报

The narrative for 2026-W17 can be summed up in one sentence: model performance gaps are narrowing, but ecosystem moats are rising fast. GPT-5.5 and DeepSeek V4 both launched this week, but the competition is no longer about benchmark scores — OpenAI is weaving Codex into an integrated network spanning models, agent frameworks, and application layers, while DeepSeek keeps applying structural pressure with open weights, 1/10 pricing, and Huawei Ascend compatibility. Two other threads merit attention. First: the coding agent tooling layer is crystallizing — Claude Code's bug postmortem, OpenClaude as a multi-model replacement, Context Mode for context optimization — marking a shift from "it runs" to "it runs well and cheaply." Second: agent evaluation and safety are getting serious attention. Microsoft's DELEGATE-52 benchmark shows frontier models corrupt 25% of content in long-document editing on average; IBM's DIVERT framework explores more efficient user-simulated evaluation. These signals suggest agent deployment has moved from "can it work" to "can we trust it."

文章详情

周报

W16 is the first week where three structural storylines of the AI industry converge at once. The first is Agent delivery form — OpenAI pushed Codex onto the desktop on April 16 (Mac Computer Use, 90+ plugins, cross-task memory), landing almost in lockstep with Anthropic's Opus 4.7 plus /ultrareview, as "AI that writes code" and "AI that uses the computer" converge at the operating system layer. The second is the full eruption of Agent memory engineering. Microsoft MEMENTO compresses reasoning intermediates into addressable mementos; claude-mem (60,000 stars cumulative), cognee (16,000 cumulative), and omi (10,000 cumulative) surge in parallel; and Percy Liang writes "Act II = personalized assistant with memory" into an industry manifesto. The third is the productization of RL post-training infrastructure — Rednote AI, Morgan Stanley, Shanghai AI Lab, Sakana AI, and NVIDIA ship Relax, AlphaLab, TREX, MARS², AC/DC, and Lightning OPD in the same week, lifting "how to automatically make LLMs stronger" into a multi-agent collaborative research stack. Around these three lines, four tributaries surface: Agent governance, the software factory, local inference, and compute economics. Automation continues to settle into systems engineering, while compute scarcity and governance complexity rise alongside it.

文章详情

周报

技术趋势

If one word captures AI in 2026-W12, it is "infrastructure" — not the models themselves, but everything required to make them work in the real world. Simon Willison distilled a year's worth of scattered agent engineering lessons into a comprehensive pattern guide. Stratechery declared agents the third paradigm shift for large language models. OpenAI acquired both Promptfoo and Astral within ten days to close environment-management gaps in its coding agent stack. Stripe launched the Machine Payments Protocol (MPP) so agents can spend money autonomously. The entire industry is rapidly shifting from "what can agents do" to "how do agents run reliably, securely, and economically in production."

文章详情

周报

技术趋势

本周 AI 行业经历了一场罕见的多线程冲击。2 月 27 日，五角大楼在同一天内完成了两个截然相反的动作：与 OpenAI 签署机密网络部署协议，同时将 Anthropic 列为"国家安全供应链风险"——尽管两家公司在自主武器和大规模监控问题上持有几乎完全相同的限制条款。国防部副部长 Emil Michael 在社交媒体上公开称 Dario Amodei 是"说谎者"和拥有"上帝情结"的人，超过 300 名 Google 和 60 名 OpenAI 员工随即签署联名信支持 Anthropic 的立场。这场冲突的本质已超越技术评估，成为一面映照 AI 治理政治化的棱镜。与五角大楼事件同步发酵的，是 Anthropic 公开指控 DeepSeek、月之暗面和 MiniMax 通过"水螅集群"（hydra cluster）架构——单个代理网络管理超过 2 万个虚假账户——发起 1600 万次系统性蒸馏查询。Google 威胁情报团队也披露了 Gemini 遭受超过 10 万次模型提取攻击的数据。这些事件共同标志着中美 AI 竞争正从模型能力赛道滑入数据对抗与知识产权攻防的新阶段。技术侧同样密集。OpenAI 宣布退役 SWE-Bench Verified，承认 59.4% 的任务存在根本性缺陷；智谱 AI 的 GLM-5 展示了完全在华为昇腾 910B 上训练的 744B MoE 模型；GitHub Trending 被 Agent 框架占据的同时，OpenClaw 连续爆出删除 Meta AI 安全总监邮件、遭 Google 封号等安全事故。Andrej Karpathy 发推称"编程已变得面目全非"，而 Block 裁员 40% 后股价上涨 24%、IBM 因 COBOL 威胁单日蒸发 310 亿美元——资本市场正在以真金白银为 AI 替代效应定价。

文章详情

周报

技术趋势

本周 AI 领域最突出的特征是一种"同步加速"：资本、模型、基础设施和研究同时进入新的量级。OpenAI 宣布了史上最大规模的 1100 亿美元融资，NVIDIA 以 300 亿美元直接入股，Anthropic 刚刚完成 300 亿美元 G 轮——三天内流入 AI 头部公司的资本超过 1400 亿美元。与此同时，Qwen3.5-397B、Claude Sonnet 4.6、Gemini 3.1 Pro 三款旗舰模型在同一周内发布，形成了一场罕见的三方对决。但真正值得关注的变化发生在水面之下。微软、Cloudflare、GitHub、HuggingFace 在同一周内集中发布 Agent 基础设施框架，标志着行业重心正从"更强的模型"转向"更可靠的 Agent 系统"。与此形成尖锐对照的是，五篇安全研究论文从几何、结构、模态三个维度共同揭示了当前 LLM 安全对齐的根本性脆弱。在 Agent 即将大规模部署的节点上，这一矛盾格外刺眼。

文章详情

周报

2026-W15 (April 5-11) marked a cognitive shift in AI engineering: the orchestration infrastructure built around models — what the industry now calls the "harness" — moved from backstage to center stage. OpenAI disclosed a million-line zero-human-code experiment. Meta built a code pre-computation engine with 50+ agents. A Claude Code source leak exposed the sophistication of this architecture. All three point to the same conclusion: the 2026 AI engineering race is no longer about models — it is about everything around them.

文章详情

周报

If one word captures this week in AI, it's "engineering." Coding agents had a collective awakening. Internal architectures got laid bare, engineering methodology got codified, toolchains proliferated, and model-layer catch-up intensified. Coding agents have officially entered the era of systematic engineering discipline. Meanwhile, agent memory discourse — sparked by Karpathy's personal Wiki experiment — rippled through academia and the open-source community, making "how should agents persist knowledge" the week's most debated question.

文章详情

周报

技术趋势

Week 13 of 2026 (March 22–28) surfaced three parallel but interconnected narratives in AI. The first is a concentrated burst of multi-agent orchestration tooling. Cline Kanban, Scion, DeerFlow 2.0, and several others all shipped in the same week, marking an industry-wide pivot from "single-agent capability" to "engineering multi-agent collaboration."

Two technical threads dominate Week 11 of 2026 (March 8–14) in recommendation system research. First, generative recommendation (GR) is undergoing full-stack optimization — transitioning from "making it work" to "making it work well, fast, and fairly" — Netflix/Meta's exponential reward-weighted SFT addresses post-training alignment, LinkedIn's causal attention reformulation halves sequence length, Kuaishou's FP8 quantization reduces OneRec-V2 inference latency by 49%, and Alibaba's differentiable geometric indexing eliminates long-tail bias at its root. Five papers advance GR's industrial maturity across five dimensions. Second, LLM-based recommendation is shifting from "single-pass inference" toward an agentic paradigm — Meta's VRec inserts verification steps into reasoning chains, Meituan's RecPilot replaces traditional recommendation lists with a multi-agent framework, USTC's TriRec introduces tri-party coordination for the first time, and RUC/JD's RecThinker enables autonomous tool invocation.

All revisions applied. Here's a summary of changes:

本周共收录 23 篇推荐系统相关论文，其中 5 分论文 5 篇，4 分 10 篇，3 分 8 篇，整体质量出色。Generative Recommendation（生成式推荐）是本周最显著的技术主线，6 篇论文直接聚焦于此，涵盖 Semantic ID 编码、受限解码优化、广告场景部署和多任务统一框架。另一条主线是 LLM 与推荐系统的融合范式——"LLM-as-Rec"（LLM 作为推荐骨干）与"LLM-for-Rec"（LLM 辅助推荐）两条路径本周都有重要进展。工业部署论文占比极高（6 篇含 Online A/B 测试），来自 AliExpress、快手、Apple App Store 等一线平台。

文章详情