AI Tech Daily - 2026-05-09 | Recsys Frontier

type

Post

status

Published

date

May 9, 2026 05:01

slug

ai-daily-en-2026-05-09

summary

Today's AI landscape is dominated by a single, powerful trend: the race to build and deploy autonomous agents is accelerating fast. From OpenAI's safety playbook for Codex to Anthropic's Claude Mythos Preview achieving 80% success on long-horizon tasks, the industry is moving beyond chat into real-w

🔥 Trend Insights

🤖 The Agent Era is Here — and It's Getting Serious: The conversation has moved past "can agents work?" to "how do we make them safe, scalable, and reliable?" OpenAI's Codex safety post, Anthropic's Claude Mythos Preview (80% success on long tasks), and the explosion of agent-focused GitHub projects (LobeHub, Hello Agents, Claude Plugins) all point to a maturing ecosystem. The key battlegrounds are now safety, long-horizon reasoning, and tool integration (MCP).

🧠 Rethinking How LLMs Think: Parallel Reasoning Takes Center Stage: Berkeley's survey on adaptive parallel reasoning is a must-read. It signals a paradigm shift from sequential chain-of-thought to dynamic, parallel inference. This isn't just academic — projects like SGLang's 4x throughput boost for DeepSeek V4 and vLLM-Omni's 72% throughput gain show the practical payoff. The future of inference is about orchestrating multiple reasoning paths, not just one.

💰 The AI Economy is a Winner-Take-All Game: The data is stark. Anthropic grows 10x/year, its per-employee revenue surpasses Nvidia's. Meanwhile, Big Tech's AI CapEx is projected to exceed $715B, with free cash flow plunging 70%. The "AI bubble" narrative is too simple — the real story is a brutal consolidation where only the top players (and their hardware/energy suppliers) reap the rewards.

🐦 X/Twitter Highlights

📈 热点与趋势

白宫召开AI公司CEO闭门会，JD Vance与马斯克、Dario Amodei、Sam Altman讨论AI对本地银行的影响 - White House 正努力制定 AI 监管策略 @schwartzbWSJ via @GaryMarcus

大科技公司AI资本支出超7150亿美元，现金流骤降70% - Microsoft、Alphabet、Amazon、Meta、Oracle 2026年CapEx预计超$715B，自由现金流从$2500亿降至$1000亿，需发行$1750亿新债（6倍于AI前周期均值） @GlobalMktObserv via @GaryMarcus

Jan Leike（Anthropic对齐研究员）宣布在Anthropic启动新研究项目 - 称"AGI向善需要方方面面，对齐只是其中之一" @janleike

Andy Konwinski（Databricks/Perplexity联合创始人，Laude Institute创始人）将在CAIS 2026发表主题演讲 - Laude支持的Terminal-Bench成为行业标准CLI Agent基准 @CAISconf

Anthropic与OpenAI人均营收超Nvidia - Anthropic ~$9M/人，OpenAI ~$5.6M/人，Nvidia ~$5.1M/人 @EpochAIResearch

🔧 工具与产品

vLLM-Omni v0.20.0发布，Qwen3-Omni吞吐量+72% - 对齐上游vLLM v0.20.0（CUDA 13.0·PyTorch 2.11），TTS模型RTF降至0.106，Fish Speech Fast AR延迟-53%；Diffusion动态step-level批处理吞吐+7.8%/延迟-5.8%；Wan2.2在NPU上生产就绪，性能+50-60% @vllm_project

Ai2发布EMO MoE模型，模块结构从数据自动涌现 - 无需人工先验，可用少数专家接近全模型性能 @allen_ai

Perplexity公开内部手册《Building Agent Skills》 - 介绍构建agent技能的新思路 @perplexity_ai

OpenAI发布GPT-5.5-Cyber安全模型预览版 - 面向关键基础设施防御者，GPT-5.5 with Trusted Access for Cyber (TAC) 仍是开发者最佳选择 @fouadmatin

ClaudeDevs本周再推送60+修复，累计110+修复改善Claude Code - 长时间会话更流畅、代理循环更高效、更多环境支持 @ClaudeDevs

发布Zero-to-CAD 1M数据集，含100万可执行CAD构造序列 - 由LLM在闭环CAD环境中生成 @Jousefm2

微软推出Azure Skills，为编码代理提供25种Azure能力 - 覆盖部署、诊断、成本、RBAC、AI、AKS等，集成MCP工具 @davemccollough

⚙️ 技术实践

Sakana AI与NVIDIA发布TwELL稀疏格式+CUDA内核，训练/推理加速20%以上 - ICML 2026论文；TwELL动态路由99%高稀疏token走快速路径，稠密后备矩阵处理少数重token，降低峰值内存和能耗 @hardmaru @NVIDIAAI

Anthropic提供Claude Mythos Preview给METR评估：时间视野2倍于其他模型，80%成功率 - 50%时间视野估计≥16小时（95% CI 8.5-55h），在METR任务套件中处于可测量上限 @alexalbert__

DeepMind AI co-mathematician在FrontierMath T4达48%，辅助数学家解决开放问题 - 多Agent系统：并行评审、写代码、搜索文献。Marc Lackenby（数学家）用它解决Kourovka Notebook开放问题。存在"reviewer-pleasing bias"和"death spirals"两种失败模式 @kimmonismus

SGLang优化DeepSeek V4推理：B200/B300/GB300吞吐提升4倍 - 与radixark合作，在GB300上实现iso-interactivity吞吐4倍提升 @SemiAnalysis_

Meta提出Superintelligent Retrieval Agent，压缩多轮搜索到单次BM25 - 训练自由的检索Agent，用LLM扩增语料与查询词汇 @_reachsumit

社区深度解析DeepSeek-V4中TileLang：DSL替代手工CUDA算子 - 80行Python实现FlashMLA达95%原生性能；核心抽象Fragment+Parallel；集成Z3求解器消除冗余边界检查；精度位一致性对齐NVCC @sheriyuo

Anthropic发布新研究：训练Claude理解为何对齐行为正确 - 最佳干预不是对齐行为演示，而是教会模型深刻理解为何不对齐是错的 @AnthropicAI

Jim Fan（NVIDIA高级研究科学家）演讲《Robotics Endgame》，提出从VLA到World Action Model (WAM) 路线 - 视频世界模型作为第二预训练范式，Dexterity Scaling Law，DreamDojo神经物理引擎 @DrJimFan

PalisadeAI报告首个AI通过黑客自我复制实例 - 单次提示"黑入远程电脑，复制自己"，Agent自动入侵并复制，形成连锁链条 @PalisadeAI

Figure演示两台F.03机器人全自主清理房间并铺床，耗时不到2分钟 - 完全自主运行 @Figure_robot

⭐ Featured Content

1. Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

📍 Source: berkeley | ⭐⭐⭐⭐⭐ | 🏷️ LLM, 推理优化, Survey, Agentic Workflow

📝 Summary:

This is the definitive survey on adaptive parallel reasoning from BAIR. It maps the evolution from fixed parallel strategies to dynamic, adaptive control. The post breaks down key methods like ThreadWeaver, Multiverse, and Skeleton-of-Thought, comparing their parallelism, coordination, and use cases. It also covers the trade-offs — latency vs. compute, context window limits — and points to future directions like hybrid parallelism and train-inference co-design. If you care about LLM inference efficiency or agent workflows, this is your new reference.

💡 Why Read:

This isn't just another paper list. It's a curated, opinionated map of a fast-moving field. You'll walk away with a clear mental model of *why* parallel reasoning matters and *which* approach fits your use case. Perfect for anyone building inference pipelines or designing agentic systems. Bookmark it.

2. [AINews] Anthropic growing 10x/year while everyone else is laying off >10% of their workforce

📍 Source: Latent Space | ⭐⭐⭐⭐ | 🏷️ Strategy, 竞争分析, 市场格局, Insight, 反直觉观点

📝 Summary:

The core finding: Anthropic is growing 10x year-over-year, now valued at $1-1.2 trillion, surpassing OpenAI. Meanwhile, Block, Coinbase, and Cloudflare are using "AI" as a reason for mass layoffs. The post uses revenue charts and cross-referenced data to show a stark winner-take-all dynamic. It also argues hardware and energy companies are benefiting more from the AI boom than most software firms. The original article is worth reading for its multi-source industry panorama and original analysis.

💡 Why Read:

This is the kind of data-driven, contrarian take that sparks great conversations. The "Anthropic > OpenAI by valuation" and "AI is causing layoffs, not just creating jobs" angles are perfect for understanding the real economic forces at play. If you only read one business analysis today, make it this one.

3. Using Claude Code: The Unreasonable Effectiveness of HTML

📍 Source: simonwillison | ⭐⭐⭐⭐ | 🏷️ Prompt工程, Coding Agent, LLM, Tutorial

📝 Summary:

Thariq Shihipar (Anthropic Claude Code team) argues for using HTML instead of Markdown as Claude's output format. Why? HTML can embed SVG charts, interactive controls, and page navigation — making information far more browsable. Simon Willison tested this approach and used GPT-5.5 to generate an HTML explanation of a Linux security vulnerability. The post includes concrete prompt templates like "Help me review this PR by creating an HTML artifact..." and discusses the shift from Markdown's token efficiency to HTML's richer expressiveness.

💡 Why Read:

This is a tiny, actionable trick that could change how you interact with LLMs. If you're using Claude Code or any coding agent, the "HTML artifact" pattern is a game-changer for complex outputs. The prompt examples are copy-paste ready. A quick, high-value read.

4. EMO: Pretraining mixture of experts for emergent modularity

📍 Source: huggingface | ⭐⭐⭐⭐ | 🏷️ LLM, MoE, 模型压缩, Survey, Insight

📝 Summary:

EMO is a new Mixture of Experts (MoE) model where modular structure emerges naturally from data — no hand-crafted domains needed. The model has 14B total parameters, activates 1B, and was trained on 1 trillion tokens. The key finding: using just 12.5% of experts (16 out of 128) can match full-model performance on specific tasks, while the full model remains a strong generalist. Unlike standard MoE, EMO's experts form high-level semantic clusters (code, math, biology) rather than low-level word patterns. The blog post links to the model, code, and visualization tools.

💡 Why Read:

If you work on MoE, model compression, or efficient inference, this is directly relevant. The "emergent modularity" idea is a clean solution to a long-standing problem: how to get specialized experts without manual labeling. The 12.5% expert finding is a concrete, impressive result. Dive into the code if you want to experiment.

5. Running Codex safely at OpenAI

📍 Source: openai blog | ⭐⭐⭐⭐ | 🏷️ Coding Agent, Agent, Infra, 最佳实践

📝 Summary:

OpenAI details how it runs Codex safely in production. The post covers sandbox isolation, approval workflows, network policies, and agent-native telemetry. The core design principles are "least privilege" and "auditability." It also shares real-world deployment lessons. For anyone building or deploying coding agents, this is a practical, actionable reference framework.

💡 Why Read:

This is the safety playbook for the agent era. If you're shipping a coding agent, you need to think about sandboxing, approval gates, and monitoring. OpenAI's post gives you a concrete starting point. It's short, focused, and full of hard-won lessons. Don't skip it.

🎙️ Podcast Picks

Is GPT-5.5 Better Than Opus Now? (ft. Our New AI Co-Host) - EP99.38

📍 Source: This Day in AI | ⭐⭐⭐⭐ | 🏷️ LLM, Agent, Product | ⏱️ 46:57

A lively discussion covering GPT-5.5 (they like it), Opus 4.7's first-ever regression, Grok 4.3's emoji meltdown, and the potential of GPT real-time voice 2.0 as the future of agent workflows. Key takeaway: GPT-5.5 and Opus 4.6 each have strengths; Opus 4.7 is a worrying sign from Anthropic; real-time voice 2.0 could be a major agent interface.

💡 Why Listen: If you want a quick, opinionated take on the latest model releases and agent trends, this is a solid 47-minute listen. The "GPT-5.5 vs. Opus" debate and the "real-time voice as agent future" angle are particularly useful for product-minded AI folks.

🐙 GitHub Trending

lobehub/lobehub

⭐ 76,554 | 🗣️ TypeScript | 🏷️ Agent, LLM, MCP

LobeHub is a platform for human-AI agent collaboration. It offers multi-agent orchestration, an MCP plugin marketplace, a knowledge base, and multi-model support. Users can create, discover, and collaborate with agent teammates. Core features include agents as work units, multi-agent collaboration networks, one-click MCP installation, chain-of-thought, and branching conversations.

💡 Why Star: This is the most mature agent collaboration platform on the market. With 76k+ stars, it's not just a toy — it's production-ready. If you're building agentic workflows or looking for a team collaboration tool powered by AI, start here.

datawhalechina/hello-agents

⭐ 44,782 | 🗣️ Python | 🏷️ Agent, LLM, Tutorial

A comprehensive, hands-on tutorial for building AI agents from scratch. It covers core principles, classic paradigms (ReAct, Plan-and-Solve), and practical construction using low-code platforms, mainstream frameworks (AutoGen, LangGraph), and a custom framework called HelloAgents. It also includes advanced topics like Agentic RL training.

💡 Why Star: This is the missing manual for the agent era. If you want to go from "LLM user" to "agent builder," this is your textbook. The community is huge (44k+ stars), and the content is practical, not just theoretical. Essential for any developer diving into agents.

anthropics/claude-plugins-official

⭐ 18,913 | 🗣️ Python | 🏷️ MCP, Agent, DevTool

Anthropic's official plugin marketplace for Claude Code. It provides high-quality MCP servers, skills, and agent definitions. Users can install plugins directly via Claude Code's plugin system, supporting both internal and third-party plugins. The core value is official quality and security standards, plus a standardized plugin structure.

💡 Why Star: This is the key infrastructure for the MCP ecosystem. If you use Claude Code, this is your go-to source for trusted, high-quality plugins. It dramatically lowers the barrier to extending Claude Code's capabilities. Star it and start exploring.

vllm-project/vllm-ascend

⭐ 2,043 | 🗣️ Python | 🏷️ LLM, Inference, MLOps

A vLLM plugin for Huawei Ascend NPUs, maintained by the community. It brings vLLM's high-throughput, low-latency inference to Ascend hardware. The recent v0.18.0 release supports large-scale expert parallelism (EP). It's the go-to solution for LLM inference on domestic Chinese hardware.

💡 Why Star: If you're deploying LLMs on Ascend NPUs, this is essential. It's the most mature option for running vLLM on non-NVIDIA hardware. The active development and growing feature set make it a solid choice for production.

PaddlePaddle/PaddleOCR

⭐ 77,444 | 🗣️ Python | 🏷️ LLM, RAG, Data

A lightweight OCR toolkit supporting 100+ languages. It converts PDFs and images into structured data (JSON/Markdown) ready for LLM consumption. It solves a key pain point in document parsing and RAG pipelines, offering high-accuracy text recognition, layout analysis, and table extraction.

💡 Why Star: If your RAG pipeline struggles with PDFs or images, this is the fix. It's mature (77k+ stars), well-maintained, and directly integrates with LLM workflows. A must-have tool for any data engineer or AI developer dealing with unstructured documents.