AI Tech Daily - 2026-06-03 | Recsys Frontier

type

Post

status

Published

date

Jun 3, 2026 04:30

slug

ai-daily-en-2026-06-03

summary

📊 Today's Overview

AI hit a major inflection point today: Microsoft released MAI-Thinking-1, its first self-trained reasoning model, alongside 6 other models and an Agent Control Specification open standard — a full-stack AI strategy rollout. GitHub's COO revealed that AI agents have driven a 1,400% surge in code commits, while the company launched Copilot app as an agent-native desktop control center. On the open-source front, MiniMax detailed M3's sparse attention achieving 5% inference time for 1M context, and Step 3.7 Flash compressed KV cache to just 22% of DeepSeek's. Anthropic expanded Project Glasswing to 15 countries, finding 10,000+ critical vulnerabilities — AI is reshaping both code creation and security defense at industrial scale.

🔥 Trend Insights

Agent-native development platforms arrive: GitHub Copilot app and OpenAI Codex (500M+ weekly users) mark the shift from code completion to full agent orchestration platforms, with non-developer usage growing 3x faster than developers.

Reasoning model race intensifies: Microsoft's MAI-Thinking-1 (35B active MoE, 97% on AIME 2025) joins MiniMax M3 and Step 3.7 Flash in pushing sparse attention and KV cache compression — the compute efficiency battle is the new frontier.

AI security goes industrial: Anthropic's Project Glasswing expansion to 15 countries and 10,000+ vulnerabilities found, alongside the Meta AI account takeover incident, shows AI's dual role as both defender and attack vector.

🐦 X/Twitter Highlights

📈 热点与趋势

Google DeepMind 发布 Co-Scientist：基于 Gemini 的多 agent 系统 – 该系统可自动生成、辩论和迭代科学假设，旨在辅助研究人员开展复杂科学探索 @GoogleDeepMind

RPCS3（PS3 模拟器）公开指责腾讯爬虫 DDoS，正封禁腾讯 IP – RPCS3 称在过去 24 小时内收到超 300 万次来自腾讯的请求，其爬虫已能解决 Cloudflare 挑战并忽视 robots.txt，用以训练腾讯聊天机器人 @rpcs3

Pompliano 采访投资者 Andrew Kang，坐谈人形机器人投资 – Kang（曾投资 Figure AI 1900 万美元）解释从加密货币转向机器人逻辑，并推出公开交易基金 $BOT，专注投资头部私有人形机器人公司 @APompliano

🔧 工具与产品

OpenAI 发布 Codex Sites 并扩展插件生态 – Sites 可将想法一键转为可直接访问的网站或应用，覆盖 Business/Enterprise 计划；插件扩展至 62 款应用和 110 项技能，覆盖销售、数据分析、创意生产、产品设计及投资 @OpenAI @OpenAI

Perplexity CEO Aravind Srinivas（Perplexity CEO）宣布 Computer 支持本地 + 云端混合推理 – 私密数据留在本地设备运行，复杂任务可无缝切换至服务器端前沿模型，即将登陆 Windows 笔记本 @AravSrinivas

Unsloth AI（量化训练优化）与 NVIDIA、Microsoft 合作，在 128GB 笔记本上训练 120B+ 参数模型 – 基于 RTX Spark 采用统一内存架构，在个人硬件上实现大规模参数训练 @UnslothAI

vLLM（UC Berkeley 开源推理引擎）原生支持 JetBrains Mellum2 和 MiniCPM-o 4.5 – Mellum2（12B MoE 激活 2.5B，128K 上下文）专为路由/RAG/子 agent 设计；MiniCPM-o 4.5（9B 全模态，文本/图像/音频/视频 input + 文本/语音 output）已集成至 vLLM-Omni @vllm_project @vllm_project

Vercel Conductor 并行编码 agent 支持远程 Sandbox 运行 – 此前仅限本地执行，现已可在 Vercel 基础设施上远程运行，Sandbox 启动速度极快 @vercel

⚙️ 技术实践

微软发布 MAI-Thinking-1 等 7 个前沿模型，SGLang 支撑其 RL 推理栈 – Mustafa Suleyman（微软 AI CEO）宣布：35B 活跃参数 MoE，256K 上下文，AIME 2025 达 97%，SWE-Bench Pro 53%；在自研 MAIA 200 芯片上性能/美元比 GB200 高 30%、性能/瓦高 1.4 倍。此外有 MAI-Image-2.5 和 MAI-Code-1-Flash（5B 参数 SWE 51%）。elie（社区分析者）详解技术报告：模型不使用任何合成数据或蒸馏，推理/agent 行为/工具使用全由后训练 RL 习得。LMSYS 透露 SGLang 被用于数千芯片上的 RL 推理负载均衡和故障恢复。微软提供 Frontier Tuning 让企业基于自身数据微调模型 @mustafasuleyman @eliebakouch @lmsysorg @satyanadella

MiniMax M3 技术细节：MSA 稀疏注意力使 attention 降至 5% 推理时间，支持 1M 上下文 – MSA（MiniMax Sparse Attention）采用真实未压缩 KV 块级 top-K 选择，取代传统压缩方案；M3 原生多模态（图像+视频），可自评估视觉编码（构建网站后自主浏览渲染输出并迭代）。Together AI 详解生产推理：需 paged decode、索引评分和多模态预处理 @MiniMax_AI @MiniMax_AI

Step 3.7 Flash（198B MoE）采用 MFA+AFD 架构，KV 缓存仅为 DeepSeek 的 22% – Multi-Matrix Factorization Attention（MFA）将 KV 缓存压缩至 22%；Attention-FFN Disaggregation（AFD）将注意力与 FFN 解耦以优化硬件利用率。FireworksAI 提供一键部署，Apache 2.0 许可证 @StepFun_ai

NVIDIA 正式发布 Cosmos 3 开放世界模型：统一多模态理解、生成与机器人策略 – Cosmos 3 支持语言、图像、视频、音频和动作的融合理解与生成，可预测未来帧、生成机器人策略。在多个基准上排名开源第一，权重和代码已发布于 HuggingFace @NVIDIARobotics

Intel AutoRound W4A16 量化集成 vLLM-Omni，Qwen3-Omni-30B 内存从 66GB 降至 25GB – 4-bit 离线量化一次后即可用 BF16 命令推理；FLUX.1-dev 从 4 GPU 缩至 1 GPU；Intel XPU B60 上 CFG Parallel 实现 1.55–1.67 倍加速 @vllm_project

Pinecone（向量数据库公司）内部数据 agent AskData 已回答 3,690 个问题，token 消耗降低 92% – 员工数据工程师 Simon Lu 构建，相比直接向 Claude/Cursor 提供原始源，token 节省 92%；相比此前自定义实现再降 38% @pinecone

⭐ Featured Content

GitHub Copilot app 发布：Agent-native 桌面控制中心 ｜ Multi-Agent parallel development's new paradigm

GitHub released Copilot app, an agent-native desktop control center that solves context fragmentation and code review burden in multi-agent parallel development. Key features: My Work unified view managing multiple agent sessions, Canvas bidirectional work panel for visual editing, Agent Merge for automated PR review and merging, plus local/cloud sandboxes. This marks Agentic IDE's evolution from code completion to a full development platform — directly valuable for developers using Coding Agents.

Sources: GitHub Blog

GitHub COO deep interview: AI Agents drove 1,400% code commit growth ｜ Platform-level Agent ecosystem challenges and responses

GitHub COO Kyle Daigle revealed on Latent Space podcast: AI Agents drove 1,400% code commit growth on GitHub, straining infrastructure and overwhelming open-source maintainers with AI-generated code floods. He shared GitHub's internal AI workflows (micro-skills, WorkIQ, MCP), Actions' evolution as a general compute layer, and how to preserve open-source social contracts. Complements the Copilot app launch — helps practitioners understand the full landscape of platform-level Agent ecosystems.

Sources: Latent Space

OpenAI Codex expands as knowledge worker productivity tool ｜ 5M+ weekly active users, non-developer share surging

OpenAI reported Codex has 5M+ weekly active users (6x growth since February), with knowledge workers making up ~20% and growing 3x faster than developers. Knowledge workers primarily use Codex for creating reports, spreadsheets, presentations, plus data analysis, research, and workflow automation. This signals AI coding tools evolving from developer-only to general productivity platforms, potentially reshaping knowledge work efficiency — essential reading for those tracking LLM productization and market dynamics.

Sources: OpenAI

Microsoft releases MAI-Thinking-1 reasoning model and Agent Control Specification open standard ｜ Microsoft AI strategy accelerates across the board

Microsoft released its first self-trained reasoning model MAI-Thinking-1 (claimed trained from scratch, no distillation), alongside the Agent Control Specification open standard (unified Agent governance), Scout Agent (24/7 automated assistant in Teams), and 7 AI models (including ultra-efficient code models). Also notable: Majorana 2 quantum chip (AI-assisted design, targeting 2029 commercialization) and Perplexity Computer features (device/server model task splitting). This is a concentrated showcase of Microsoft's AI strategy — practitioners tracking industry landscape shifts need to catch up quickly.

Sources: llm-stats.com

Anthropic expands Project Glasswing: covers 15 countries' critical infrastructure, finds 10,000+ high-severity vulnerabilities ｜ Industrial-scale AI security defense

Anthropic expanded Project Glasswing from 50 initial partners to ~150 new organizations, covering power, water, healthcare, communications, hardware, and other critical infrastructure across 15 countries. The project has discovered 10,000+ high/critical severity vulnerabilities. Anthropic also released Claude Security product and plans to provide vulnerability scanning tools to security teams. The article also discusses long-term trends of AI transforming cybersecurity — directly valuable for practitioners focused on AI security defense.

Sources: Anthropic

NVIDIA Jetson brings Agentic AI to the physical world: JetPack 7.2 and NemoClaw released ｜ New infrastructure for edge Agent deployment

NVIDIA released JetPack 7.2 and NemoClaw support for Jetson at COMPUTEX, pushing Agentic AI from servers to the physical world. JetPack 7.2 brings Yocto support, CUDA 13, MIG, and AGX Orin performance boost to 241 TOPS; NemoClaw enables single-command deployment, paired with Metropolis VSS skills for visual reasoning Agents. Real-world deployments from Solomon, Advantech, and others are already live. For practitioners focused on edge AI and physical-world Agent deployment, this is a key signal for understanding infrastructure evolution.

Sources: NVIDIA Blog

Holo3.1 released: Cross-environment Computer Use Agent, supports mobile and local inference ｜ Major upgrade for computer vision Agents

Hcompany released Holo3.1, an upgrade to computer vision Agent model Holo3, focusing on cross-environment robustness (desktop, browser, mobile) and cross-Agent framework compatibility. New AndroidWorld support: 35B-A3B model score improved from 67% to 79.3%; first-time availability of FP8, Q4 GGUF, NVFP4 quantized versions for consumer-grade hardware local inference. Also released smaller models (0.8B, 4B, 9B) to reduce deployment costs. For practitioners focused on Computer Use Agent local deployment and mobile automation, this is an important model update.

Sources: Hugging Face

Hackers took over high-profile Instagram accounts through Meta AI support bot ｜ Classic cautionary tale of AI system security integration

Hackers successfully took over high-profile Instagram accounts by simply chatting with Meta's AI support bot. Attackers only needed to request linking a new email address, and the AI automatically completed the account recovery process. This incident exposes severe security risks of connecting AI chatbots directly to sensitive operations like account recovery — a classic cautionary tale for LLM security integration. For practitioners building Agents and AI systems, this is an important warning about security boundary design.

Sources: Simon Willison

🎙️ Podcast Picks

GitHub's plan for Agents — Kyle Daigle, GitHub

📍 Source: Latent Space | ⭐⭐⭐⭐⭐ | 🏷️ LLM, Agent, Infra | ⏱️ 1:23:27

GitHub COO Kyle Daigle discusses the infrastructure challenges of the AI Agent era: Agent-driven code commits up 1,400%, pressure on CI/CD, open-source maintenance, and code review. Deep dive into GitHub's internal AI workflows (micro-skills, WorkIQ, MCP, Copilot desktop app, CLI, cloud Agents), and how to integrate AI through existing workflows (Slack, Teams, email). Explores how AI changes developer roles, open-source social contracts, and GitHub's strategic evolution from code hosting to an Agent operations layer.

💡 Why Listen: Heavyweight guest (GitHub COO) with exclusive inside perspective on how AI Agents are reshaping code infrastructure at platform scale. The 1,400% commit surge number alone is worth the listen — this is the ground truth from the company hosting the world's code.

📄 Paper Highlights

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Microsoft ｜ 🏷️ Agent Framework, Multi-Agent, Agentic Workflow, Fine-tuning, RLHF/DPO, Multimodal

First open framework applying online multi-turn RL to visual web agents — trains a 4B model to match OpenAI CUA and Gemini CUA using just 0.4K initialization trajectories and 2.2K RL tasks.

Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation

Bilibili ｜ 🏷️ Agent Framework, Reasoning, Fine-tuning, Multimodal, NLP Task

Introduces Social Chain-of-Thought (Social-CoT) for UGC quality assessment — simulates diverse viewer personas to judge community resonance rather than visual fidelity, trained via SFT + process-supervised RL.