ποΈ Weekly Notes
Personal Reflection
This week marked a clear evolution in AI agentsβfrom isolated tools to coordinated systems that persist across sessions, spawn sub-agent swarms, and merge vision with code execution. As the industry matures toward IPOs and trillion-dollar infrastructure bets, we're seeing the practical convergence of multimodal capabilities with production-ready tooling.

π§ Main
-
AGENTS.md outperforms skills in our agent evals β Vercel's research found that embedding a compressed 8KB docs index directly in AGENTS.md achieved 100% pass rate on Next.js 16 API evals, outperforming skills-based approaches which maxed at 79%. Passive context proved more reliable than active retrieval for teaching AI coding agents framework-specific knowledge.
-
Claude Code's 'Tasks' update lets agents work longer and coordinate across sessions β Anthropic introduced persistent "Tasks" in Claude Code, enabling state-aware project management through dependency graphs, filesystem persistence, and cross-session coordination. This architectural shift allows Claude to execute sophisticated workflows and maintain project state across crashes or terminal closures.
-
Introducing Agentic Vision in Gemini 3 Flash β Google introduced Agentic Vision in Gemini 3 Flash, which treats vision as an active investigation using a Think-Act-Observe loop where the model generates Python code to manipulate and analyze images step-by-step. Enabling code execution delivers a consistent 5-10% quality boost across most vision benchmarks.
-
Introducing Helix 02 Full-Body Autonomy β Figure AI released Helix 02, the first neural network controlling a humanoid robot's full body from pixels, enabling a continuous 4-minute autonomous dishwasher loading/unloading taskβthe longest horizon, most complex autonomous humanoid task to date. The system combines whole-body control with visuomotor policy and scene understanding.
-
Kimi K2.5 Visual Agentic Intelligence β Moonshot AI released Kimi K2.5, featuring state-of-the-art coding and vision capabilities plus a self-directed agent swarm paradigm that can create up to 100 sub-agents executing parallel workflows across 1,500 tool calls, reducing execution time by up to 4.5x. The model excels at coding with vision and real-world software engineering tasks.
-
OpenAI Plans Fourth-Quarter IPO in Race to Beat Anthropic to Market β OpenAI is laying groundwork for a Q4 2026 IPO, accelerating plans as competition with Anthropic intensifies, with both companies racing to be the first major generative-AI startup to go public. OpenAI faces challenges including leadership changes and fierce competition from Google.
-
SoftBank in Talks to Invest Up to $30 Billion More in OpenAI β SoftBank Group is in talks to invest up to $30 billion more in OpenAI as part of the startup's efforts to raise up to $100 billion in new capital. The potential deal could value OpenAI at as much as $830 billion.
-
Sam Altman Says OpenAI Is Slashing Its Hiring Pace as Financial Crunch Tightens β OpenAI CEO Sam Altman announced the company will "dramatically slow down" hiring as it continues to lose billions quarterly and faces financial pressure. The move comes amid concerns about OpenAI's cash burn rate despite planning to spend over $1 trillion on data center infrastructure.
-
Amazon laying off about 16,000 corporate workers in latest anti-bureaucracy push β Amazon is laying off 16,000 corporate workers in its second round of mass job cuts since October, bringing total layoffs to 30,000. The company cites ongoing efforts to reduce bureaucracy while investing heavily in AI, with capital expenditures expected to reach $125 billion in 2026.
-
Amazon to Shut Down All Amazon Go and Amazon Fresh Stores β Amazon is closing all 57 Amazon Fresh stores and 15 Amazon Go locations, citing failure to deliver an economically scalable model. Some locations will be converted to Whole Foods stores, and Amazon plans to focus on same-day delivery services and expand Whole Foods with over 100 new stores.
-
Apple will reportedly unveil its Gemini-powered Siri assistant in February β Apple plans to announce a new Gemini-powered version of Siri in February that can complete tasks by accessing user data and on-screen content. An even bigger upgrade is expected at WWDC in June with a more conversational Siri running on Google's cloud infrastructure.
-
Tesla kills Autopilot, locks lane-keeping behind $99/month fee β Tesla is discontinuing Autopilot and eliminating the $8,000 one-time FSD purchase option starting February 14, requiring a $99/month subscription for any self-steering features. The move follows California regulatory action and represents Tesla's quest for recurring revenue streams.
-
Yahoo is adding generative AI to its search engine β Yahoo announced Yahoo Scout, a new AI-powered answer engine in beta powered by Anthropic's Claude. Scout synthesizes information from the web and Yahoo's data with interactive media, structured lists, and visible source links.
-
Interactive tools in Claude (MCP Apps) β Claude now offers interactive tools that allow users to open and interact with connected apps directly within conversations. Built on MCP Apps, the new extension to the Model Context Protocol enables developers to deliver interactive UI within any supporting AI product.
-
Qwen3-Max-Thinking debuts with focus on hard math, code β Alibaba Cloud released Qwen3-Max-Thinking, a flagship reasoning model for complex math, coding, and multi-step agent workflows. The model features a 262,144-token context window and can interleave tool calls within reasoning, equipped with built-in web search and code interpreter.
-
Terminally online Mistral Vibe β Mistral AI released Vibe 2.0, powered by Devstral 2, featuring custom subagents, multi-choice clarifications, slash-command skills, and unified agent modes for terminal-native coding. Available on Le Chat Pro and Team plans with Devstral 2 API access moving to paid tiers.
-
Thread by @karpathy β Andrej Karpathy shares observations on the rapid shift from 80% manual coding to 80% agent coding in just weeks during November-December 2025. He notes that LLM coding feels more fun but requires careful oversight as models make subtle conceptual errors and tend to overcomplicate solutions.
-
OpenAI to add shopping cart and merchant tools to ChatGPT β OpenAI is developing commerce-focused features for ChatGPT including a dedicated shopping cart section for tracking products and finalizing purchases. The company is also building a merchant submission page for sellers to upload product feeds.
π§ͺ Research
-
AI model from Google DeepMind reads recipe for life in our DNA (AlphaGenome) β Google DeepMind released AlphaGenome, an AI model that can analyze one million letters of DNA code at a time to unravel the 'dark genome' (98% of DNA that doesn't code for proteins). The model can predict the impact of changing even a single letter in genetic code and could accelerate understanding of genetic diseases, cancer, and drug target discovery.
-
How AI assistance impacts the formation of coding skills β Anthropic's randomized controlled trial with 52 software engineers found that using AI assistance led to a 17% lower quiz score compared to manual coding. The study revealed that developers who used AI to build comprehension retained more knowledge than those who relied on it purely for code generation.
-
On-Device LLMs State of the Union, 2026 β Meta AI researchers provide a comprehensive overview of on-device LLM progress, showing that billion-parameter models now run in real time on flagship devices through efficient architectures, advanced quantization techniques (4-bit as default), and optimizations like speculative decoding and KV cache management.
-
Robbyant Open-Sources LingBot-World, a World Model for Millisecond-Level Real-Time Interaction β Robbyant released LingBot-World, an open-source world model achieving nearly 10 minutes of continuous video generation with 16 FPS throughput and sub-one-second interaction latency. The model enables real-time control via keyboard/mouse and supports zero-shot generalization from single images.
-
NVIDIA Launches Earth-2 Family of Open Models β NVIDIA unveiled the Earth-2 family of open models for AI weather and climate prediction, including 15-day forecasts, zero to six-hour storm predictions, and global data assimilation. The fully open, accelerated stack makes production-ready weather AI accessible for organizations to run and fine-tune.
-
Microsoft introduces newest in-house AI chip β Maia 200 β Microsoft launched Maia 200, built on TSMC's 3nm process with 140 billion transistors, achieving 10 petaflops of FP4 compute and 30% more performance per dollar than Maia 100. The chip features 216GB HBM3e memory and operates at 750W TDP, offering better efficiency than Nvidia's B300.
π οΈ Tools
-
Introducing Prism (OpenAI) β OpenAI launched Prism, a free AI-native workspace for scientific writing and collaboration powered by GPT-5.2, offering unlimited projects and collaborators. Built on the acquired Crixet platform, Prism integrates LaTeX editing, literature search, equation handling, and real-time collaboration with AI assistance.
-
ChatGPT Containers can now run bash, pip/npm install packages, and download files β ChatGPT's code execution environment has been massively upgraded to run Bash commands, Node.js alongside Python, install packages via pip and npm, and download files from the web. The update enables ChatGPT to write and test code in 10+ languages while maintaining safety through URL verification.
-
Grok Imagine API (xAI) β xAI unveiled the Grok Imagine API, a unified bundle for end-to-end creative workflows featuring world-class video generation and editing models. Grok Imagine ranks #1 on Artificial Analysis text-to-video benchmarks, delivering superior quality with lower latency and price.
-
MoonshotAI kimi-agent-sdk β Kimi Agent SDK is a multi-language library (Go, Node.js, Python) that exposes the Kimi Code agent runtime for programmatic use. It enables developers to build custom applications, automate tasks, and extend capabilities with custom tools while reusing Kimi CLI configuration.
-
Open Coding Agents (Ai2) β Ai2 released SERA (Soft-verified Efficient Repository Agents), achieving 54.2% on SWE-Bench Verified while requiring only 40 GPU days to train. The breakthrough uses soft-verified generation that enables repository-specific specialization where a 32B model can surpass its 110B teacher at just $1,300 cost.
-
Unrolling the Codex agent loop β OpenAI's technical deep-dive explains how Codex CLI's agent loop orchestrates interaction between user, model, and tools through the Responses API. The post details prompt construction, tool execution, context window management, and prompt caching strategies in building a production software agent.
π Closing Reflection
Week 05 showcased the AI industry at an inflection pointβagent systems gaining genuine coordination capabilities while the infrastructure race intensifies toward trillion-dollar commitments. The convergence of vision, code execution, and persistent state management suggests we're moving past the demo phase into genuinely useful autonomous systems worth exploring in depth.
π Thanks & Contact
Thanks for reading! If you have suggestions or feedback, I'd love to hear from you via my contact form. See you next week!