🔗 Learn more about me, my work, and how to connect: maeste.it – personal bio, projects, and social links.
This week brings significant insights from MIT NANDA's enterprise AI research that's been making waves on social media, though often for the wrong reasons. While headlines scream about AI's enterprise failures, the actual research tells a much more nuanced story about transformation challenges and opportunities. I've spent considerable time diving deep into the State of AI in Business 2025 Report, and what I found confirms many of the trends we've been discussing here for weeks. The report doesn't show failure; it shows evolution. It highlights the gap between pilot projects and production deployments, the critical importance of agent memory systems, and the emerging infrastructure that will enable true enterprise transformation. Beyond this cornerstone research, we're seeing fascinating developments from Perplexity's bold Chrome bid to OpenAI's aggressive pricing strategy with GPT-5, and the continued maturation of agentic architectures. For those interested in diving deeper into these topics, I discussed them at length in Saturday's podcast episode on Youtube and Spotify.
Business and Society
Key Takeaways for AI Engineers
Enterprise Reality Check: MIT NANDA reveals only 5% of organizations see transformative returns from $30-40B GenAI investments, but the "shadow AI economy" shows 90% individual adoption
Memory is Everything: Successful agent deployments depend on sophisticated memory systems and context engineering, not just raw model capabilities
Environmental Impact: Google and Mistral's new studies show AI's energy footprint is manageable with proper optimization
Action Items:
Study MIT NANDA's full report for enterprise implementation patterns
Evaluate memory-first architectures for your agent systems
What's been going on this week?
Let me start with the elephant in the room: the State of AI in Business 2025 Report from MIT NANDA. Social media's been having a field day with this one, cherry-picking the statistic that only 5% of organizations are seeing transformative returns from their AI investments. But that's a dangerously incomplete reading of what's actually a fascinating document that every enterprise AI practitioner should study in full.
I need to acknowledge that NANDA isn't a neutral observer here. They're deeply invested in agent development (check out their work), which certainly colors their perspective. Yet even with this lens, their analysis confirms trends I've been tracking for months. The report does indeed show that despite high adoption rates (mostly pilots), there's minimal transformation happening. But here's what the skeptics aren't telling you: the report thoroughly explains why and offers concrete solutions.
The research reveals that chatbots outperform custom or agent-based solutions, but attributes this to the immaturity of agent platforms rather than fundamental limitations. More critically, it emphasizes that enterprises desperately need expert guidance and frameworks. Companies attempting solo implementations without proper consulting or methodologies fail at significantly higher rates. The key success factor that emerges repeatedly is agent memory and learning capabilities. The report hammers this point home: successful pilots that graduate to production invariably feature sophisticated memory systems and use that memory for context engineering and behavioral conditioning.
Chapter 4 brilliantly outlines what enterprise clients actually need: "A vendor we trust; Deep understanding of our workflow; Minimal disruption to current tools; Clear data boundaries; The ability to improve over time; Flexibility when things change." In other words, they need agents and a reliable vendor that provides a solid framework for building them. Chapter 5 delivers NANDA's recipe for transforming pilots into successes, and while their bias toward agentic development is clear, their vision is compelling. My favorite passage captures the transformation ahead: "The infrastructure to support this transition is emerging through frameworks like Model Context Protocol (MCP), Agent-to-Agent (A2A), and NANDA, which enable agent interoperability and coordination. These protocols create market competition and cost efficiencies by allowing specialized agents to work together rather than requiring monolithic systems. And these frameworks form the foundation of the emerging Agentic Web, a mesh of interoperable agents and protocols that replaces monolithic applications with dynamic coordination layers."
The enterprise AI landscape is heating up in other fascinating ways. Perplexity's audacious $34.5 billion bid for Chrome represents a bold play to control distribution while Google faces antitrust proceedings. They're committing to keep Chromium open source, invest $3 billion in the project, and maintain Google as the default search. Meanwhile, Sam Altman's outlining OpenAI's trillion-dollar roadmap with massive infrastructure investments in data centers and specialized chips for AGI development.
On the geopolitical front, Altman's comments about China's AI capabilities suggest the US might be underestimating Chinese progress. His skepticism about export controls alone containing China's ambitions aligns with Nvidia's development of new China-specific chips that would outperform the current H20, taking advantage of Trump's recent openness to more advanced chip sales.
The environmental conversation is finally getting serious data. Google's analysis of Gemini's environmental footprint reveals each text query uses energy equivalent to watching TV for nine seconds, with the model becoming 33x more energy efficient and reducing carbon output by 44x over the past year. A Gemini query consumes 0.24 Wh, slightly below ChatGPT's 0.34 Wh average. Mistral's contribution to environmental standards emphasizes the collective responsibility across the AI value chain to address environmental impacts as AI integrates deeper into our economy.
An intriguing side note from the financial sector: research on trading with ChatGPT shows significant numbers of investors using AI for trading decisions, potentially impacting market volatility and price formation. The irony that finance professionals claim not to use GPT while trading volumes are dropping significantly on ChatGPT outages suggests otherwise isn't lost on me.
🤖 Agentic AI
Key Takeaways for AI Engineers
Declarative Agent Design: YAML-based agent definitions demonstrate that the focus is on natural language more than Python or any other language to get the best results form agents
Context Engineering > RAG: Focus on smart context window management rather than traditional retrieval
Production Reality: Developers are discussing in conferences and online, finding new recipes for this new world. Old ones don’t apply to the new world.
Action Items:
Explore YAML-based agent frameworks like ADK 1.12.0
Implement two-tier architectures with stateless subagents
What's been going on this week?
Agents and multi-agent systems are emerging as the key to moving enterprise AI beyond simple chatbots toward genuine process transformation. The trend toward declarative agent composition is accelerating, as demonstrated by ADK 1.12.0's new YAML support. My team experimented with this approach over a year ago, and the reasoning is clear: the natural language components interacting with LLMs are the true core of an agent from the end developer's perspective. Everything else like communication protocols, registry, discovery, monitoring, and security should be handled by the framework. The same for advanced context engineering where the framework should facilitate a lot.
The reAgent wrap-up from GitHub HQ brought together 150+ agent builders and highlighted the significant gap between media hype and production reality.
Tool design for agents requires rethinking traditional approaches. Reilly Wood's analysis of MCP tools explains why auto-converting existing APIs to MCP tools fails: agents struggle with large numbers of tools, APIs can explode context windows, and APIs don't leverage agents' unique capabilities. The solution involves designing tools specifically for agent consumption from the ground up.
The research frontier continues advancing with OpenCUA's comprehensive toolkit for computer-use agents, featuring 22K human demonstration trajectories across three operating systems and 200+ applications. Their "reflective long Chain-of-Thought" reasoning helps agents identify and recover from errors during multi-step tasks. Meanwhile, Manus Wide Research positions itself not just as an AI but as a unique personal cloud computing platform, defining the General AI Agents category.
Perhaps most provocatively, Chroma's founder declares "RAG is Dead, Context Engineering is King". The argument: RAG poorly bundles three concepts when the real work is context engineering, determining what belongs in the context window for each LLM generation. Top teams now use a two-stage approach: first-stage retrieval reduces 10,000 candidates to 300, then LLMs re-rank those to the final 20-30. This shift from retrieval to intelligent context management represents a fundamental rethinking of how we feed information to language models.
Even established players are embracing agentic approaches. Grammarly's new AI agents go beyond grammar correction to assist with content structuring and tone personalization, representing an evolution toward more intelligent, contextual writing assistants.
🛠️ AI Assisted Coding
Key Takeaways for AI Engineers
GPT-5 Coding Performance: Matches Claude Sonnet with proper tag-based prompting
Context Expansion: Sonnet's 1M token support enables full codebase analysis
Vibe Coding Mainstream: Microsoft integrating natural language coding directly into Excel cells
Action Items:
Master GPT-5's verbosity API and tag system
Test Sonnet's 1M context for large codebase tasks
What's been going on this week?
I've been testing GPT-5 for code generation, and it's nearly at Claude's level, at least matching Sonnet while Opus remains slightly superior. The gap is minimal and likely bridgeable with more careful prompting since each model has its own prompting style. OpenAI has invested heavily in tag-based behavior customization for GPT-5, so it probably needs better utilization than my initial attempts. The GPT-5 Prompting Guide reveals innovative features like the verbosity API parameter that controls response length independently of reasoning depth, and the Responses API that maintains reasoning context between tool calls. The coding tips cheat sheet provides practical advice on structuring requests for superior code quality and leveraging the model's reasoning capabilities for complex debugging.
Claude Sonnet 4 now supports 1M token context in public beta on Anthropic's API and Amazon Bedrock, with Vertex AI support planned. This massive context window expansion is excellent for lengthy programming contexts with extensive code or multiple problems to analyze. Apple's validation of Claude's coding capabilities comes through native Claude integration in Xcode 26 beta 7, allowing iOS and macOS developers direct access to Claude's capabilities from within the IDE.
I've repeatedly said that Vibe Coding will replace spreadsheets for many use cases. Well, Microsoft's racing to adapt by putting vibe coding directly into Excel cells. The new COPILOT function allows users to insert natural language prompts directly into cells for tasks like summarizing, categorizing, or generating data, transforming Excel into an AI-native tool accessible to non-technical users.
The gaming industry's embracing AI at massive scale according to recent reports, integrating artificial intelligence into every aspect of game development from content generation to creating intelligent NPCs. Game developers are discovering that AI can revolutionize not just asset creation but entire gameplay experiences, enabling more immersive and personalized player interactions.
📈 AI Models and Releases
Key Takeaways for AI Engineers
Price War Incoming: OpenAI's aggressive GPT-5 pricing undercuts competitors significantly
User Experience Matters: OpenAI bringing back GPT-4o after user complaints about GPT-5
Chinese Competition: DeepSeek V3.1 reaches parity with Western models at 685B parameters
Action Items:
Benchmark GPT-5 vs current models for your use cases
Evaluate DeepSeek V3.1 for open-source alternatives
What's been going on this week?
OpenAI's moves this week reveal both technical advancement and market strategy. GPT-5's pricing at $1.25 per million input tokens and $10 per million output tokens dramatically undercuts Anthropic's Claude Opus 4.1 and matches Google's Gemini 2.5 Pro baseline. This aggressive pricing could trigger an industry-wide price war. Interestingly, OpenAI's bringing back GPT-4o as an option after users complained about missing its perceived personality and flexibility. Sam Altman's commitment to addressing performance concerns shows OpenAI's responsiveness to user feedback.
The technical breakthrough behind GPT-5's efficiency comes from OpenAI's new MXFP4 data type, which promises massive computational savings compared to traditional data types. This innovation allows fitting a 120 billion parameter model into a GPU with just 80GB of VRAM, potentially reducing inference costs by 75%.
Google's advancing on multiple fronts. Gemini's new automatic memory feature remembers conversation details without prompts, personalizing output based on past interactions. The company's also expanding AI Mode for conversational search globally, adding agentic capabilities for restaurant bookings in the US for AI Ultra subscribers. On the efficiency side, Gemma 3 270M delivers strong instruction-following in a tiny 270 million parameter package, using just 0.75% battery for 25 conversations on a Pixel 9 Pro.
Chinese AI development continues accelerating. DeepSeek V3.1 quietly dropped with 685 billion parameters, achieving benchmark scores rivaling OpenAI and Anthropic's proprietary systems. The model represents a significant challenge to Western AI dominance. DeepSeek-R2's imminent launch on Huawei's AI chips promises an advanced Mixture-of-Experts architecture potentially doubling R1's parameters. ByteDance isn't sitting idle either, releasing Seed-OSS-36B with 512K token context, designed for advanced reasoning with both synthetic and non-synthetic variants.
🦾 Robotics and AI of things
Key Takeaways for AI Engineers
Reality Check: Chinese robot games show current limitations but massive investment commitment
Video as Policy: Using video generation as proxy for robot control policies
Home Integration: Gemini replacing Google Assistant across Nest devices this fall
Action Items:
Study video-to-policy transfer techniques
Prepare for consumer robotics deployment surge
What's been going on this week?
The Humanoid Robot Games in Beijing offered a fascinating window into robotics' current state. With 280+ teams from 16 countries competing in soccer, sprinting, and kickboxing, the robots fell down frequently, revealing the technology's immaturity. But that's precisely the point. These events serve multiple purposes for Chinese robotics development: public acceptance building, progress demonstration, and most critically, real-world data collection. Azeem Azhar's commentary notes that despite the frequent falls, the progress in locomotion and balance demonstrates China's rapid advancement in humanoid robotics.
China's massive investment in robotics isn't just about national pride. It's a strategic play for manufacturing dominance and addressing demographic challenges. These public demonstrations, while showing current limitations, also reveal the trajectory and commitment level. Every fall generates valuable data, every successful kick advances the field. The willingness to showcase imperfect technology publicly demonstrates confidence in the development path.
The research frontier's exploring innovative approaches to robot control. The paper "Video Generators are Robot Policies" proposes using video generation as a proxy for learning robot policies, addressing both generalization under distribution shifts and the limitation of human demonstration data. This approach could dramatically accelerate robot learning by leveraging the massive datasets used for video generation.
Consumer robotics is taking a different path through smart home integration. Google's replacing Assistant with Gemini across Nest speakers and displays this fall, bringing advanced conversational AI, Gemini Live, and multi-device awareness to smart home control. This represents a significant step toward integrating conversational AI into daily domestic life, potentially setting the stage for more sophisticated home robotics integration.
🔗 Learn more about me, my work, and how to connect: maeste.it – personal bio, projects, and social links.