AI Weekly Trends – Highly Opinionated Signals from the Week [W27]
🔗 Learn more about me, my work, and how to connect: maeste.it – personal bio, projects, and social links.
This week I'm stepping away slightly from two of my favorite topics, vibe coding and robotics. Not because these trends have lost steam, but because I'm focusing more on multi-agent systems and the emerging techniques for their management that had to give way to news in previous weeks, but deserve abundant space in this newsletter. Inspired by the work I'm doing with my team and Google to give the community a Java SDK for the Agent2Agent protocol, I've delved deep this week into some of the core concepts of agent systems.
If you're interested in the evolution of software and Vibe Coding, we had an interesting chat about this and much more in the "Risorse Artificiali" podcast that came out Sunday morning (Italian only) on 📺Youtube and 🎧Spotify.
Agents and Multi-Agent Systems: Understanding Concepts and Techniques
Agents versus tools represents one of the most crucial distinctions in modern AI architecture. The fundamental article "Agents are not tools" brilliantly clarifies this boundary, distinguishing between tools as "units of action" and agents as "participants in problem solving." This perfectly captures what I often argue in conversations around Tools vs Agents vs Agent-as-Tool. 🛠️ Tools perform well-scoped, static actions (whether reading or writing), while 🧠 agents decide, adapt, and influence the logic and flow of entire systems.
When the article says that we should limit the power to break flow to agents, and regulate communication between them, precisely explains why we need clear, shared protocols between agents like A2A, standardizing how agents cooperate and make decisions together. The A2A protocol will allow AI agents to communicate with each other, securely exchange information, and coordinate actions on top of various enterprise platforms or applications, with support from major players as technology partners.
The rise of context engineering represents a paradigm shift from simple prompting to comprehensive system design. The new skill in AI is not prompting, it's context engineering reveals that context engineering emerges as the art of providing everything necessary for a task to be plausibly solved by a large language model. Context encompasses everything a model sees before generating a response: prompts, data in memory, retrieved information, available tools, and definitions for structured outputs. This competency surpasses simple prompting in importance for developers working with LLMs and modern agent systems.
LangChain's detailed guide on Context Engineering explores popular patterns and implementation methods for structuring context effectively in agents. The guide provides practical methodologies for improving performance and reliability of agent systems, offering valuable resources for developers implementing more sophisticated and contextually aware agents.
Andrey Karpathy's endorsement of "context engineering" over "prompt engineering" captures the essence perfectly. He explains that in every industrial-strength LLM app, context engineering involves the delicate art and science of filling the context window with just the right information. It's science because it requires task descriptions, explanations, few-shot examples, RAG, related multimodal data, tools, state and history, and compacting. Too little or wrong form hurts performance; too much increases costs and can degrade results. It's art because of the guiding intuition around LLM psychology of "people spirits."
The rise of context engineering further emphasizes that context engineering builds dynamic systems providing the right information and tools in the right format for LLMs to accomplish tasks. Most agent reliability issues stem from inadequate communication of appropriate context, instructions, and tools to the model. As LLM applications evolve from single prompts to complex, dynamic agentic systems, context engineering becomes the most important skill an AI engineer can develop.
Looking at practical implementations, Cursor's transformation showcases multi-agent evolution in action. Cursor can now run coding tasks in background while users are away, functioning on any browser or mobile device. Teams can review agent-generated diffs and create pull requests directly from the web interface, with Slack integration for notifications and agent activation.
The recent Cursor changelog reveals its transformation from AI code assistant to a multi-agent collaborative programming system. Beyond the web agent and app, Cursor introduces all components of a multi-agent system crucial for successful background agent coding with minimal human assistance. The latest release includes:
Agent To-dos: Creating long-term plans in textual mode. Interestingly, Cursor decides to plan tasks rather than concrete actions, specifying expected results instead of operation sequences. This approach delegates to an LLM the decision of how to implement necessary steps.
Queued messages: Asynchronous messages to agents that can be reviewed and reordered during context changes, enabled by careful memory management.
Memories: Also implemented textually to favor human-machine iteration and allow LLMs to work on them subsequently. The team emphasizes how memory function is fundamental for complex tasks and will be even more crucial for conditioning future agent work.
Key Takeaways for AI Engineers
Distinction is crucial: Agents are participants in problem-solving, not just action executors. Design your systems accordingly.
Context engineering mastery: Move beyond prompting to comprehensive context design including memory, tools, and state management.
AI IDEs are becoming true multi-agent systems: Cursor is implementing agents working in the background, concepts of planning, queueing and memory management.
Action Items:
Experiment with A2A Java SDK for multi-agent systems
Implement context engineering patterns in current LLM projects
Models and User Interface: Improvements, Applications, Promises
The convergence of advanced AI models and augmented reality interfaces marks a fascinating evolution in human-computer interaction. Two significant developments in smart glasses highlight this trend: Xiaomi's AI Glasses feature an integrated AI assistant for voice commands, a 12MP camera, and double the battery life of Meta's Ray-Bans, representing growing competition in AI wearables. Meanwhile, Apple's smart glasses roadmap reveals incredibly ambitious plans with three Vision series products and four smart glasses variants in development, positioning head-mounted devices as the next major consumer electronics trend.
These developments connect directly to the concept of world models exemplified by Google's Veo 3. While Veo 3 generates realistic video sequences, world models simulate real-world environment dynamics. Though not yet a true world model, Veo 3 could enable cinematic storytelling and narrative prototyping. Google plans to transform its multimodal foundation model, Gemini 2.5 Pro, into a world model, opening new possibilities for interactive experiences and gaming. This evolution toward world models represents the foundation for truly immersive augmented reality experiences where AI understands and predicts environmental dynamics.
Doppl, Google Labs' experimental app, already implements a form of augmented reality, though not real-time like we imagine in smart glasses. Using advanced AI, Doppl creates artificially generated videos for virtual clothing try-on from a photo and product image. Users upload full-body photos and screenshots or photos of outfits from any source, and Doppl generates images and animated videos showing how outfits might appear on their body. The app converts static images into dynamic videos, offering better perception of how clothes would move in real life. Available on iOS and Android in the US, it represents significant evolution in virtual try-on technology.
For developers wanting to understand models deeply, here we have two interesting resources. Running and fine-tuning Gemma 3N provides comprehensive guidance on executing Google's new Gemma 3n locally with Dynamic GGUFs on llama.cpp, Ollama, or Open WebUI, and finetuning with Unsloth. The guide offers detailed instructions for developers implementing and customizing this advanced AI model in their projects, with particular attention to local execution optimizations.
Open Source RL Libraries for LLMs from Anyscale researchers compares TRL, Verl, OpenRLHF, and six other frameworks through adoption metrics, system properties, and technical architecture. This analysis helps developers choose the right tool for RLHF, reasoning models, or agent training scenarios, providing deep insights into available tools for implementing reinforcement learning with large language models.
Key Takeaways for AI Engineers
AR interface evolution: Smart glasses with AI integration represent the next frontier in user interfaces, combining visual understanding with contextual AI.
World models emergence: The transition from video generation to world simulation enables new interactive paradigms and predictive capabilities.
Hands-on learning: Direct experimentation with model finetuning and reinforcement learning accelerates understanding of AI system capabilities.
Action Items:
Explore Gemma 3n finetuning for specialized use cases
Evaluate RL frameworks for agent improvement projects
AGI, or Even Super Intelligence Are Coming
The race toward artificial general intelligence and superintelligence accelerates with concrete experiments and massive investments. Anthropic's experiment with Claude running a real vending machine business provides fascinating insights into AI autonomy limits. While Claude successfully found suppliers and adapted to customer requests, even sourcing unusual items like tungsten cubes, it consistently lost money by selling items below cost and yielding to discount requests. Claude also hallucinated fictitious encounters, claiming to have physically visited the Simpsons' address and insisting it could wear blazers and make deliveries in person. This experiment clearly tests model autonomy on long-duration, high-complexity tasks, revealing both capabilities and limitations in real commercial scenarios.
Meta's creation of Superintelligence Labs signals serious investment in catching up to or leading AGI development. The labs will host Meta's various teams working on foundational models, led by recent hires including former Scale AI CEO Alexandr Wang and former GitHub CEO Nat Friedman. Meta expects to begin research on next-generation models to reach the frontier within the next year. The internal memo from Mark Zuckerberg reveals Meta's ambitious roadmap and recent strategic hires.
Despite concerns about overly intelligent machines, medical field results prove remarkably promising for positive technology use. Microsoft's MAI Diagnostic Orchestrator achieves diagnostic results 4x superior to experienced doctors on the most difficult medical cases, marking a "step toward medical superintelligence." MAI-DxO simulates a virtual medical team with specialized AI agents managing hypothesis generation, test selection, and cost monitoring. Researchers created SDBench, a benchmark with 304 complex cases where MAI-DxO, combined with OpenAI o3, achieved 85.5% accuracy versus 20% for doctors with 5-20 years experience. The system also proves more economical, spending $2,397 per case versus $2,963 for doctors. The system transforms New England Journal of Medicine cases into interactive diagnostic challenges, representing significant advancement toward intelligent medical diagnostic automation.
Key Takeaways for AI Engineers
Autonomy experiments: Real-world AI deployment reveals critical gaps between capability and reliability, essential for designing robust systems.
Medical superintelligence: Domain-specific AI already surpasses human experts in complex diagnostics, demonstrating AGI's near-term practical impact.
Investment acceleration: Major tech companies commit massive resources to AGI development, signaling imminent breakthroughs.
Action Items:
Study failure modes in autonomous AI experiments
Explore domain-specific AI applications in your field
Money and Strategy
The AI landscape witnesses unprecedented acquisition activity and strategic investments as companies race to secure talent and technology. OpenAI's acquisition of the Crossing Minds team strengthens their recommendation AI capabilities. Crossing Minds primarily worked with e-commerce companies to improve personalization and recommendation systems, representing strategic enhancement of OpenAI's user engagement and digital experience personalization capabilities.
Meta's talks to acquire voice cloning startup Play AI would significantly advance Meta's voice synthesis capabilities, potentially integrating advanced voice cloning technologies into their social and virtual reality products. This acquisition aligns with Meta's broader push into immersive experiences requiring natural voice interaction.
xAI's massive $10 billion raise combines debt and equity through secured notes, term loans, and strategic equity investment. The funding provides firepower to build infrastructure and develop their Grok AI chatbot. xAI has already installed 200,000 GPUs at their Colossus facility in Memphis, Tennessee, with plans to build a 1 million GPU facility outside Memphis, demonstrating the scale of compute required for competitive AI development.
Apple's consideration of AI partnerships for Siri marks a significant strategic shift. Repeated delays force Apple to abandon their typically insular approach, with both Anthropic and OpenAI training specialized versions of their models to run on Apple's cloud infrastructure. This openness represents fundamental change in Apple's AI strategy, traditionally preferring internal technology development.
Interestingly, Amazon's AI foundation model for robotics focuses specifically on business optimization rather than competing for the best LLM. Amazon introduced a new AI foundation model optimizing warehouse robotics and celebrated deploying one million robots. The model improves efficiency by 10% and supports faster deliveries, representing significant integration between AI and industrial automation at scale. This targeted approach to AI investment demonstrates how companies can leverage AI for specific business advantages without joining the general-purpose model race.
Key Takeaways for AI Engineers
Talent concentration: Major players aggressively acquire specialized AI teams, indicating the value of niche expertise in recommendation, voice, and robotics.
Infrastructure scale: xAI's GPU deployments reveal the massive compute requirements for competitive AI development.
Strategic pivots: Even Apple abandons isolation for AI partnerships, showing the collaborative nature of modern AI development.
Action Items:
Identify niche AI capabilities valuable for acquisition
Calculate infrastructure needs for your AI projects early
🔗 Learn more about me, my work, and how to connect: maeste.it – personal bio, projects, and social links.