AI Weekly Trends – Highly Opinionated Signals from the Week [W20] 🚀

May 19, 2025

Follow me: 🐦 X | 💼 LinkedIn | 📬 Substack | 📝 Medium (with voiceover)

Hey there, fellow AI engineers! What a wild week in AI! 🚀

I’m not sending the bibliography anymore in the newsletter to make it better fitting in email clients, but you can get it by reading the same article (but with bibliography) in my Medium account, which is great also if you want to listen to the article with their excellent text-to-speech service.

Let me walk you through what I've been seeing this week and why I think these changes matter for our work.

1. Code Whisperers: The Rise of Vibe Coding and AI-Powered Development

I'm watching a transformative shift in software development with the emergence of "vibe coding", a conversational approach to programming where developers describe their intent and AI models generate the code. This trend is radically changing how software is created, who can create it, and what gets built.

Specialized Coding Models Taking Center Stage

The coding model ecosystem is expanding rapidly. Together AI just released DeepCoder, a 14B parameter model that claims performance similar to OpenAI's o3-mini but with complete transparency - its dataset, code, training logs, and optimizations are all open. This marks a significant availability shift for high-quality coding models outside major AI labs.

After being acquired by OpenAI, Windsurf announced a family of specialized coding models: flagship SWE-1 (comparable to Claude Sonnet 3.5), unlimited-use SWE-1-lite, and SWE-1-mini. Their strategic approach involves training on incomplete code states across multiple work surfaces. Windsurf believes this specialization will eventually outperform general-purpose models for coding tasks, potentially signaling a future where domain-specific models dominate development workflows.

Major Players Expanding AI Coding Capabilities

Tech giants aren't standing still. Google will reportedly unveil an AI software development agent at its upcoming I/O conference. This agent, codenamed 'Codey', would assist with the complete development lifecycle, placing Google in direct competition with Anthropic's Claude Code and OpenAI's Windsurf. Google's entry is particularly noteworthy since Google is used to give great free and trial subscriptions that will permit many of us to give it a try - potentially accelerating industry-wide adoption.

OpenAI continues improving its capabilities with the GPT-4.1 family, offering significant advances in coding abilities and instruction following. Their official prompting guide gives developers practical insights for leveraging these improvements.

OpenAI's Deep Research agent connects directly to GitHub repositories for analysis, creating new workflows for understanding existing codebases. This capability is valuable to analyze an existing source code, both for having it documented and as a starting point for vibe coding. The integration of code analysis with generation represents a step toward more contextually aware AI coding assistants.

Evolving Tools and Methodologies

The ecosystem supporting vibe coding continues diversifying. Void, an open-source AI code editor built as a VS Code fork, offers direct connections to AI models without third-party servers. It features Agent Mode (allowing AI to search, create, and modify files) and specialized tracking for AI-suggested changes.

Testing is evolving alongside coding practices. Testsigma's Agentic Testing brings AI agents to QA teams. It's important to consider a new way to write tests in the era of a new way to write code. Testing is also different when the outputs are not deterministic. Traditional testing approaches must adapt to handle the variable nature of AI-generated code.

Challenges in the Vibe Coding Era

Despite the excitement, vibe coding faces significant challenges. A comprehensive survey on hallucinations in code generation LLMs catalogs how generated code can contain incorrect elements that only manifest under specific execution paths, making them difficult to detect before deployment.

The economic incentives are also concerning. An article on the perverse incentives of vibe coding describes how AI coding assistants operate on variable-ratio reinforcement – an unpredictable pattern that triggers dopamine like gambling. One developer reportedly spent over $1,000 "vibe coding" while discovering that AI companies charging by token count might incentivize verbose code generation.

And I agree, vibe coding is addictive. I go to bed with wide-open eyes like zero calcare comics, thinking 'I could give another prompt and get that thing improved'...like when I started learning to code. It’s a lot of fun indeed…but you know I’m a geek.

Democratizing Software Development

The implications extend far beyond individual developers. Vibe coding is taking more and more space, making it possible to create software that would probably never be coded. For example, when vibe coding from platforms designed for non-developers becomes reliable, we'll see many spreadsheets that non-developers create to track conferences, deliveries and processes becoming vibe-coded apps.

This democratization could unlock an entirely new category of applications created by non-developers for niche needs that never justified traditional development resources. The photography analogy is particularly insightful: having 90% of code written by AI by the end of 2026 doesn't mean developers will write less code. I expect that 10% could be even more than the 100% of code today. Think about photography: we're creating vastly more photos than before year 2000, 99% from smartphones, but professional photographers aren't shooting less ; they're shooting more because of how digital photography expanded the industry. Software will follow the same path.

This suggests vibe coding won't reduce demand for skilled developers but will increase total code volume, with developers focusing on the critical 10% requiring human expertise while the overall software ecosystem dramatically expands.

Microsoft's approach to giving LLMs access to Python debuggers (paper) points toward agentic vibe debugging, where AI helps debug code by leveraging execution information. This integration throughout the development lifecycle suggests a future where the entire process is AI-augmented.

2. Autonomous Innovators: How AI Agents Are Redefining Software Development

AI agents - autonomous systems that can perceive their environment, make decisions, and take actions to achieve specific goals - are rapidly evolving from research projects to practical tools with profound implications for software development. This trend intersects closely with vibe coding but extends beyond code generation to encompass a broader range of capabilities for independent problem-solving.

From Code Generation to Code Execution

A significant advancement in agentic AI is the ability to not just generate code but also execute it safely. MCP Run Python, an MCP server from Pydantic, enables running LLM-generated Python code in a sandbox. This could be very important for agents generating dynamic code to be executed as part of their workflow. Security and validation are key points here to avoid injection and other security risks. This capability allows AI agents to not just suggest code but actually test and refine it through execution, creating a more robust development process.

However, these capabilities also introduce new security challenges. The Model Context Protocol (MCP), introduced by Anthropic AI, connects LLMs with tools and data but lacks default security features, posing significant risks. Experts have warned about vulnerabilities, including prompt injections and tool tampering. As agents gain the ability to execute code, security becomes increasingly critical.

Some progress is being made on this front. A new paper from DeepMind describes strategies for defending against prompt injection attacks, which could help mitigate some of these security concerns. This may represent the first significant progress in defeating prompt injection after two and a half years of research, according to Simon Willison.

Models Evolving to Support Agency

The capabilities of foundation models continue to evolve in ways that support increasingly autonomous agents. Google's open Gemma 3 models have taken several steps forward, now supporting function calling and larger (128K) context windows. Quantization-aware training optimizes their performance to make the models accessible for less-powerful hardware: a single GPU or even a GPU-less laptop. These developments are particularly significant for local LLMs working on the same hardware as the agent instead of having API calls for LLMs, enabling more efficient and private agent implementations.

Meanwhile, Anthropic is reportedly preparing to launch advanced versions of Claude's Sonnet and Opus models featuring hybrid thinking and expanded tool use capabilities. These models are reportedly capable of alternating between reasoning and tool use, and can self-correct by stepping back to examine what went wrong. For coding specifically, the models can test their generated code, ID errors, troubleshoot with reasoning, and make corrections without requiring human intervention. This represents a significant advance in autonomous problem-solving capabilities.

Agents in Practice

The theoretical capabilities of AI agents are increasingly being realized in practical tools. Manus has eliminated its waitlist, offering broader access to its virtual desktop AI agent with one free daily task for all users and a one-time bonus of 1,000 credits. Manus also introduced image generation, allowing its agentic AI to accomplish visual tasks with step-by-step planning. Manus is a great general-use agent system, and now everyone can give it a try, because that agent technology is becoming more accessible to mainstream users.

Perhaps the most impressive example of agentic AI in practice is Google DeepMind's AlphaEvolve, an artificial intelligence agent that can invent brand-new computer algorithms. It pairs Google's large language models with an approach that tests, refines, and improves algorithms automatically. AlphaEvolve proposes code, tests it through automated evaluators, and builds upon successful approaches to develop increasingly effective algorithms across entire codebases. This process has yielded significant improvements across Google's infrastructure, from data center efficiency and chip design to AI training optimization, demonstrating the practical impact of agentic code generation and refinement.

The Exponential Growth of Agent Capabilities

A study from Metr.org measures AI agent performance in terms of the length of tasks they can complete, showing that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. Looking carefully at the charts suggests that the trajectory is not linear, but could be exponential (even if at the very beginning of the curve), highlighting the potential for rapid acceleration in agent capabilities.

Infrastructure for the Agent Economy

As agents become more capable of independent action, new infrastructure is emerging to support what some call the agent economy. An article on why agents need a new payment stack discusses the technical and practical hurdles that need to be cleared before AI agents can autonomously conduct transactions from discovery to purchase.

As I wrote last week in my article Ensuring Trust and Privacy in AI Agent Systems: Using Blockchain Smart Contracts, Performance Bonds, and Zero-Knowledge Proofs, it's not just a matter of payment, and I think connecting agentic AI to value could be important from many perspectives. This suggests that blockchain technology and other trust mechanisms could be critical infrastructure for autonomous agents that handle sensitive tasks or financial transactions.

Transforming the Web

The implications of AI agents extend beyond individual applications to reshaping the entire web. An analysis of how AI agents will change the web for users and developers suggests that AI agents will transform the web by autonomously interacting and exchanging content, significantly altering both user experience and web development practices. This may result in an autonomous internet where AI agents dominate interactions, prompting changes in content presentation, payment systems, and business models. Developers will need to adapt by creating APIs for AI agents and focusing on personalized, scalable user experiences.

3. Evolution Accelerated: The New Wave of Language Models Reshaping AI

The landscape of large language models (LLMs) continues to evolve at a breathtaking pace, with new architectures, capabilities, and approaches emerging constantly. This evolution is not merely incremental but represents fundamental shifts in how models are designed, trained, and deployed, with profound implications for AI engineers.

Strategic Model Development and Release Timing

Major AI labs are carefully timing their model releases, balancing competitive pressures with technical achievements. Meta is reportedly pushing back the projected June launch timeline for its Llama Behemoth model to the Fall due to a lack of significant improvement. This suggests that the competition between major labs has reached a stage where incremental improvements are no longer sufficient to justify major releases.

Meanwhile, Anthropic is reportedly preparing to launch advanced versions of Claude's Sonnet and Opus models featuring hybrid thinking and expanded tool use capabilities. An Anthropic model, codenamed Neptune, is undergoing safety testing, with some believing the name hints at a 3.8 (8th planet from the sun) release. The news coincides with Anthropic launching a new bug bounty program focused on testing Claude's principles on safety measures.

These developments highlight a growing emphasis on substantial qualitative improvements rather than just scaling, with particular focus on reasoning capabilities, self-correction, and tool use. For AI engineers, this signals a shift from simply adopting larger models to selecting models with specific architectural advantages for particular use cases.

New Architectures and Approaches

Beyond the conventional scaling race, novel architectural approaches are emerging that could fundamentally change how models think. Sakana AI unveiled Continuous Thought Machines (CTMs), a new type of model that makes AI more brain-like by allowing it to 'think' step-by-step over time instead of making instant decisions like current AI systems do. Unlike most AI that processes information in a static, one-shot way, the CTM considers how its internal activity unfolds over time, much like human brains do.

This approach draws inspiration from real brains, where the timing of when neurons activate together is crucial for intelligence. Sakana demonstrated the CTM solving complex mazes, visibly tracing possible paths through the maze as it thinks, and tackling image recognition by viewing different parts of an image and spending more time based on the difficulty of the task.

As the user notes, Sakana is a unique AI startup in its mission to bring 'nature-inspired' methods to AI models, and these CTMs provide a differentiator that could help bring the flexibility and adaptability of human brains to advanced systems — leading to AI that reasons, learns, and solves problems in a more human-like fashion. This represents a potentially significant shift in how models approach complex reasoning tasks.

In a similar vein, AM-Thinking-v1 advances the frontier of reasoning at 32B scale. This reasoning-optimized language model demonstrates state-of-the-art performance among dense models of its size by employing a meticulously designed post-training pipeline, including Supervised Fine-Tuning and Reinforcement Learning, to achieve reasoning capabilities comparable to larger Mixture-of-Experts models without relying on private data or massive architectures.

These developments suggest that architectural innovations and specialized training pipelines can achieve reasoning capabilities previously thought to require much larger models, potentially making advanced reasoning more accessible and efficient.

Models for Resource-Constrained Environments

A significant trend is the optimization of models for deployment on less powerful hardware. Google's open Gemma 3 models now support function calling and larger (128K) context windows, while quantization-aware training optimizes their performance to make the models accessible for less-powerful hardware: a single GPU or even a GPU-less laptop.

Similarly, Stability AI has open-sourced Stable Audio Open Small, a 341M parameter text-to-audio model optimized to run on Arm CPUs. It can generate 11-second audio clips on smartphones in under 8 seconds.

These developments expand the range of environments where sophisticated AI models can be deployed, enabling edge computing applications and reducing dependency on cloud-based API services. For AI engineers, this opens new possibilities for creating responsive, private, and cost-effective AI applications that run directly on user devices.

Understanding Model Limitations

As models become more capable, understanding their limitations becomes increasingly important. Research from Microsoft on how LLMs get lost in multi-turn conversations shows that LLMs perform significantly worse in multi-turn conversations, with an average 39% drop in task performance due to unreliability and early, incorrect assumptions. This study shows how important it is to write a good zero-shot prompt. This is particularly significant in vibe coding and agentic AI.

This research highlights the importance of prompt engineering and the need for strategies to maintain model performance across extended interactions. It suggests that AI engineers should carefully consider interaction design, particularly for applications involving multi-turn dialogues or extended problem-solving sessions.

Industry Dynamics and Competition

The LLM landscape is increasingly characterized by intense competition among major labs, with each trying to establish unique advantages. While OpenAI, Anthropic, and Google have been leading with flagship models, smaller players are finding niches through specialization and open approaches.

The competitive dynamics are pushing all players to innovate faster, but also raising questions about the sustainability of the current development pace. As models become more capable, the differentiating factors shift from raw capabilities to reliability, safety, and specialized features.

For AI engineers, this competitive environment creates both opportunities and challenges. On one hand, the rapid pace of innovation provides access to increasingly powerful tools. On the other, the fragmentation of the model ecosystem and the potential for sudden shifts in capabilities require flexible architectures that can adapt to evolving model landscapes.

4. Market Movers: How Companies Are Positioning in the AI Gold Rush

The business landscape around AI is evolving rapidly as major tech companies, startups, and VCs jockey for position in what many see as a transformative technological wave. Understanding these market dynamics helps AI engineers navigate career opportunities and technology adoption decisions.

Funding and Strategic Shifts

The scale of investment in AI continues to grow astronomically. SoftBank's $100B commitment towards OpenAI's Stargate is reportedly stalling due to concerns over U.S. tariffs and rising data center costs. Meanwhile, Perplexity is set to raise a $500M round that would value the company at $14B, showing continued investor confidence in specialized AI applications.

To foster their ecosystems, Google DeepMind launched the AI Futures Fund, giving AI startups early access to advanced models, funding, and technical expertise.

However, an analysis of points of friction in the VC industry suggests AI may fundamentally transform traditional venture capital. As software creation becomes more accessible through AI, valuable startups will increasingly tackle what remains difficult – selling to complex industries or building supplier networks others can't replicate, rather than simply assembling large engineering teams.

Strategic Partnerships and Platform Ambitions

Companies are forming strategic partnerships to position themselves for the evolving AI landscape. Perplexity and PayPal announced a new partnership enabling PayPal and Venmo checkout options on the AI platform. This shows how much payment system and AI are important for companies in this market, highlighting the emerging integration of AI with financial infrastructure.

Meanwhile, Sam Altman, CEO of OpenAI, shared a bold new vision: The company is working on a shared AI operating system based on ChatGPT that aims to become the central part of people's digital lives. This platform would offer smart interfaces across devices with a model that understands everything in a user's life, from emails and chats to books and videos. This positions OpenAI not just as an AI provider but as a potential foundational platform for future digital experiences.

Mainstreaming AI-Generated Content

Major platforms are bringing AI-generated content to mainstream audiences. TikTok, with its 1.8B monthly users, is rolling out AI Alive for transforming still images into videos. Similarly, Amazon's Audible is expanding its AI-narrated audiobook library with over 100 voices across multiple languages, while Spotify has made similar moves with ElevenLabs. These developments suggest AI-generated media is rapidly moving from novelty to mainstream, with significant implications for content creators across media formats.

Safety and Transparency Focus

As capabilities advance, companies are increasingly emphasizing safety and transparency. OpenAI launched a new Safety Evaluations Hub displaying test results for its models across metrics like harmful content generation, hallucination rates, and jailbreak attempts. The release comes after critiques that the company is not transparent with safety testing, representing a response to growing calls for greater accountability in AI development.

5. Beyond Code: AI-Powered Robotics Bridging Digital and Physical Worlds

While software AI captures most headlines, the integration of AI with robotics represents a crucial frontier where digital intelligence meets physical reality, creating new possibilities for automation and physical world interaction.

Vision-Language Models: The Eyes and Brain of Robot Systems

Vision-language models (VLMs) have become essential for robotics, enabling machines to understand and interact with their visual environment. Vision Language models are super important for robotics and many research in this area confirm this.

Hugging Face's VLM landscape analysis shows how these models have advanced with smaller, more capable architectures that enable reasoning, video understanding, and multimodal agents. This evolution makes it increasingly feasible to deploy sophisticated vision capabilities on robots with limited computational resources.

Several research papers demonstrate rapid progress in this domain:

Diffusion-VLA unifies diffusion models with autoregressive techniques to scale robot foundation models
DexVLA enhances vision-language models with diffusion experts for dexterous manipulation
TinyVLA creates efficient models requiring less data and computing power while enabling effective manipulation

These advances suggest robots will increasingly understand their environment through natural language instructions and visual perception, simplifying human-robot collaboration and enabling more flexible automation.

Robotics Becomes More Accessible

First of all, breaking news: I got the first part of my HuggingFace robot: I cannot wait to get the other parts and get started experimenting with it.

Meanwhile, established players continue making progress. Tesla's Optimus humanoid robot appears to be catching up to competitors, with CEO Elon Musk telling shareholders it represents a multi-trillion-dollar opportunity. Tesla recently released video showing an Optimus prototype dancing, and the company is already using robots in its factories.

These developments suggest robotics is following a similar trajectory to other technologies, with increasing accessibility enabling a broader range of participants to contribute to innovation.

Real-World Applications and Limitations

Beyond consumer applications, robotics continues making significant inroads in industrial and commercial settings. Amazon's warehouse stowing robot matches human performance in warehouse operations while highlighting robotics' current frontier - its specialized hardware and AI vision can successfully handle diverse items at scale, yet the 14% failure rate demonstrates why full warehouse automation remains elusive despite significant advances.

This example highlights an important reality: while robotics and AI are making impressive progress, physical world challenges often impose significant constraints that don't exist in purely digital domains. The 14% failure rate would be unacceptable in many production environments, suggesting human-robot collaboration remains optimal for many applications.

In South Korea, robot chefs are being deployed at highway restaurants, with tech companies introducing collaborative robots alongside humans in hotels, elder care, schools, and restaurants. The bots aim to address labor shortages in the rapidly aging nation, with the government planning to increase robot workers to 1 million by 2030. However, reactions have been mixed from both workers and customers, highlighting the social and cultural challenges that accompany technological transitions.

6. The Ripple Effect: AI's Broader Impact on Work, Health, and Society

Beyond technical innovations, AI is creating profound ripple effects throughout society, reshaping employment patterns, educational approaches, healthcare research, and fundamental social structures. Understanding these broader impacts is essential for AI engineers who want to create responsible, beneficial AI systems.

Employment Dynamics in the Age of AI

The relationship between AI and employment is complex and often controversial. Recent high-profile workforce reductions have put this issue in the spotlight. Klarna CEO says AI helped company shrink workforce by 40%, noting that the headcount reduction wasn't solely due to AI but also because of attrition. Similarly, Microsoft is laying off about 6,000 people, or 3% of its workforce.

There's ambiguity about the true causes of these workforce reductions. It's unclear if those are layoffs and a shrinking workforce, which have different causes, but it is easier to blame AI. This perspective suggests that AI may sometimes serve as a convenient explanation for workforce reductions driven by multiple factors.

A more nuanced view emerges from an analysis of AI's Second-Order Effects, which suggests that founders should explore AI's second-order effects, like workforce reallocation and regulatory compliance, for sustainable growth. It's important to consider workforce relocation and requalification to improve the company and the productivity in a new way, instead of shrinking the workforce in favour of 'just' AI. This perspective emphasizes that AI's impact on employment can be positive when it focuses on augmenting human capabilities rather than simply replacing them.

AI in Education

The educational impact of AI is also emerging as a significant area of research. A meta-analysis published in Nature shows that ChatGPT significantly boosts learning, performing best in problem-based scenarios. The analysis of 51 studies shows that ChatGPT substantially improves student learning performance while moderately enhancing learning perception and higher-order thinking. It was most effective in problem-based learning environments with consistent usage for 4-8 weeks.

This research suggests that AI may be particularly valuable as an educational tool when integrated into active, problem-solving approaches to learning rather than passive information consumption. For AI engineers, this highlights the importance of designing systems that engage users in collaborative problem-solving rather than simply providing answers.

Transforming Healthcare Research

Perhaps the most profound long-term impact of AI may be in healthcare research. Impact on health researches are just at the beginning, but could be mindblowing and the most important revolution.

OpenAI released HealthBench, a benchmark created with 262 physicians to evaluate how AI systems perform in health conversations and establish a new standard for measuring AI's safety and effectiveness in medical contexts. Recent models appear to perform much better on this benchmark, with OpenAI's o3 scoring 60% compared to GPT-3.5 Turbo's 16%. The results also revealed that smaller models are now much more capable, with GPT-4.1 Nano outperforming older options while also being 25x cheaper.

Several research projects highlight the potential of AI in healthcare:

TrialMatchAI is an end-to-end AI-powered clinical trial recommendation system that automates patient-to-trial matching by processing heterogeneous clinical data. Built on fine-tuned, open-source large language models within a retrieval-augmented generation framework, it ensures transparency and reproducibility while maintaining a lightweight deployment footprint suitable for clinical environments.
Integrating Single-Cell Foundation Models with Graph Neural Networks explores how AI-driven drug response prediction holds great promise for advancing personalized cancer treatment. The study investigates whether incorporating the pretrained foundation model scGPT can enhance the performance of existing drug response prediction frameworks.
Mass General Brigham's researchers introduced FaceAge, an AI tool that can estimate a person's biological age and improve cancer survival outcome predictions simply by analyzing their facial photograph. The study found that cancer patients, on average, appeared about 5 years older, with a higher FaceAge correlating with worse survival rates. In physician testing, doctors showed significant improvement in accuracy when predicting 6-month survival when adding FaceAge risk scores to clinical data.

While we're taught not to judge books by covers, our faces may actually reveal crucial health insights. By quantifying what physicians have intuitively observed for decades, this tech turns facial characteristics into actionable biomarkers that may help doctors personalize treatments more precisely than ever before.

Follow me: 🐦 X | 💼 LinkedIn | 📬 Substack | 📝 Medium (with voiceover)

Artificial Code

Discussion about this post