AI Weekly Trends Highly Opinionated Signals from the Week [CY26W6]
🔗 Learn more about me, my work, and how to connect: maeste.it – personal bio, projects, and social links.
A week dominated by announcements from Anthropic with Claude Opus 4.6 and OpenAI with GPT-5.3-Codex just hours later. As you’ll find in the text, both bring significant improvements across various metrics, but especially in coding and the ability to develop code in long, multi-agent tasks with complex planning.
My first impressions from some direct testing and reading articles by developers I greatly respect (and that I report in this newsletter) are remarkable and even led me to write that in this specific area models are getting very close to the concept of AGI. I know that talking about AGI, where the G stands for “general,” doesn’t make sense in a specific field, but I think you understand what I mean. Seeing a model that writes/improves itself (Codex) or writes from scratch a C compiler capable of compiling the Linux Kernel (Opus)... makes even those who have seen quite a lot in computer science tremble, or perhaps exactly us.
I mentioned AGI, and spoke at length about AGI in its more proper sense and about possible futures, preferable or avoidable, in an interview with Alessandro Maserati on Wednesday on the Risorse Artificiali podcast. On Saturday, in the episode with my co-hosts Alessio and Paolo, we talked about coding agents and their use. Listen to both and let me have your comments.
New AI Models and Research
Takeaways for AI Engineers
Takeaway 1: The simultaneous release of Claude Opus 4.6 and GPT-5.3 signals a paradigm shift: models are no longer just “smarter,” but become true agentic entities capable of planning and delegating complex tasks over long time horizons.
Takeaway 2: The evolution of benchmarks from simple pattern matching to evaluating complex generative (Banana) and strategic (Kaggle Game Arena) capabilities reflects the need to measure what now matters: long-term reasoning and social interaction capabilities.
Takeaway 3: The 1M token context and improvements in human-machine interface (speech-to-text, advanced OCR) are not accessory features: they are the fundamental enablers that make practical use of these agentic capabilities possible.
Action Items:
Experiment with Claude Opus 4.6 on long-duration tasks to verify task persistence.
Watch models play on Kaggle. Warning: it can become addictive :)
What’s happening this week?
As I mentioned in the introduction, the most important news of the week are undeniably the releases of the new models from Anthropic and OpenAI. Released just hours apart, they introduce significant advances across all their capabilities, especially in code writing and the ability to execute long tasks in an agentic manner. Additionally, Anthropic is experimenting with a much larger context window (up to 1M tokens), which while not an absolute novelty is very important because it’s currently one of the main limitations. The improvement we see on benchmarks is, while significant, not as spectacular as what could happen in releases 12/18 months ago, but this is natural at least regarding how benchmarks measure the performance of these models. The results we see in the field are even more significant because these models are acquiring evolved capabilities that allow them to break down and plan long tasks and assign them to multiple agents. I’ll return to this in the chapter on AI Assisted Coding.
And precisely this evolution in behaviors and capabilities to break down the problem leads to the need to test these models’ responses differently. Very interesting in this sense are both the new benchmark called Banana Benchmark (which essentially measures models’ ability to perform complex and long tasks in terms of time) and the idea of making models play strategically complex video games on Kaggle. Regarding the latter, I imagine many of you went with your mind to the finale of “War Games,” when they make the computer play against itself at tic-tac-toe. Beyond the cinematic reference, the idea itself is interesting and deserves to be explored, both on Kaggle for models and perhaps even between agents, to better understand how far the planning and strategy capabilities of these models can be pushed today.
Some news also from Europe and China on the human-machine interface with a speech-to-text model and an OCR model. Perhaps they’re less impressive, but if you’re attentive readers of this newsletter you know that the human-machine interface is one of the main components of the end-user experience and therefore steps forward in this area are always relevant.
Links of the week
Claude Opus 4.6 (12 min) Anthropic’s new flagship with improved agentic coding, longer task persistence and 1M token context window in beta. SOTA results on reasoning and coding.
Mistral Introduces Voxtral Transcribe 2 (3 min) Next-generation speech-to-text model with open weights, sub-200ms latency and accurate transcription in 13 languages at low cost.
GLM-OCR (Hugging Face) Multimodal OCR model for complex documents with CogViT visual encoder and lightweight connector for efficient token downsampling.
Banana Benchmark New benchmark from Peking University to evaluate LLMs on open generative tasks (creative writing, summaries, dialogue) with human annotations.
Kaggle Game Arena Updates (7 min) Google DeepMind expands Game Arena with Werewolf and poker to test social dynamics and risk management. Gemini 3 Pro and Flash dominate the chess leaderboard.
Agentic AI
Takeaways for AI Engineers
Takeaway 1: The battle of agents in companies is officially on: after Anthropic’s Claude Cowork, OpenAI launches Frontier. Both focus on integration with existing systems (no replatforming) and agent collaboration (A2A protocol), but the real winner will be whoever solves the security problem first.
Takeaway 2: “Context rot” is inevitable with current models: the subagent approach (and MCP as protocol) is the best pragmatic solution, embracing model limitations rather than fighting them.
Takeaway 3: Like e-commerce, the agent ecosystem needs a security stack: each layer handles what the others can’t. Companies building this infrastructure will have enormous opportunities.
Action Items:
Try Perplexity’s Model Council to compare strategic approaches between models.
Evaluate your agent security stack: what happens if an agent gets compromised?
What’s happening this week?
After the launch a few weeks ago of Anthropic’s Claude Cowork, OpenAI also provides its recipe for bringing agents into companies. Like Cowork, it integrates AI use with systems already in use in everyday work. Additionally with the promise of being more distributed across the company network, favoring collaboration between agents, a bit like the A2A protocol. These systems need to be tried (currently not available to everyone, but only to a limited set of users), because they will radically change the way of working in companies, at least as much as PCs did compared to paper.
Context rot and security are two fundamental themes when talking about agents. I’m reporting two articles on these topics that are worth reading. I also spoke about this in a panel at Voxxed Days Ticino last week. Remember that “with great power comes great responsibility”... even if you’re not Spiderman.
Perplexity’s Model Council is the production version of Karpathy’s experiment we talked about in December. As I said then, there are niches where it can be useful, so it’s good that a vendor decided to make it accessible to everyone.
Links of the week
OpenAI introduced Frontier (8 min) Enterprise platform to build, deploy and manage AI agents. Shares context across business systems, onboarding, learning and clear permissions.
Clawdbot’s Missing Layers (7 min) Like e-commerce, agents need a security stack. Each layer handles what the others can’t. Opportunities for infrastructure.
Context Management and MCP (10 min) Context rot is inevitable: the best solution is subagents. Pragmatic approach that embraces model limitations rather than fighting them.
Perplexity Model Council (6 min) Multi-model research that runs queries across several frontier AI models simultaneously, synthesizing outputs into a unified answer. Available for Perplexity Max.
AI Assisted Coding
Takeaways for AI Engineers
Takeaway 1: Claude wrote a C compiler in Rust and OpenAI states that 5.3-Codex developed itself. Agent coding capabilities have reached levels that challenge even the most radical skepticism.
Takeaway 2: Claude Code’s “agent teams” give each agent narrow scope and clean context: better reasoning, independent quality checks and natural checkpoints. It’s the architecture that solves the single-agent system limitation.
Takeaway 3: Coding is the main battlefield: Anthropic, OpenAI, Alibaba, Apple are all focusing here. Whoever wins on coding wins on agentic AI.
Action Items:
Enable CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in Claude Code and try teams of specialized agents.
Read Addy Osmani’s article on agent teams: the clearest documentation on how swarm architecture works.
What’s happening this week?
As I said in the first chapter, the models released by Anthropic and OpenAI have made great strides across all metrics, but it’s clear that the big companies are focusing at least at this moment on coding and planning capabilities. And from this come news that seems truly science fiction: Claude creating a C compiler written in Rust capable of compiling a Linux Kernel. Or OpenAI declaring that 5.3-Codex was developed by a preliminary version of itself. Taking these news and statements at face value (but obviously deniers will tell you they’re just invented to create hype), we are facing development capabilities of these models/agents that were absolutely unthinkable just a few months ago. At AGI level in this specific field. If you read the C compiler news well you’ll see that it has limitations, inefficiencies and dependencies that it could (and perhaps shouldn’t have)... but if I think well of the people I know who are capable of writing such a thing, I count them on the fingers of one hand... and I guarantee you that I know many very high-level Open Source developers around the world. Oh and none of those on my hand would have done it alone in those times... skeptics or not, think about it.
Going more practical, if you use Claude Code to develop, I invite you to read Addy Osmani’s article well, because agent teams are the main innovation at Anthropic in my opinion and deepening how they work can only improve your workflow.
How much coding is the main focus of all big tech companies at this moment can also be seen from the (re)race of the Chinese specifically on coding models on one side, and the adoption of SOTA models by Apple in its development tool.
Links of the week
Building a C compiler with a team of parallel Claudes (13 min) Multiple Claude instances in parallel built a C compiler in Rust for Linux kernel 6.9. $20K effort and 2000 sessions with reduced human supervision.
Claude in Xcode (1 min) Xcode 26.3 introduces native support for Claude Agent SDK: subagents, background tasks and plugins directly in Apple’s IDE.
GPT-5.3-Codex (11 min) Faster agentic coding model combining GPT-5.2-Codex coding performance with GPT-5.2 reasoning and professional knowledge. Used to find bugs in their own training runs.
Claude Code Swarms - Agent Teams (15 min) Agents working in parallel with narrow scope and clean context: better reasoning, independent quality checks and natural checkpoints. Enable with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.
Qwen3-Coder-Next for Agentic Coding (5 min) Alibaba’s open-weight model optimized for coding agents. Hybrid MoE architecture with strong performance on executable synthesis and RL-based interaction.
Business and Society
Takeaways for AI Engineers
Takeaway 1: The Super Bowl as cultural thermometer: Anthropic ran an advertisement during America’s most-watched event. While in Europe we question whether AI is a bubble, in the United States it’s already part of the cultural fabric, at the level of Coca-Cola and McDonald’s.
Takeaway 2: AI adoption is unprecedented in history: ChatGPT reached 100 million users in two months. No technology — cellphones, internet — has ever had such rapid uptake.
Takeaway 3: Claude remains ad-free: Anthropic declares it will not show ads in conversations. An interesting position on trust and integrity, contrasting with the aggressive Super Bowl presence.
Action Items:
Read the article with 10 charts on the AI era: in a few minutes it gives you a clear picture of how fast this technology is establishing itself.
Read the Hugging Face article on the open source ecosystem to understand the post-DeepSeek trajectory and how open artifact sharing is driving momentum.
What’s happening this week?
I’m reporting here some links to give a window on the business impacts that AI is having and continues to have. Beyond the smile that Anthropic’s Super Bowl ad brought me... the significant data for me is that while in Europe we ask if these companies are just a bubble (and frankly someone hopes so), one or more of them run an Ad at the Super Bowl. And it’s not just a question of costs, but also how popular that event is and part of a country’s culture... of which AI is part if not as much as Coca-Cola or McDonald’s, at least enough to advertise in a Super Bowl.
And OpenAI’s adoption data confirms this. But if you want to understand in a few minutes don’t miss the article with the 10 most significant charts on the topic, they give you a nice picture of how rapidly this technology is establishing itself.
Links of the week
Claude Will Remain Ad-Free (3 min) Anthropic announces that Claude will not show ads or sponsored content in conversations, to preserve trust and integrity in AI interactions.
10 Charts That Explain the AI Era (7 min) ChatGPT reached 100M users in two months, highlighting unprecedented adoption compared to past technologies like cellphones and internet.
Open Source AI Ecosystem (9 min) Explores the open source AI trajectory since the “DeepSeek moment,” highlighting long-term strategies and forecasting sustained momentum.
xAI Joins SpaceX (2 min) xAI joins SpaceX, integrating advanced AI research with aerospace engineering. Strategic merger between AI development and hardware/space exploration initiatives.
🔗 Learn more about me, my work, and how to connect: maeste.it – personal bio, projects, and social links.


