The agent is a process, and it runs in a second-level operating system
🔗 Learn more about me, my work and how to stay in touch: maeste.it: personal bio, projects and social links.
A week where I go back to a theme that has been with me for months and that, with LINCE, I have ended up handling up close: what an agent really is, where its boundaries begin and end. In the deep dive I try to line up how the concept has evolved, from harness engineering to loop engineering, all the way to a thesis I care about: harness and loop are becoming the minimal unit through which we access agents, more a process running in a second-level operating system than an app with an LLM stuffed inside. I read it through the investments of recent weeks, from OpenAI acquiring Ona to Xiaomi with MiMo Code, all the way to the sandboxes from NVIDIA and LangChain. In the links section you will find the themes that frame the picture: Google pushing on local with DiffusionGemma and Gemma 4 QAT, the Claude Fable 5 saga (launched, leaked by Pliny and then suspended by a US directive), and two applications of AI to scientific research, Claude as a chemist and Codex simulating black holes. Enjoy.
My agenda
New interview: Roberto Stagi (Ratel AI) explains why an agent’s context does not saturate because of MCP servers, but because the tool index stays inside the model. Open source, open benchmarks. Listen. It also covers “agent anxiety”: the unease of not having an agent at work while you are having lunch at the beach, more common than we admit.
Saturday saw the release of “Writing code is a commodity: Fable and workflows”, where I hand Fable a multi-language task and it brings it home overnight with 40 agents in parallel, zero-shot. From there: loop engineering and Anthropic’s article “When AI builds itself”. Episode
Our projects Lince.sh and AntiVocale (Google Play, GitHub), by now you know them well.
On my own:
I was in Catania as a speaker at Coderful, one of the best organized conferences with the best content I have come across recently. You can find my slides here; as soon as it is available, you will find the video there too.
On June 24 I will be in Milan as a speaker at AIConf.
Harness and loop: the new minimal unit of agentic AI
Part of my work over these months, especially on LINCE, has been trying to give a definition of an agent: where it begins, where it ends, where its boundaries run. I am not here to tell you the details of that project, but one thing that work forced me to do is to look closely at how the very concept of an agent, and of its boundaries, has evolved in recent months. It is a story worth telling, because I believe it is changing the basic unit we reason with when we talk about agentic AI.
The starting point is something I have been saying for weeks: to build agents, great models and a few tools are not enough. The harness, that is the scaffolding around the model, is becoming the central piece, and how you couple the model to the harness matters as much as the model itself. The skills of those who work seriously on these systems have shifted accordingly: before we talked about harness engineering, and recently the community (Boris, the creator of Claude Code, first of all) has started calling it loop engineering. Behind these two terms there is a precise idea: beyond the context you give the LLM, there is a lot more to take care of.
Harness engineering adds, to context curation, the ability to define the limits within which we want the agent to move, the harness itself. It means giving it a sandbox, some evals, a way to verify its own work. By adding these boundaries, the agent can move with more autonomy and handle longer, more complex tasks. Loop engineering goes a step further: if we want even greater autonomy, we also have to define the limits of the loop within which the harness cycles to reach the result. A loop is made of an initial state, an event that starts it, a goal to reach, a set of consolidated behaviours (the skills), a working state (the memory) that keeps track of what has been done and what remains to be verified, and decision mechanisms to figure out whether to continue or whether the goal has been reached.
The distinction, if you want a compass, is this: the harness defines the space in which the agent can move, what it is allowed to do and what it is not; the loop defines the time and the decision, how many times to repeat and when to stop.
Putting the two together, what we call an agent starts to look more and more like a process running inside an agentic operating system, a kind of second-level operating system, in which the harness defines the limits and the loop manages the processes. And this bond between LLM, harness and loop is defining a new minimal entity: a unit of work that we can move onto the machine, the network, the cloud. Not a microservice like the web or REST ones we are used to, but something closer to a pod.
Let me explain with an analogy I am fond of. When I interface with a database using SQL, I take for granted that logging, writing to disk and transaction management are handled by the server, without my having to plug them in every time. A database server writes to disk and keeps transactions, full stop. In the same way, when I interface with an agent (that is, with its harness and its loop), I take for granted that it has skills, evals, a sandbox. Thinking of evals, sandboxes or guardrails as pieces to stick on top of some code that talks to an LLM is a view that holds up less and less: those pieces are an integral part of the unit we work with.
And that is exactly what the investments of recent weeks tell us. You no longer build an application with the LLM tucked inside as a tool, which is what LangChain, LlamaIndex and the others did a couple of years ago. You build on top of the agent, understood as LLM plus harness plus loop, treating it as the system in which to make your AI-native applications live. OpenAI acquired Ona (formerly Gitpod) precisely to give Codex secure, preconfigured cloud environments and to orchestrate persistent, long-running tasks. Xiaomi released MiMo Code, an open source coding harness that, by their account, holds up over sequences beyond 200 steps with a persistent memory entrusted to subagents (self-reported numbers, I take them with a grain of salt, but the direction is that one). NVIDIA published SkillSpector to analyze agents’ skills for vulnerabilities before installing them. And even LangChain now offers hardware-isolated microVMs to give each agent its own dedicated computer.
It is a bit like writing your own app in HTML5. Underneath there are layers upon layers (the browser doing the rendering, the JavaScript engine, HTTP, TCP/IP, the sockets), each with its own mechanisms for security, tracing and verification, and each of which we take for granted. Nobody plugs in the sockets by hand. The harness and the loop are becoming that kind of layer: in a word, they are the new minimal entity through which we access agents.
The links that caught my eye this week
Local, even at Google
DiffusionGemma is a 26B MoE that generates blocks of text in parallel with textual diffusion, up to 4x on GPU. Gemma 4 QAT brings quantization-aware checkpoints to run on mobile and laptop without losing quality.
DiffusionGemma and Gemma 4 QAT confirm the local trend I was already talking about in recent weeks: there is more and more attention on local models, even from Google. The two pieces of news, by the way, attack the two real bottlenecks of inference at home: latency, with parallel block generation, and memory, with quantization. The right applications for smaller models still need to be found, but the growth of hardware will lead to increasingly powerful models that can run locally. And given that we are starting to see governments, like the US one, restricting the use of powerful models, I believe we might soon need it for real.
The Fable 5 saga
Anthropic launches Fable 5 and Mythos 5, then a US export control directive suspends access over a possible jailbreak. Pliny leaks the system prompt, while Anthropic withdraws the policy that quietly degraded researchers’ requests.
Fable 5 is, or rather was, since the US government banned its use, a bombshell. I talk about it in last Saturday’s podcast: its ability to handle very complex tasks is genuinely remarkable. I could have made it the deep dive, on this and on all the news it generated, from researchers railing (rightly) against overly strict limits on LLM work, then loosened by Anthropic walking back, all the way to the US government banning its use. I deliberately did not make it the deep dive, because I believe we will see more plot twists in the coming days and it seems too early to draw a synthesis. Meanwhile Pliny (a well-known name in the hacking world) leaked the system prompt, and apparently using that system prompt on Opus yields better results than the vanilla version of Opus 4.8. I have not tried it yet, but it seems like further confirmation of how the so-called in-context learning phase can no longer be overlooked: if a well-crafted system prompt shifts the results of an already powerful model like Opus 4.8, it means much of the value lies not just in the weights, but in how the harness sets up the context. And that is exactly the thread of this week’s deep dive.
AI enters the lab
Claude predicts NMR spectra matching ChemDraw and MestReNova and proposes molecular structures from spectral data. Astrophysicist Chi-kwan Chan uses Codex to refine simulations of plasma and particles around black holes.
These are both applications of AI to computational scientific research, perhaps the next frontier where we will see agents perform the feats we now see on code. It is no coincidence that coding was the first domain to take off: like code, scientific simulation is an environment where verification, however hard, remains manageable, because a computation either matches the data or it does not. And those who have been reading me for a while will have learned that it is precisely where results are verifiable that agents give their best.


