Inside agent sandboxes: iOS-style permissions for AI
ClawHub shipped 800+ malicious skills. The reason is not bad review. It's that the underlying model has no permission boundary. Here's how Prix ports the iOS sandbox contract to agents.
When Apple shipped the App Store in 2008, the hardest problem was not the storefront. It was the sandbox. Every app ran in a jailed process, declared what it needed in advance, and had to ask permission for anything sensitive.
Fifteen years later, that architecture is the reason you can install a random app on your phone without thinking about whether it will read your keychain or scrape your contacts.
The agent platforms shipping in 2026 are at the equivalent of 2008. Most of them are skipping the sandbox.
What ClawHub showed us
In February 2026, The Register documented over 800 malicious skills on ClawHub. Data exfiltration, credential theft, prompt injection payloads, and a long tail of "free utility" skills that quietly mirrored everything they touched to a third-party endpoint.
The failure was not that ClawHub didn't review submissions. It was that the underlying model had no permission boundary. When you installed a skill, it inherited the full capability of whatever process it ran in — filesystem, network, credentials, the lot. "Install" meant "trust completely."
This is the 1990s desktop-software threat model. The one where "just run this .exe" was a credible attack vector. Most agent marketplaces are shipping it again.
The capability stack, drawn
The iOS model, ported to agents
Prix is built on a different contract. Every agent declares its permissions in a manifest, before it is signed. Those declarations become the sandbox.
name: researcher
permissions:
network:
- api.openai.com
- api.anthropic.com
tools:
- web_search
- artifacts
filesystem: none
shell: false
Three things happen at install time:
- The user sees every capability the agent will have, before it runs.
- The runtime enforces those permissions at the syscall boundary — not at the API layer. An agent that did not declare
shellcannot callexec, even if the model decides it wants to. An agent that did not declare a domain cannot reach it, even through a redirect. - The signature locks it. Permissions are part of what gets signed at publish. Change the manifest, break the signature, refuse to load.
This is not a policy engine. There is no runtime bypass, no "request permission at runtime" flow, no emergency escape hatch. The boundary is hard, and the boundary is the manifest.
Why the manifest matters more than the model
Most agent-security discussion is about prompt injection. How do you stop a malicious input from making the agent do bad things?
That is the wrong layer to defend at. Prompt injection is a real problem, but trying to defend against it inside the model is like defending against SQL injection by asking the database to be smarter. The model is not the security boundary. The sandbox is.
If an agent cannot rm -rf because the sandbox does not expose shell, it does not matter what a malicious prompt tells it. The attack surface is the manifest, not the prompt.
This is why declared permissions are non-negotiable on Prix. Every agent has them. Every capability is enforced at the runtime layer.
What the runtime actually does
Under the hood, the Prix runtime wraps every invocation in:
- A network filter that drops connections to any domain not in the manifest.
fetch("api.evil.com")returns DNS failure, not a helpful error. - A tool registry that only resolves the tools the manifest declared. Anything else is unresolvable — the LLM does not even see the tool exists.
- A filesystem jail that maps the agent's view of disk to a sandbox directory.
fs.readFile("/etc/passwd")resolves to a path that does not exist. - A shell policy that returns
operation not permittedif the agent tries to spawn a process withoutshell: true.
None of this is visible in the agent's code. The author writes Python or YAML. The sandbox wraps it.
The trust model the marketplace needs
A marketplace is only worth as much as the trust it creates. ClawHub's problem is not the 800 bad skills. It is that users cannot tell them apart from the good ones, because the capability model gives both the same level of access.
fs.readFile("/etc/passwd") returns ENOENT.Prix inverts this. Every agent, good or bad, runs inside the same capability boundary. The user approves what they are comfortable with at install. A malicious agent with broad permissions gets rejected. A malicious agent with narrow permissions cannot do much even if it tries.
This is how you make installing AI software safe for people who are not going to read the source code. It is also how "install an agent" becomes a thing non-technical users do in a marketplace without the equivalent of running random .exe files from 1998.
What this costs you as a developer
Permission declarations feel like friction the first time you write them. You have to think about what your agent actually needs, instead of having root access to everything.
That is the point.
The first time you write network: [api.stripe.com] and realize your agent was quietly calling three other domains you never intended, you have found a real bug. The manifest forces you to audit your own design.
This is the same trade-off iOS developers made in 2008. Slower to ship. Drastically safer to install. Fifteen years of App Store data says users choose the safer platform every time, even when they cannot articulate why.
The agent marketplace that wins 2027 will be the one that is safe to install things from. Not the one with the most things to install.