OpenClaw — the open-source AI agent that hit 280K GitHub stars faster than any repo in history — had 245 vulnerabilities logged in its first three months. Its skill marketplace was 20% malware. Belgium’s national CERT issued an advisory. China restricted it at state banks.
The natural question: was this bad coding, intentional backdoors, or something else?
The security research consensus (Kaspersky, Microsoft, Oasis Security, Cato Networks) lands on architectural negligence. Not malice. The malware came from third-party attackers who flooded the marketplace; the core team patched critical bugs within 24 hours of disclosure. But the architecture made all of it possible.
The lethal trifecta
Simon Willison’s framing captures it cleanly. OpenClaw combined three properties that, individually, are fine — but together are explosive:
- Access to private data (messages, files, credentials)
- Exposure to untrusted content (13,700 community-uploaded skills, web inputs)
- Ability to act externally (send messages across 20+ platforms, execute commands)
Any two of three is manageable. All three with no circuit breaker is an open door.
The admin-level system access was on by default. Credentials sat in plaintext. The skill marketplace had no sandboxing. The local WebSocket gateway could be brute-forced by any website. Each individually fixable. But the architecture assumed trust everywhere, so attackers only needed to find one seam.
The circuit breaker principle
Compare this to Claude Code, which has the same theoretical trifecta — it reads your files, processes web content, and executes shell commands. The difference: a permission prompt sits between intent and action. Every file write, every bash command, every external call requires explicit human approval (unless deliberately bypassed).
One architectural decision — a synchronous human gate — collapses the attack surface from “any of 13,700 untrusted skills can exfiltrate your SSH keys” to “you have to click yes.”
This isn’t unique to AI agents. It’s the same principle behind sudo, UAC prompts, and OAuth consent screens. The novelty is that AI agents make the trifecta trivially easy to assemble. A weekend project can ship an agent with full disk access, internet connectivity, and a plugin ecosystem. The security architecture that took operating systems decades to develop gets skipped because the demo looks better without it.
What to evaluate
When assessing any AI agent — for your team, your organisation, or your own stack — the questions aren’t about specific CVEs. They’re structural:
- Does it have the trifecta? Data access + untrusted input + external action capability?
- Where’s the circuit breaker? Human approval? Sandboxing? Capability restrictions? At least one must exist.
- What’s the trust model for extensions? Open marketplace with anonymous uploads, or explicit installation of reviewed packages?
- What runs by default vs. what requires opt-in? Admin access on by default is a design choice, not a bug.
OpenClaw’s crisis wasn’t a failure of implementation. It was a failure to ask these questions before shipping. The code quality was fine. The architecture was the vulnerability.
The uncomfortable implication
The tools that feel most powerful — the ones that “just work” without asking permission, that connect to everything, that let anyone extend them — are the ones most likely to be architecturally unsafe. Friction is a feature. The prompt that slows you down is the same prompt that stops the malicious skill from dumping your keychain.
Every time you bypass a permission gate for convenience, you’re removing a circuit breaker from a system that might have the trifecta. That’s a trade-off worth making consciously, not one that should be the default.