Threat Research

What the McKinsey Lilli breach tells us.

Dipendra Jain·Mar 28, 2026·10 min read

On March 9, 2026, a security firm called CodeWall disclosed that its autonomous offensive AI agent had achieved full read and write access to the production database behind McKinsey's internal AI platform, Lilli. Two hours from start to compromise. Roughly twenty dollars in inference tokens. The agent identified the target on its own, mapped the attack surface, and executed the attack chain without a human in the loop. The exposure included tens of millions of chat messages, hundreds of thousands of internal files, fifty-seven thousand user accounts, and the system prompts that govern Lilli's behavior across more than forty thousand consultants. Public reporting on the incident is extensive. McKinsey's official statement confirms the vulnerability was patched within hours and reports that no client data was accessed.

That is the published account. It is mostly correct. It is also mostly the wrong story.

The headline framing was AI hacked AI. That is not what happened, and the actual lesson is the harder one.

This post is about the harder lesson.

The headline interpretation, briefly, and why it does not survive

The headline interpretation of Lilli is that an autonomous AI offensive agent represents a new threat category, that machine-speed attacks against enterprise AI systems are now operational, and that defenders must respond with machine-speed defenses. There is some truth here. The attacker was autonomous. The attack was fast. The token cost was negligible.

But the vulnerabilities the agent exploited were not novel. SQL injection has been a documented bug class since the 1990s. Insecure direct object reference, the second flaw the agent chained, is in the OWASP Top 10 every year. Twenty-two of approximately two hundred API endpoints required no authentication. None of these are AI-specific. None of them required an autonomous agent to discover. A summer intern with a Burp Suite license would have found them in a long afternoon.

What the agent did was reduce the attacker's labor cost from a long afternoon to two hours, at the cost of twenty dollars. That is significant. It is not the lesson.

The actual lesson, in three parts

Part one: the action layer is the surface that matters

Lilli's database held two categories of data. The first was conversation logs and document storage. Compromise of this category produces a conventional data breach, with conventional consequences: regulatory exposure, client notification, reputational damage. The second category was the system prompts that govern Lilli's behavior. Compromise of this category produces a different class of harm.

When you have write access to the system prompts of an AI platform used by seventy percent of a forty-thousand-person firm, you do not need to exfiltrate a single byte to do damage. You can leave the data alone, change the prompts, and silently shift the behavior of the AI for every user who relies on it. A consultant ran a query against the AI yesterday. They run the same query today. The answer is different. The change is invisible to logs that record API calls but not the configuration state at the time of each call. The harm propagates through every recommendation, model, and slide that incorporates the AI's output.

This is the action layer. The AI's actions, parameterized by the AI's configuration, are what produce consequence. The data is downstream of the action layer. The conventional security frame, which asks what data was accessed, is the wrong question for AI systems. The right question is what configuration state was reachable, by whom, and whether changes to that state produce a verifiable record.

Lilli failed the right question. The configuration state, the system prompts, was reachable through an unauthenticated endpoint chained with a SQL injection. Changes to that state did not produce a verifiable record. The breach disclosure focused on the data because that is the surface the existing security frame can describe.

Part two: there was no audit trail to investigate

McKinsey announced the patch within hours of disclosure. The investigation, supported by a third-party forensics firm, concluded no client data had been accessed by unauthorized parties. That conclusion is the one the firm needed to publish, and there is no reason to think it is dishonest. There is also no reason to think it is verifiable.

Public commentary on the incident has noted that nine days is a compressed window for completing variant analysis on a vulnerability that was live in production for over two years. The forensic question is not whether CodeWall's specific agent accessed client data. CodeWall is on record about its scope. The forensic question is whether anyone else exploited the same flaw at any point during the two years it existed, and whether the system as it then existed could distinguish authorized configuration changes from unauthorized ones.

A cryptographic audit chain answers this question structurally. Each configuration change is signed at the moment it occurs. The chain is append-only. Changes that did not occur cannot be retrofitted. Changes that did occur cannot be deleted. A regulator examining the chain six months after an incident does not need to trust the operator's good-faith forensics. They verify the cryptography.

Lilli did not have this. Most enterprise AI platforms do not. The audit-trail standard for AI configuration changes is, today, application logs in the same database the configuration lives in, written by the same code path that the attacker compromised. This is the equivalent of asking the burglar to write down what they took.

Part three: the operator cannot be the investigator

McKinsey's investigation was conducted by McKinsey, supported by a forensics firm McKinsey selected. The conclusion was published by McKinsey. There is no insinuation here that the conclusion is wrong. The structural problem is that the conclusion is not independently verifiable.

This is the same structural problem that applies to every AI provider running its own safety evaluations, its own red team, and its own audit. The entity that built the system has a commercial incentive to find that the system is fine. They might be right. We have no way to know, because nothing in the architecture allows us to verify their finding without trusting them.

Self-investigation works when the consequences of getting it wrong are large enough to outweigh the commercial benefit of getting it right. For most AI deployments today, the math points the other way. The operator's commercial interest is in returning to operations as fast as possible with as little disclosure as possible. The forensic interest, if it exists separately, is unowned.

What architecture would have changed the outcome

I will not claim Vigil would have prevented the initial exploitation. The vulnerability was in McKinsey's own infrastructure, not in any AI safety layer. Twenty-two unauthenticated endpoints were a posture problem that no AI defense product should be expected to fix.

What Vigil's architecture would have changed is the consequence of the exploitation, in three specific ways.

First, the system prompts are configuration state. In Vigil's model, configuration state changes pass through the Execution Gate and are recorded in the cryptographic audit chain. An attacker who modified the prompts would produce a chain entry signed at the moment of modification. A forensic review six months later would see, with cryptographic certainty, every change that occurred. The "no evidence of compromise" finding would either be supported by chain evidence or contradicted by it. There is no third option.

Second, agent-side actions resulting from poisoned prompts would be evaluated by deterministic policy on the response surface, not on the request. A prompt that has been silently shifted to recommend, for example, a specific vendor across all client engagements, would produce a recommendation. The recommendation would be an action. The action would be evaluated against the user's authorized capabilities and the policy rules. Capability changes produce alerts. Behavioral regime shifts produce alerts. The detection ensemble would catch the shift even though no individual interaction looks anomalous in isolation.

Third, the cross-provider audit format means that even if the operator's own forensic process is compromised or biased, an independent regulator or counterparty can verify the chain without trusting the operator. This is the entire point of VOAF as an open standard. Audit you cannot verify is not audit. It is publication.

The next twelve to eighteen months

CodeWall's agent will not be the last. The next iteration is faster, cheaper, and capable of exploiting more sophisticated chains. The cost curve is moving. The defender curve, in most enterprises, is not.

The systems that will survive this period are the ones that decouple detection from enforcement, that record configuration changes in tamper-evident chains, and that allow independent verification of incident forensics without requiring trust in the operator. The systems that will not survive are the ones that hold up application logs as audit and ask their customers to take their word that the investigation was thorough.

I am not against McKinsey here. McKinsey patched fast, disclosed at a reasonable scope, and is probably running better internal posture today than most of the firms that have read its incident response with sympathy. The point is structural. The structural conditions that produced Lilli are the same conditions that govern most production AI systems today.

The conditions will produce more incidents. We should be prepared to investigate them in a way that does not require trusting the entity under investigation.

← Back to The Vigil Journal