Mckinsey's Lilli (Reportedly) Hacked

An autonomous agent reportedly breached McKinsey’s Lilli in hours, exposing a new weak point inside AI systems: the prompt layer.

Mar 12, 2026

The “Lilli hack” story is less about McKinsey — and more about your architecture

A blog post published on 9 March 2026 claims an autonomous offensive security agent gained read/write access to McKinsey’s internal AI platform, Lilli, within two hours—without credentials or insider knowledge. (codewall.ai)

If the report is accurate, the uncomfortable takeaway is not “AI is risky”. It’s more specific:

Most enterprise generative AI programmes are securing the model, but under-securing the system.

And in 2026, “the system” is bigger than an LLM endpoint. It’s APIs, storage, identity, retrieval pipelines, prompt configuration, workspaces, plug-ins, and the pathways that move internal documents into embeddings and back out into user-visible answers.

McKinsey has publicly described Lilli as a knowledge and synthesis platform that scans large internal repositories, returns summaries, and links users to relevant content and experts. The CodeWall post alleges that same kind of platform design exposed a classic web-app weakness (SQL injection) and something newer: the prompt layer as a crown-jewel asset.

That combination is the real story.

What the post claims happened (in plain terms)

According to CodeWall, their agent:

Found public API documentation with 200+ endpoints, and identified a subset that did not require authentication.
Exploited a SQL injection where JSON keys (field names) were concatenated into SQL, enabling database access.
Allegedly accessed large volumes of sensitive artefacts (messages, files, user accounts, workspaces), plus system prompts and RAG “document chunks”.
Highlighted a higher-impact possibility: because the database access was read/write, an attacker could potentially rewrite system prompts “silently” via an update statement—changing how the assistant behaves without a code deploy.
Reported a responsible disclosure timeline and claimed unauthenticated endpoints were patched quickly after notification.

Even if you ignore the dramatic numbers and focus on the mechanism, the pattern is familiar:

One “small” unauthenticated endpoint
One “old” bug class
One “new” blast radius because the application is an AI control plane for knowledge work

The risk: prompt integrity beats prompt safety

Most AI governance programmes obsess over prompt safety:

banned topics
jailbreaks
toxicity
policy refusal behaviour

Those matter, but they assume the prompts are yours.

The CodeWall post points at a different category: prompt integrity—whether the instructions, tool policies, and system routing logic can be altered without detection.

If a prompt is treated like “just configuration”, it often ends up:

stored in a database table
editable through an admin UI
changed without code review
not cryptographically signed
not monitored like production code

That’s fine until the prompt becomes the steering wheel for:

what internal sources are searched
what data can be surfaced
which tools can execute actions
what gets pasted into decks, emails, models, and client deliverables

At that point, prompt integrity is not “AI hygiene”. It is financial controls, reputation risk, and potentially market abuse risk if internal deal information can be surfaced or subtly shaped.

Why autonomous agents change the maths

Traditional attackers are bottlenecked by time and attention. Autonomous agents aren’t.

In the CodeWall narrative, the agent didn’t “run a scanner and stop”. It iterated—probing, learning from error messages, and escalating through multiple steps.

This matters because many enterprise teams still rely on:

periodic pen tests
checklist scanning
“we’ll harden it before broader rollout”

Agents flip this: your attack surface is tested continuously, cheaply, and creatively—including by actors who don’t need elite technical staff.

So the question becomes: are you building an AI platform that assumes clever, persistent, machine-speed enumeration?

The practical fixes (what to do on Monday)

Below is a hands-on control set that maps to how these platforms actually fail.

1) Make “no auth endpoints” a deploy-blocker

Create a release gate that fails builds if:

any endpoint is exposed without authentication (except explicitly allowed health checks)
API documentation is publicly reachable
staging/dev environments are reachable from the open internet

This is boring. That’s the point.

2) Treat the AI platform as a tier-0 system

If your genAI tool touches internal research, pricing, pipeline, M&A, or client work, it’s not a productivity app. It’s effective:

a knowledge vault
a document router
a synthesis engine
(increasingly) an action engine

Apply the same tiering as finance systems:

strong identity, conditional access, device posture
least-privilege service accounts
production data segregation
full audit logs with anomaly detection

3) Lock down retrieval and storage paths

In RAG systems, the soft underbelly is often:

object storage paths
embedding stores
metadata tables that reveal filenames and internal structure

Assume filenames and folder structures are sensitive. The CodeWall post explicitly claims that filenames alone were high-risk.

Practical move: implement token-scoped, time-limited download URLs and enforce per-user authorisation at download time (not “if you know the link, you can fetch it”).

4) Separate “chat logs” from “crown jewels”

Many organisations store:

chat messages in plaintext
document references and snippets
user identifiers
workspace and project names

That becomes a map of what the organisation is thinking about right now.

Minimum viable controls:

short retention by default (with explicit legal holds)
encryption at rest and field-level protection for identifiers
strict access policies for analytics queries (yes, even for internal teams)

5) Introduce prompt integrity controls (this is the new bit)

Do for prompts what you already do for code:

Version control: prompts are pulled from a repo, not edited live in prod
Change approval: mandatory review, named approver, ticket reference
Signing: prompts (or prompt bundles) are cryptographically signed; the runtime verifies the signature
Runtime attestation: the app emits a “prompt hash” with every response for auditing
Integrity monitoring: alerts on drift, unauthorised changes, or hash mismatches

If you only do one thing from this article, do this.

6) Add “model governance” and “system governance”

OWASP’s LLM Top 10 is useful because it forces teams to look beyond jailbreaks—towards things like insecure output handling and excessive agency.

In practice, governance should cover:

who can create assistants/workspaces
what tools can those assistants call
what data connectors can be attached
how outputs can be exported (copy/paste, file generation, email integration)

The risk is not just what the model says. It’s what the system lets it touch.

7) Red-team with agents, not just humans

Human red teams remain essential. But you should add an agent layer that:

continuously enumerates new endpoints
fuzzes inputs and schemas (including JSON keys)
checks auth boundaries and IDOR patterns
tests prompt injection paths via documents and tool outputs

The goal isn’t “AI vs AI theatre”. It’s continuous coverage of a fast-changing surface.

A simple board-level test: “Can we prove our AI hasn’t been silently steered?”

If your internal AI is used for decisions, you need to answer three questions crisply:

Can we prove which prompt bundle produced a given output?
Can we prove that the bundle wasn’t altered between approval and runtime?
Can we detect subtle behavioural drift within hours, not weeks?

If the honest answer is “not really”, then the prompt layer is currently a control gap—regardless of how good your model filters are.

The lesson

If the CodeWall account is directionally correct, the most important line is implicit:

Enterprise AI platforms are becoming the new operating layer for knowledge work — and they’re being defended like internal wikis.

The fix is not to slow down AI adoption. It’s to upgrade the security posture to match what these systems have become.

Because in 2026, “AI security” is no longer a niche discipline. It’s just enterprise security, pointed at a new control plane.

by Gonçalo Perdigão

Building Creative Machines

Discussion about this post

Ready for more?