Dia is currently (as of last week) not vulnerable to this kind of exfiltration i...

simonw · 2025-08-09T18:15:37 1754763337

Given how important this problem is to solve I would advise anyone with a credible solution to shout it from the rooftops and then make a ton of money out of the resulting customers.

benlivengood · 2025-08-09T19:20:42 1754767242

I believe you've covered some working solutions in your presentation. They limit LLMs to providing information/summaries and taking tightly curated actions.

There are currently no fully general solutions to data exfiltration, so things like local agents or computer use/interaction will require new solutions.

Others are also researching in this direction; https://security.googleblog.com/2025/06/mitigating-prompt-in... and https://arxiv.org/html/2506.08837v2 for example. CaMeL was a great paper, but complex.

My personal perspective is that the best we can do is build secure frameworks that LLMs can operate within, carefully controlling their inputs and interactions with untrusted third party components. There will not be inherent LLM safety precautions until we are well into superintelligence, and even those may not be applicable across agents with different levels of superintelligence. Deception/prompt injection as offense will always beat defense.

simonw · 2025-08-09T19:35:21 1754768121

I loved that Design Patterns for Securing LLM Agents against Prompt Injections paper: https://simonwillison.net/2025/Jun/13/prompt-injection-desig...

I wrote notes on one of the Google papers that blog post references here: https://simonwillison.net/2025/Jun/15/ai-agent-security/

NitpickLawyer · 2025-08-10T09:29:18 1754818158

> CaMeL was a great paper

I've read the CaMeL stuff and it's good, but keep in mind it's just "mitigation", never "prevention".

Terr_ · 2025-08-10T08:35:11 1754814911

Find the smallest secret you can't have stolen, calculate the minimum number of bits to represent it, and block any LLM output that has enough entropy to hold it. :P

saagarjha · 2025-08-09T18:58:53 1754765933

Guys we totally solved security trust me

benlivengood · 2025-08-09T19:21:48 1754767308

I'm out of this game now, and it solved a very particular problem in a very particular way with the current feature set.

See sibling-ish comments for thoughts about what we need for the future.