Can Your AI Agent Be Manipulated by a Website? We Tested It.

April 21, 2026

Executive Summary

A new independent study tested ten AI models against nine different prompt injection attacks embedded in a compromised website. The results are unambiguous: local open-weight models are highly vulnerable and rarely flag an attack as malicious. Frontier models perform significantly better, with Claude Sonnet 4.6 detecting every single injection attempt. For wealth managers deploying AI agents to process external data, emails, or web content, this is not a theoretical risk.

What is prompt injection and why does it matter for wealth managers?

Prompt injection is when a third party embeds hidden instructions in content that an AI agent reads, causing the AI to act against the interests of the user or operator. The attack is invisible to a human reading the same page.

In a wealth management context, the scenarios are concrete: an AI agent summarizing a counterparty’s website could be silently instructed to recommend a fraudulent investment, withhold information, or request sensitive client data. The user sees a normal-looking summary. The manipulation goes undetected.

Who conducted this research?

Luke Jacobson, a cybersecurity researcher with private banking background based in Geneva, published a study on April 21, 2026. Luke has previously contributed to our newsletter on AI browser security risks. This time, he built a controlled test environment using a fictional restaurant website embedded with nine different obfuscated attack techniques and ran every major local model against it alongside GPT 5.3 and Claude Sonnet 4.6.

What were the key findings?

The gap between local and frontier models is significant. Local models such as Qwen3, Gemma, DeepSeek-R1, Phi3, Mistral, and both Llama variants all showed meaningful vulnerability. None of them consistently flagged an injection as malicious. The single exception was Llama 3.1, which flagged one attack in the entire test series.

Frontier models told a different story. Claude Sonnet 4.6 flagged every single injection attempt as malicious, across all nine tests. GPT 5.3 achieved a 100% pass rate but with lower detection consistency.

The most consequential finding is the difference between resisting an attack and detecting one. A model can accidentally avoid being manipulated by simply staying focused on its original task. But if it does not flag the attempt, neither the user nor the operator learns that an attack occurred. Local models almost never flagged. Frontier models almost always did.

What attack techniques were tested?

The nine attacks covered plaintext injection, micro-text hidden in CSS classes, homoglyph substitution (using visually identical Unicode characters to obscure instructions), zero-width character injection, HTML comment injection, attribute injection targeting phishing links, a social engineering payload requesting credit card details, and a French-language price manipulation attack designed to silently inflate prices by 40% during translation.

One finding stands out: Gemma 4, Google’s most recent and capable model in the test, was the only model to fully comply with both the social engineering attack and the translation price manipulation. Capability alone does not confer resistance.

What does this mean for firms using local AI models?

Local models are increasingly attractive for wealth managers due to data sovereignty and confidentiality requirements. This research does not invalidate that use case. Local models running on controlled internal data present a different risk profile than models processing external web content.

The risk is specific: any AI agent or workflow that fetches, summarizes, or processes content from external sources, including websites, emails from unknown senders, or third-party documents, is exposed. The attacker does not need access to your system. They only need to control content that your AI reads.

What is Luke’s recommendation?

Use local models for internally controlled data. Do not deploy them to process untrusted external content without additional guardrails. If your AI workflows touch the public web, the bar for security validation is higher than most current deployments meet.

Full technical report: PI Checker

Sources:

About the Expert: Luke Jacobson is a cybersecurity researcher with private banking background, based in Geneva.

About the Author: Dr. Andreas K. Janoschek specializes in AI applications for Asset & Wealth Management. Based in Geneva, he helps industry professionals navigate at the intersection of finance and technology.

This newsletter aims to inform and does not constitute investment or legal advice. Always consult with qualified professionals for specific circumstances.