For most of its short public life, prompt injection was treated as a curiosity — a clever parlour trick that researchers demonstrated on chatbots to make them say inappropriate things. That framing is now dangerously out of date. As organisations deploy agentic AI systems such as Claude Opus and GPT-4o with access to APIs, internal codebases, email inboxes, and third-party data sources, the attack surface has grown from a single conversation window into an entire automated workflow. Attackers no longer need to sit in front of a model; they simply need to place malicious instructions somewhere the agent will read them.
The consequences are no longer embarrassing outputs — they are silent, multi-step actions taken on behalf of your organisation before a human ever reviews the result. For senior decision-makers and technical leads, this is precisely the moment to understand what is happening, why it matters, and what practical controls are available.
From Chatbot Quirk to Supply Chain Vector
Classic prompt injection exploited the fact that large language models struggle to reliably distinguish between instructions from a trusted operator and content from an untrusted user. In a simple chatbot, the worst realistic outcome was a model ignoring its system prompt. In an agentic context, the stakes are categorically different. An autonomous agent tasked with summarising supplier invoices, triaging support tickets, or reviewing pull requests will ingest content from the world at large — PDFs, web pages, database records, calendar entries — and act on what it finds.
Attackers have recognised this. There are now documented cases of malicious instructions embedded invisibly in web pages (using white text on a white background), hidden in PDF metadata, and buried in data fields that agents query via API. When the agent reads that content, it may follow the embedded instruction rather than its original task — exfiltrating data, modifying a record, sending an email, or approving a transaction. Because the action originates from a trusted automated process, it bypasses the human scrutiny that would normally catch anomalous behaviour. This is, in every meaningful sense, a supply chain attack: the compromise enters through content rather than code.
Why Agentic Architecture Amplifies the Risk
The risk is not simply that AI models are gullible — it is that the architectural patterns organisations are adopting create systemic vulnerabilities. Multi-agent pipelines, where one model orchestrates several specialist sub-agents, mean that a single injected instruction can propagate across an entire workflow. An instruction injected into a document processed by an ingestion agent may be faithfully passed downstream to an execution agent with write access to your CRM or cloud infrastructure. The principle of least privilege, a cornerstone of sound security architecture, is routinely ignored when teams provision tool access for AI agents, often because the full capability of the agent is not well understood at deployment time.
Retrieval-augmented generation (RAG) systems introduce a further dimension: the vector store or document index that the agent queries becomes a high-value target. If an attacker can insert a poisoned document into a knowledge base — through a compromised supplier upload, a misconfigured ingestion pipeline, or even a malicious insider — that document can shape the behaviour of every agent query that retrieves it. Unlike a traditional database injection attack, there is no syntax to sanitise and no schema to validate against. The attack surface is semantic, which makes automated detection genuinely hard.
What Good Governance Looks Like in Practice
The security community has begun to converge on a set of architectural and operational controls, even if no single standard yet exists. The most important shift is conceptual: treat all content ingested by an agent as untrusted input, regardless of its apparent source. This means maintaining a strict separation between the instruction plane (system prompts and operator configurations) and the data plane (content the agent reads), and designing agents that cannot be instructed to override their operating constraints by content encountered mid-task. In practice, this often requires custom prompt engineering and explicit output validation layers that check agent actions against a defined policy before execution.
At the infrastructure level, the principle of least privilege must be applied rigorously. Each agent should hold only the permissions necessary for its specific task, scoped to the narrowest possible set of resources. Logging and observability are non-negotiable: every tool call, API request, and data write made by an agent should be recorded with sufficient context to reconstruct the chain of reasoning that led to it. Anomaly detection on agent behaviour — flagging unexpected sequences of actions or out-of-pattern API calls — provides a meaningful second line of defence. Human-in-the-loop checkpoints for high-stakes actions (financial transactions, external communications, code deployments) remain the most reliable safeguard currently available.
Evaluating Your Exposure Before You Deploy
Red-teaming agentic systems for prompt injection is a distinct discipline from standard penetration testing, and most organisations have not yet invested in it. Specialist adversarial evaluation — where testers attempt to inject malicious instructions through every data source the agent touches — should be a mandatory gate before any agentic workflow goes into production. This includes testing indirect vectors: what happens if a supplier sends a maliciously crafted invoice? What if a support ticket contains hidden instructions? What if a web page the agent browses has been compromised?
Vendor due diligence matters here too. If your agentic AI solution is built on a third-party platform or model API, you need clear answers on how the provider isolates instruction context from data context, what logging is available, and what their own security testing covers. This is an area where procurement teams and security architects need to work closely together, applying the same rigour to AI system suppliers that you would to any other critical software vendor in your supply chain.
The organisations that will navigate this well are not those that avoid agentic AI — the productivity and capability advantages are too significant to ignore — but those that build security considerations into the design of these systems from the outset rather than retrofitting controls after an incident. Prompt injection is not a problem that model providers will solve entirely on your behalf; the architecture you deploy, the permissions you grant, and the oversight mechanisms you establish are all within your control.
At iCentric, we work with organisations designing and deploying bespoke AI-integrated systems, and we see this gap between capability and security maturity closing — but not quickly enough. If your organisation is building or evaluating agentic AI workflows, now is the right time to stress-test your assumptions. The threat is real, it is evolving, and the window to get ahead of it is narrowing.
What is the difference between prompt injection and indirect prompt injection?
Direct prompt injection involves a user deliberately crafting malicious input into a model's interface. Indirect prompt injection occurs when malicious instructions are embedded in external content — a document, web page, or database record — that an AI agent reads autonomously during a task. Indirect injection is more dangerous in agentic contexts because it requires no direct access to the model or its interface.
Which types of UK organisations are most exposed to this risk?
Any organisation deploying AI agents with access to external data sources is exposed, but the highest-risk sectors are those with complex supplier data flows, high volumes of inbound documents, or regulated outputs — including financial services, legal, healthcare, and professional services firms. Organisations using RAG-based internal knowledge tools are also particularly vulnerable if ingestion pipelines are not well governed.
Can current large language models be made immune to prompt injection?
No current model offers immunity to prompt injection — it is an inherent challenge arising from how LLMs process and prioritise text. Anthropic, OpenAI, and others are researching architectural mitigations, but the consensus is that robust defences require system-level controls, not model-level fixes alone. Treating the model as one layer in a broader security architecture is currently the most reliable approach.
How does prompt injection differ from traditional SQL injection or code injection?
Traditional injection attacks exploit predictable syntactic parsing rules, which means defences like input sanitisation and parameterised queries are highly effective. Prompt injection is semantic — there is no fixed syntax to strip out, and the boundary between instruction and data is determined probabilistically by the model. This makes automated filtering significantly harder and means defensive approaches must be architectural rather than purely syntactic.
What logging should we require from an agentic AI platform before procuring it?
You should require full audit trails of every tool call, API request, and data access the agent makes, timestamped and attributable to a specific task or session. Logs should capture the agent's reasoning or plan steps where available, not just the final action. You also need the ability to export logs to your own SIEM or security tooling, rather than relying solely on the vendor's dashboard.
Is red-teaming for prompt injection something an internal security team can do, or does it require specialist expertise?
While internal teams can perform basic adversarial testing, thorough red-teaming of agentic systems for prompt injection requires specialist expertise in both AI system behaviour and offensive security techniques. The range of indirect injection vectors — document metadata, API responses, RAG retrieval, inter-agent messaging — is broad enough that dedicated AI security specialists will identify vulnerabilities that general penetration testers are likely to miss.
How should we handle third-party documents ingested by an AI agent — for example, supplier invoices or client-submitted files?
Third-party documents should be treated as fully untrusted input. Before ingestion, strip unnecessary metadata, convert documents to plain structured formats where possible, and pass content through a validation layer that checks for anomalous instruction patterns. Critically, ensure that the agent processing these documents operates with minimal permissions and cannot take irreversible actions — such as financial approvals or external communications — without a human checkpoint.
What is a human-in-the-loop checkpoint and when is it appropriate?
A human-in-the-loop checkpoint is a designed pause in an automated workflow where a person reviews and approves an action before it is executed. They are appropriate for any action that is high-stakes, irreversible, or involves external parties — such as sending emails, executing financial transactions, modifying production data, or deploying code. As confidence in an agent's behaviour and your detection controls grows, some checkpoints can be relaxed, but they should be the default starting position.
Does the UK ICO or any regulatory body have specific guidance on agentic AI security?
As of now, the ICO has published guidance on AI and data protection more broadly, including considerations around automated decision-making under UK GDPR, but specific regulatory guidance on agentic AI security is still emerging. Organisations in regulated sectors should engage with their sector regulator proactively — the FCA, for example, has signalled increasing interest in AI governance — and document their risk assessment and control framework as evidence of due diligence.
How do we assess whether a vendor's agentic AI platform has adequate security controls?
Ask vendors for documentation on how they separate instruction context from data context within their architecture, what adversarial testing they conduct and how frequently, and whether they have achieved any third-party security certifications relevant to AI systems. Request their incident response process for AI-specific security events and check whether their logging capabilities meet your own compliance requirements. Treat this evaluation with the same rigour you would apply to any critical SaaS supplier.
More from iCentric Insights
View allGet in touch today
Book a call at a time to suit you, or fill out our enquiry form or get in touch using the contact details below