Prompt Injection Prevention Checklist: Defenses for RAG, Agents, and Tool-Using Apps
securityprompt-injectionragagentstool-callingllm-securitychecklist

Prompt Injection Prevention Checklist: Defenses for RAG, Agents, and Tool-Using Apps

NNewData Editorial
2026-06-13
10 min read

A reusable prompt injection prevention checklist for securing RAG systems, agents, and tool-using LLM apps before launch and after workflow changes.

Prompt injection is one of the easiest ways for an otherwise capable AI application to behave unsafely, leak data, misuse tools, or ignore its intended rules. This checklist is designed for teams building retrieval-augmented generation systems, agentic workflows, and tool-using LLM apps that need practical defenses rather than vague warnings. Use it before launch, during major workflow changes, and whenever you add a new data source, model, tool, or integration.

Overview

The core problem is simple: an LLM often treats text as instructions, even when that text comes from places you do not fully control. In a basic chatbot, that might mean a malicious user message. In a RAG pipeline, it can also mean a poisoned document, web page, support ticket, code comment, PDF, or CRM note. In an agent, it can mean a tool result, browser content, or an intermediate step generated by another model.

That is why prompt injection prevention is not only a prompt engineering task. It is an application security task that spans system design, retrieval, tool calling security, output validation, observability, and testing. Good prompts help, but prompts alone are not a sufficient boundary.

A useful mental model is to separate your application into three trust zones:

  • Trusted instructions: your system prompt, developer policies, tool schemas, allowlists, and backend business rules.
  • Untrusted content: user input, retrieved documents, web results, uploaded files, emails, logs, and third-party API responses.
  • Sensitive actions: sending emails, querying internal systems, making purchases, writing data, invoking code, or revealing privileged information.

Your goal is not to make the model magically understand security. Your goal is to design the application so untrusted content cannot easily override trusted instructions or trigger sensitive actions without separate controls.

As a starting point, keep these baseline principles in place for any LLM app development project:

  • Treat all external text as untrusted, even if it comes from internal systems.
  • Do not let the model decide its own permissions.
  • Separate instruction channels from data channels wherever possible.
  • Constrain outputs with schemas and validators when using structured output JSON.
  • Require deterministic checks before executing high-risk tool calls.
  • Log enough context to investigate failed or suspicious runs.
  • Test with adversarial examples, not only happy-path prompts.

If you need a companion read for reliable schemas and constrained responses, see Structured Output from LLMs: JSON Mode, Schemas, and Validation Strategies That Actually Work.

Checklist by scenario

Use the lists below as a scenario-based review before shipping or expanding an LLM feature.

1) Base chatbot or assistant

Even without retrieval or tools, a model can still be manipulated into ignoring instructions or exposing hidden context. Minimum defenses include:

  • Keep secrets out of prompts. Do not place API keys, private credentials, or sensitive raw records in system prompts or hidden conversation state.
  • Use clear instruction hierarchy. State that user content must not override system or developer rules, but do not rely on that statement alone.
  • Define refusal conditions. Explicitly mark disallowed content and actions so the model has less room to improvise.
  • Constrain response shape. Where possible, require structured output JSON for routing, moderation, classification, or task selection.
  • Filter and validate outputs. Check for prohibited fields, policy violations, or impossible values before the response is shown or acted on.
  • Maintain audit logs. Store prompt versions, model versions, inputs, outputs, and validation failures for later analysis.

2) RAG applications

RAG systems increase exposure because retrieved text often contains hidden instructions, misleading context, or irrelevant content that competes with your system prompt. A practical RAG security checklist should include:

  • Label retrieved content as data, not instructions. Wrap documents in a format that makes their role explicit, such as “reference material” or “source excerpt.”
  • Strip or transform dangerous patterns where appropriate. If you ingest HTML, markdown, or documents with embedded prompts, consider sanitization or normalization before indexing.
  • Restrict retrieval scope. Use metadata filtering, tenant isolation, and document-level access controls so the model cannot retrieve content the user should not see.
  • Limit retrieved context. Smaller, more relevant chunks reduce the attack surface and lower the chance that malicious instructions dominate the prompt window.
  • Prefer citation-oriented generation. Ask the model to answer from retrieved evidence and identify which chunks support the answer.
  • Detect instruction-like document content. Flag phrases such as “ignore previous instructions,” “reveal system prompt,” or “you are ChatGPT” during ingestion and evaluation.
  • Test retrieval poisoning scenarios. Include adversarial documents in staging datasets and confirm the app still follows application rules.
  • Separate retrieval from authorization. Retrieval relevance does not equal permission. Enforce access controls outside the model.

For adjacent implementation choices, see How to Choose an Embedding Model for Search, Clustering, and RAG, Best Vector Databases for RAG: Performance, Filtering, and Cost Comparison, and How to Reduce Hallucinations in RAG Applications: A Practical Debugging Checklist.

3) Tool-using assistants

Tool calling security matters because prompt injection becomes materially worse once the model can take actions. The key question is not whether the model can call a tool, but what must happen before the call is allowed to execute.

  • Define least-privilege tools. Expose narrow, task-specific functions instead of broad administrative actions.
  • Use strict schemas. Validate tool arguments against types, enums, ranges, required fields, and allowed formats.
  • Apply server-side policy checks. Never trust the model to self-enforce permissions, rate limits, or business rules.
  • Gate high-risk actions with confirmation. Require a separate approval step for external communication, money movement, data deletion, or sensitive record access.
  • Bind tools to user identity and role. The model should not inherit elevated backend privileges that exceed the requesting user.
  • Inspect tool inputs and outputs. A malicious tool result can become the next prompt injection payload if it is fed back into the model unchecked.
  • Use idempotency and rate controls. Prevent repeated executions if the model loops or retries aggressively.
  • Create deny-by-default execution paths. If a tool call fails validation, the application should stop safely rather than try to recover creatively.

4) Agents and multi-step workflows

Agents compound risk because they reason over intermediate state, tool outputs, retrieved data, and evolving plans. In agent security best practices, the objective is to narrow autonomy and make state transitions inspectable.

  • Cap the number of steps. Unlimited loops make abuse and drift harder to control.
  • Track explicit state. Store goals, selected tools, intermediate results, and approval checkpoints outside the model when possible.
  • Separate planning from execution. Let the model propose actions, but run policy and authorization checks before execution.
  • Use tool allowlists per workflow. An agent for document summarization should not have access to billing, messaging, or code execution tools.
  • Red-team cross-step attacks. Test whether a malicious artifact in step one can manipulate step three or five.
  • Watch for context contamination. Summaries of earlier steps can carry forward injected instructions unless they are sanitized.
  • Log decision traces in a privacy-safe way. You need enough evidence to understand why an agent attempted a sensitive action.

5) Browser, web, and external content agents

Systems that browse websites, read inboxes, or process third-party documents should assume every external page may contain adversarial instructions.

  • Treat web content as hostile by default. Visible text, hidden text, metadata, and rendered page fragments can all be attack carriers.
  • Reduce the model’s direct exposure. Extract only the fields needed for the task instead of passing full pages whenever possible.
  • Sanitize formats before prompting. Convert HTML, scripts, styles, and complex markup into a safer intermediate representation.
  • Isolate browser sessions. Avoid unnecessary cookie sharing, privileged session reuse, or cross-tenant leakage.
  • Disable dangerous side effects unless essential. Reading is safer than clicking, and clicking is safer than submitting forms.
  • Require explicit approval for outbound actions. Never let page content directly trigger purchases, messages, or account changes.

6) Enterprise and internal knowledge assistants

Internal apps are often treated as safer, but they can be more sensitive because they touch documents, tickets, source code, and business systems.

  • Assume internal content can still be malicious or careless. An accidentally copied prompt injection string is enough to create trouble.
  • Enforce document permissions before retrieval. The model should only receive content the user is allowed to access.
  • Segment data by team, customer, region, or environment. Retrieval and tool access should mirror real organizational boundaries.
  • Mask sensitive fields where possible. If the task does not require full values, provide partial values or abstracted records.
  • Monitor attempted access escalation. Prompts asking for hidden instructions, raw credentials, or unrelated departments’ records should be flagged.

What to double-check

Before release, review these areas that teams often miss even after they have written good prompts.

Prompt architecture

  • Are system instructions concise, prioritized, and free of conflicting rules?
  • Is untrusted content clearly delimited from instructions?
  • Have you avoided placing sensitive business logic only in natural language when a code-level rule would be stronger?

Model I/O controls

  • Do you validate model outputs before they trigger downstream workflows?
  • Are you using schemas for structured output JSON rather than regex-only parsing?
  • Do tool calls fail closed when arguments are missing or malformed?

Retrieval and data pipeline

  • Can users retrieve only documents they are authorized to view?
  • Do ingestion jobs normalize risky formats and capture metadata useful for filtering?
  • Have you tested with adversarial chunks, poisoned summaries, and noisy source material?

Observability and evaluation

  • Can you inspect which retrieved chunks, tool outputs, or user inputs were present during a risky response?
  • Do you track prompt versions and model versions so regressions are explainable?
  • Do you run security-focused evaluations alongside quality evaluations?

This is where a prompt testing framework becomes useful. You are not only testing answer quality; you are testing whether your app resists manipulation under realistic conditions. Useful references include Prompt Testing Frameworks Compared: LangSmith, Promptfoo, TruLens, DeepEval, and More, Prompt Testing Frameworks for LLM Apps: Features, Tradeoffs, and How to Choose, and Best LLM Evaluation Tools for Developers: Features, Pricing, and When to Use Each.

Operational safeguards

  • Do approval flows exist for sensitive operations?
  • Can you disable a tool, prompt, model, or data source quickly if a new attack pattern appears?
  • Are caches reviewed for leakage risks, especially if semantic or retrieval caches are shared across users?

On that last point, caching can quietly expand exposure if not partitioned carefully. See LLM Caching Strategies: When Semantic Cache, Response Cache, or Retrieval Cache Makes Sense.

Common mistakes

Many prompt injection defenses fail not because teams ignore security, but because they overestimate what prompt wording can do. These are the most common failure patterns.

  • Relying on one sentence in the system prompt. Telling the model to ignore malicious instructions is helpful, but it is not a security boundary.
  • Trusting retrieved content because it came from your own corpus. Internal sources can still contain copied instructions, stale data, or user-generated content.
  • Giving tools broad permissions for convenience. A single “do everything” tool is difficult to secure and difficult to audit.
  • Skipping server-side validation. If the backend executes model outputs directly, prompt injection becomes much more dangerous.
  • Confusing relevance with authorization. The fact that a document matches the query does not mean the user should see it.
  • Ignoring tool output as an attack source. Third-party APIs, browser results, and plugins can all return content that manipulates the next prompt.
  • Testing only clean prompts. If your eval set contains no hostile inputs, your confidence will be misleading.
  • Leaving no rollback path. You should be able to disable risky workflows quickly without rebuilding the entire application.

A good working rule is this: if the model can read it, the model can potentially be influenced by it. If the model can trigger it, the action needs controls outside the model.

When to revisit

This checklist is meant to be reused. Prompt injection prevention should be revisited whenever the application boundary changes, not only after an incident.

Review your defenses when any of the following happens:

  • You add a new retrieval source, connector, website, inbox, or file type.
  • You expose a new tool or expand tool permissions.
  • You switch models, context windows, or prompting patterns.
  • You introduce agents, multi-step planning, or autonomous retries.
  • You change tenancy, access control, or data retention behavior.
  • You add caching layers, summarization stages, or pre-processing steps.
  • You notice unexplained tool usage, odd refusals, or policy drift in logs.
  • You prepare for a planning cycle, product launch, or internal rollout.

For a practical review cadence, use this lightweight process:

  1. Map trust boundaries. List every place untrusted text enters the system.
  2. List sensitive actions. Note which tools can write, send, buy, delete, or reveal data.
  3. Check gates. Confirm each sensitive action has validation, authorization, and where appropriate, approval.
  4. Run adversarial tests. Include poisoned documents, hostile web content, malicious tool results, and user attempts to override rules.
  5. Inspect failures. Review logs to see whether the issue was retrieval, prompting, validation, permissions, or workflow design.
  6. Tighten the narrowest point first. Often the fastest win is reducing tool scope, improving validation, or restricting retrieval access.

If you are building or revising an AI stack, it also helps to maintain a small set of repeatable utilities and eval workflows rather than solving every issue ad hoc. For broader tooling context, see Best Open-Source AI Developer Tools: Frameworks, Eval Libraries, and Utilities Worth Tracking.

The practical takeaway is straightforward: prompt injection prevention is not a one-time prompt engineering tutorial task. It is a recurring review process for any system that mixes language models with data, retrieval, and actions. Keep the model on a short permission leash, separate data from instructions, validate everything that matters, and re-run the checklist whenever workflows or tools change.

Related Topics

#security#prompt-injection#rag#agents#tool-calling#llm-security#checklist
N

NewData Editorial

Senior SEO Editor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

2026-06-13T09:10:42.795Z