EXEC WEBINAR
Join Cobalt CTO Gunter Ollmann to learn the 90% Rule for Securing AI Ecosystems
EXEC WEBINAR
Join Cobalt CTO Gunter Ollmann to learn the 90% Rule for Securing AI Ecosystems

LLM System Prompt Leakage: Prevention Strategies

LLM system prompt leakage represents an important addition to the Open Worldwide Application Security Project (OWASP) Top 10 for LLM Applications for 2025, addressing risks stemming from sensitive information exposed in large language model prompts. 

Unchecked, this vulnerability can disclose sensitive information attackers can leverage for unauthorized access, remote code execution, or other attacks. 

In this blog, we'll explain how system prompt leakage works, illustrate some examples, and offer mitigation strategies and resources.

System Prompt Leaking Explained

This vulnerability emerges from exposure to sensitive information contained within LLM system prompts. 

Large language models and other artificial intelligence applications rely on system prompts to set parameters for handling user queries. LLM developers use system prompts to control LLM responses to the questions by defining response goals, context, behavior, constraints, and tone. 

For example, system prompts may define the purpose of responses as answering questions about shopping recommendations by matching customer profiles and shopping history with products purchased by customers with similar backgrounds. System prompts also provide safety guardrails preventing LLMs from providing misleading, dangerous, or illegal output.

System prompts provided by developers precede and incorporate any prompts provided by users. LLMs treat system and user prompts together as a single set of commands. In this way, system prompts set a framework for responding to user prompts. System prompts help steer query responses toward desired outputs and away from answers that are irrelevant, inaccurate, or unethical.

However, system prompts can become sources of undesired outcomes when they contain sensitive information hackers can access. Bad actors can exploit prompts that display information about system functionality, rules, filters, or permissions. For example, a system prompt may contain credentials for accessing a database which in turn contains customer credentials. A hacker might craft a query that exposes the prompt, setting the stage for theft of customer credentials from the database.

System prompt exposure can open the door for a variety of risks. These risks stem not from the system prompt exposure itself, but from the information leaked through the exposure. The nature of the information determines the range of risk. Potential vulnerabilities include disclosure of sensitive information, bypassing of system parameters, and granting of restricted access permissions to unauthorized users.

Cover image: Mastering Best Practices to Secure AI

Examples of System Prompt Leaking

To illustrate how system prompt leaking can occur, let's consider a few examples. Here are some scenarios involving:

  • Credential leakage
  • Guardrails leakage
  • System architecture exposure
  • Internal decision-making rules exposure
  • Filtering criteria exposure
  • Permissions and access rules disclosure

Credential Leakage

We've already touched on this scenario with the example of customer credential leakage above, but it's one of the most common system prompt leakage vulnerabilities, so let's consider another illustration, using an actual case involving drag-and-drop LLM development tool Flowise. Flowise lets developers build LLM flows that integrate with data loaders, caches, and databases. You can build an AI agent to automate tasks such as sending REST API requests. Flowise provides an SDK for integrating with external apps. Developers typically integrate Flowise's native tools with external services such as AWS Bedrock, Confluence, GitHub, or OpenAI API.

Security provider Legit reviewed 959 Flowise servers and found that 45% were vulnerable to an authentication bypass exploit (CVE-2024-31621) that used LLM system prompts to expose sensitive data. Changing the URL for REST API requests from lower-case to upper-case letters enabled access to the API without requiring authentication. A hacker could use this exploit to communicate with the server's REST API endpoints and retrieve sensitive data. Additionally, older versions of Flowise stored passwords and API keys as plaintext, exposing these as well.

Attackers can use similar methods to access and leverage other sensitive data through system prompt leakage. Let's consider a few other examples.

Guardrails Leakage

In this scenario, consider an LLM that has a system prompt providing guardrails that prohibit offensive content generation, external links, and code execution. After uncovering the system prompt, an attacker can use a prompt injection attack to bypass the guardrails and execute code remotely.

System Architecture Exposure

In this example, a system prompt provides system architecture information identifying the database used for the app. Armed with this information, the attacker can launch an SQL injection attack. For example, an SQL query can be constructed to display all records in the database.

Internal Decision-making Rules Exposure

In this scenario, consider a financial application with a system prompt defining transaction limits and loan amounts for users. An attacker with access to system prompts can change these values, enabling them to conduct transactions and loans involving higher figures.

Filtering Criteria Exposure

In this example, envision a military LLM with a system prompt preventing unauthorized users from accessing classified information. Attempting unauthorized access normally triggers an error message. By accessing and manipulating the prompt, a hacker can bypass security safeguards and exfiltrate classified data.

Permissions and User Roles Disclosure

In a final example, consider an LLM system prompt that contains information about the app's role structures and permission levels. For example, some roles may only have permission to read documents, while others may have permission to edit, upload, or delete documents. By accessing the system prompt, an attacker can launch a privilege escalation attack and start uploading malicious software to the system.

Mitigation Strategies to Reduce System Prompt Leakage

How can you secure your LLM app against system prompt leakage? Here are four keys to mitigation:

  • Segregate sensitive data from system prompts
  • Avoid reliance on system prompts for critical behavior control
  • Implement external guardrails
  • Enforce independent security controls

Segregate Sensitive Data from System Prompts

Sensitive data should not be included in system prompts. This includes:

  • Passwords
  • API keys
  • Auth keys
  • User roles
  • User permissions
  • Database names

To avoid exposing this type of data in system prompts, externalize it to systems your LLM model does not directly access.

Avoid Reliance on System Prompts for Critical Behavior Control

Allowing LLMs to control model behavior runs the risk of attackers altering behavior by using prompt injections. To mitigate this risk, whenever possible, use external systems to ensure desired behavior. For example, external systems can be used to detect and prevent harmful content.

Implement External Guardrails

Similar considerations apply to guardrails. While it's possible to train LLMs not to reveal guardrails, this does not guarantee the system will consistently abide by restrictions. To improve adherence to guardrails, use independent systems to review behavior for compliance.

Enforce Independent Security Controls

Likewise, independent controls provide better enforcement of security guidelines than reliance on system prompts. LLMs lack the deterministic, auditable properties necessary for effective security controls. They should not be entrusted to protect controls such as:

  • Privilege separation
  • Authorization bounds checks

Instead, delegate responsibility for enforcing these controls to independent systems. In applications where agents are performing tasks that require multiple access levels, deploy multiple agents where each agent is configured independently with least privileges specific to its intended task.

Prevent LLM System Prompt Leakage with Cobalt

LLM system prompt leakage potentially opens your application to the risk of sensitive data exposure, unauthorized access, and other attacks. Protecting your LLM application requires taking security measures against insecure system prompts.

LLM pentesting services for AI-enabled applications can help you secure your application against system prompt leakage and other vulnerabilities. Our expert pentesting team works with OWASP to develop and maintain mitigations against common LLM security risks. Our platform makes it easy for your security team to collaborate with our experts and rapidly deploy customized tests of your application's security. Schedule a demo to see how our pentesting services can help keep your LLM secure against system prompt leakage and other risks.

Back to Blog
About Gisela Hinojosa
Gisela Hinojosa is a Senior Security Consultant at Cobalt with over 5 years of experience as a penetration tester. Gisela performs a wide range of penetration tests including, network, web application, mobile application, Internet of Things (IoT), red teaming, phishing and threat modeling with STRIDE. Gisela currently holds the Security+, GMOB, GPEN and GPWAT certifications. More By Gisela Hinojosa