FAST TRACK
See our Fast Start promotion and start your first pentest on The Cobalt Offensive Security Testing Platform for only $4,950.
FAST TRACK
See our Fast Start promotion and start your first pentest on The Cobalt Offensive Security Testing Platform for only $4,950.

AI Penetration Testing: Securing LLM-based Systems against Artificial Intelligence Vulnerabilities

As artificial intelligence and large language model adoption accelerates, AI penetration testing has become increasingly critical. 

While 82 percent of C-level executives say their business success depends on secure AI, just 24 percent of generative AI projects have a security component, a survey by AWS and IBM found. Many organizations currently limit GenAI use because of security concerns, and 27% have a ban in place, a Cisco survey shows. To realize the business value of AI and remain competitive, brands are depending on security teams to establish new security standards.

In this blog, we'll provide an introduction to AI penetration testing and how it can help protect brands against emerging vulnerabilities. First, we'll look at AI and LLM from a pentester perspective and identify some of the most prevalent vulnerabilities security teams are finding in these technologies. Then we'll share the testing methodology Cobalt uses to identify AI and LLM risks so they can be assessed and addressed.

Pentester Perspective: Pentesting for AI Applications and LLM-enabled Software

Today's AI and LLM applications may be new, but they typically use popular languages such as Python and JavaScript that are familiar to attackers. Traditional vulnerabilities remain prevalent in AI applications and LLM-enabled software, the Open Worldwide Application Security Project emphasizes.

To this point, we asked a member of the Cobalt Core about the prevalence of vulnerabilities within AI-enabled applications. Parveen Yadav, a contributing member of the OWASP LLM Top 10, says from his experience testing AI-enabled applications that, "AI Safety is a paramount concern...such as the ability of LLMs to perform actions beyond their intended scope such as allowing them to run system commands or queries beyond its intended requirements." The emergence of LLM-specific vulnerabilities underscored the necessity for a new OWASP Top 10 category, providing crucial guidance for developers and security professionals.

The OWASP Top 10 for Large Language Model Applications has identified some of the most prevalent vulnerabilities as prompt injections, data leakage, inadequate sandboxing, and unauthorized code execution. The full list of leading vulnerabilities includes:

  1. Prompt injections
  2. Insecure output handling
  3. Training data poisoning
  4. Model denial of service
  5. Supply chain vulnerabilities
  6. Sensitive information disclosure
  7. Insecure plug-in design
  8. Excessive agency
  9. Overreliance
  10. Model theft

SANS AI Survey Report 2024 Cover Image

Here's a brief breakdown of how each vulnerability manifests in AI and LLM applications:

1. Prompt injections

In prompt injection attacks, attackers misuse AI's ability to interpret natural language by issuing malicious instructions, disguised as legitimate prompts. Prompt injections can directly expose or overwrite underlying system prompts (jailbreaking), compromising backend systems and enabling access to insecure functions and sensitive data. External sources such as files and websites can indirectly enable attackers to hijack LLM conversations without alerting human users. Prompt injections can be used to query sensitive data, influence critical operations, conduct social engineering attacks, or perform other malicious actions.

Mitigation strategies include:

  • Enforcing privileges on backend access
  • Requiring human approval
  • Segregating external content from user prompts

2. Insecure output handling

This vulnerability arises when LLM output gets passed on to other system components without sufficient validation, sanitization, and handling. This can trigger cross-site scripting (XSS) and cross-site request forgery (CSRF) in browsers and server-side request forgery (SSRF), privilege escalation, and remote code execution in backend systems.

Mitigation strategies include:

  • Validation and sanitization procedures
  • Output encoding

Read more about insecure output handling

3. Training data poisoning

Bad actors can introduce tampered raw text into the data machine learning programs ingest for training. This can introduce security vulnerabilities, backdoors, and performance biases. Poisoned data can enter systems from sources such as Common Crawl, WebText, OpenWebText, and books.

Mitigation strategies include:

  • Verifying training data supply chains
  • Implementing network controls to strengthen sandboxing
  • Applying data filters

4. Model denial of service

Attackers can overload LLMs with input that triggers high resource consumption. For instance, a query that triggers recurring resource usage can overload an LLM. Denial of service attacks also can target context windows and exceed their limits. DoS vulnerabilities can compromise system performance and increase costs.

Mitigation strategies include:

  • Input validation and sanitization
  • Resource capping
  • API rate limit restrictions
  • Queued and total action limits

5. Supply chain vulnerabilities

Software and data supply chains can introduce vulnerabilities to LLM applications, causing biased behavior, security breaches, or system failure. Traditional software component vulnerabilities remain a source of risk for LLM. Machine learning systems also can be compromised by third-party pre-trained models, data, and plug-ins.

Mitigation strategies include:

  • Supplier vetting
  • Plug-in screening
  • Component vulnerability scanning

6. Sensitive information disclosure

Bad actors can manipulate LLMs to reveal confidential data, such as sensitive information or proprietary algorithms. This vulnerability can cause unauthorized data access, privacy violations, and security breaches.

Mitigation strategies include:

  • Data sanitization
  • Strict user policies

7. Insecure plug-in design

LLMs automatically call plug-ins, often without execution controls from model integration platforms and without input validation. Attackers can exploit this by making malicious requests to generate privilege escalation, remote code execution, data exfiltration, and other mischief. Poor access control and failure to track authorization cross plug-ins promote this vulnerability.

Mitigation strategies include:

  • Enforcing strict parameterized input
  • Applying input sanitization and validation
  • Validating plug-ins

8. Excessive agency

Developers often allow LLM-based systems the capability to act as agents in interacting with other systems and perform actions in response to input prompts or LLM outputs. Unexpected or ambiguous outputs can trigger undesired actions, such as deleting documents, calling unneeded functions, or activating unnecessary operations on other systems. This can happen because of poor design or because of attacker exploitation.

Mitigation strategies include:

  • Limiting functions to necessary operations
  • Limiting permissions

Read more about this LLM vulnerability with an overview of excessive agency.

9. Overreliance

Systems or people that rely on LLM output without oversight can suffer this vulnerability when LLMs generate results that are inaccurate, inappropriate, or unsafe. For example, LLMs may fabricate falsehoods and state them as facts (hallucination/confabulation). Overreliance vulnerabilities can generate misinformation, legal liability, and reputation damage as well as security risks.

Mitigation strategies include:

  • Filtering LLM output for self-consistency
  • Cross-checking output against trusted sources
  • Fine-tuning models to improve output quality

10. Model theft

With a LLM model theft attack, bad actors and advanced persistent threats can steal proprietary LLMs by targeting vulnerabilities to gain unauthorized access and copy or exfiltrate models. This can cause brands to suffer economic loss, reputation damage, and competitive disadvantage, while enabling unauthorized resource use and data access.

Mitigation strategies include:

  • Implementing strong access controls
  • Restricting LLM access to network resources, internal services, and APIs
  • Monitoring access logs for suspicious activity

What we test for

Recognizing the growing threat posed by LLM vulnerabilities, Cobalt has drawn on the expertise of our penetration testing community to develop AI pentesting procedures that address the threats OWASP has identified. Our procedures focus on the most critical risks facing today's AI and LLM implementations. Common test cases include:

  • Prompt injections: We catch prompt injection vulnerabilities by creating prompts to bypass LLM controls, reveal sensitive data, and bypass content filters through language patterns or tokens.
  • Jailbreaks: We assess jailbreak vulnerabilities by using semantic deception and social engineering to generate hostile content and using base64 encoding to bypass prompts preventing criminal NLP requests
  • Insecure input handling: We check LLM outputs to assess risks of remote code execution, SQL query execution, and JavaScript-based XSS attacks.

This list is illustrative, not exhaustive. Our testing procedures cover other vulnerabilities as well for comprehensive protection.

Start an AI Penetration Test with Cobalt

AI penetration testing has become imperative for security teams. Vulnerabilities such as prompt injections, insecure output handling, and data poisoning can compromise system security, brand reputations, company finances, and even legal liabilities.

Our AI pentesting services protect you against these risks by providing safeguards against today's most prevalent vulnerabilities. We help you secure your LLM-based systems by drawing from the expertise of our pentesting community. Our team works with your team to deliver you consistent, comprehensive coverage against today's biggest threats. Our simplified setup process and streamlined testing procedures make it easy for you to schedule a test without the headaches of negotiating custom scoping and statements of work.

 

Back to Blog
About Andrew Obadiaru
Andrew Obadiaru is the Chief Information Security Officer at Cobalt. In this role Andrew is responsible for maintaining the confidentiality, integrity, and availability of Cobalt's systems and data. Prior to joining Cobalt, Andrew was the Head of Information Security for BBVA USA Corporate Investment banking, where he oversaw the creation and execution of Cyber Security Strategy. Andrew has 20+ years in the security and technology space, with a history of managing and mitigating risk across changing technologies, software, and diverse platforms. More By Andrew Obadiaru