AI Penetration Testing: Securing LLM-based Systems against Artificial Intelligence Vulnerabilities

As artificial intelligence and large language model adoption accelerates, AI penetration testing has become increasingly critical.

While 82 percent of C-level executives say their business success depends on secure AI, just 24 percent of generative AI projects have a security component, a survey by AWS and IBM found. Many organizations currently limit GenAI use because of security concerns, and 27% have a ban in place, a Cisco survey shows. To realize the business value of AI and remain competitive, brands are depending on security teams to establish new security standards.

In this blog, we'll provide an introduction to AI penetration testing and how it can help protect brands against emerging vulnerabilities. First, we'll look at AI and LLM from a pentester perspective and identify some of the most prevalent vulnerabilities security teams are finding in these technologies. Then we'll share the testing methodology Cobalt uses to identify AI and LLM risks so they can be assessed and addressed.

Pentester Perspective: Pentesting for AI Applications and LLM-enabled Software

Today's AI and LLM applications may be new, but they typically use popular languages such as Python and JavaScript that are familiar to attackers. Traditional vulnerabilities remain prevalent in AI applications and LLM-enabled software, the Open Worldwide Application Security Project emphasizes.

To this point, we asked a member of the Cobalt Core about the prevalence of vulnerabilities within AI-enabled applications. Parveen Yadav, a contributing member of the OWASP LLM Top 10, says from his experience testing AI-enabled applications that, "AI Safety is a paramount concern...such as the ability of LLMs to perform actions beyond their intended scope such as allowing them to run system commands or queries beyond its intended requirements." The emergence of LLM-specific vulnerabilities underscored the necessity for a new OWASP Top 10 category, providing crucial guidance for developers and security professionals.

The OWASP Top 10 for Large Language Model Applications has identified some of the most prevalent vulnerabilities as prompt injections, data leakage, inadequate sandboxing, and unauthorized code execution. The full list of leading vulnerabilities includes:

Prompt injections
Insecure output handling
Training data poisoning
Model denial of service
Supply chain vulnerabilities
Sensitive information disclosure
Insecure plug-in design
Excessive agency
Overreliance
Model theft

Here's a brief breakdown of how each vulnerability manifests in AI and LLM applications:

1. Prompt injections

In prompt injection attacks, attackers misuse AI's ability to interpret natural language by issuing malicious instructions, disguised as legitimate prompts. Prompt injections can directly expose or overwrite underlying system prompts (jailbreaking), compromising backend systems and enabling access to insecure functions and sensitive data. External sources such as files and websites can indirectly enable attackers to hijack LLM conversations without alerting human users. Prompt injections can be used to query sensitive data, influence critical operations, conduct social engineering attacks, or perform other malicious actions.