FAST TRACK
See our Fast Start promotion and start your first pentest on The Cobalt Offensive Security Testing Platform for only $4,950.
FAST TRACK
See our Fast Start promotion and start your first pentest on The Cobalt Offensive Security Testing Platform for only $4,950.

LLM Overreliance: What It Is and How to Prevent

Even back in Episode V, C3PO expected R2D2 to know better than to trust a strange computer, but nearly half a century later, LLM overreliance ranks in the Open Worldwide Application Security Project (OWASP) list of top 10 LLM applications and generative AI vulnerabilities. 

News organizations, academic researchers, and even IT professionals have fallen victim to this surprisingly widespread vulnerability. Despite some highly publicized incidents, many organizations have failed to take steps to mitigate this vulnerability.

In this blog, we'll provide some guidance on how to prevent LLM overreliance. We'll cover:

  • What LLM overreliance is
  • Examples of LLM overreliance
  • How to avoid LLM overreliance

What Is LLM Overreliance?

LLM overreliance occurs when human or digital users accept or act upon false statements or dataset provided by large language models without due fact-checking diligence. OWASP top 10 for LLM applications and generative AI clasifies overreliance as LLM09.

In other words, when people trust LLMs even though everyone has known not to since HAL assured audiences in 2001: A Space Odyssey, "No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error."

LLM overreliance typically happens when poor human oversight of LLMs fails to catch artificial intelligence hallucinations. AI relies on input data analysis to generate LLM output, but a biased data sample or statistical errors can yield inaccurate interpretations. LLM lacks the critical reasoning ability to distinguish possible combinations of data or remote statistical possibilities from real-world facts. Human beings must take steps to supply missing critical criteria to avoid falling prey to AI hallucinations.

Consequences of LLM overreliance can range from ridiculous to deadly serious. Failing to supervise LLM output can lead to chatbot misinformation, plagiarism, academic fraud, coding errors, security breaches, sensitive information leakage and legal liability. When AI interfaces with physical and mechanical systems such as cars and medical devices, LLM overreliance and a lack of validation can event have fatal consequences.

Example Attacks: LLM Overreliance

To illustrate how LLM overreliance happens and the risks it can create, here are some examples:

LLM Overreliance Example 1: Fake News

In November 2023, a website called Global Village Space published an article claiming that Israeli Prime Minister Benjamin Netanyahu's psychiatrist had committed suicide. The report was soon promoted by Iranian media outlets. Upon investigation, it turned out that Global Village Space had used AI systems to generate the article from a 2010 article by a satirical news site.

AI-powered fake news sites have proliferated over the last two years, from 49 identified sites in May 2023 to 1,075 in September 2024, according to news tracker NewsGuard. A University of Montreal study of the role of large language models in fake news found that many LLM models have no safeguards to prevent fake news generation, and LLM-generated fabrications can be more difficult to detect than human-generated ones.

LLM Overreliance Example 2: Academic Fraud

AI hallucinations can mislead academic researchers as well as news readers. In February 2023, medical researchers Hussam Alkaissicorresponding and Samy I. McFarlane published their research on artificial hallucinations in ChatGPT and their implications for scientific writing. In one test, they asked ChatGPT to produce a paragraph on the mechanism involved in a particular type of osteoporosis. ChatGPT produced a paragraph mixing true and false information. When asked to provide references for its work, ChatGPT cited five non-existent papers indexed to PubMedIDs of unrelated papers.

Another medical research team publishing in npj Digital Medicine found that scientists have trouble distinguishing AI-generated abstracts of scientific papers from human-generated summaries. The difficulty distinguishing AI-generated content and hallucinations from human work leaves academic researchers open to perpetuating plagiarism and fraud.

On the flip side, academics also run risks when using AI to detect fraud. The University of California Davis was forced to clear law student Louise Stivers after the school used beta-tested AI plagiarism software to falsely accuse her of cheating.

LLM Overreliance Example 3: Malicious Code

Software developers who rely on LLMs to help write and edit code run similar risks. A 2023 Bilkent University study of AI-assisted code generation tools compared the performance of ChatGPT, GitHub Copilot, and Amazon CodeWhisperer. The study found that ChatGPT generated correct code 65.2% of the time, while GitHub Copilot succeeded 46.3% of the time, and Amazon CodeWhisperer had a 31.1% success rate. Average technical debt to correct errors ranged from 5.6 minutes for Amazon CodeWhisperer to 9.1 minutes for GitHub Copilot.

LLMs can be useful for generating boilerplate code or brainstorming algorithms, but LLM programming tools fall short when it comes to other areas of coding, says data scientist Sahin Ahmed. LLMs can have trouble handling creative problem-solving, context, big-picture thinking, domain-specific knowledge, debugging, or user-centric design. Worse, bad actors can use LLMs to poison coding apps and suggest vulnerable code, University of Connecticut researchers have found.

How to Prevent LLM Overreliance

Software developers using LLMs for coding can mitigate overreliance by following best practices recommended by OWASP. These include:

  • Following secure coding best practices
  • Monitoring LLM output using criteria such as self-consistency, voting techniques, and comparing multiple model performance
  • Manually and automatically cross-referencing LLM output with trusted external and real-world sources, such as Data Commons, leveraged by Google's DataGemma open models
  • Fine-tuning models with techniques such as prompt engineering, chain of thought prompting, parameter efficient tuning (PET), and full model tuning
  • Breaking down complex tasks into subtasks assigned to different agents
  • Disclosing LLM limitations and risks to users
  • Encouraging responsible LLM use through APIs and user interfaces using tools such as content filters, user warnings, and AI-generated content labeling

These guidelines can help developers enjoy the benefits of LLM coding while reducing risk exposure.

Prevent LLM Vulnerabilities with Cobalt Pentesting

Overreliance represents just one of the top 10 LLM and AI vulnerabilities identified by OWASP. Other cybersecurity risks range from prompt injection, insecure output handling, and poisoned training data to insecure plugins, model theft, supply chain vulnerability, and excessive agency. Any of these vulnerabilities can disrupt LLM users in their own right, and they can magnify the impact of overreliance risks.

Cobalt AI penetration testing services help you safeguard your LLM application with next-generation professional security testing. Our team of experienced pentesters is led by a core of experts who contribute to OWASP LLM top 10 and industry leaders to develop and implement LLM security standards. Our user-friendly platform makes it easy for you to rapidly schedule customized tests to help secure your LLM attack surface from malicious actors. We help you detect vulnerabilities in real-time and mitigate them before attackers can exploit them.

Contact Cobalt to discuss how we can help you secure your AI-enabled applications and networks.

Back to Blog
About Ernest Li
10+ years experience in threat intelligence, threat detection, threat research, and security operations with a Masters degree from the University of Oxford. More By Ernest Li