New types of malicious attacks involving AI systems are emerging alongside this new technology. One way for attackers to manipulate systems in their favor is to poison an AI’s data set.
The cybersecurity industry refers to these attacks as data poisoning attacks.
These attacks pose a variety of different problems such as breaking an email spam filter or supporting the creation of deep fake content.
Learn more about what a data poisoning attack is, how data poisoning supports deep fake technology, and what steps to take to mitigate the risk.
What is an AI Poisoning Attack?
An Artificial Intelligence poisoning attack occurs when an AI model's training data is intentionally tampered with, affecting the outcomes of the model's decision-making processes. Despite the black-box nature of AI models, these attacks seek to deceive the AI system into making incorrect or harmful decisions.
In an AI poisoning attack, adversaries inject malicious or misleading data into the training dataset. The attacker introduces subtle modifications that can taint the learning process, creating bias and causing incorrect outputs or faulty decision-making from the AI model.
By poisoning the training data, the attacker can manipulate the behavior of these deep learning systems in the way they desire.
Types of Data Poisoning Attacks
Both external attackers and insiders with access to training data can poison an AI system. This increases the importance of establishing a basic understanding of these different attacks.
- Label Poisoning (Backdoor Poisoning): Adversaries inject mislabeled or malicious data into the training set to influence the model's behavior during inference. (Source)
- Training Data Poisoning: In training data poisoning, the attacker modifies a significant portion of the training data to influence the AI model's learning process. The misleading or malicious examples allow the attacker to bias the model's decision-making towards a particular outcome. (Source)
- Model Inversion Attacks: In model inversion attacks, adversaries exploit the AI model's responses to infer sensitive information about the data it was trained on. By manipulating queries and analyzing the model's output, the attacker can extract private information or details about the dataset. (Source)
- Stealth Attacks: In stealth attacks, the adversary strategically manipulates the training data to create vulnerabilities that are difficult to detect during the model's development and testing phases. The attack aims to exploit these hidden weaknesses once the model is deployed in real-world scenarios. (Source)
These attacks pose a significant threat to the reliability, trustworthiness, and security of AI systems, particularly when these systems are used in critical applications such as autonomous vehicles, medical diagnosis, or financial systems.
These risks require companies and researchers to build these trained models with caution.
How Attackers Use Data Poisoning to Create Deep Fakes
Attackers can use data poisoning techniques to manipulate AI systems, including those used for generating deep fakes.
Deep fakes are realistic but synthetic media, such as images or videos, created using AI.
When AI that generates deep fakes is poisoned, it causes the model to create deep fakes that exhibit specific characteristics or behave unrealistically.
Attackers use this strategy to deceive viewers or manipulate the content to spread misinformation or defame individuals. For example, cybercriminals could poison an AI model controlling Gmail’s spam system with misleading training data, making the spam bypass their filters. As a result, spam emails could potentially impact a far greater number of people.
These attacks can also skew an AI model's understanding of facial features, expressions, or voice patterns. This can lead to deceptive deep fakes that have serious privacy and identity implications. For example, if an AI home security system is attacked, attackers could trick the system to believe that someone other than the rightful owner controls the system.
An Example of an Adversarial Attack on an AI System
One of the most well-known examples of an adversarial attack on an AI system is the manipulation of images to deceive image classification models. An early example of this is Tay, Microsoft's Twitter chatbot released in 2016. Twitter intended for Tay to be a friendly bot that Twitter users could interact with. Tay worked until malicious actors decided to feed her nothing but deleterious and vulgar tweets. This permanently altered her output and there was little Microsoft could do other than pull Tay off their app.
This may change in the near future, but today, most organizations do not build AI models from scratch, but are building on top of already available large language models (LLMs) supplied by companies like OpenAI.
While using an established LLM may sound like a way to avoid data poisoning attacks from the outside, they are not. One group of researchers found that with less than $100, they were able to influence AI biases in ways undetectable by humans, such as altering Wikipedia posts and uploading influential images to a website.
Researchers and practitioners are also developing defense mechanisms to combat each of the attacks highlighted above as new AI systems and features are rolled out.
Over time the industry has compiled a working list of best practices to help decrease the severity of attacks and strengthen AI systems.
Best Practices for Stopping Data Attacks
Businesses should implement multiple best practices to defend against data attacks. Here are some key strategies Cobalt recommends:
- Data Sanitization and Preprocessing: Implement data sanitization techniques to filter out potential attacks, such as removing anomalies and suspicious patterns or carefully verifying the integrity of data sources.
- Anomaly Detection: Employ statistical methods or machine learning algorithmic anomaly detection to monitor the incoming data and identify suspicious patterns.
- Adversarial Training: Train models to identify poisonous data by augmenting the training data with carefully crafted adversarial examples.
- Model Architectures: Design model architectures that protect against data attacks. This includes architectures with built-in defenses against adversarial inputs, such as robust optimization algorithms, defensive distillation, or feature squeezing.
- Continuous Monitoring: Continuously monitor the performance and behavior of your AI models in real-world scenarios, including comparing outputs to expected behavior and searching for anomalous patterns indicative of a data attack.
- Input Validation and Verification: Examine input to ensure that the incoming data meets your criteria, such as checking data integrity, verifying the authenticity and trustworthiness of data sources, and employing techniques like checksums or digital signatures.
- Secure Data Handling: Establish strict access controls and cybersecurity measures, such as encryption, secure data storage, and access control mechanisms, to protect the training data from unauthorized modifications or tampering.
- Training Procedures: Ensure that the training process is resilient to attacks by using secure environments for training, verifying the integrity of training data sources, and implementing protocols for managing the training pipeline.
Another example, OpenAI provides API users with a list of best practices in their documentation. Many of the practices are similar to the ones listed above and also include points that are directly related to using ChatGPT 4.
By implementing AI best practices, organizations can enhance the resilience of their AI systems against data attacks and mitigate the risks associated with malicious data inputs. It's important to continuously stay updated on emerging attack techniques and adopt evolving defense strategies.
Safeguarding Your AI Systems
AI poisoning attacks pose a significant threat to the reliability and security of AI systems. Learn more about other emerging LLM vulnerabilities with an overview of excessive agency.
If you are an organization seeking to protect your AI systems from data manipulation, it is critical to understand the nature of these attacks, consider their relation to deep fakes, and implement effective countermeasures. See how Cobalt helps companies secure their LLM-enabled applications and networks with AI penetration testing services.