Want to know how the bad guys attack AI systems? MITRE’S ATLAS can show you

MITRE’s ATLAS threat landscape knowledge base for artificial intelligence is a comprehensive guide to the tactics and processes bad actors use to compromise and exploit AI systems.

Hooded spy in underground hideout trying to steal valuable data by targeting governmental websites with weak security. Espionage specialist doing cyber attacks to gain access to sensitive info

Credit: DC Studio / Shutterstock

It’s one thing to understand that artificial intelligence introduces new and rapidly evolving threats, but it’s quite another — incredibly daunting — task to stay on top of what those threats look like, where they’re coming from, and how severe they are.

The Adversarial Threat Landscape for AI Systems (ATLAS) is an attempt to do just that, so you don’t have to. Developed by the nonprofit technology research organization MITRE and modeled after its widely popular MITRE ATT&CK repository, ATLAS is a “living knowledge base of adversary tactics and techniques based on real-world attack observations and realistic demonstrations from AI Red Teams and Security Groups”.

MITRE recommends using ATLAS for activities such as security analysis, AI development and implementation, threat assessments, and red-teaming and reporting attacks on AI-enabled systems. The ATLAS flows from left to right and demonstrates an attack lifecycle from initial reconnaissance through ultimate impact.

MITRE

MITRE

Reconnaissance

Reconnaissance activities will often involve searching for publicly available research materials from victims. This may include examples such as journals and conference proceedings, preprint repositories, or technical blogs.

The process matches that of attackers who may also look for publicly available adversarial vulnerability analysis. This can include information on vulnerabilities in models, services/providers, platforms and underlying technologies and helps inform successful AI-focused attacks, whether using known exploitation techniques or creating new ones.

Resource development

After attackers have conducted their reconnaissance, they look to establish resources they can use for their malicious activities. This includes activities such as creating and purchasing resources to support their activities or compromising and stealing existing resources which may offer both cost savings and make their activities opaque and hard to attribute.

We see this quite often with cloud infrastructure more recently, but historically activities such as botnets as well for DDoS type attacks.

This tactic in ATLAS involves seven different techniques. For the sake of brevity, we won’t cover them all but they include things such as:

Acquiring public ML artifacts
Obtaining/developing capabilities
Acquiring infrastructure
Poisoning data and publishing poisoned datasets

Techniques in this tactic not only involve traditional resources but also look to craft adversarial data, create proxy machine learning models, and publish poisoned datasets publicly, similar to the way in which attackers take advantage of the open-source ecosystem by poisoning software packages.

Initial access

Once an attacker has done their reconnaissance and is developing resources for their malicious activity, they will seek to gain initial access to the AI/ML system, typically via networks, mobile devices, edge systems or a combination of them. The systems may also be local to the enterprise or be hosted in a cloud environment or managed service provider.

There are many ways attackers can establish initial access to a system. Some examples ATLAS gives include:

ML supply chain compromise
Valid accounts
App exploitation
LLM prompt injection
Phishing
Model evasion

While some of these techniques are common in other cyberattacks, some are more novel for AI/ML, such as compromising the ML supply chain through GPU hardware, data, and ML software or even the model itself.

Model evasion is a technique that involves the attacker crafting adversarial data and inputs to the ML model that can cause a desired effect on a target model. LLM prompt injection has been perhaps one of the most discussed attack types against generative AI and LLM systems. It involves crafting malicious prompts to input to the LLM to get it to act in unintended ways.

ML model access

A unique technique in attacking AI/ML systems is ML model access. Attackers are often seeking access to the ML model to gain information, develop attack techniques, or input malicious data into the model for nefarious purposes. They can also get access to the model through various paths, such as the underlying hosting environment, via an API, or by interacting directly with it.

Techniques involved in ML model access include:

ML model inference API access
ML-enabled product or service
Physical environment access
Full ML model access

Organizations are increasingly utilizing ML and AI in their products and services, either directly through an AI provider or direct integration of ML and AI into their product portfolio. Attackers may look to get access to the underlying ML model through these products and services, or even glean insights from logs and metadata.

Execution

Now we’re starting to have the rubber hit the road, as the attacker looks towards execution. This involves looking to run malicious code within the ML artifacts or software, either locally or on a remote system. It also aids broader activities, from moving laterally the stealing of sensitive data.

There are three potential techniques involved in this tactic:

User execution
Command and scripting interpreter
LLM plugin compromise

Execution may involve the user taking specific actions, such as executing unsafe code through techniques such as social engineering or attachments. Attackers may also use commands and scripting to embed initial access payloads or help establish command and control.

Persistence

Once an initial foothold has been established through execution, attackers are striving to establish persistence. This often occurs through ML artifacts and software and is aimed at helping the attacker keep access beyond system restarts or credential rotations that would normally eliminate their access.

The techniques cited for persistence include:

Poison training data
Backdoor ML model
LLM prompt injection

Persistence, of course is a common facet of cyberattacks, but the method in which the attacker establishes it for AI/ML systems can be unique. This may involve poisoning datasets the ML model uses or its underlying training data and labels to embed vulnerabilities or inserting code which can be triggered later when needed, such as a backdoor.

Privilege escalation

Gaining initial access and persistence are key but often an attacker wants to escalate their privilege to achieve an intended impact, whether it is full organizational compromise, impacting models or data, or exfiltration of data. Attackers typically take advantage of system weaknesses, misconfigurations, and vulnerabilities to escalate their level of access.

The three techniques ATLAS identifies include:

LLM prompt injection
LLM plugin compromise
LLM jailbreak

Given that we have discussed the first two techniques several times already, we will focus on the LLM jailbreak. An LLM Jailbreak includes using a prompt injection to put the LLM into a state that lets it freely respond to any user input, disregarding constraints, controls, and guardrails the LLM system owner may have put in place.

Defense evasion

Getting access to a system and persisting achieves much for the attacker but detection could lead to the elimination of access or severely impacting an attacker’s goals, making defensive evasion is key.

Similar to previous tactics, the techniques involved here include:

Evading ML model
LLM prompt injection
LLM jailbreak

This may aid in activities such as evading ML-based virus and malware detection or network scanning to ensure their activities are not discovered.

Credential access

It should be no surprise to see credential access and compromise listed. While ATLAS lists account names and passwords, this should be expanded to any sort of credentials, including access tokens, API keys, GitHub Privileged Access Tokens and more, as credential compromise remains a leading attack vector and we see the rise of non-human identities (NHI) as well, due to APIs, microservices, cloud, and the current digital landscape.

The only technique ATLAS lists under credential access is:

Unsecured Credentials

They discuss insecurely stored credentials such as plaintext files, environment variables, and repositories.

Discovery

Discovery is similar to reconnaissance, but this takes place within your environment, rather than on the outside. The attacker has established access and persistence and is now looking to gain insights about the system, network, and ML environment.

The four techniques listed include:

Discover ML model ontology
Discover ML model family
Discover ML artifacts
LLM meta prompt extraction

Here attackers are looking to understand the ML model, its ontology, the family of models it belongs to, how it responds to inputs, and more to tailor their attacks accordingly. They also are looking to understand how an LLM handles instructions and its internal workings so it can be manipulated or forced to disclose sensitive data.

Collection

In this phase of the attack lifecycle according to ATLAS, the attacker is gathering ML artifacts and other information to aid in their goals. This often is a precursor to stealing the ML artifacts or using the collected information for next steps in their attacks. Attackers are often collecting information from software repos, container and model registries and more.

The techniques identified are:

ML artifact collection
Data from information repositories
Data from local systems

ML staging attack

Now that information has been collected, bad actors start to stage the attack with knowledge of the target systems. They may be training proxy models, poisoning the target model, or crafting adversarial data to feed into the target model.

The four techniques identified include:

Create proxy ML model
Backdoor ML model
Verify attack
Craft adversarial data

Proxy ML Models can be used to simulate attacks and do so offline while the attackers hone their technique and desired outcomes. They can also use offline copies of target models to verify the success of an attack without raising the suspicion of the victim organization.

Exfiltration

After all the steps discussed, attackers are getting to what they really care about — exfiltration. This includes stealing ML artifacts or other information about the ML system. It may be intellectual property, financial information, PHI or other sensitive data depending on the use case of the model and ML systems involved.

The techniques associated with exfiltration include:

Exfiltration via ML inference API
Exfiltration via cyber means
LLM meta prompt extraction
LLM data leakage

These all involve exfiltrating data, whether through an API, traditional cyber methods (e.g. ATT&CK exfiltration), or using prompts to get the LLM to leak sensitive data, such as private user data, proprietary organizational data, and training data, which may include personal information. This has been one of the leading concerns around LLM usage by security practitioners as organizations rapidly adopt them.

Impact

Unlike exfiltration, the impact stage is where the attackers create havoc or damage, potentially causing interruptions, eroding confidence, or even destroying ML systems and data. In this stage, that could include targeting availability (through ransom, for example) or maliciously damaging integrity.

This tactic has six techniques, which include:

Evading ML models
Denial of ML service
Spamming ML systems with chaff data
Eroding ML model integrity
Cost harvesting
External harms

While we have discussed some of the techniques as part of other tactics, there are some unique ones here related to impact. For example, denial of an ML service is looking to exhaust resources or flood systems with requests to degrade or shut down services.

While most modern enterprise grade AI offerings are hosted in the cloud with elastic compute, they still can run into DDoS and resource exhaustion, as well as cost implications if not properly mitigated, impacting both the provider and the consumers.

Additionally, attackers may look to erode the ML model’s integrity instead with adversarial data inputs that impact ML model consumer trust and cause the model provider or organization to fix system and performance issues to address integrity concerns.

Lastly, attackers may look to cause external harms, such as abusing the access they obtained to impact the victim system, resources, and organization in ways such as related to financial and reputational harm, impact users or broader societal harm depending on the usage and implications of the ML system.

Want to know how the bad guys attack AI systems? MITRE’S ATLAS can show you

MITRE’s ATLAS threat landscape knowledge base for artificial intelligence is a comprehensive guide to the tactics and processes bad actors use to compromise and exploit AI systems.

Reconnaissance

Resource development

Initial access

ML model access

Execution

Persistence

Privilege escalation

Defense evasion

Credential access

Discovery

Collection

ML staging attack

Exfiltration

Impact

From our editors straight to your inbox

More from this author

Understanding OWASP’s Top 10 list of non-human identity critical risks

Why honeypots deserve a spot in your cybersecurity arsenal

Secure by design vs by default – which software development concept is better?

Top 10 cybersecurity misconfigurations: Nail the setup to avoid attacks

Kicking dependency: Why cybersecurity needs a better model for handling OSS vulnerabilities

Application detection and response is the gap-bridging technology we need

3 key strategies for mitigating non-human identity risks

Top 10 open source software security risks — and how to mitigate them

Show me more

Linux, macOS users infected with malware posing as legitimate Go packages

8 obstacles women still face when seeking a leadership role in IT

What is risk management? Quantifying and mitigating uncertainty

CSO Executive Sessions: How cybersecurity impacts company ratings - A fey factor for investors and consumers

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

CSO Executive Sessions: How cybersecurity impacts company ratings - A fey factor for investors and consumers

CSO Executive Sessions: How should software solution providers keep themselves and their enterprise clients safe?

CSO Executive Sessions: Open Source Institute’s Eric Nguyen on supply chain risks to critical infrastructure (Part 2)