MITRE’s ATLAS threat landscape knowledge base for artificial intelligence is a comprehensive guide to the tactics and processes bad actors use to compromise and exploit AI systems. Credit: DC Studio / Shutterstock It’s one thing to understand that artificial intelligence introduces new and rapidly evolving threats, but it’s quite another — incredibly daunting — task to stay on top of what those threats look like, where they’re coming from, and how severe they are. The Adversarial Threat Landscape for AI Systems (ATLAS) is an attempt to do just that, so you don’t have to. Developed by the nonprofit technology research organization MITRE and modeled after its widely popular MITRE ATT&CK repository, ATLAS is a “living knowledge base of adversary tactics and techniques based on real-world attack observations and realistic demonstrations from AI Red Teams and Security Groups”. MITRE recommends using ATLAS for activities such as security analysis, AI development and implementation, threat assessments, and red-teaming and reporting attacks on AI-enabled systems. The ATLAS flows from left to right and demonstrates an attack lifecycle from initial reconnaissance through ultimate impact. MITRE MITRE Reconnaissance Reconnaissance activities will often involve searching for publicly available research materials from victims. This may include examples such as journals and conference proceedings, preprint repositories, or technical blogs. The process matches that of attackers who may also look for publicly available adversarial vulnerability analysis. This can include information on vulnerabilities in models, services/providers, platforms and underlying technologies and helps inform successful AI-focused attacks, whether using known exploitation techniques or creating new ones. Resource development After attackers have conducted their reconnaissance, they look to establish resources they can use for their malicious activities. This includes activities such as creating and purchasing resources to support their activities or compromising and stealing existing resources which may offer both cost savings and make their activities opaque and hard to attribute. We see this quite often with cloud infrastructure more recently, but historically activities such as botnets as well for DDoS type attacks. This tactic in ATLAS involves seven different techniques. For the sake of brevity, we won’t cover them all but they include things such as: Acquiring public ML artifacts Obtaining/developing capabilities Acquiring infrastructure Poisoning data and publishing poisoned datasets Techniques in this tactic not only involve traditional resources but also look to craft adversarial data, create proxy machine learning models, and publish poisoned datasets publicly, similar to the way in which attackers take advantage of the open-source ecosystem by poisoning software packages. Initial access Once an attacker has done their reconnaissance and is developing resources for their malicious activity, they will seek to gain initial access to the AI/ML system, typically via networks, mobile devices, edge systems or a combination of them. The systems may also be local to the enterprise or be hosted in a cloud environment or managed service provider. There are many ways attackers can establish initial access to a system. Some examples ATLAS gives include: ML supply chain compromise Valid accounts App exploitation LLM prompt injection Phishing Model evasion While some of these techniques are common in other cyberattacks, some are more novel for AI/ML, such as compromising the ML supply chain through GPU hardware, data, and ML software or even the model itself. Model evasion is a technique that involves the attacker crafting adversarial data and inputs to the ML model that can cause a desired effect on a target model. LLM prompt injection has been perhaps one of the most discussed attack types against generative AI and LLM systems. It involves crafting malicious prompts to input to the LLM to get it to act in unintended ways. ML model access A unique technique in attacking AI/ML systems is ML model access. Attackers are often seeking access to the ML model to gain information, develop attack techniques, or input malicious data into the model for nefarious purposes. They can also get access to the model through various paths, such as the underlying hosting environment, via an API, or by interacting directly with it. Techniques involved in ML model access include: ML model inference API access ML-enabled product or service Physical environment access Full ML model access Organizations are increasingly utilizing ML and AI in their products and services, either directly through an AI provider or direct integration of ML and AI into their product portfolio. Attackers may look to get access to the underlying ML model through these products and services, or even glean insights from logs and metadata. Execution Now we’re starting to have the rubber hit the road, as the attacker looks towards execution. This involves looking to run malicious code within the ML artifacts or software, either locally or on a remote system. It also aids broader activities, from moving laterally the stealing of sensitive data. There are three potential techniques involved in this tactic: User execution Command and scripting interpreter LLM plugin compromise Execution may involve the user taking specific actions, such as executing unsafe code through techniques such as social engineering or attachments. Attackers may also use commands and scripting to embed initial access payloads or help establish command and control. Persistence Once an initial foothold has been established through execution, attackers are striving to establish persistence. This often occurs through ML artifacts and software and is aimed at helping the attacker keep access beyond system restarts or credential rotations that would normally eliminate their access. The techniques cited for persistence include: Poison training data Backdoor ML model LLM prompt injection Persistence, of course is a common facet of cyberattacks, but the method in which the attacker establishes it for AI/ML systems can be unique. This may involve poisoning datasets the ML model uses or its underlying training data and labels to embed vulnerabilities or inserting code which can be triggered later when needed, such as a backdoor. Privilege escalation Gaining initial access and persistence are key but often an attacker wants to escalate their privilege to achieve an intended impact, whether it is full organizational compromise, impacting models or data, or exfiltration of data. Attackers typically take advantage of system weaknesses, misconfigurations, and vulnerabilities to escalate their level of access. The three techniques ATLAS identifies include: LLM prompt injection LLM plugin compromise LLM jailbreak Given that we have discussed the first two techniques several times already, we will focus on the LLM jailbreak. An LLM Jailbreak includes using a prompt injection to put the LLM into a state that lets it freely respond to any user input, disregarding constraints, controls, and guardrails the LLM system owner may have put in place. Defense evasion Getting access to a system and persisting achieves much for the attacker but detection could lead to the elimination of access or severely impacting an attacker’s goals, making defensive evasion is key. Similar to previous tactics, the techniques involved here include: Evading ML model LLM prompt injection LLM jailbreak This may aid in activities such as evading ML-based virus and malware detection or network scanning to ensure their activities are not discovered. Credential access It should be no surprise to see credential access and compromise listed. While ATLAS lists account names and passwords, this should be expanded to any sort of credentials, including access tokens, API keys, GitHub Privileged Access Tokens and more, as credential compromise remains a leading attack vector and we see the rise of non-human identities (NHI) as well, due to APIs, microservices, cloud, and the current digital landscape. The only technique ATLAS lists under credential access is: Unsecured Credentials They discuss insecurely stored credentials such as plaintext files, environment variables, and repositories. Discovery Discovery is similar to reconnaissance, but this takes place within your environment, rather than on the outside. The attacker has established access and persistence and is now looking to gain insights about the system, network, and ML environment. The four techniques listed include: Discover ML model ontology Discover ML model family Discover ML artifacts LLM meta prompt extraction Here attackers are looking to understand the ML model, its ontology, the family of models it belongs to, how it responds to inputs, and more to tailor their attacks accordingly. They also are looking to understand how an LLM handles instructions and its internal workings so it can be manipulated or forced to disclose sensitive data. Collection In this phase of the attack lifecycle according to ATLAS, the attacker is gathering ML artifacts and other information to aid in their goals. This often is a precursor to stealing the ML artifacts or using the collected information for next steps in their attacks. Attackers are often collecting information from software repos, container and model registries and more. The techniques identified are: ML artifact collection Data from information repositories Data from local systems ML staging attack Now that information has been collected, bad actors start to stage the attack with knowledge of the target systems. They may be training proxy models, poisoning the target model, or crafting adversarial data to feed into the target model. The four techniques identified include: Create proxy ML model Backdoor ML model Verify attack Craft adversarial data Proxy ML Models can be used to simulate attacks and do so offline while the attackers hone their technique and desired outcomes. They can also use offline copies of target models to verify the success of an attack without raising the suspicion of the victim organization. Exfiltration After all the steps discussed, attackers are getting to what they really care about — exfiltration. This includes stealing ML artifacts or other information about the ML system. It may be intellectual property, financial information, PHI or other sensitive data depending on the use case of the model and ML systems involved. The techniques associated with exfiltration include: Exfiltration via ML inference API Exfiltration via cyber means LLM meta prompt extraction LLM data leakage These all involve exfiltrating data, whether through an API, traditional cyber methods (e.g. ATT&CK exfiltration), or using prompts to get the LLM to leak sensitive data, such as private user data, proprietary organizational data, and training data, which may include personal information. This has been one of the leading concerns around LLM usage by security practitioners as organizations rapidly adopt them. Impact Unlike exfiltration, the impact stage is where the attackers create havoc or damage, potentially causing interruptions, eroding confidence, or even destroying ML systems and data. In this stage, that could include targeting availability (through ransom, for example) or maliciously damaging integrity. This tactic has six techniques, which include: Evading ML models Denial of ML service Spamming ML systems with chaff data Eroding ML model integrity Cost harvesting External harms While we have discussed some of the techniques as part of other tactics, there are some unique ones here related to impact. For example, denial of an ML service is looking to exhaust resources or flood systems with requests to degrade or shut down services. While most modern enterprise grade AI offerings are hosted in the cloud with elastic compute, they still can run into DDoS and resource exhaustion, as well as cost implications if not properly mitigated, impacting both the provider and the consumers. Additionally, attackers may look to erode the ML model’s integrity instead with adversarial data inputs that impact ML model consumer trust and cause the model provider or organization to fix system and performance issues to address integrity concerns. Lastly, attackers may look to cause external harms, such as abusing the access they obtained to impact the victim system, resources, and organization in ways such as related to financial and reputational harm, impact users or broader societal harm depending on the usage and implications of the ML system. SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe