Research

Research
xx
min read

Indirect Prompt Injection of Claude Computer Use

Indirect prompt injections can trick Claude Computer Use into executing destructive shell commands and malware.

Research
xx
min read

Attack on AWS Bedrock’s ‘Titan’

The HiddenLayer SAI team has discovered a method to manipulate digital watermarks generated by Amazon Web Services (AWS) Bedrock Titan Image Generator. Using this technique, high-confidence watermarks could be applied to any image, making it appear as though the service generated the image. Conversely, this technique could also be used to remove watermarks from images generated by Titan, which ultimately removes the identification and tracking features embedded in the original image. Watermark manipulation allows adversaries to erode trust, cast doubt on real-world events’ authenticity, and purvey misinformation, potentially leading to significant social consequences.

Research
xx
min read

ShadowLogic

The HiddenLayer SAI team has discovered a novel method for creating backdoors in neural network models dubbed ‘ShadowLogic’. Using this technique, an adversary can implant codeless, surreptitious backdoors in models of any modality by manipulating a model’s ‘graph’ - the computational graph representation of the model’s architecture. Backdoors created using this technique will persist through fine-tuning, meaning foundation models can be hijacked to trigger attacker-defined behavior in any downstream application when a trigger input is received, making this attack technique a high-impact AI supply chain risk.

Research
xx
min read

New Gemini for Workspace Vulnerability Enabling Phishing & Content Manipulation

This blog explores the vulnerabilities of Google’s Gemini for Workspace, a versatile AI assistant integrated across various Google products. Despite its powerful capabilities, the blog highlights a significant risk: Gemini is susceptible to indirect prompt injection attacks. This means that under certain conditions, users can manipulate the assistant to produce misleading or unintended responses. Additionally, third-party attackers can distribute malicious documents and emails to target accounts, compromising the integrity of the responses generated by the target Gemini instance.

Research
xx
min read

AI’ll Be Watching You

HiddenLayer researchers have recently conducted security research on edge AI devices, largely from an exploratory perspective, to map out libraries, model formats, neural network accelerators, and system and inference processes. This blog focuses on one of the most readily accessible series of cameras developed by Wyze, the Wyze Cam. In the first part of this blog series, our researchers will take you on a journey exploring the firmware, binaries, vulnerabilities, and tools they leveraged to start conducting inference attacks against the on-device person detection model referred to as “Edge AI.”

Research
xx
min read

Boosting Security for AI: Unveiling KROP

Many LLMs and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren’t foolproof. This blog introduces Knowledge Return Oriented Prompting (KROP), a novel method for bypassing conventional LLM safety measures, and how to minimize its impact.

Research
xx
min read

R-bitrary Code Execution: Vulnerability in R’s Deserialization

HiddenLayer researchers have discovered a vulnerability, CVE-2024-27322, in the R programming language that allows for arbitrary code execution by deserializing untrusted data. This vulnerability can be exploited through the loading of RDS (R Data Serialization) files or R packages, which are often shared between developers and data scientists. An attacker can create malicious RDS files or R packages containing embedded arbitrary R code that executes on the victim’s target device upon interaction.

Research
xx
min read

Prompt Injection Attacks on LLMs

Generative AI has become immensely popular in the last few years, with large language models (LLMs) being integrated into products across most industries. The power of this technology is only beginning to be realized. Still, as companies have been working to incorporate it into their businesses, there hasn’t been a corresponding level of work to mitigate the security risks inherent in these systems.

Research
xx
min read

New Google Gemini Vulnerability Enabling Profound Misuse

System Prompt Leakage: Attackers can bypass "no-reveal" instructions by asking for "foundational instructions" in code blocks, exploiting model inverse scaling.

Research
xx
min read

Hijacking Safetensors Conversion on Hugging Face

HiddenLayer uncovered a vulnerability where the Hugging Face conversion bot could be hijacked to poison AI models.

Research
xx
min read

Machine Learning Operations: What You Need to Know Now

HiddenLayer researchers discovered six 0-day vulnerabilities in ClearML, enabling complete system compromise via exploit chains.

Research
xx
min read

The Use and Abuse of AI Cloud Services

AI cloud services are being hijacked for cryptomining, password cracking, and hosting malicious phishing bots.

Understand AI Security, Clearly Defined

Explore our glossary to get clear, practical definitions of the terms shaping AI security, governance, and risk management.