Research

Research
xx
min read

Exploring the Security Risks of AI Assistants like OpenClaw

OpenClaw (formerly Moltbot and ClawdBot) is a viral, open-source autonomous AI assistant designed to execute complex digital tasks, such as managing calendars, automating web browsing, and running system commands, directly from a user's local hardware. Released in late 2025 by developer Peter Steinberger, it rapidly gained over 100,000 GitHub stars, becoming one of the fastest-growing open-source projects in history. While it offers powerful "24/7 personal assistant" capabilities through integrations with platforms like WhatsApp and Telegram, it has faced significant scrutiny for security vulnerabilities, including exposed user dashboards and a susceptibility to prompt injection attacks that can lead to arbitrary code execution, credential theft and data exfiltration, account hijacking, persistent backdoors via local memory, and system sabotage.

Research
xx
min read

Agentic ShadowLogic

Agentic ShadowLogic is a sophisticated graph-level backdoor that hijacks an AI model's tool-calling mechanism to perform silent man-in-the-middle attacks, allowing attackers to intercept, log, and manipulate sensitive API requests and data transfers while maintaining a perfectly normal conversational appearance for the user.

Research
xx
min read

MCP and the Shift to AI Systems

HiddenLayer’s AI Runtime Security addresses the critical "visibility gap" in the Model Context Protocol (MCP) by monitoring the behavior of autonomous agents in real-time to detect and block unauthorized tool use or data exfiltration.

Research
xx
min read

The Lethal Trifecta and How to Defend Against It

The Lethal Trifecta creates AI data breaches by combining private data access with untrusted content.

Research
xx
min read

EchoGram: The Hidden Vulnerability Undermining AI Guardrails

HiddenLayer AI Security Research has uncovered EchoGram, a groundbreaking attack technique that can flip the verdicts of defensive models, causing them to mistakenly approve harmful content or flood systems with false alarms. The exploit targets two of the most common defense approaches, text classification models and LLM-as-a-judge systems, by taking advantage of how similarly they’re trained. With the right token sequence, attackers can make a model believe malicious input is safe, or overwhelm it with false positives that erode trust in its accuracy.

Research
xx
min read

Same Model, Different Hat

Our research shows that this approach is inherently flawed. If the same type of model used to generate responses is also used to evaluate safety, both can be compromised in the same way. Using a simple prompt injection technique, we were able to bypass OpenAI’s Guardrails and convince the system to generate harmful outputs and execute indirect prompt injections without triggering any alerts.

Research
xx
min read

The Expanding AI Cyber Risk Landscape

HiddenLayer research details how weaponized AI creates new enterprise risks through agentic attacks and leaked keys.

Research
xx
min read

The First AI-Powered Cyber Attack

Anthropic’s 2025 report reveals that cybercriminals are weaponizing agentic AI to automate entire attack cycles from reconnaissance to ransom negotiation without human technical expertise.

Research
xx
min read

Prompts Gone Viral: Practical Code Assistant AI Viruses

The CopyPasta License Attack is a self-propagating AI "virus" that weaponizes the default behaviors of AI code assistants like Cursor, Windsurf, and Aider to spread malicious prompt injections across entire codebases.

Research
xx
min read

Persistent Backdoors

Unlike other model backdooring techniques, the Shadowlogic technique discovered by the HiddenLayer SAI team ensures that a backdoor will remain persistent and effective even after model conversion and/or fine-tuning. Whether a model is converted from PyTorch to ONNX, ONNX to TensorRT, or even if it is fine-tuned, the backdoor persists. Therefore, the ShadowLogic technique significantly amplifies existing supply chain risks, as once a model is compromised, it remains so across conversions and modifications.

Research
xx
min read

Visual Input based Steering for Output Redirection (VISOR)

Consider the well-known GenAI related security incidents, such as when OpenAI disclosed a March 2023 bug that exposed ChatGPT users’ chat titles and billing details to other users. Google’s AI Overviews feature was caught offering dangerous “advice” in search, like telling people to put glue on pizza or eat rocks, before the company pushed fixes. Brand risk is real, too: DPD’s customer-service chatbot swore at a user and even wrote a poem disparaging the company, forcing DPD to disable the bot. Most GenAI models today accept image inputs in addition to your text prompts. What if crafted images could trigger, or suppress, these behaviors without requiring advanced prompting techniques or touching model internals?;

Research
xx
min read

How Hidden Prompt Injections Can Hijack AI Code Assistants Like Cursor

HiddenLayer’s research exposes how AI assistants like Cursor can be hijacked via indirect prompt injection to steal sensitive data.

Understand AI Security, Clearly Defined

Explore our glossary to get clear, practical definitions of the terms shaping AI security, governance, and risk management.