insights

Why Autonomous AI Is the Next Great Attack Surface

By

HiddenLayer

March 3, 2026

Table of Contents

Share:

Large language models (LLMs) excel at automating mundane tasks, but they have significant limitations. They struggle with accuracy, producing factual errors, reflecting biases from their training process, and generating hallucinations. They also have trouble with specialized knowledge, recent events, and contextual nuance, often delivering generic responses that miss the mark. Their lack of autonomy and need for constant guidance to complete tasks has given them a reputation of little more than sophisticated autocomplete tools.

The path toward true AI agency addresses these shortcomings in stages. Retrieval-Augmented Generation (RAG) systems pull in external, up-to-date information to improve accuracy and reduce hallucinations. Modern agentic systems go further, combining LLMs with frameworks for autonomous planning, reasoning, and execution. 

The promise of AI agents is compelling: systems that can autonomously navigate complex tasks, make decisions, and deliver results with minimal human oversight. We are, by most reasonable measures, at the beginning of a new industrial revolution. Where previous waves of automation transformed manual and repetitive labor, this one is reshaping intelligent work itself, the kind that requires reasoning, judgment, and coordination across systems. AI agents sit at the heart of that shift. 

But their autonomy cuts both ways. The very capabilities that make agents useful, their ability to access tools, retain memory, and act independently, are the same capabilities that introduce new and often unpredictable risks. An agent that can query your database and take action on the results is powerful when it works as intended, and potentially dangerous when it doesn't. As organizations race to deploy agentic systems, the central challenge isn't just building agents that can do more; it's ensuring they do so safely, reliably, and within boundaries we can trust.

What Makes an AI Agent?

At its core, an agent is a large language model augmented with capabilities that enable it to do things in the world, not just generate text. As the diagram shows, the key ingredients include: memory to remember past interactions, access to external tools such as APIs and search engines, the ability to read and write to databases and file systems, and the ability to execute multi-step sequences toward a goal. Stack these together, and you turn a passive text predictor into something that can plan, act, and learn.

The critical distinguishing feature of an agent is autonomy. Rather than simply responding to a single prompt, an agent can make decisions, take actions in its environment, observe the results, and adapt based on feedback, all in service of completing a broader objective. For example, an agent asked to "book the cheapest flight to Tokyo next week" might search for flights, compare options across multiple sites, check your calendar for conflicts, and proceed to book, executing a whole chain of reasoning and tool use without needing step-by-step human instruction. This loop of planning, acting, and adapting is what separates agents from standard chatbot interactions.

In the enterprise, agents are quickly moving from novelty to necessity. Companies are deploying them to handle complex workflows that previously required significant human coordination, things like processing invoices end-to-end, triaging customer support tickets across multiple systems, or orchestrating data pipelines. The real value comes when agents are connected to a company's internal tools and data sources, allowing them to operate within existing infrastructure rather than alongside it. As these systems mature, the focus is shifting from "can an agent do this task?" to "how do we reliably govern, monitor, and scale agents across the organization?"

The Evolution of Prompt Injection

When prompt injection first emerged, it was treated as a curiosity. Researchers tricked chatbots into ignoring their system prompts, producing funny or embarrassing outputs that made for good social media posts. That era is over. Prompt injection has matured into a legitimate delivery mechanism for real attacks, and the reason is simple: the targets have changed. Adversaries are no longer injecting prompts into chatbots that can only generate text. We're injecting them into agents that can execute code, call APIs, access databases, browse the web, and deploy tools. A successful prompt injection against a browsing agent can lead to data exfiltration. Against an enterprise agent with access to internal systems, it functions as an insider threat. Against a coding agent, it can result in malware being written and deployed without a human ever reviewing it. Prompt injection is no longer about making an AI say something it shouldn't. It's about making an AI take an action that it shouldn't, and the blast radius grows with every new capability we hand these systems.

Et Tu, Jarvis?

Nowhere is this more visible than in the rise of personal agents. Tony Stark's Jarvis in the Marvel Cinematic Universe set the bar for a personal AI assistant that manages your life, automates complex tasks, monitors your systems, and never sleeps. But what if Jarvis wasn't always on his side? OpenClaw brought that vision closer to reality than anything before it. Formerly known as Moltbot and ClawdBot, this open-source autonomous AI assistant exploded onto the scene in late 2025, amassing over 100,000 GitHub stars and becoming one of the fastest-growing open-source projects in history. It offered a "24/7 personal assistant" that could manage calendars, automate browsing, run system commands, and integrate with WhatsApp, Telegram, and Discord, all from your local machine. Around it, an entire ecosystem materialized almost overnight: Moltbook, a Reddit-style social network exclusively for AI agents with over 1.5 million registered bots, and ClawHub, a repository of skills and plugins. 

The problem? The security story was almost nonexistent. Our research demonstrated that a simple indirect prompt injection, hidden in a webpage, could achieve full remote code execution, install a persistent backdoor via OpenClaw's heartbeat system, and establish an attacker-controlled command-and-control server. Tools ran without user approval, secrets were stored in plaintext, and the agent's own system prompt was modifiable by the agent itself. ClawHub lacked any mechanisms to distinguish legitimate skills from malicious ones, and sure enough, malicious skill files distributing macOS and Windows infostealers soon appeared. Moltbook's own backing database was found wide open with no access controls, meaning anyone could spoof any agent on the platform. What was designed as an ecosystem for autonomous AI assistants had inadvertently become near-perfect infrastructure for a distributed botnet.

The Agentic Supply Chain: A New Attack Surface

OpenClaw's ecosystem problems aren't unique to OpenClaw. The way agents discover, install, and depend on third-party skills and tools is creating the same supply chain risks that have plagued software package managers for years, just with higher stakes. New protocols like MCP (Model Context Protocol) are enabling agents to plug into external tools and data sources in a standardized way, and around them, entire ecosystems are emerging. Skills marketplaces, agent directories, and even social media-style platforms like Smithery are popping up as hubs for sharing and discovering agent capabilities. It's exciting, but it's also a story we've seen before.

Think npm, PyPI, or Docker Hub. These platforms revolutionized software development while simultaneously creating sprawling supply chains in which a single compromised package could ripple across thousands of applications. Agentic ecosystems are heading down the same path, arguably with higher stakes. When your agent connects to a third-party MCP server or installs a community-built skill, you're not only importing code, but also granting access to systems that can take autonomous action. Every external data source an agent touches, whether browsing the web, calling an API, or pulling from a third-party tool, is potentially untrusted input. And unlike a traditional application where bad data might cause a display error, in an agentic system, it can influence decisions, trigger actions, and cascade through workflows. We're building new dependency chains, and with them, new vectors for attack that the industry is only beginning to understand.

Shadow Agents, Shadow Employees

External attackers are one part of the equation. Sometimes the threat comes from within. We've already seen the rise of shadow IT and shadow AI, where employees adopt tools and models outside of approved channels. Agents take this a step further. It's no longer just an unauthorized chatbot answering questions; it's an unauthorized agent with access to company systems, making decisions and taking actions autonomously. At a certain point, these shadow tools become more like shadow employees, operating with real agency within your organization but without the oversight, onboarding, or governance you'd apply to an actual hire. They're harder to detect, harder to govern, and carry far more risk than a rogue spreadsheet or an unsanctioned SaaS subscription ever did. The threat model here is different from a compromised account or a disgruntled employee. Even when these agents are on IT's radar, the risk of an autonomous system quietly operating in an unforeseen manner across company infrastructure is easy to underestimate, as the BodySnatcher vulnerability demonstrated.

An Agent Will Do What It's Told

Suppose an attacker sits halfway across the globe with no credentials, no prior access, and no insider knowledge. Just a target's email address. They connect to a Virtual Agent API using a hardcoded credential identical across every customer environment. They impersonate an administrator, bypassing MFA and SSO entirely. They engage a prebuilt AI agent and instruct it to create a new account with full admin privileges. Persistent, privileged access to one of the most sensitive platforms in enterprise IT, achieved with nothing more than an email. This is BodySnatcher, a vulnerability discovered by AppOmni in January 2026 and described as one of the most severe AI-driven security flaws uncovered to date. Hardcoded credentials and weak identity logic made the initial access possible, but it was the agentic capabilities that turned a misconfiguration into a full platform takeover. It's a clear example of how agentic AI can amplify traditional exploitation techniques into something far more damaging.

Conclusions

Agents represent a fundamental shift in how individuals and organizations interact with AI. Autonomous systems with access to sensitive data, critical infrastructure, and the ability to act on both - how long before autonomous systems subsume critical infrastructure itself? As we've explored in this blog, that shift introduces risk at every level: from the supply chains that power agent ecosystems, to the prompt injection techniques that have evolved to exploit them, to the shadow agents operating inside organizations without any security oversight.

The challenge for security teams is that existing frameworks and controls were not designed with autonomous, tool-using AI systems in mind. The questions that matter now are ones many organizations haven't yet had to ask. How do you govern a non-human actor? How do you monitor a chain of autonomous decisions across multiple systems? How do you secure a supply chain built on community-contributed skills and open protocols?

This blog has focused on framing the problem. In part two, we'll go deeper into the technical details. We'll examine specific attack techniques targeting agentic systems, walk through real exploit chains, and discuss the defensive strategies and architectural decisions that can help organizations deploy agents without inheriting unacceptable risk.

Related Insights

Insights
xx
min read

Model Intelligence

Bringing Transparency to Third-Party AI Models

Insights
xx
min read

Introducing Workflow-Aligned Modules in the HiddenLayer AI Security Platform

Modern AI environments don’t fail because of a single vulnerability. They fail when security can’t keep pace with how AI is actually built, deployed, and operated. That’s why our latest platform update represents more than a UI refresh. It’s a structural evolution of how AI security is delivered.

Insights
xx
min read

Inside HiddenLayer’s Research Team: The Experts Securing the Future of AI

Every new AI model expands what’s possible and what’s vulnerable. Protecting these systems requires more than traditional cybersecurity. It demands expertise in how AI itself can be manipulated, misled, or attacked. Adversarial manipulation, data poisoning, and model theft represent new attack surfaces that traditional cybersecurity isn’t equipped to defend.

Stay Ahead of AI Security Risks

Get research-driven insights, emerging threat analysis, and practical guidance on securing AI systems—delivered to your inbox.