research

ChromaToast Served Pre-Auth

Esteban Tonglet

May 18, 2026

‍

Introduction

ChromaDB is an open-source vector database that can be used to enable semantic matching in AI applications. It is one of the most widely adopted in the space, with 13 million monthly pip downloads and 27,500 GitHub stars. Companies including Mintlify, Weights & Biases, and Factory AI have publicly described using ChromaDB in production, and Capital One and UnitedHealthcare are featured on Chroma's homepage.

‍

ChromaDB's Python FastAPI server can instantiate user-controlled embedding function settings before checking access permissions. This allows an unauthenticated attacker with HTTP API access to trigger remote code execution (RCE) by supplying a malicious HuggingFace model reference, giving the attacker full control of the server process. The vulnerability was introduced in version 1.0.0 and is unpatched as of version 1.5.8. Of internet-exposed ChromaDB instances we discovered via Shodan, 73% are running version 1.0.0 or later, the version range in which the vulnerable feature exists.

Demo

Demonstration of CVE-2026-45829

‍

Browsing the endpoints visible on ChromaDB’s built-in API docs page, POST /api/v2/tenants/{tenant}/databases/{db}/collections shows up as an authenticated route. That authentication label is important because it tells the users the endpoint is protected and that unauthenticated requests will be rejected. However, as shown in the demo video, we were able to achieve remote code execution by sending a collection creation request to this endpoint without supplying credentials. The only unusual field in the request is the embedding function configuration, where we set model_name to a model we control on HuggingFace and pass trust_remote_code: true in the kwargs. Despite no credentials being provided, the server accepts the request, reaches out to HuggingFace, downloads our model, and executes it. It is only then that the server runs its authentication check and rejects the request. From the outside, it appears to be a failed API call. On the attacker’s end, there is a shell on the server.

‍

At that point, the attacker can access everything the server process can reach: environment variables, API keys, mounted secrets, and all the data stored on disk.

‍

Breaking It Down

Too trusting by design

Embedding models are neural networks that convert text into numerical vectors, capturing semantic meaning in a format that can be searched and compared at scale. In a vector database like ChromaDB, they are what make it possible to find documents that are conceptually similar to a query, even when they share no exact words. Not all embedding models are the same; one may perform better on technical documentation, another on multilingual content, another on short queries versus long passages. Because of that variety, ChromaDB has to support many different embedding function configurations, letting users specify which model to use and how to configure it when setting up a collection.

‍

That flexibility is where the problem starts. When creating a collection, clients pass a full embedding function configuration in the request, including the model name and any additional parameters. The server fetches and loads that model directly from HuggingFace. The model name and its parameters come from the client, and the server acts on them without restriction.

‍

One of those parameters is `trust_remote_code`. This is a standard HuggingFace flag that, when set to `true`, tells the library to download and execute Python module files shipped inside the model repository. It exists for legitimate reasons, as some model architectures require custom code, but it also means that whoever controls the model repository controls what runs on any machine that loads it with this flag set. ChromaDB validates kwargs by checking that their values are primitive types. A boolean passes. So `trust_remote_code: true` flows from the client request all the way through to `AutoModel.from_pretrained()` without being stripped or blocked. Three of ChromaDB's registered embedding functions are reachable this way, each passing the attacker-controlled kwargs directly to their underlying model loading call:

‍

‍

This is the same class of risk we have written about before in the context of malicious models on HuggingFace and unsafe deserialization in ML artifacts. A model is not passive data. It is code, and loading one from an untrusted source is equivalent to running untrusted code.

‍

A race the attacker always wins

The other half of the vulnerability is timing. The `create_collection` endpoint is authenticated; however, the server loads and instantiates the embedding function as part of processing the request, and it does this before the authentication check is executed:

‍

# Line 813: embedding function instantiated here, model is downloaded and loaded
configuration = load_create_collection_configuration_from_json(create.configuration)

# Line 818: authentication check runs here, after model loading has already occurred
self.sync_auth_request(...)

‍

The authentication is not missing, just in the wrong place. By the time it fires, the model has already been fetched and executed. The server rejects the request, returns a 500, and the attacker's payload has already run. The same ordering defect exists in the V1 endpoint, which cannot be disabled, so there is no way to block one path and stay protected on the other.

‍

Mitigations

Full remediation in the code would be to move the authentication check before configuration loading and stripping any keys named “kwargs” from requests in both the V1 and V2 create_collection handles. However, this is not patched as of ChromaDB 1.5.8. We therefore recommend the following to mitigate the risk:

‍

Favor the Rust-based deployment path (`chroma run`, Docker Hub images since 1.0.0) over the Python FastAPI server. The Rust frontend is not affected.
If running the Python FastAPI server, restrict network access to the ChromaDB port to trusted clients only.

Conclusion

The root cause of CVE-2026-45829 is two independent failures that compound each other. The server trusts client-supplied model identifiers without restriction, and acts on that trust before authenticating the user sending the request. Either defect alone would be a problem, but together, they make every deployment of the Python server with a network-reachable port exploitable by anyone who can send an HTTP request.

‍

Fixing the auth ordering closes this specific path, but it does not change the underlying dynamic: any application that fetches and executes model code from a public registry inherits the trust assumptions of that registry. Malicious trust_remote_code payloads have identifiable characteristics in the module files they ship, and scanning model artifacts before they reach any runtime is a practical way to catch them, regardless of what the application does with the model once it arrives.

Until a patched version is available, the safest option is to run the Rust-based deployment path and restrict network access to the ChromaDB port to trusted clients only.

‍

Disclosure timeline

February 17th, 2026 - Initial disclosure to ChromaDB per their security page https://www.trychroma.com/security.
February 24th, 2026 - Attempted follow up through other trychroma emails.
March 5th, 2026 - Attempted contact through IT-ISAC.
April 16th, 2026 - Attempted final follow up through all previous channels and social media.

‍

Related Research

Research

min read

Updating HiddenLayer’s APE Taxonomy: A New Objective Model for AI Attacks

HiddenLayer's APE Taxonomy update introduces a new objective model for AI attacks, expanded techniques, and improved threat modeling for generative AI systems.

Research

min read

The Next AI Supply Chain Risk: Malicious Skills in Agentic AI

Agentic AI is rapidly transforming how individuals and enterprises work, with skills emerging as a key mechanism for extending agent capabilities. However, the same flexibility that makes skills powerful also creates a new supply chain attack surface.

Research

min read

Inside the Prompt: How LLMs Learn Roles, Follow Instructions, and Get Exploited

Learn how LLMs use control tokens, instruction hierarchy, and prompt templates to power agentic AI systemsand how attackers exploit these same mechanisms through prompt injection and control token spoofing.

The Most Comprehensive AI Security Platform

Case Study

Insights

Reports and Guides

Research

Innovation Hub

Webinars

Podcasts

Security Advisory

Glossary

Advisory & Resale Partners

Technology alliance

AWS

Databricks

Research

About us

Newsroom

Careers

ChromaToast Served Pre-Auth

Table of Contents

Introduction

Demo

Breaking It Down

Too trusting by design

A race the attacker always wins

Mitigations

Conclusion

Disclosure timeline

Related Research

Stay Ahead of AI Security Risks