The Hugging Face breach has left many in the AI community wondering about the security of their model supply chain. A vulnerability in the popular Transformers library, used by millions of developers, was exploited to gain access to sensitive data.
The breach highlights the importance of secure model supply chain management. This involves not only the development and deployment of AI models but also the secure storage and sharing of model weights and other sensitive data.
The Hugging Face breach demonstrates the potential consequences of a compromised model supply chain. In this case, the attackers were able to gain access to sensitive data, including model weights and user credentials.
It's essential for developers to take a proactive approach to securing their model supply chain. This includes implementing robust security measures, such as encryption and access controls, to protect sensitive data.
A fresh viewpoint: Fbcs Breach
Impact and Consequences
The Hugging Face breach had a significant impact on the AI community, with many researchers and developers expressing concerns about the potential consequences of the breach.
Curious to learn more? Check out: Fbcs Data Breach Class Action Lawsuit
The breach compromised sensitive data, including model weights and training data, which could be used to create malicious models that mimic the behavior of the original models.
The compromised data also included user credentials, which could be used to gain unauthorized access to other systems and applications.
The breach highlighted the importance of data security and the need for robust security measures to protect sensitive information.
The Hugging Face team responded quickly to the breach, taking steps to contain and mitigate the damage, including notifying affected users and providing guidance on how to secure their accounts.
Consider reading: Ai Security Training
Hackers Steal Auth Tokens
Hugging Face's Spaces platform was breached, allowing hackers to access authentication secrets for its members.
The breach was detected by Hugging Face's team, who found unauthorized access to Spaces secrets, specifically related to Spaces' secrets.
A subset of Spaces' secrets could have been accessed without authorization, raising concerns about the security of user data.
Hugging Face has already revoked authentication tokens in the compromised secrets and notified those impacted by email.
The company recommends that all Hugging Face Spaces users refresh their tokens and switch to fine-grained access tokens, which provide tighter control over who has access to AI models.
Hugging Face is working with external cybersecurity experts to investigate the breach and report the incident to law enforcement and data protection agencies.
The company has been tightening security over the past few days, including removing org tokens, implementing key management service for Spaces secrets, and improving system capabilities to identify leaked tokens.
Malicious Models Work
Malicious models can execute arbitrary code, potentially leading to malicious behavior when loaded. This is due to the way certain models are serialized and deserialized.
Developers often use the torch.load() function to load PyTorch models with transformers, which deserializes the model from a file. This includes the model's architecture, weights, and configurations.
Take a look at this: Key Challenges Faced by Genai Models
The malicious payload in a PyTorch model file was injected using the __reduce__ method of the pickle module. This method allows attackers to insert arbitrary Python code into the deserialization process.
Hugging Face's security protections include malware scanning, pickle scanning, and secrets scanning, but they don't outright block pickle models from being downloaded. Instead, they're marked as "unsafe", allowing someone to still download and execute potentially harmful models.
Tensorflow Keras models, the second-most prevalent model type on Hugging Face, can also execute arbitrary code. While it's not as easy for attackers to exploit this method, it's still a concern.
Security Measures
Hugging Face, a prominent AI development platform, has over 500,000 AI models and 250,000 datasets, making it a significant target for cyber threats.
The platform's security measures were found wanting when researchers at Lasso Security uncovered over 1,500 exposed API tokens, allowing access to 723 organizations' accounts, including major players like Meta and Microsoft.
Additional reading: Ai Ml Security
To mitigate this risk, AI developers should use new tools available to them, such as Huntr, a bug-bounty platform tailored specifically for AI vulnerabilities, to enhance the security posture of AI models and platforms.
The Lasso Security team's investigation revealed that API tokens were found hard-coded in public repositories, lacking basic protections.
Discover more: Genai Security
Secure Your Model Supply Chain with Artifactory
Artifactory is the ultimate solution for safeguarding your AI model supply chain.
By integrating JFrog Artifactory with your environment, you can download models securely and leverage JFrog Advanced Security to block any attempts to download malicious models.
The malicious models database in Artifactory is continuously updated with the latest findings from the JFrog Security Research team and other public data sources.
This provides real-time protection against emerging threats and shields your supply chain from potential risks.
Whether you're working with PyTorch, TensorFlow, and other pickle-based models, Artifactory ensures the integrity of your AI ecosystem.
You can stay ahead of security threats by exploring the JFrog Security Research blog and enhance the security of your products and applications.
Artifactory acts as a secure proxy for models, empowering you to innovate with confidence.
Curious to learn more? Check out: What Challenges Does Generative Ai Face
Third-Party API Usage
Third-Party API Usage is a critical aspect of API security. It's not just about securing APIs that are internally developed, but also ensuring safe consumption and usage of leveraged third-party APIs is essential.
API supply chain calls consist of consumption of both internal and third-party APIs, making it crucial for organizations to have a good understanding of what third-party APIs are in use. This includes their function and the data associated with them to assess risk.
Developers need to understand the ramifications of mishandling privileged API keys. This can lead to significant security breaches, as seen in the Hugging Face API Token Exposure breach where over 1,500 exposed API tokens were found, granting access to 723 organizations' accounts.
Organizations should implement technologies to prevent secrets like static API tokens from being exposed in code and repositories. This can be achieved by constantly scanning repos and revoking any OAuth token, GitHub App token, or personal access token when it's pushed to a public repository or public gist.
Trust and Ecosystem
The recent Hugging Face breach is a stark reminder of the importance of trust in the AI ecosystem. Vulnerabilities like CVE-2023-6730 can compromise the privacy and security of organizations using Hugging Face repositories.
Initiatives like Huntr, a bug bounty platform tailored for AI CVEs, play a crucial role in enhancing the security posture of AI models and platforms. This collective effort is imperative in fortifying Hugging Face repositories.
API attacks are on the rise, and organizations integrating with generative AI technologies may face the same risks and consequences.
Providers Need to Foster Trust
As AI technology becomes increasingly integrated into our daily lives, trust is a crucial aspect that needs to be addressed. Providers need to foster trust in APIs and beyond.
API attacks are on the rise, and organizations integrating with generative AI technologies may face the same risks and consequences. This is a concern that needs to be taken seriously.
Karl Mattson, CISO of API security firm Noname Security, emphasizes the importance of maintaining trust by building secure API implementations and protecting third-party transactions with good security hygiene. This is essential for organizations to ensure the integrity of their systems.
Organizations are already using generative AI from various vendors and channels, including integrating it into in-house application development, incorporating it into third-party applications, or accessing it directly via API from providers such as OpenAI or Google's Bard. This widespread adoption highlights the need for robust security measures.
The AI industry will need to work together to build trust and protect against API attacks, which are increasing in frequency and severity. This collective effort will help to safeguard the integrity of AI systems and maintain the trust of users.
Playground for Researchers
Hugging Face has become a playground for researchers striving to counteract new threats, often using tactics to bypass security measures.
Researchers and bug bounty hunters are attempting to get code execution for seemingly legitimate purposes, which are often misclassified as "malicious" payloads.
The runpy module is being utilized to bypass the current Hugging Face malicious models scan and simulate execution of arbitrary Python code.
Unharmful code execution demonstrations, such as the one by paclove/pytorchTest, are being used to test and improve security measures.
A model uploaded by the research team, MustEr/m3e_biased, was able to bypass the malicious models scan using the runpy module.
Analysis and Response
Researchers have previously found AI security risks on Hugging Face, a platform where the ML community collaborates on models, data sets, and applications.
This risk is not just theoretical - in the past, researchers were able to access Meta's Bloom, Meta-Llama, and Pythia large language model repositories using unsecured API access tokens found on GitHub and Hugging Face.
To mitigate this risk, AI developers can use new tools like Huntr, a bug-bounty platform tailored specifically for AI vulnerabilities.
Payload Analysis
The payload uploaded by baller423 was a reverse shell connection to an actual IP address, 210.117.212.93, which is notably more intrusive and potentially malicious.
This IP address range belongs to Kreonet, a high-speed network in South Korea that supports advanced research and educational endeavors.
A fundamental principle in security research is refraining from publishing real working exploits or malicious code, but this principle was breached when the malicious code attempted to connect back to a real IP address.
Researchers at JFrog found that the same payload with varying IP addresses was encountered shortly after the model was removed, highlighting the persistence of the security threat.
This is just one instance of a potentially malicious model, as further investigation into Hugging Face uncovered some 100 potentially malicious models.
Deeper Analysis Required
The recent discovery of malicious AI models on Hugging Face highlights the need for more rigorous security measures in the AI industry. The malicious payload uploaded by baller423 initiated a reverse shell connection to an actual IP address, indicating a potential security threat.
Researchers at Lasso Security found over 1,500 exposed API tokens on Hugging Face, which granted access to 723 organizations' accounts, including major players like Meta and Microsoft. This breach shows that API tokens can be a significant vulnerability in the AI supply chain.
The Lasso Security team's investigation into Hugging Face's security measures was driven by its popularity in the open-source AI community. They discovered that API tokens were often exposed in public repositories, lacking basic protections.
The malicious payload was injected into the PyTorch model file using the __reduce__ method of the pickle module, enabling attackers to insert arbitrary Python code into the deserialization process. This method can potentially lead to malicious behavior when the model is loaded.
Hugging Face's security protections, including malware scanning and pickle scanning, don't outright block or restrict pickle models from being downloaded. Instead, they mark them as "unsafe", which means someone can still download and execute potentially harmful models.
The growing existence of publicly available and potentially malicious AI/ML models poses a major risk to the supply chain, particularly for attacks that target demographics such as AI/ML engineers and pipeline machines.
Sources
- https://jfrog.com/blog/data-scientists-targeted-by-malicious-hugging-face-ml-models-with-silent-backdoor/
- https://www.maginative.com/article/hugging-face-api-token-exposure-is-a-wake-up-call-for-ai-security/
- https://www.bleepingcomputer.com/news/security/ai-platform-hugging-face-says-hackers-stole-auth-tokens-from-spaces/
- https://www.darkreading.com/application-security/hugging-face-ai-platform-100-malicious-code-execution-models
- https://www.reversinglabs.com/blog/5-lessons-learned-from-the-huggingface-api-breach
Featured Images: pexels.com