Over 1,500 Hugging Face API Tokens Exposed in Supply Chain Security Breach

X Facebook LinkedIn Messenger Reddit Telegram Threads WhatsApp

Security researchers at Lasso Security have uncovered a major vulnerability on Hugging Face, the popular open-source AI and machine learning platform, revealing that more than 1,500 exposed API tokens left hundreds of organizations — including Meta, Microsoft, and Google — open to devastating supply chain attacks. The disclosure, made public on December 4, 2023, highlights a growing intersection between artificial intelligence infrastructure and cybersecurity threats at a time when Bitcoin trades near $42,000 and the broader crypto market surges past $1.5 trillion in total capitalization.

The Exploit Mechanics

Table of Contents

The breach was not the result of a sophisticated zero-day or a novel attack vector. Instead, Lasso Security researchers used straightforward substring searches across public repositories on both Hugging Face and GitHub to locate hardcoded API tokens that developers had accidentally committed to public code. By brute-forcing the first two characters of token prefixes, researchers bypassed GitHub’s 100-result search limit and systematically collected thousands of exposed credentials. Using the Hugging Face whoami API endpoint, they validated each token, identified its owner, mapped organizational memberships, and catalogued permission levels. The scale was staggering: over 1,500 tokens were confirmed active, granting access to 723 organizational accounts on the platform.

★ FREE PDF BitcoinsNews.com

Institutional Bitcoin Playbook

How funds & corporates allocate to Bitcoin — frameworks you can steal.

⬇️ Download Now

Affected Systems

Of the 1,500-plus tokens discovered, 655 carried write permissions — the most dangerous level of access. These write-enabled tokens affected 77 organizations, including some of the most prominent names in AI development. Meta’s Llama 2 project, EleutherAI’s Pythia models, and BigScience Workshop’s Bloom language model were all potentially compromised. Hugging Face hosts more than 500,000 AI models and 250,000 datasets, making it a critical piece of global AI infrastructure. The exposed tokens could have allowed attackers to modify datasets downloaded tens of thousands of times per month, poison training data for widely-used models, or steal over 10,000 private AI models outright. The three affected organizations — Meta, EleutherAI, and BigScience Workshop — collectively serve models with millions of downloads, meaning any successful tampering would have cascaded across countless downstream applications and users.

The Mitigation Strategy

Upon notification, the affected organizations moved quickly to revoke the exposed tokens and close the access paths. Hugging Face itself had already deprecated its organization API tokens (org_api), which researchers found could still be exploited for read access and billing manipulation despite the deprecation. The platform’s Python library was patched to add token-type verification in the login function. Researchers also identified a weakness where the deprecated tokens could bypass intended restrictions with minor modifications to the login flow. Hugging Face’s own secret scanning tool — similar to GitHub’s Secret Scanning feature — is available to alert users when tokens are hardcoded into projects, but the tool clearly was not being used by enough developers. The research underscores that platform-level scanning tools are necessary but insufficient when developers fail to adopt them consistently.

Lessons Learned

This incident maps directly to three of OWASP’s Top 10 risks for Large Language Models: supply chain vulnerabilities, training data poisoning, and model theft. The exposed tokens represented all three threat vectors simultaneously. Data poisoning — where an attacker subtly corrupts the datasets used to train AI models — is among the most critical threats facing the AI ecosystem today. If attackers had modified datasets that designate types of network traffic, for example, the downstream effects could include misallocated resources and network performance degradation across enterprise environments. The fact that this was discovered by researchers rather than malicious actors is fortunate but should not breed complacency. Hardcoding API tokens in public repositories remains one of the most common and preventable security failures in software development, and the AI sector is not immune.

User Action Required

For developers and organizations using Hugging Face or similar AI platforms, immediate steps are essential. First, audit all repositories for hardcoded API tokens and rotate any credentials that may have been exposed. Second, enable platform-provided secret scanning tools on all projects. Third, implement least-privilege token policies — most operations do not require write permissions. Fourth, for organizations managing high-profile models or datasets, establish continuous monitoring for unauthorized access attempts. As the AI industry continues its rapid expansion — with Bitcoin hovering around $41,980 and Ethereum at $2,243 on this date, reflecting a broader risk-on environment — the security of AI supply chains will only grow more critical. The tools and practices for securing these platforms exist; adoption is the missing piece.

Disclaimer: This article is for informational purposes only and does not constitute financial or investment advice. Always conduct your own research before making any financial decisions.

🌱 FOR BUSINESSES BitcoinsNews.com

Reach 100K+ Crypto Readers

Sponsored content, press releases, banner ads, and newsletter placements. Put your brand in front of Bitcoin's most engaged audience.

Advertise With Us Submit a Press Release

sigfault_

January 12, 2024 at 9:14 am

brute forcing the first two chars of a token prefix to bypass GitHub search limits is genuinely clever. simple but effective

Tomer K.

January 18, 2024 at 4:42 pm

655 tokens with write access. let that sink in for a second. someone could have pushed malicious code to models downloaded millions of times

pwn_hunter
February 10, 2026 at 6:33 pm

Tomer K. and the worst part is most of those tokens were months old. developers just commit them once and forget. secret rotation is basically nonexistent in ML

1. segfault_42
  May 22, 2026 at 6:30 pm
  
  secret rotation in ML is basically nonexistent because researchers are not engineers. they publish notebooks not production code
  
Dmitri V.
April 18, 2026 at 11:15 am

exactly this. write access to org repos means you can backdoor model weights. nobody would notice until thousands of downloads later

null_pointer_42
May 21, 2026 at 11:45 am

655 tokens with write access to org repos including Meta and Microsoft. one leaked credential and you poison the entire ML supply chain downstream

darknode_42

February 3, 2024 at 11:28 am

hard agree with Tomer. the write access part is what makes this way scarier than a typical credential leak

Kwame A.

February 12, 2026 at 11:07 am

Lasso Security did the industry a favor by disclosing. imagine if a nation state had found those 1500 tokens first

Ewa K.

May 23, 2026 at 4:28 pm

brute forcing the first two chars of a token prefix to bypass GitHub limits is embarrassingly simple. 1500+ exposed tokens means nobody was rotating credentials

leak_sommelier

June 8, 2026 at 2:12 pm

1500 tokens and im willing to bet half of them are still active today. nobody learns