📈 Get daily crypto insights that make you smarter about your money

Over 1,500 Hugging Face API Tokens Exposed in Supply Chain Security Breach

Security researchers at Lasso Security have uncovered a major vulnerability on Hugging Face, the popular open-source AI and machine learning platform, revealing that more than 1,500 exposed API tokens left hundreds of organizations — including Meta, Microsoft, and Google — open to devastating supply chain attacks. The disclosure, made public on December 4, 2023, highlights a growing intersection between artificial intelligence infrastructure and cybersecurity threats at a time when Bitcoin trades near $42,000 and the broader crypto market surges past $1.5 trillion in total capitalization.

The Exploit Mechanics

The breach was not the result of a sophisticated zero-day or a novel attack vector. Instead, Lasso Security researchers used straightforward substring searches across public repositories on both Hugging Face and GitHub to locate hardcoded API tokens that developers had accidentally committed to public code. By brute-forcing the first two characters of token prefixes, researchers bypassed GitHub’s 100-result search limit and systematically collected thousands of exposed credentials. Using the Hugging Face whoami API endpoint, they validated each token, identified its owner, mapped organizational memberships, and catalogued permission levels. The scale was staggering: over 1,500 tokens were confirmed active, granting access to 723 organizational accounts on the platform.

Affected Systems

Of the 1,500-plus tokens discovered, 655 carried write permissions — the most dangerous level of access. These write-enabled tokens affected 77 organizations, including some of the most prominent names in AI development. Meta’s Llama 2 project, EleutherAI’s Pythia models, and BigScience Workshop’s Bloom language model were all potentially compromised. Hugging Face hosts more than 500,000 AI models and 250,000 datasets, making it a critical piece of global AI infrastructure. The exposed tokens could have allowed attackers to modify datasets downloaded tens of thousands of times per month, poison training data for widely-used models, or steal over 10,000 private AI models outright. The three affected organizations — Meta, EleutherAI, and BigScience Workshop — collectively serve models with millions of downloads, meaning any successful tampering would have cascaded across countless downstream applications and users.

The Mitigation Strategy

Upon notification, the affected organizations moved quickly to revoke the exposed tokens and close the access paths. Hugging Face itself had already deprecated its organization API tokens (org_api), which researchers found could still be exploited for read access and billing manipulation despite the deprecation. The platform’s Python library was patched to add token-type verification in the login function. Researchers also identified a weakness where the deprecated tokens could bypass intended restrictions with minor modifications to the login flow. Hugging Face’s own secret scanning tool — similar to GitHub’s Secret Scanning feature — is available to alert users when tokens are hardcoded into projects, but the tool clearly was not being used by enough developers. The research underscores that platform-level scanning tools are necessary but insufficient when developers fail to adopt them consistently.

Lessons Learned

This incident maps directly to three of OWASP’s Top 10 risks for Large Language Models: supply chain vulnerabilities, training data poisoning, and model theft. The exposed tokens represented all three threat vectors simultaneously. Data poisoning — where an attacker subtly corrupts the datasets used to train AI models — is among the most critical threats facing the AI ecosystem today. If attackers had modified datasets that designate types of network traffic, for example, the downstream effects could include misallocated resources and network performance degradation across enterprise environments. The fact that this was discovered by researchers rather than malicious actors is fortunate but should not breed complacency. Hardcoding API tokens in public repositories remains one of the most common and preventable security failures in software development, and the AI sector is not immune.

User Action Required

For developers and organizations using Hugging Face or similar AI platforms, immediate steps are essential. First, audit all repositories for hardcoded API tokens and rotate any credentials that may have been exposed. Second, enable platform-provided secret scanning tools on all projects. Third, implement least-privilege token policies — most operations do not require write permissions. Fourth, for organizations managing high-profile models or datasets, establish continuous monitoring for unauthorized access attempts. As the AI industry continues its rapid expansion — with Bitcoin hovering around $41,980 and Ethereum at $2,243 on this date, reflecting a broader risk-on environment — the security of AI supply chains will only grow more critical. The tools and practices for securing these platforms exist; adoption is the missing piece.

Disclaimer: This article is for informational purposes only and does not constitute financial or investment advice. Always conduct your own research before making any financial decisions.

🌱 FOR BUSINESSES BitcoinsNews.com
Reach 100K+ Crypto Readers
Sponsored content, press releases, banner ads, and newsletter placements. Put your brand in front of Bitcoin's most engaged audience.

10 thoughts on “Over 1,500 Hugging Face API Tokens Exposed in Supply Chain Security Breach”

  1. brute forcing the first two chars of a token prefix to bypass GitHub search limits is genuinely clever. simple but effective

  2. 655 tokens with write access. let that sink in for a second. someone could have pushed malicious code to models downloaded millions of times

    1. Tomer K. and the worst part is most of those tokens were months old. developers just commit them once and forget. secret rotation is basically nonexistent in ML

      1. secret rotation in ML is basically nonexistent because researchers are not engineers. they publish notebooks not production code

    2. exactly this. write access to org repos means you can backdoor model weights. nobody would notice until thousands of downloads later

    3. null_pointer_42

      655 tokens with write access to org repos including Meta and Microsoft. one leaked credential and you poison the entire ML supply chain downstream

  3. brute forcing the first two chars of a token prefix to bypass GitHub limits is embarrassingly simple. 1500+ exposed tokens means nobody was rotating credentials

Leave a Comment

Your email address will not be published. Required fields are marked *

BTC$60,768.00-3.0%ETH$1,617.36-2.9%SOL$67.56-2.8%BNB$565.05-2.1%XRP$1.07-3.0%ADA$0.1472-3.2%DOGE$0.0759-4.0%DOT$0.8806-2.8%AVAX$6.40-0.2%LINK$7.39-2.5%UNI$2.93+0.6%ATOM$1.64-3.7%LTC$41.19-1.2%ARB$0.0755-3.3%NEAR$1.94-1.6%FIL$0.7461-4.8%SUI$0.6786-2.8%BTC$60,768.00-3.0%ETH$1,617.36-2.9%SOL$67.56-2.8%BNB$565.05-2.1%XRP$1.07-3.0%ADA$0.1472-3.2%DOGE$0.0759-4.0%DOT$0.8806-2.8%AVAX$6.40-0.2%LINK$7.39-2.5%UNI$2.93+0.6%ATOM$1.64-3.7%LTC$41.19-1.2%ARB$0.0755-3.3%NEAR$1.94-1.6%FIL$0.7461-4.8%SUI$0.6786-2.8%
Scroll to Top