Microsoft AI Researchers Accidentally Leak Sensitive Data on GitHub
Cloud Security Startup Discovers Terabytes of Confidential Information Exposed
Microsoft’s AI research division unintentionally revealed a substantial amount of sensitive data, including private keys and passwords, while sharing open-source training data on GitHub. Cloud security firm Wiz made this discovery during its ongoing investigation into the inadvertent exposure of cloud-hosted data.
The GitHub repository in question offered open-source code and AI models for image recognition. Users were instructed to download these models from an Azure Storage URL. However, Wiz found that this URL was mistakenly configured to grant access to the entire storage account, unintentionally exposing additional private data.
Read Also: Microsoft Phases Out WordPad in Latest Windows Update
This accidental exposure included 38 terabytes of sensitive information, such as personal backups from two Microsoft employees’ computers. The data also contained personal information like Microsoft service passwords, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees.
The URL had been misconfigured since 2020, allowing potential access to full control, not just read-only permissions. This glaring oversight occurred due to an overly permissive shared access signature (SAS) token within the URL, which was used to provide access to Azure Storage account data.
Wiz promptly reported its findings to Microsoft on June 22, and Microsoft revoked the SAS token two days later, completing its investigation by August 16. Fortunately, Microsoft stated that no customer data or other internal services were compromised as a result of this incident.
“AI unlocks huge potential for tech companies,” Wiz co-founder and CTO Ami Luttwak told TechCrunch.
As a response, Microsoft has expanded GitHub’s secret spanning service, which now monitors all public open-source code changes for any exposed credentials or secrets, including SAS tokens with overly permissive settings.
This incident underscores the importance of robust security measures when handling vast amounts of data, especially in the realm of AI development, where data exposure can have far-reaching consequences.