Copilot exposes private GitHub pages, some removed by Microsoft
arstechnica.com
FREE FOR THE TAKING Copilot exposes private GitHub pages, some removed by Microsoft Repositories once set to public and later to private, still accessible through Copilot. Dan Goodin Feb 27, 2025 6:43 pm | 6 Credit: Microsoft Credit: Microsoft Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreMicrosofts Copilot AI assistant is exposing the contents of more than 20,000 private GitHub repositories from companies including Google, Intel, Huawei, PayPal, IBM, Tencent and, ironically, Microsoft.These repositories, belonging to more than 16,000 organizations, were originally posted to GitHub as public, but were later set to private, often after the developers responsible realized they contained authentication credentials allowing unauthorized access or other types of confidential data. Even months later, however, the private pages remain available in their entirety through Copilot.AI security firm Lasso discovered the behavior in the second half of 2024. After finding in January that Copilot continued to store private repositories and make them available, Lasso set out to measure how big the problem really was.Zombie repositoriesAfter realizing that any data on GitHub, even if public for just a moment, can be indexed and potentially exposed by tools like Copilot, we were struck by how easily this information could be accessed, Lasso researchers Ophir Dror and Bar Lanyado wrote in a post on Thursday. Determined to understand the full extent of the issue, we set out to automate the process of identifying zombie repositories (repositories that were once public and are now private) and validate our findings.After discovering Microsoft was exposing one of Lassos own private repositories, the Lasso researchers traced the problem to the cache mechanism in Bing. The Microsoft search engine indexed the pages when they were published publicly, and never bothered to remove the entries once the pages were changed to private on GitHub. Since Copilot used Bing as its primary search engine, the private data was available through the AI chat bot as well.After Lasso reported the problem in November, Microsoft introduced changes designed to fix it. Lasso confirmed that the private data was no longer available through Bing cache, but it went on to make an interesting discoverythe availability in Copilot of a GitHub repository that had been made private following a lawsuit Microsoft had filed. The suit alleged the repository hosted tools specifically designed to bypass the safety and security guardrails built into the companys generative AI services. The repository was subsequently removed from GitHub, but as it turned out, Copilot continued to make the tools available anyway. Screenshot showing Copilot continues to serve tools Microsoft took action to have removed from GitHub. Credit: Lasso Lasso ultimately determined that Microsofts fix involved cutting off access to a special Bing user interface, once available at cc.bingj.com, to the public. The fix, however, didn't appear to clear the private pages from the cache itself. As a result, the private information was still accessible to Copilot, which in turn would make it available to the Copilot user who asked.The Lasso researchers explained:Although Bings cached link feature was disabled, cached pages continued to appear in search results. This indicated that the fix was a temporary patch and while public access was blocked, the underlying data had not been fully removed.When we revisited our investigation of Microsoft Copilot, our suspicions were confirmed: Copilot still had access to the cached data that was no longer available to human users. In short, the fix was only partial, human users were prevented from retrieving the cached data, but Copilot could still access it.The post laid out simple steps anyone can take to find and view the same massive trove of private repositories Lasso identified.Theres no putting toothpaste back in the tubeDevelopers frequently embed security tokens, private encryption keys and other sensitive information directly into their code, despite best practices that have long called for such data to be inputted through more secure means. This potential damage worsens when this code is made available in public repositories, another common security failing. The phenomenon has occurred over and over for more than a decade.When these sorts of mistakes happen, developers often make the repositories private quickly, hoping to contain the fallout. Lassos findings show that simply making the code private isnt enough. Once exposed, credentials are irreparably compromised. The only recourse is to rotate all credentials.This advice still doesnt address the problems resulting when other sensitive data is included in repositories that are switched from public to private. Microsoft incurred legal expenses to have tools removed from GitHub after alleging they violated a raft of laws, including the Computer Fraud and Abuse Act, the Digital Millennium Copyright Act, the Lanham Act, and the Racketeer Influenced and Corrupt Organizations Act. Company lawyers prevailed in getting the tools removed. To date, Copilot continues undermining this work by making the tools available anyway.Microsoft representatives didnt immediately respond to an email asking if the company plans to provide further fixes.Dan GoodinSenior Security EditorDan GoodinSenior Security Editor Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82. 6 Comments
0 Comments ·0 Shares ·75 Views