Internet Archive played crucial role in tracking shady CDC data removals
arstechnica.com
"Deletion disobedience" Internet Archive played crucial role in tracking shady CDC data removals Internet Archive makes it easier to track changes in CDC data online. Ashley Belanger Feb 4, 2025 4:18 pm | 30 Credit: alengo | E+ Credit: alengo | E+ Story textSizeSmallStandardLargeWidth *StandardWideLinksStandardOrange* Subscribers only Learn moreWhen thousands of pages started disappearing from the Centers for Disease Control and Prevention (CDC) website late last week, public health researchers quickly moved to archive deleted public health data.Soon, researchers discovered that the Internet Archive (IA) offers one of the most effective ways to both preserve online data and track changes on government websites. For decades, IA crawlers have collected snapshots of the public Internet, making it easier to compare current versions of websites to historic versions. And IA also allows users to upload digital materials to further expand the web archive. Both aspects of the archive immediately proved useful to researchers assessing how much data the public risked losing during a rapid purge following a pair of President Trump's executive orders.Part of a small group of researchers who managed to download the entire CDC website within days, virologist Angela Rasmussen helped create a public resource that combines CDC website information with deleted CDC datasets. Those datasets, many of which were previously in the public domain for years, were uploaded to IA by an anonymous user, "SheWhoExists," on January 31. Moving forward, Rasmussen told Ars that IA will likely remain a go-to tool for researchers attempting to closely monitor for any unexpected changes in access to public data.IA "continually updates their archives," Rasmussen said, which makes IA "a good mechanism for tracking modifications to these websites that haven't been made yet."The CDC website is being overhauled to comply with two executive orders from January 20, the CDC told Ars. The Defending Women from Gender Ideology Extremism and Restoring Biological Truth to the Federal Governmentrequires government agencies to remove LGBTQ+ language that Trump claimed denies "the biological reality of sex" and is likely driving most of the CDC changes to public health resources. The other executive order the CDC cited, theEnding Radical And Wasteful Government DEI Programs And Preferencing, would seemingly largely only impact CDC employment practices.Additionally, "the Office of Personnel Management has provided initial guidance on both Executive Orders and HHS and divisions are acting accordingly to execute," the CDC told Ars.Rasmussen told Ars that the deletion of CDC datasets is "extremely alarming" and "not normal." While some deleted pages have since been restored in altered versions, removing gender ideology from CDC guidance could put Americans at heightened risk. That's another emerging problem that IA's snapshots could help researchers and health professionals resolve."I think the average person probably doesn't think that much about the CDC's website, but it's not just a matter of like, 'Oh, we're going to change some wording' or 'we're going to remove these data," Rasmussen said. "We are actually going to retool all the information that's there to remove critical information about public health that could actually put people in danger."For example, altered Mpox transmission data removed "all references to men who have sex with men," Rasmussen said. "And in the US those are the people who are not the only people at risk, but they're the people who are most at risk of being exposed to Mpox. So, by removing that DEI language, you're actually depriving people who are at risk of information they could use to protect themselves, and that eventually will get people hurt or even killed."Likely the biggest frustration for researchers scrambling to preserve data is dealing with broken links. On social media, Rasmussen has repeatedly called for help flagging broken links to ensure her team's archive is as useful as possible.Rasmussen's group isn't the only effort to preserve the CDC data. Some are creating niche archives focused on particular topics, like journalist Jessica Valenti, who created an archive of CDC guidelines on reproductive rights issues, sexual health, intimate partner violence, and other data the CDC removed online.Niche archives could make it easier for some researchers to quickly survey missing data in their field, but Rasmussen's group is hoping to take next steps to make all the missing CDC data more easily discoverable in their archive."I think the next step," Rasmussen said, "would be to try to fix anything in there that's broken, but also look into ways that we could maybe make it more browsable and user-friendly for people who may not know what they're looking for or may not be able to find what they're looking for."CDC advisers demand answersThe CDC has been largely quiet about the deleted data, only pointing to Trump's executive orders to justify removals. That could change by February 7. That's the deadline when a congressionally mandated advisory committee to the CDC's acting director, Susan Monarez, asked for answersin anopen letterto a list of questions about the data removals."It has been reported through anonymous sources that the website changes are related to new executive orders that ban the use of specific words and phrases," their letter said. "But as far as we are aware, these unprecedented actions have yet to be explained by CDC; news stories indicate that the agency is declining to comment."At the top of the committee's list of questions is likely the one frustrating researchers most: "What was the rationale for making these datasets and websites inaccessible to the public?" But the committee also importantly asked what analysis was done "of the consequences of removing access to these datasets and website" prior to the removals. They also asked how deleted data would be safeguarded and when data would be restored.It's unclear if the CDC will be motivated to respond by the deadline. Ars reached out to one of the committee members, Joshua Sharfsteina physician and vice dean for Public Health Practice and Community Engagement at Johns Hopkins Universitywho confirmed that as of this writing, the CDC has not yet responded. And the CDC did not respond to Ars' request to comment on the letter.Rasmussen told Ars that even temporary removals of CDC guidance can disrupt important processes keeping Americans healthy. Among the potentially most consequential pages briefly removed were recommendations from the congressionally mandated Advisory Committee on Immunization Practices (ACIP).Those recommendations are used by insurance companies to decide who gets reimbursed for vaccines and by physicians to deduce vaccine eligibility, and Rasmussen said they "are incredibly important for the entire population to have access to any kind of vaccination." And while, for example, the Mpox vaccine recommendations were eventually restored unaltered, Rasmussen told Ars that she suspects that "one of the reasons" preventing interference currently with ACIP is that it's mandated by Congress.Seemingly ACIP could be weakened by the new administration, Rasmussen suggested. She warned that Trump's pick for CDC director, Dave Weldon, "is an anti-vaxxer" (with a long history of falsely linking vaccines to autism) who may decide to replace ACIP committee members with anti-vaccine advocates or move to dissolve ACIP. And any changes in recommendations could mean "insurance companies aren't going to cover vaccinations [and that] physicians will not recommend vaccination." And that could mean "vaccination will go down and we'll start having outbreaks of some of these vaccine-preventable diseases.""If there's a big polio outbreak, that is going to result in permanently disabled children, dead childrenit's really, really serious," Rasmussen said. "So I think that people need to understand that this isn't just like, 'Oh, maybe wear a mask when you're at the movie theater' kind of CDC guidance. This is guidance that's really fundamental to our most basic public health practices, and it's going to cause widespread suffering and death if this is allowed to continue."Seeding deleted data and doing science to fight backOn Bluesky, Rasmussen led one of many charges to compile archived links and download CDC data so that researchers can reference every available government study when advancing public health knowledge."These data are public and they are ours," Rasmussen posted. "Deletion disobedience is one way to fight back."As Rasmussen sees it, deleting CDC data is "theft" from the public domain and archiving CDC data is simply taking "back what is ours." But at the same time, her team is also taking steps to be sure the data they collected can be lawfully preserved. Because the CDC website has not been copied and hosted on a server, they expect their archive should be deemed lawful and remain online."I don't put it past this administration to try to shut this stuff down by any means possible," Rasmussen told Ars. "And we wanted to make sure there weren't any sort of legal loopholes that would jeopardize anybody in the group, but also that would potentially jeopardize the data."It's not clear if some data has already been lost. Seemingly the same user who uploaded the deleted datasets to IA posted on Reddit, clarifying that while the "full" archive "should contain all public datasets that were available" before "anything was scrubbed," it likely only includes "most" of the "metadata and attachments." So, researchers who download the data may still struggle to fill in some blanks.To help researchers quickly access the missing data, anyone can help the IA seed the datasets, the Reddit user said in another post providing seeding and mirroring instructions. Currently dozens are seeding it for a couple hundred peers."Thank you to everyone who requested this important data, and particularly to those who have offered to mirror it," the Reddit user wrote.As Rasmussen works with her group to make their archive more user-friendly, her plan is to help as many researchers as possible fight back against data deletion by continuing to reference deleted data in their research. She suggested that effortdoing science that ignores Trump's executive ordersis perhaps a more powerful way to resist and defend public health data than joining in loud protests, which many researchers based in the US (and perhaps relying on federal funding) may not be able to afford to do."Just by doing things and standing up for science with your actions, rather than your words, you can really make, I think, a big difference," Rasmussen said.Ashley BelangerSenior Policy ReporterAshley BelangerSenior Policy Reporter Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience. 30 Comments
0 Comments
·0 Shares
·51 Views