
The man trying to capture the internet before it disappears
www.vox.com
Government websites have undergone massive changes since President Donald Trump returned to office.Some of the changes are routine like swapping out the current president and vice president for their predecessors on the White Houses official site.But other changes go much further. Several sites like USAID.gov, ReproductiveRights.gov, and the Spanish-language version of WhiteHouse.gov have gone offline. Remaining sites have been scrubbed of certain data and terminology in order to comply with Trumps executive orders targeting gender ideology and DEI. Its an acceleration of a problem known as digital decay or linkrot. Large quantities of the internet are disappearing as media outlets go under, companies upgrade their web infrastructure, or organizations take down information they believe is no longer valuable or relevant. A recent Pew Research Center study found that 38 percent of webpages that existed in 2013 are no longer available. Because so much of our culture now happens online, losing those pages means losing part of the record of ourselves. Mark Graham, director of the Wayback Machine, joined Sean Rameswaram on Today, Explained to talk about digital decay, what his team is doing to combat the problem both generally and during Trumps second term, and why internet preservation is so important. Below is an excerpt of the conversation, edited for length and clarity. Theres much more in the full podcast, so listen to Today, Explained wherever you get podcasts, including Apple Podcasts, Spotify, and Stitcher.For people who have maybe stumbled upon your website but dont really know what you do, can you give them a sense of the things that you guys have saved in 30 years? Where do I begin? Its like walking into a very large library and saying, Show me your favorite book. Last year, there was a big news story that MTV News was shut down. The founding editor wrote about it on LinkedIn, and there were a lot of other editors talking about it: My God, all of our articles are gone. Theyre missing. And I just casually waded into the conversation and went, Hi, um check the Wayback Machine. They were like, Oh my God, you guys got it all. What did you do? We didnt do anything when the site went down because weve been doing our job all along. Weve been working to archive the public web, as its published, on an ongoing continuous basis. If we have to start paying attention to something after its gone down, that means we screwed up. So what are you guys doing in advance of these sites going down to make sure that people can find out what Everlast was singing about in 2004? We set our web crawlers and archiving software out on a mission every day to identify and to download web pages and related web-based resources. We bring in millions and millions of URLs every day that are signals of where new material is being published on the web. And we make sure that we archive all of those URLs and all the web pages associated with those URLs. Then, we look at those pages, and we identify links to other pages. And then we go to those pages and we archive them. Thats where you get this metaphor of crawling like a spider throughout this web. The net result of it is that we add more than a billion archived URLs to the Wayback Machine every day. This material thats added to the Wayback Machine is indexed and its immediately available to people who go to web dot archive.org and enter in a URL. They are then able to see a history of archives that we have of that web page that was available from the URL at any given time. Thats where you get this metaphor of crawling like a spider throughout this web. I want to talk about government websites, because thats the reason were having this conversation today. I think most people probably think the government will take care of archiving government websites. But here we are in a new administration and websites are disappearing, coming back online, and people are worried. When you an archivist of the internet see this happening, how do you react to that? Is it better or worse than regular, non-governmental websites going offline?Well, as an American, my tax dollars help pay for some of this stuff and much of it is a benefit to people. Certainly my first reaction is: That might not be such a good thing. I do want to underscore that the National Archives and Records Administration does do archiving as well, and the Library of Congress. So its not like were the only game in town. But for whatever reason, we seem to be one of the main players in the space of trying to archive much of the public web, including and right now, especially US government websites and making those archives available in near real time. Were you caught off-guard when you saw the new administration removing web pages, removing websites?In some respects, this is normal and expected. Its whats happened, frankly, for each administration in the time that weve been working on this effort. I mean, look, its under new management, right? You wouldnt expect the WhiteHouse.gov website under any new presidential administration to be the same as it was before. Youre going to see the bios of the people that are part of the current administration, the news of that administration. We go out of our way to try to anticipate the frequency in which web pages should be archived so that we have a pretty good shot at getting those changes. Youre saying that the WhiteHouse.gov site obviously changes administration to administration. I think to some degree people understand that: Joe Bidens administration probably wouldnt have been posting trolly Valentines about immigration to their Instagram account a year ago. But what were seeing here is websites that people need websites that record public health information going offline briefly, permanently, what have you. Is that a different degree of erasing the historical record or messing with the historical record than weve seen? Thats true. It is. Its different. Its certainly different in terms of the number [of changes] seemingly! Were still in the early stages of this administration, but yeah, Id say on the face of it, youre right. Historically, we havent seen major US government websites taken offline like we did, for example, with regard to USAID. But Im going to leave that kind of analysis to others, and really just focus on trying to archive the material.The Wayback Machine and the Internet Archive are mostly funded through donations: the generosity of people, institutions, even governments. Is that going to be enough to archive the internet to the extent that future generations will want and need?Enough is a very subjective term. As an archivist, for me, its never enough. I dont know, and no one knows, what is going to be of use, value, importance in the future maybe even the near future of tomorrow, much less the very far-off future. Since millions of people use our site on a daily basis, we get a lot of feedback from them. It motivates us, but it also helps direct us and inspires us to continuously try to do a better job at being the best library that we can be.As an archivist, for me, its never enough.You guys have been at this for nearly three decades. Certainly, youve saved a lot of stuff. Certainly, a lot of stuff has fallen through the cracks. I wonder, is there something that slipped through the cracks that might suggest to our audience what is lost when we cant archive to the extent we want to, or need to?Okay, I got one! This is just in recent history. Apparently there was a page up on the CDC website about bird flu last week that was only up for a few minutes, and no one got it. And by losing that fleeting web page, that one maybe minor, maybe major web page about bird flu on the CDC website, what are we losing?Well, were losing part of the story, right? Were losing part of our understanding of the evolution of arguably a significant health issue. We dont know where this is going to go. I guess thats the other point, right? You dont know now what is going to be very important in the near or longer term. In the time of Martin Luther, there were raging debates. Much of that debate took the form of things that were written on pamphlets. The pamphlets at the time were considered of little value: People read them and they shared them, but they didnt necessarily save them. So today, a scholar of that time or someone like me, who is strangely curious what I would give for a collection of those pamphlets. You are comparing, in a way, a CDC website to the Protestant Reformation. But I think you mean it, dont you? I do! Because I dont know. One really cant know without the benefit of the long historical view. Thats not something that we have access to today. Why? Because we dont have a real time machine.See More:
0 Commentarii
·0 Distribuiri
·68 Views