The U.S. Library of Congress is going forward with plans to permanently archive every public tweet ever sent and make them available for perusal.
According to Library spokesperson Jennifer Gavin, the project, plans for which were first announced two years ago but had since been rumored cancelled, is definitely happening, and the Library is actively figuring out how to make it work.
“The process of how to serve it out to researchers is still being worked out, but we’re getting a lot of closer,” Gavin told the Nieman Journalism Lab. “I couldn’t give you a date specific of when we’ll be ready to make the announcement.
“We began receiving the material, portions of it, last year. We got that system down. Now we’re getting it almost daily,” she also said. “And of course, as I think is obvious to anyone who follows Twitter, it has ended up being a very large amount of material.”
“Large amount of material” is definitely an understatement. Since its inception six years ago, Twitter has processed billions of tweets — and that number is rising, as the microblogging site currently manages some 400 million tweets a day according to its CEO. It will take a lot of petabytes to store all that material.
Twitter is fully supportive of the project, gladly donating the Library access to its archive and calling it “very exciting that tweets are becoming part of history.”
But they also clarified how, exactly, the tweets will be used.
“It should be noted that there are some specifics regarding this arrangement. Only after a six-month delay can the Tweets be used for internal library use, for non-commercial research, public display by the library itself, and preservation,” the site said in a blog post.
Gavin says that the archive will be available only at the Library’s site in Washington D.C. to anyone with a library card, and states that “my understanding is that at this time we do not intend to make it available by web,” although that may change.