The web: a museum of our everyday lives – The Varsity's Magazine Archive

[dropcap]W[/dropcap]henever I make a post on social media, I wonder who it will reach — not just in the present, but in the future. Hundreds of years from now, will a researcher studying a hashtag on Instagram labelled ‘dog’ meticulously analyze the editing choices I made for a photo of my dog? Will historians piecing together the lives of millennial university students investigate my tweets? Will my social media accounts exist at all?

I learned while writing this piece that my curiosity might not be as weird or narcissistic as it sounds: archives of our generation’s social media and web pages are currently being compiled, investigated, and utilized across U of T and the world.

Like cave paintings in France or clay tablets from Mesopotamia, our social media posts are artifacts that will offer future historians insights into our daily lives, our society, and our politics. Our social media accounts are museums of our everyday lives, self-curated time capsules for future researchers. Such a large — and constantly expanding — collection of the thoughts and behaviours of ordinary people has never been available to researchers before. While this wealth of data will be invaluable to future researchers and historians, it also presents unique problems that don’t have conclusive solutions.

Preserving our digital data

Last December, volunteers gathered at U of T to archive climate change and environmental data that was at “high risk” of being deleted or of being made unavailable to the public under Donald Trump’s then-incoming presidency.

This “Guerrilla Archiving” event was done in collaboration with the Internet Archive’s “End of Term 2016” project. The Internet Archive is an online non-profit library that has recorded around 279 billion web pages for future historians to use. Its Canadian headquarters are located on the seventh floor of Robarts Library at UTSG.

Matt Price, a sessional lecturer at U of T’s Department of History, was one of the organizers of the event. Price explained it was important to copy these pages not just for historical reasons, but for the sake of documenting the truth: our understanding of climate and its relation to human health comes from these long stretches of data, which is why it’s imperative for them to stay publicly accessible.

Sam-chin Li is the Reference/Government Publications Librarian at Robarts Library who assisted volunteers at the archiving event. According to Li and Nich Worby, a Government Information and Statistics Librarian at Robarts Library, government information is now only available digitally and only on government websites. Without strong enforcement, this digital content could be at risk of being edited or deleted.

“That is why preserving government websites is not only essential for researchers, historians and scientists to do their work in the future, it is also critical for the opposition and public to keep government accountable,” wrote Li and Worby in an email.

According to Li and Worby, future historians and researchers can use archived web content to grasp a better understanding of our “history and heritage.” Platforms like Twitter reveal valuable information about the lives of ordinary people and contains relevant interactions between governments and citizens.

Wendy Duff, a professor and dean in the Faculty of Information at U of T, thinks our social media archives will be “incredibly valuable” to future researchers trying to understand our societies, and that they will be able to exclusively provide information about certain demographics. Primary sources from the past, like letters and diaries, came from a small, specific group of people: those who were literate and had the free time to write. Now, tons of different groups have access to the internet — and the ability to inadvertently share glimpses of their daily lives with future historians.

Piecing together our lives

Back in April 2010, the Library of Congress announced that it would preserve all public tweets — excluding private account information or deleted tweets, as well as pictures and links — for future generations and historians. In addition to tweets, the Library of Congress is also collecting online information about American and select international election candidates, select Facebook pages and news sites, and websites related to important historical events.

Price underscored that an archive of the lives of ordinary people has never been available to historians before. Historians of earlier centuries have a “scarcity of sources,” while historians of the early 21st century will be overwhelmed by sources. “Their problem is going to be that there’s so many documents that it’s going to be very difficult to sort them,” said Price.

“There will be a massive amount of records, and you will not be able to read them all,” agreed Duff.

For example, a researcher studying a president from the 1800s might have the ability to read every letter sent from the president’s office, but a researcher studying a president from the 21st century almost certainly could not read all the relevant emails and tweets sent out, Duff explained.

“So you will have to have electronic tools to be able to understand certain patterns.”

To sort through these sources, historians of the early 21st century will need to use computational methods — such as searching for keywords or more complex queries — as well as physical analyses of outside texts or sources, explained Price. For some media, like tweets, statistical analysis is the only way to interact with them. One tweet doesn’t reveal enough; historians would have to examine an aggregation of tweets and consult relevant Twitter threads in order to gauge enough context.

‘Fake news’ and self-curation

Our social media accounts are near-shrines of our idealized versions of ourselves: we only post edited photos, we only tweet our wittiest thoughts, and we only share our most ‘likeable’ life events.

A more insidious issue is the spread of misinformation — popularly known as ‘fake news’ — on platforms like Facebook and Twitter. The proliferation of false news stories and even fake first-hand accounts has been a pressing concern, especially over the past year. How will researchers hundreds of years from now be able to navigate our social media posts, all of which have varying degrees of reliability and bias?

Fiorella Foscarini, an associate professor and Director of Concurrent Registration Option at U of T’s Faculty of Information, says that fake news, forged records, and unreliable information has always been around, especially in the personal sphere or other environments with little outside control.

“What we are experiencing with social media, with the current proliferation of partial accounts or completely fabricated facts, is an interesting cultural phenomenon,” said Foscarini. “But it is also worrisome, because many people do not seem to have the critical instruments necessary to evaluate their sources.”

Archivists can prevent the spread of unreliable information by verifying the identity of the data at hand, providing resources for cross-examination, and monitoring the use of information to detect any modifications, Foscarini explained. However, outside of official archival spaces, these best practices might not be implemented.

Price explained that, regardless of genre, every source historians deal with has an “agenda,” and that historians have to learn to “read between the lines” of people’s self-presentations.

“Social media today are different in genre from the kinds of texts produced 100 or 200 years ago, in part because they offer a very strange hybrid of public and private with highly curated visions of oneself,” said Price.

Instead of looking for answers about what people were “really like,” future researchers should turn to social media to see how people curated themselves and the conventions for this self-curation — or, in Price’s words, what “kind of cultural representations were dominant in a particular moment.”

Price also said it would be a good idea to use tweets as ways to learn about how events or ideas “travelled and became meaningful to the historical actors,” rather than to learn what was really happening during an event or crisis.

Archiving social media

While trying to capture tweets about the Hong Kong Umbrella Movement as part of a school project, Alexander Herd and his fellow group members ran into a problem: some of these Twitter accounts were being blocked or shut down by authorities trying to censor the information and ideas being shared.

In order to prevent these deleted tweets from being lost forever, Herd — who completed a master’s degree in Library and Information Science at U of T in 2016 — and his group members placed them in a “dark archive.” Dark archives usually contain “sensitive information” about an ongoing event and can hold political tweets for 25–50 years.

Copyright was another issue for Herd’s group. Despite tweets being public record, “many users are not comfortable with their tweets being archived for eternity.”

“By extension, there has been discussion over who owns copyright of a tweet,” said Herd. According to Herd, there isn’t a clear resolution yet, but dark archiving tweets for a long period of time is a possible solution.

Herd and his group also consulted U of T librarians for their project, including Li and Worby.

According to Li and Worby, permanently sharing archived tweets is currently prohibited according to Twitter’s Developer Agreement. Researchers can only share ‘Tweet IDs’ with the public. Tweet IDs are unique, unsigned integers that contain a timestamp, worker number, and sequence number to help researchers gain the full content of tweets. However, deleted tweets are not available, which spurs yet another ethical issue when it comes to public figures and their ability to delete their tweets.

A ‘digital dark age’

In grad school, Price was part of a digital preservation effort at Stanford University that involved lucrative restriction enzyme patents. The archives at the library were given a collection of all the emails sent between the researchers working on this project in the 1970s.

However, the data was on eight-inch floppy disks. Price watched as the researchers moved the data to 5.25-inch floppy disks, and then to 3.5-inch floppy disks, and then to disk drives, and then to small hard drives before printing out the emails.

It’s inevitable that researchers hundreds of years from now will run into the same technological problems. It’s possible that the technology of the future may not be able to support our current technology or read our files — and leave future researchers in a ‘digital dark age.’

Duff said it would be a “huge detriment” if we were to lose all records of our digital data. Data loss is already happening every time someone accidentally deletes a file or breaks a phone full of pictures, Duff pointed out.

Price said that there could be some “massive social upheavals” in the future, especially in the wake of global climate change, which might compromise digital sources — which are currently stored in large buildings that depend on electricity to stay online.

“We know that paper can survive, sometimes for thousands of years, but there’s no evidence that digital data can survive in that way,” said Price.

Preserving digital data

Unlike books, web pages change “unpredictably and continuously,” explained Price, which means that archivists need to frequently make copies of these pages in order to truly capture our history.

“Archiving dynamic, interactive, ubiquitous digital information is much more challenging than archiving stable, almost unchanging analog records,” said Foscarini.

Despite the difficulties posed by technological obsolescence, Foscarini said that preserving websites and social media is no longer perceived as completely “unsurmountable.” The problem lies in ensuring these digital materials will still “make sense” hundreds of years from now.

“What kind of metadata do we need to retain, or to add, in order to provide enough context that would allow future generations to understand what that tweet or that meme meant to communicate?” said Foscarini.

Emily Maemura is a fourth-year PhD candidate at U of T’s Faculty of Information whose research centres on archiving and preserving the web.

“Long-term preservation of digital media is perhaps less like letters or newspapers, and more like audio-visual collections, which requires monitoring and attention since software and hardware become obsolete over time,” said Maemura.

Maemura is researching another challenge web archivers must face: deciding which social media posts to actually keep, since archiving the web takes time, money, and resources.

“I think there’s an assumption that it’s possible to capture ‘everything’ that’s out there,” said Maemura.

Maemura explained that this is an “impossible goal” because there is a finite amount of data that can be sustained and because technological limits make it difficult to capture certain kinds of dynamic data.

“So it’s important to be aware of, and be critical of, the kinds of selection processes that happen, who decides what is preserved, and who is responsible for the ongoing access and maintenance,” said Maemura.

So, next time you retweet a viral meme or make an online post, consider the possibility of researchers and archivists centuries from now studying it. What will it say about who we are today?