iQ The RIMPA Quarterly Professionals Magazine | September 2021

Page 28

SPECIAL FEATURE NEW DEVELOPMENTS AND INNOVATIONS

HOW DO WE KNOW THAT THE INFORMATION LINKED IS THE SAME AS THE TIME OF TWEETING OR THAT THE URL EVEN STILL EXISTS. However, the challenge is that a tweet contains other information alongside the text and metadata. Are the attached images or a video, for example? The JSON helpfully contains links to these objects at various qualities so you can choose which to download (usually the best quality). The files are typically small, except for where Twitter is used for live streaming where the video can be many gigabytes. These media files need to be permanently associated with the JSON and metadata, bringing us to the concept of the “multi-part asset”. This was introduced in Preservica v6 and combines all the metadata and files associated with a single piece of information into an atomic asset that must be handled as a whole.

The next extraction challenge is where the tweet contains a URL link to an external web page. This is contained in the JSON and can be shown in the render tool but introduces the concept of link rot – how do we know that the information linked is the same as the time of tweeting or that the URL even still exists. It is possible to take a snapshot of the web page, either as an image, PDF or WARCiv file, and to add that to the multi-part asset but what are the copyright issues relating to this? This remains to be solved. At Preservica we have a proof of concept running, acquiring tweets as multi-part assets, and creating links between tweets for conversation tracking, for example for retweets and quotes. Whilst extraction changes the information it can be done almost immediately after the tweet is posted and is comprehensive and appears to be identical to the original tweet. Of course, the APIs themselves are now a critical part of the process. They are licensed and often have stringent and yet ambiguous terms and conditions, and these can vary at zero notice. At Preservation and Archiving Special Interest Group (PASIG) 2019, Amelia Acker of The University of Texas at Austin explored this in more detail and showed how the APIs themselves and certainly their terms and conditions should be preserved alongside the content extracted.

i API is the acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other. ii Lossy compression is a way of getting even smaller squeezed files than lossless. This technology strips out data it has been programmed to regard as either unnecessary or redundant. — Ron Goldberg iii JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications (e.g., sending some data from the server to the client, so it can be displayed on a web page, or vice versa). iv WARC is a file format for the long term preservation of digital data. It stores web pages and other digital resources including images and meta information in their original source code.

28 | iQ September 2021

So, Twitter preservation has introduced some interesting digital preservation concepts. It has showed that a good quality API can be very useful in exporting a comprehensive copy of the information held within the system so it can be re-used and trusted. It has also introduced the concept of a multi-part asset which contains multiple files which combine to present a single indivisible piece of information.

ABOUT THE AUTHOR

Jon Tilbury is Chief Innovation Officer at Preservica and is responsible for ensuring innovation and reliability in Preservica’s Digital Preservation technology. After graduating from Oxford University, Jon has over 30 years' experience in the IT industry, working in development, design, managerial and leadership roles. Starting on the original Digital Preservation research projects, Jon has been a key part of the evolution of this sector and has overseen the creation of Preservica’s platform as a product, before founding Preservica as an independent business. Jon brings a passion for establishing Digital Preservation as a ubiquitous technology embedded into daily life. Outside Preservica, Jon is a keen photographer, cyclist and traveller.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.