6 minute read
Back Talk — A Lesson from the Middle Ages about Digital Preservation
Column Editor: Jim O’Donnell (University Librarian, Arizona State University) <jod@asu.edu>
I once wrote about a medieval manuscript copied in the sixth century A.D. and then lost for about 1000 years because somebody had tossed it up on the top of an armoire in their monastery library and just forgot it was there. When they took it down in 1711, it was good as new. The scholar who found it printed it, and it became part of the established record of the ancient world.
That’s one kind of preservation that we all know about, and it can be very successful: do everything by hand, do it on animal skins with durable ink, bind it carefully, leave it in an almost climate-controlled environment, and it’ll do just fine, even for a thousand years.
But I have a different story to tell about preservation in our time. It starts in the early 1990s when the publishing house of Chadwick-Healey saw the opportunity in digital information and began looking for works of established scholarship that could be digitized and made available through libraries to scholars and users around the world.1 They chose a remarkable publishing accomplishment of a former age: Patrologia Latina, a 225-volume comprehensive set of the Latin works of early Christian writers between the years 200 and 1200 A.D. The set was published and priced to sell to village priests in the mid19th century by an obsessive French clergyman, the Abbé Migne, whose career we know best because we have his Paris police dossier to go by.2 This set was for over 100 years the standard place one consulted to find anything written by Christian authorities in Latin before the year 1200. Modern editions have begun to replace and supplement what was in this great set, but on the other hand, some parts of it are still indispensable, and its role as a standard reference for 100 years has given it a continuing canonical authority in many respects.
When Chadwick-Healey digitized Patrologia, many of us were alarmed by publishers taking out-of-copyright, public-domain content and monetizing it by digitization, and we were alarmed because we’d never seen this done before. Now it happens all the time.
But Chadwick-Healey did a very good job on this complicated set of books. The volumes are reprint editions of these authors from as far back as the 16th century all the way to the 19th century, usually with an abundance of other Latin text in them by way of introductions, biographies, footnotes, appendices, and that sort of thing. Chadwick-Healey had the set digitized by double-keying offshore as we are now familiar with, and then proofread the living daylights out of that. I’ve been using this set myself for 30 years and I have to say they did a good job. I can search for unusual words, quotations from authors, authors who quote the Bible, etc.: specialty searches of very high value. I’ve reconciled myself to paying for public-domain content — and by now it’s not very expensive at all.
But publishers come and go. Readers of this column will mostly recognize that Chadwick-Healey sold up a long time ago to Bell and Howell, which in turn was swallowed up by ProQuest in 2001.
When ProQuest took over, it kept Patrologia exactly as it was, and those of us who were used to it continued exactly as we had. But about six months ago, ProQuest decided that it would incorporate the resource into their larger set of databases and make it available under a generic search engine interface that it uses for many of their collections.
When I began to use the new interface, I was mightily frustrated. Searches that should have brought up 1,000 hits came up, empty, but worse than that. The specialty searches were hard and sometimes impossible to do. For example, if I want to search for an unusual theological term in medieval texts, I need to make sure that the search is confined to texts actually written between 200 and 1200. But the complete database is full of all of those introductions, conclusions, etc. In the former Patrologia database, it was easy to do a search, restricted to medieval authors only. In the new ProQuest interface, it’s somewhere between difficult and impossible.
Other scholars are equally frustrated, some giving up entirely. Luckily, I am now in the library profession myself and have ProQuest representatives I know and can talk to, so I raised the question. The good news is they are taking the issue seriously, and we have hopes that there will be restoration of functionality sometime soon. Where the searches come up empty, there is a serious glitch in the structure of the database and the searching, and they need desperately to work on that. The specialty searches still need to be reproducible. When a user is doing the searches and getting frustrating results, they have no way of knowing the difference between a bad interface and a glitch, and they just go away, absolutely frustrated.
There are a couple of lessons from this. First preserving digits is possible. Yes, it’s a little scary when an important resource changes owners and we do have to worry about what happens at some future day — it will happen — when ProQuest is no more.
We hope that someone else cares by that time to take over the curation of the past that ProQuest has accumulated, but we also have strategies for preservation from our colleagues of the Internet Archive, at LOCKSS, at CLOCKSS, at Portico, and similar initiatives to fall back on. We can hope.
The scary lesson I learned from this experience is that it’s not all about preserving the 1s and zeroes in the right order. The reason digital Patrologia could be so useful was that the people who created it understood the print resource they were emulating and took the time and trouble to reproduce its functionality, first in the way they structured the metadata and second in the way they structured the search engine and its interface to its users. Those things also need to be preserved if digital preservation is to be effective, but they require a much more intentional and sophisticated assessment of the data.
I spend enough time already worrying whether folks will continue to learn Latin to pursue studies that I think are important. But now I get to worry about the ownership, the management, the transition, the care, and the attention to user experience and user interface that makes the critical difference between having a whole lot of digits on your hard drive, and having a genuinely useful resource.
I began this column hoping I could end by saying we’ll know if we’re successful with digital preservation if we check back in 950 years, but now I’m afraid I have to say, check back in 10 years and let’s hope we get lucky. I’d rather not have to rely on luck.
Endnotes
1. Sir Charles Chadwyck-Healey has now published his memoir as a leader in digital publishing: Publishing for Libraries: At the Dawn of the Digital Age (2020).
2. His biography is by Howard Bloch, God’s Plagiarist (1994).