9 minute read

Collecting from the Web: Collection Development Policy in the Born-Digital Universe

By Carol A. Mandel (Dean Emerita, New York University Division of Libraries, and Distinguished Presidential Fellow, Council on Library and Information Resources) <carol.mandel@nyu.edu>

If, before the events of the last two years, there were any lingering doubts that the wild west of the web, despite all its click-bait and conspiracy theories, held a treasure chest of critical documentary content, those concerns have surely been dispelled. The global and personal stories of the COVID-19 pandemic, painful videos exposing racial injustice, and the stunning evidence presented to the House select committee investigating the January 6, 2021 insurrection at the U.S. Capitol are but the most prominent examples of the historical, social, and cultural evidence that web archiving will carry to the immediate and long-term future.

We, as librarians and as individuals in society, are indebted to the vision and foresight of the Internet Archive and of many national libraries for taking up the challenge of capturing, managing, and providing enduring access to segments of that otherwise ephemeral evidence. What we as librarians have not yet done is fully consider what the vast, complex cornucopia of born-digital content in the web can mean for the concept of our own library collecting and collections. If libraries have a responsibility to ensure society’s access to a world of knowledge, this is not content to be ignored. Librarians have been leaders in the transition from print to electronic resources; in transitioning collecting from “just in case” to “just in time”; in creating libraries that are service-based rather than collection-based;1 in moving to new models of open access; in working to de-colonize our approach to access and collections. The benefits of this work have been strikingly evident in the last two years, when access to the fullest possible range and depth of online content has meant survival for a locked-in world. Yet even as we are immersed in a universe of born-digital content that far expands the bounds of traditional electronic publishing, we have not yet stretched our arms fully around the potential role of that content in libraries. (The idiom “getting arms around” is apt here in all its meanings: embracing, understanding, managing.) It’s time, as the old Apple ads used to say, to think different.

In a paper issued in late 2019, I discussed the need to frame the nature and challenges of born-digital content into issues that, in turn, can lead to strategies for access, collection, and preservation.2 One significant component of that task is to match the opportunities and complexities of web content to areas relevant to the mission, priorities, and capacities of a range of different types of libraries. This issue of Against the Grain devoted to web archiving is a welcome opportunity to begin that considered look. A key step is to consider whether the concept of “archiving the web” has created a too daunting connotation, one that implies a vast archival task that falls only into the realm of a national library.

“The web” can be viewed as an overarching, if messy, entity. But as my friend and colleague, Clifford Lynch, describes so well in his companion article in this journal, “the web” is not what most of us casually think it is, and not what it used to be. It is not a monolith, but rather more than a billion disparate sites, many containing ever more items in ever more forms, accessed through the portal of a web browser. From a content perspective, a web portal can be viewed as the world’s largest third-party distributor. It is the pointer to most networked born-digital content — the good, the bad, and the ugly. And much of the good content is worthy of “Collecting strategies have consideration for library collecting. never remained For libraries, web harvesting is an important tool by which valuable — in fact, essential — born-digital constagnant as new forms of tent can be targeted and collected.3 content and

A small number of libraries have new distribution begun to add web collecting to their models have development of special collections strength.4 If a library has deep strength in, say, 20th century stuemerged. it is time to embrace dent protest movements, collecting a new phase.” from the web is essential to continue that collection into the 21st century. But how “special” need a collection be to merit the addition of digital-only content? In a century when so much substantive and valuable material is available by harvesting it from the web, the red-line distinction between “general” and “special” is not meaningful as a collecting determination. Special collections grew up as separate entities because of the need to physically segregate and safeguard rare material.5 Brilliant and active special collections librarians have expanded the reach and function of their work, but web collecting need not — and should not — be bounded by the special collections realm.

This is not to say that collecting from the web is, as yet, as efficient as core collecting streamlined by processes such as approval plans and demand-driven acquisitions. Web collecting requires selector time and expertise to establish and monitor profiles (as do many approval plans), along with (relatively modest) operational support to participate in a program such as Archive-It. But for any library that serves its community by shaping collections — at least around the edges of a “readymade” plan — selector expertise is always part of the picture. And on the support side, libraries have triumphed in the 21st century through their ability to retool operations. Over the last 50 years, libraries have adapted to many new forms of collecting, adding images, video, databases. Selectors have grappled with, to use some now dated terminology, grey literature, technical reports, government documents, and more recently open access publications. Shared content has been expanded through microfilming and then digitization, and through

collaborative acquisition. Collecting strategies have never remained stagnant as new forms of content and new distribution models have emerged. It is time to embrace a new phase.

Here are just a few examples where collecting from the web enables rich opportunities for materials that strengthen and enhance collections. • The Arts. If your collections include interest in contemporary artists or music, the websites of artists and composers are a treasure trove. Regional collecting often includes capturing information about the work of local artists, composers, theaters, etc. and there is no better source than the web. • Poetry. Only a handful of well-known poets publish on paper. Poetry journals are almost entirely online and open access — and as labors of love by their editors, their sites are susceptible to “going out of print.” If your library collects poetry, it is not possible to ignore web harvesting. • Area studies. Area studies librarians have long struggled to capture ephemeral and hard-to-find publications. Today web content illuminates countless aspects of society and culture. As older area studies documents are being digitized and shared on the web,6 are not the online only statistical documents of 2021 as valuable, in the near and long term, as the statistical records of 1911? • Hyper-local. Public libraries, and many other institutions, have long taken responsibility to serve as a source of current and historical town and regional information. From hometown newspapers, to local exhibition catalogs, to community cookbooks, such local content has been invaluable both as a community resource and as essential historical and cultural documentation. Most of this content is now published via the web, and will be ephemeral without local/regional library intervention. • Partnerships with scholars. Scores of exceptional digitization projects have resulted through partnerships between scholars, whose research and expertise exposes important content, and libraries that have translated that content into widely accessed online collections. From medieval manuscript collections to ethnographic video, a world of rich and now widely studied content has been delivered through these collaborations. Today there are scholars in social and cultural studies who know where the web treasure resides for their field; partnerships with those scholars today can transform teaching and research for the future. • Diverse perspectives. Web content offers an unprecedented source of grass-roots contributions and unfiltered voices. Community archiving and story-telling projects are creating extraordinary resources of past and present experience and cultural perspective — content that we now recognize must be represented in library collections.

Given the content value that lies just within these few examples, why then has web collecting not become a widespread practice? Has initiating and integrating the practice into operations seemed too daunting? Have time pressures on selectors caused them to avoid any handcrafted selecting? Adopting web collecting is non-trivial, but our track record clearly demonstrates that barriers can be addressed if there is a will to collect this material. Wrangling grey literature and government documents was non-trivial. Creating institutional repositories was non-trivial. The transitions to electronic licensing and to digitization were non-trivial. At each stage in our transformations there has been help from collaborative projects, grant funds, regional hubs, centers of excellence, consortia, and sheer mission-driven enthusiasm. We are not prepared, yet, for all of the complex content that Clifford Lynch describes, but we are ready for next steps. Thanks to pioneering work by the Internet Archive, national libraries, and many forward-looking others, tools and services for collecting important born-digital resources are available now. It’s time to recognize — and address — the born-digital gap in our ability to deliver today’s knowledge to future generations.

Endnotes

1. Dempsey, Lorcan and Malpas, Constance (2018) “Academic Library Futures in a Diversified University System,” in

Higher Education in the Era of the Fourth Industrial Revolution, ed. Gleason, Nancy. Singapore: Palgrave Macmillan.

Pp. 65-90. Dempsey and Malpas characterize the shift in academic libraries away from describing themselves in terms of collections and instead in terms of the research and teaching needs of their parent institution. 2. Mandel, Carol (2019) Can We Do More? An Examination of

Potential Roles, Contributors, Incentives, and Frameworks to

Sustain Large-Scale Digital Preservation. Washington, DC: Council on Library and Information Resources. Accessed at https://www.clir.org/can-we-do-more/. 3. The groundbreaking strategy for collecting digital content developed by the Library of Congress in 2017 sets out this conceptualization clearly and well. See Library of Congress Collection Development Office, Collecting

Digital Content at the Library of Congress, accessed at https://www.loc.gov/acq/devpol/CollectingDigitalContent. pdf. The plan has been successful and is now entering a new phase; see Puccio, Joe, Developing a New Digital

Collections Strategy at the Nation’s Library, accessed at https://blogs.loc.gov/thesignal/2021/05/new-digitalcollections-strategy/. 4. The libraries participating in the Ivy Plus Libraries

Confederation offer a good role model as described by

Samantha Abrams in this issue. Abrams recently held the position of Web Resources Collection Librarian for the

Confederation while administratively based at Columbia, a position other libraries may well want to institute. 5. Joyce, William L. (1988) “The Evolution of the Concept of Special Collections in American Research Libraries.”

Rare Books and Manuscripts Librarianship 3(1): 19-29. 6. See, for example, the excellent work of the South Asia

Open Archives, accessed at https://www.jstor.org/site/ saoa/, and the Digital South Asia Library, accessed at https://dsal.uchicago.edu/. The contemporary counterpart of much of this material is now “published” as websites.

This article is from: