8 minute read
Open Web Curation
By Gary Price (Co-Founder, infoDJ; Co-Founder, Editor, infoDOCKET; Editor, ARL Day in Review) <gprice@gmail.com> and Curtis Michelson (CEO, infoDJ) <curtis@infodj.io>
More than ever, citizens and especially students need ready access to authoritative, trustworthy information. It’s out there, but given the sheer volume of material and the frenetic pace of information sharing, markers of credibility are eroding. Twitter, TikTok and the like dominate students’ news gathering, and nefarious actors truck in conspiracy theories on those platforms. Meanwhile, high quality born-digital initiatives and resources struggle to be discovered and shared. We call this lesser explored universe of trustworthy information the “credible web.” It is large and growing.
We hope that libraries and allied organizations can find more ways to uncover, index and share this dynamic and growing region of the open Internet with their patrons and wider world. This article aims to provide a bit of backstory and context for the mission of mining the credible web and some practical ways to begin. We humbly propose an updated mission for libraries and librarians, beyond collection management and towards active curation and publishing.
What is The Credible Web?
So what kind of artifacts comprise the credible web? We’re talking about things like reports coming out of think tanks, NGOs and research institutes. Government reports along with the figures, tables and graphics embedded therein. These high quality (often peer-reviewed) sources of information can bring rich context to student’s social media fed news diets and classroom discussions. We also include the various speciality datasets and interactive visualizations from institutions like the Smithsonian, NOAA, the International Energy Agency and the like. For a more representative list of this latter category, see the searchable database we have compiled at Open Web Tools.
The question that comes up is, if this material is open and available and of such high quality, why is it not more widely discovered and shared? Put simply, the good stuff is hiding in plain sight, or rather on the Nth page of Google search results. One definition of the “invisible web” is simply the Google results below the first scroll. And, complicating the situation is the fact that once discovered, reports take time to read and digest. These are not headlines that easily roll over a tik-tok or feed. A 250-page PDF needs a wider screen and a bit of time to review.
Libraries and librarians are and always will be ideally suited for this task. They gather selections of materials from a variety of sources, make enough sense of them to mark them up with some metadata and then share them in a variety of ways with users. In the age of digitization, our users can be both near and far, an IP address somewhere in the world. The process of gathering, organizing and exposing these materials to our global users is perhaps a new facet of collection development and perhaps one day a part of library science degree programs.
How Did We Get Here?
In the early days of the web, there were any number of library led projects working to help organize the web. A few years later most of these projects disappeared. Here are just three noteworthy examples from the “scholarly graveyard.”
InfoMine From UC Riverside
Librarians’ Index to the Internet
Resource Discovery Network (UK)
These were well intentioned, but now many years on, we see that the information world has exploded and it has split into niche channels and niche search experiences and interfaces. Some of those niche channels are extremely high quality and relevant, or what we call credible. Look at The Global Jukebox, ReliefWeb or Jurist. Again, see Open Web Tools for more.
Call To Action
Perhaps it’s time for library communities to think anew. Much of this content is what libraries have long aspired to track; namely, being able to discover and share relevant credible sources with their patrons. Now they can. The challenges presented by this opportunity do not come from negotiating big deals or monitoring subscription license charges (95%+ of this information is free to the end user). Rather, now that this trove of authoritative content is right before us, the issue is more of an internal resource allocation challenge in which management and library leadership need to develop strategies for advancing discovery and use of the credible web in a timely manner for their various end users.
As we will demonstrate in our attached video (see below), there are relatively low costs ways to begin wading into this wild wiggly web. With simple tools, we can begin to collect and share wild content and do so rigorously by applying “classic” library concepts of collection development (e.g., reputation of publisher, update frequency, etc.) Will we need to make modifications or adaptations? Absolutely. To take one example, there is a real phenomenon of links or sub-pages on sites (so called “deep links”) disappearing or changing (see sidebar “Measuring the Ephemeral Web”). So that is a challenge — open web pages may change in small or even large ways over time. But, perhaps that’s part of the new library mission too, to keep a digital breadcrumb trail to versions of these documents over time which can also provide fascinating timelines and context. We will demonstrate some ways to do this in our video.
Librarian as Curator, Library as Publisher
We might be suggesting a “flipping the script” for libraries. Rather than just acquiring big collections and opening the gates to them via generic OPAC interfaces, we are envisioning a future supplemental stream of library activity that curates companion open web content to supplement and enhance the use of journals, books and course materials. With new tools in our hands, is it time to try again? And, if libraries are curating, perhaps when they share these just-in-time goodies with their patrons, they are really publishing. Imagine a subset of library staff which publish topic-specific news feeds to classes or professors. This is a higher level of service, and certainly has costs attached. But what might the conversation sound like at the next annual budget meeting if one was to propose reallocating some funds toward curatorial publishing?
There are as many reasons as there are materials and users. Here are just a few key reasons:
• Currency: providing our users with the latest data on a topic.
• Provenance: Getting material directly from the source. These are primary and secondary sources that can and will likely be cited in the formal scholarly record later.
• Credibility: With globally maintained library authority files virtually every source of information whether it’s a person, an institution/corporation or even a body hidden behind the pseudonym could be easily identified on-fly.
• Context: This material adds the backstory and trend lines to stories in the news. See Project Info Literacy report, especially recommendation #4.
• Open Access: end users don’t hit paywalls.
The Google Paradox
Inevitably, we hear, “it’s in Google, why do we need to do this ourselves?” While it’s technically the case that most all the content we are speaking about is somewhere in Google, it certainly doesn’t mean the end user will ever find it, either because Google’s relevance algorithm buries it, or due to people’s casual search habits. If there is too much to fit on a single page of results, users miss it. What we’re describing is not the recreation of Google but more like what was once called SDI or selective dissemination of information. Getting better control of this material (both text and audiovisual) helps librarians and others in the selection of materials for LibGuides, webliographies, and other tools.
Where To Begin
As a start, let’s establish some working principles. To share the credible web in a robust scalable way, it would be ideal if each library’s effort was:
• Cooperative. How might staff and even student teams organize and work together?
• Made up of a diverse mix of both generalists and subject experts
• Included experts from the digital preservation web archiving, machine learning and multimedia production communities
• Later, partnered with vendors and publishers to assist with technology, metadata, and distribution
Consider asking a colleague or two to join you in the following process of initial source gathering:
• Select a few topics to begin, something of relevance to your institution or to a campus department
• Consider inviting a faculty member from that department to advise you
• Investigate who are the key providers of credible materials on this topic? (Go beyond books & journals, consider reports from think tanks and topic-specific news outlets and trade organizations.
• Consider including multimedia content such as podcasts and even video feeds (YouTube, etc.)
• Identify key voices in these topics. Do they offer newsletters or feeds that can alert you to new materials when they become available? Capture their Twitter or other social handles.
• Leverage your subscription databases to alert you to new materials, reports as well.
• See if trade/business press publications cover the same topics. Can they be used to help identify people, organizations, reports, etc?
Once these sources are named and captured, and you begin surfacing documents they make available, read them back to front to mine the reference lists for more sources to monitor. Rinse, repeat. The same goes for captions on charts and tables which often point to additional credible sources for underlying data and datasets.
<https://www.charleston-hub.com/media/atg/>