9 minute read
Transnational Collaborative Web Archiving: The International Internet Preservation Consortium
By Abigail Grotke (Assistant Head, Digital Content Management Section, Web Archiving Program, Library of Congress, and 2021 Chair of the International Internet Preservation Consortium) <abgr@loc.gov>
and Olga Holownia (Senior Program Officer, International Internet Preservation Consortium) <olga@netpreserve.org>
“Web archiving is a pretty heavy rucksack but it is full of interesting challenges of all types: scientific, technical, legal
and operational.” — John Tuck, From Integration to Web Archiving, 2007
introduction
Even in its early days, the Internet posed challenges for those who recognized the need to capture and archive it for future generations. Effectively tackling the myriad of web archiving challenges was impossible for any one institution to solve by itself. Collaboration was deemed necessary even from the beginning.
Archives and libraries began archiving the Internet starting in the mid-1990s. Many are aware of the Internet Archive (IA) and its Wayback Machine. Less well-known is that national libraries, charged with preserving the output of their citizens through legal mandates or legislation, noted the importance of documenting this global resource about the same time that IA began crawling the web. While not at the same scale as IA, and with different missions, approaches and legal frameworks, national libraries and archives began to capture portions of the Internet that were important to them, focusing on preserving entire national domains, for instance, or documenting events such as elections in their countries.
Teaming Up to Preserve a global Web
The first international web archiving collaboration started in 1997 as the Nordic Web Archive (NWA) and involved the National Libraries of Denmark, Finland, Iceland, Norway and Sweden. In 2003, eleven national libraries — the NWA members plus the British Library, Library and Archives Canada, Library of Congress, National Library of Australia, National Library of France, National Library of Italy (Florence) — and IA signed an agreement that established the International Internet Preservation Consortium (IIPC). Since its inception, the IIPC has grown five-fold to include members from over 35 countries across the world. About 85 percent are libraries (national, regional, academic), with the rest being non-profit organizations, audiovisual institutes, and services providers. Each is committed to sharing best practices, and developing tools and resources for the global cultural heritage community.
To achieve its goals, the IIPC members have formed working groups and task forces to develop and recommend standards for collecting, preserving and providing long-term access, and, more recently, to produce training materials, create transnational web archive collections and offer resources for researchers. Building on the early ideas to create a “web archiving toolkit,” the IIPC has facilitated the development and sustainability of open source software and tools through a number of funded projects led by members. And through its annual conference, various communication platforms, and collaboratively maintained repositories of resources, the IIPC has been providing a forum for the sharing of knowledge about web archiving and raising awareness of Internet preservation issues. Supported by membership fees, the IIPC funds a number of strategic initiatives, including projects, working group activities, training events, conferences and one full-time Senior Program Officer. The consortium structure includes an executive board and a steering committee. Besides the Senior Program Officer, the other officer roles are individuals from member organizations who volunteer their time, and all working group and project leadership roles are performed on top of regular work duties back at their home institutions.
Building Community through Collaborative Projects and Working groups
Collaboration and community building have been at the core of all IIPC activities. The current IIPC member obligations state that “every member is expected to work collaboratively, within its country’s legislative framework, to identify, develop and facilitate implementation of solutions for selecting, harvesting, collecting, preserving and providing access to Internet content.” IIPC members hold a unique combination of expertise. Participants range from program managers and library administrators, to technical staff and curatorial teams that perform a variety of web archiving tasks at their home organizations. Member institutions primarily make contributions to the IIPC by dedicating personnel time to projects. Through working groups, and portfolios that focus on member engagement, partnerships and outreach, and tools development, the IIPC community has been actively involved in organizing a number of technical, curatorial, educational, and outreach projects.
The European Society for the History of Science (ESHS) and Brepols announce a partnership to publish the Society’s �lagship journal Centaurus. Journal of the European Society for the History of Science fully in Open Access from 2022 onwards, at no cost to the authors or readers.
Background Until 2021 Centaurus. An International Journal of the History of Science and its Cultural Aspects was published by Wiley as the o�fi cial journal of the ESHS. As the collaboration ended with Wiley in 2021, the ESHS and Brepols have decided to launch a new Centaurus, with the same editorial team, scope, and principles. Together, Brepols and the ESHS have the aim of publishing Centaurus fully Open Access through the fair and inclusive Subscribe-to-Open publishing model. Subscriptions will be available at a signifi cantly lower rate, together with other benefi ts for participating libraries. More info: https://bit.ly/CentaurusOA2021 Subscriptions: periodicals@brepols.net
Uses existing library relationships and subscriptions to convert gated journals to open access
SUBSCRIPTION DEADLINE (31 January 2022) PUBLICATION DATE (1st issue: Spring 2022)
LIBRARIES SUBSCRIBE CENTAURUS Deadline: 31 January 2022
JOURNAL SUBSCRIPTION TARGET ACHIEVED
OR
JOURNAL SUBSCRIPTION TARGET NOT ACHIEVED
2022 VOLUME (PUBLISHED IN 4 ISSUES) AVAILABLE IN OPEN ACCESS
2022 VOLUME (PUBLISHED IN 4 ISSUES) REMAINS BEHIND PAYWALL
A Foundation in Collaboratively Developed Tools and standards
The consortium began with six working groups which focused on defining metrics for web archiving, access, content management, deep web, creating shared frameworks for web archiving activities, and identifying researcher requirements. Initial projects centered around topics critical to the members: harvesting, access, and preservation, with tools and standards in the forefront. It was clear early on that collaborating would be more efficient and cost effective than each member developing tools in isolation, and we still develop tools collaboratively today. These included crawling tools that could be used by consortium members and others, and standards that would enable long-term preservation of the content being archived, which resulted in the development of the WARC preservation format, now an ISO standard. Projects have also involved development of requirements for access tools that reflected the needs of member organizations and their researchers, including open source web archive replay tools, necessary to provide access to the web archives. These include the development and maintenance of OpenWayback, and more recently support for transitioning to a Python version of the web archive replay tool.
Since 2019, the IIPC has funded a series of projects through our Discretionary Funding Program. Each funded project must involve at least two members, and must benefit the larger field of web archiving. The majority of funding has supported the development of tools, although the focus of these new projects is shifting towards use of web archives, easier access, and visualizations.
Collaborative Collections, Training, and research
While the work on tools development is now supported by a dedicated Portfolio and the new funding program, current working groups focus on the consortium’s other strategic goals, which include Content Development, Training, and Research.
The Content Development Working Group (CDG), formed in 2015, leads an effort to create collaborative collections. It expanded upon efforts that began in 2010 with the first transnational collection, focused on the Winter Olympics. Led by volunteer web curators, the IIPC has developed eight large, thematic collections in the past six years which are “broader than any one member’s responsibility or mandate” and cover topics such as Olympics and Paralympics, climate change, Artificial Intelligence, the European refugee crisis, Intergovernmental Organizations, and, the most recent and largest effort, a Novel Coronavirus (COVID-19) web archive. The Training Working Group was formed in 2017 to fulfill the vision of making IIPC the world leader for training on web archiving. Training materials geared toward beginners have been developed collaboratively by IIPC with the Digital Preservation Coalition and are available openly online, along with video case studies from experienced practitioners.
IIPC initiated collaboration with researchers shortly after the consortium was created. Research use of web archives has also been one of the recurrent themes at our annual conference and we have collaborated with initiatives such as RESAW (Research Infrastructure for the Study of Archived Web Materials) and the Archives Unleashed Project to promote research use of web archiving. The Research Working Group, chartered in 2018, engages with the key existing networks, CDG and individual members to share information about web archiving research projects and tools, facilitate ways for dissemination and discussion of use cases as well as enabling access and use of our collaborative collections.
Outreach and Advocacy
A major IIPC goal is to “raise awareness of Internet preservation issues and initiatives through activities such as collaborative collecting, conferences, workshops, training events and publications.” We do this in a number of ways, most notably, through our annual General Assembly meeting for members, and a Web Archiving Conference open to all.
Our advocacy work also extends to working with the wider community and researchers, for instance on initiatives such as the earlier mentioned Archives Unleashed Project (https:// archivesunleashed.org), RESAW (resaw.eu), Web ARChive studies network researching web domains and events (warcnet.eu), and the International GLAM Labs Community (glamlabs.io).
Conclusion
Although the strategic goals and emphases of work have changed over the years as a result of the interest of members, the IIPC has only strengthened over time with collaboration, and with the incredible generosity of our web archiving community members in their willingness to share their expertise. Even as expertise has grown in our own organizations, we still rely on these external relationships to learn new things, and to tackle challenging problems. The recent year and a half has just refocused the way we work together across our borders, and we have been able to try out new ideas to determine the best ways to continue to engage our community and make it easier for more members (including those who are not from web archiving teams) to be actively involved. It remains reassuring and helpful, nearly 20 years after the founding of the IIPC, to know that we are not alone in tackling the issues of web archiving.
Web Archiving Week in London, 2017, a combined WAC & RESAW conference.
Building Web Collections: Cooperation Past and Future continued from page 13
tions and individuals, we’ve navigated many rocky shoals, from policy challenges to new technological hurdles. Working together, I believe we can surmount many of the barriers to building digital collections looming ahead. In fact, only by working together can we succeed in preserving humanity’s online culture in the future.
Endnotes
1. https://web.archive.org/web/20020228175146/http://web.archive.org:80/collections/e2k.html 2. https://www.washingtonpost.com/archive/politics/2001/01/19/transition-onthe-web-the-cyber-house-rules/11a977ad-9b97-4699-be4d-2a6802357d58/ 3. https://netpreserve.org/ 4. https://communitywebs.archive-it.org/ 5. http://blog.archive.org/2021/11/02/24-arts-organizations-join-the-collaborative-art-archive-carta/ 6. https://archive-it.org/ 7. https://web.archive.org/save/ 8. https://webservices.archive.org/wewa