The National Archives Web Continuity Project

Page 1

Web continuity matters An overview and update for government organisations


Foreword Over half of all interaction between government and the public now happens online. Website links are commonly used in a wide range of documentation, from answers to Parliamentary Questions, forming part of Hansard’s official record, through to academic research and public awareness campaigns. Increasingly, government websites are the first port of call for politicians and their civil servants, for business and professional communities, and for the general public. The integrity of Web links is therefore crucial to the business of government and to ensure public access to information.

In April 2007 Jack Straw, as leader of the House of Commons, wrote to the Cabinet Office minister Hilary Armstrong, asking her to ensure continuing access to online documents. The Archiving Digital Assets and Link Management working group was formed as a consequence. Its membership comprised representatives from The National Archives, the British Library, the Parliamentary Libraries, the Central Office of Information, and website managers from several government departments. The group agreed that all web-based information should be treated as a valuable asset, and that, in particular, all online information referenced through Web links should remain available and accessible.

Web links are easily broken because the information, or uniform resource locators (URLs), are moved, for example as the result of government reorganisation, rapidly changing technology, or as part of regular archiving and site closures.

It recommended appropriate policies and practices to address this, including a recommendation that permanent references are made to information in the government’s Web estate.

There can’t be many of us who haven’t experienced the frustration of ‘page not found’. In fact, between 1997 and 2006, over 60 percent of the URLs cited in Hansard in response to Parliamentary Questions were broken.

The Web continuity project has been set up and funded by The National Archives, to implement their suggested solution.

This document presents the solution being developed by The National Archives to ensure that there are no broken URL links in government websites and official publications. It develops the proposals outlined in Maintaining the integrity of the online public face of government, a consultation document first circulated in September 2007.

This brochure sets out our plans for ensuring the availability and accessibility of online government information.

Analysis by the House of Commons’ Library, and research on Hansard conducted by The National Archives, revealed: • Departments are increasingly citing URLs in answers to Parliamentary Questions, which form part of the official record (including links to non-governmental websites). • 60 percent of links in Hansard to UK government websites, for the period 1997 to 2006, are now broken. • This is the same as links in non-government websites – but it does indicate a lack of special provision for official information that needs addressing.

David Thomas Technology Director and Chief Information Officer, The National Archives

• Departments vary considerably. For one, every link works; for another, every link is broken. This demonstrates that it is possible to maintain links where there is an appreciation of the issues and a will to do so. • During the process of website rationalisation, much of the Web estate will be archived. Information will continue to be found only if we agree a way to handle link management effectively.


What is Web continuity?

Our solution The Web continuity project is developing a solution that:

• It is a solution for archiving and linking to online information in perpetuity.

• provides effective and practical processes for delivering on links;

• It is about maintaining the integrity of the online face of government.

• relates search to a broader pan-Government search project, and; • provides possibilities for long-term preservation and access.

We aim to deliver this solution in two phases.

Our objectives Our vision is to ensure that every Web link works, and that every piece of online information cited remains accessible, in perpetuity, for the public face of government in the digital world. At present this is not the case. More than half of the time a link in Hansard produces a 404 ‘page not found’ error, rather than the information required. There are a number of possible causes: • information has been deleted; • information has been moved and the link not updated; • the link was incorrectly recorded.

We want to ensure that no cited information is lost through deletion, by adopting an effective and comprehensive Web archiving strategy and related processes. As information is moved either to the archive, or relocated for other reasons, we want to make sure that there are policies and processes in place to maintain links to current information, or take the user to the content in the website archive. Links should also be correctly given and recorded in official records. Despite best endeavours, there may still be some cases where links are broken. As a fallback, we want to ensure that the information referenced is findable through wider search mechanisms. This work is being taken forward by the pangovernment search project. Finally, the Web may not continue to be the dominant publishing medium it is at present. At some point in the future it may undergo significant technical transformations. Long-term preservation of information is, therefore, critical in ensuring that there is a historical record available which is accessible.

Phase 1 All links work in perpetuity

Phase 2 Deliver long-term access and preservation

• The National Archives will comprehensively archive websites, in collaboration with partners.

• The British Library will work with the National Archives, and the other UK copyright libraries, to develop systems for identifying and retrieving documents required for long-term preservation from government websites, and transferring them to a digital store.

• The National Archives will configure a software component that will be installed on all government websites. This will deliver the information requested by the user whether it is on the live website, or retrieved from the archive and presented appropriately. The component will be validated using the CSIA Claims Tested Mark (CCT Mark) Scheme. This scheme is a government quality approval scheme, predominantly focused on information assurance (IA). • The National Archives will prepare guidance, which will then be issued by the Transformational Government team at the Central Office of Information. This guidance will be regularly reviewed and amended as necessary.

• In particular, the British Library will work with the Transformational Government team to develop metadata standards. This will help with both the identification of documents and with the creation of records for the Library’s catalogue. • It will help to define the set of government departments, agencies and non-departmental public bodies that will be included in Web continuity project, and from whose websites we would expect to harvest documents.


What benefits will Web continuity bring? The solution developed by the Web continuity project will be easy to implement and use. The benefits are that:

Timeline

• government can manage its information more efficiently, removing non-current content, confident that it has been captured in the Web archive; • information is easily accessible to all; • information is retained for the short-term enabling government to re-use information or retrieve it; • information will be preserved for the long-term, aiding researchers, historians and other users; • the public has greater confidence in government data handling through fewer instances of broken links.

What will the project deliver? Comprehensive archiving of the government web estate by the National Archives. This builds on our earlier programme of archiving selected websites of the major central government departments. Guidance on the use of sitemap generation software for website managers. This will enable more comprehensive capture of website content. The recommended use of XML (Extensible Markup Language) sitemaps as a supplementary means of directing website crawls. The benefits of using sitemaps are: • controlled, systematic harvesting using a single, scalable, internationally accepted protocol;

• better coverage – ‘hidden’ (unlinked-to) pages can be listed, as can ‘virtual’ pages generated by dynamic (CMS or database-driven) applications; • a low-level auditable trail – each sitemap is, in effect, a complete, time-stamped catalogue of a website’s archivable contents. A single, centrally hosted registry as the means of auditing website crawls. This database will be available to all government stakeholders and will provide an online authoritative record of the archived government Web estate. It will provide the definitive record of all necessary administrative and technical detail: who, what, where, when, how often, how much, how long and how well. Installation of a software component on government organisations’ websites, which redirects users to the Web archive if a link is no longer active, but has been captured in the archive. This will ensure that links persist over time. We are currently developing a component which will run on Apache and Microsoft IIS (Internet Information Server) Web servers, and will consider developing components for other variations.

November 2007 – April 2008 March 2008 May 2008

Sign up for briefings – email webcontinuity@nationalarchives.gov.uk Solution piloted by The National Archives, Ministry of Justice and the Department for International Development. Briefings to government: • Heads of e-Comms. • Website Managers. • Government Publishing /Publicity Managers. • Departmental Records Officers.

June – November 2008

Solution software and guidance will be available for use via the Digital People Network (currently under development by Central Office of Information). Solution implemented by government departments, supported by The National Archives. Digital People Network used to share best practice, for horizontal support for departments across government, and as a mechanism for governance.

Guidance to government webmasters on best practice for website design and maintenance for archiving purposes. A methodology for monitoring both the efficacy of and compliance with the new system and guidance. This will include a League Table of results arranged by government organisation.

Feasibility studies and development work by the National Archives.

November 2008

Project implemented and monitoring in place.


An action plan for Government departments Who

What

When

How

Directors of Communications

Facilitate implementation Encourage best practice Cascade key messages

March 2008

F orward this brochure to interested/impacted parties Encourage use of the Digital People Network

Heads of e-Comms

Facilitate implementation Encourage best practice Cascade key messages Identify and brief IT contacts with responsibility for web servers Identify and brief key contacts in non departmental public bodies (NDPBs) – including Heads of e-comms and website managers Consider when work can be scheduled

March 2008

Forward this brochure to interested/impacted parties Encourage use of the Digital People Network Email: webcontinuity@nationalarchives.gov.uk with the names of agency/ NDPB contacts who need to be included in briefings

Attend project briefing session Work with IT contacts to schedule the installation of the redirection component, and the sitemap generation software

May 2008

Organised by The National Archives

June 2008 and beyond

Software and guidance will be available via the Digital People Network

Identify and brief IT contacts with responsibility for Web servers Consider when work can be scheduled

March 2008

Pass on this brochure

Work with key IT contacts to schedule the installation of the redirection component, and the sitemap generation software

June 2008 and beyond

Software and guidance will be available via the Digital People Network

Attend project briefing session

May 2008

Organised by The National Archives

Attend project briefing session

May 2008

Organised by the National Archives

Use guidance

June 2008

Guidance will be available via the Digital People Network

IT Managers

Organise installation of redirection component

From March 2008 and beyond

Schedule for year 2008-09 Discuss with IT suppliers if necessary Obtain software and guidance from The National Archives (from June 2008)

Departmental Record Officers (DROs)

Engage with website managers to ensure that the preservation of key information is ongoing

May 2008 and beyond

The Digital People Network is aimed at the information management community as well as e-comms and IT groups

Attend project briefings

May 2008

Organised by The National Archives

Website Managers

Government Publishing/ Publicity Managers


How the project is governed

The Senior Responsible Owner for the project is The National Archives’ Chief Executive, Natalie Ceeney. Together with the members of the senior steering group she provides support and guidance for the project.

The steering group members are: Natalie Ceeney

(Senior Responsible Owner) The National Archives

Andrew Stott

Deputy CIO for Government, Cabinet Office

John Pullinger

Director General, Information Services, House of Commons

Alex Butler

Transformational Strategy Director, COI

Jeremy Gould

Head of Internet Communication, Ministry of Justice

A Project Assurance Group, comprised of the members of the original working group, also supports the project. The assurance group ensures that the Web continuity project meets its objective of providing a link management service and comprehensive website archiving to government. Its membership is drawn from The British Library, the Central Office of Information, the Department for Information Services at the House of Commons, the Department for International Development, the Ministry of Justice, and The National Archives. It: • meets periodically to review the progress made by the National Archives project team; • provides expert knowledge and support to the project team; • reviews and agree project documentation and guidance to be issued in support of the technical solution, etc; • ensures that the needs of its stakeholders are addressed in each phase of the solution. The project is managed by a Project Team at The National Archives.

Any questions? We hope this brochure answers your questions about Web continuity, but if you’d like more information please do email us: webcontinuity@nationalarchives.gov.uk

About The National Archives The National Archives is at the heart of information policy – setting standards and supporting innovation in information and records management across the UK, and providing a practical framework of best practice for opening up and encouraging the re-use of public sector information. The National Archives Kew | Richmond | Surrey TW9 4DU www.nationalarchives.gov.uk/webcontinuity


The National Archives Kew | Richmond | Surrey TW9 4DU www.nationalarchives.gov.uk/webcontinuity

Published March 2008


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.