Technology & Standards Watch (TechWatch) www.jisc.ac.uk/techwatch
Horizon Scanning report 10_01 First published: Sept. 2010
Data mash-ups and the future of mapping by University College London Centre for Advanced Spatial Analysis (CASA)
University of Nottingham Centre for Geospatial Science (CGS)
Michael Batty Andrew Crooks Andrew Hudson-Smith Richard Milton
Suchith Anand Mike Jackson Jeremy Morley
Reviewed by: James Reid Team Leader and Business Development Manager, Geoservices EDINA Andrew Turner Deputy Director, Centre for Computational Geography University of Leeds
To ensure you are reading the latest version of this report you should always download it from the original source. Original source Version This version published Publisher Copyright owner
http://www.jisc.ac.uk/techwatch 1.0 Sept. 2010 JISC: Bristol, UK Suchith Anand, Michael Batty, Andrew Crooks, Andrew Hudson-Smith, Mike Jackson, Richard Milton, Jeremy Morley
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Executive Summary The term 'mash-up' refers to websites that weave data from different sources into new Web services. The key to a successful Web service is to gather and use large datasets and harness the scale of the Internet through what is known as network effects. This means that data sources are just as important as the software that 'mashes' them, and one of the most profound pieces of data that a user has at any one time is his or her location. In the past this was a somewhat fuzzy concept, perhaps as vague as a verbal reference to being in a particular shop or café or an actual street address. Recent events, however, have changed this. In the 1990s, President Bill Clinton's policy decision to open up military GPS satellite technology for 'dualuse' (military and civilian) resulted in a whole new generation of location-aware devices. Around the same time, cartography and GIScience were also undergoing dramatic, Internetinduced changes. Traditional, resource intensive processes and established organizations, in both the public and private sectors, were being challenged by new, lightweight methods. The upshot has been that map making, geospatial analysis and related activities are undergoing a process of profound change. New players have entered established markets and disrupted routes to knowledge and, as we have already seen with Web 2.0, newly empowered amateurs are part of these processes. Volunteers are quite literally grabbing a GPS unit and hitting the streets of their local town to help create crowdsourced datasets that are uploaded to both open source and proprietary databases. The upshot is an evolving landscape which Tim O'Reilly, proponent of Web 2.0 and always ready with a handy moniker, has labelled Where 2.0. Others prefer the GeoWeb, Spatial Data Infrastructure, Location Infrastructure, or perhaps just location-based services. Whatever one might call it, there are a number of reasons why its development should be of interest to those in higher and further education. Firstly, since a person's location is such a profound unit of information and of such value to, for example, the process of targeting advertising, there has been considerable investment in Web 2.0-style services that make use of it. Understanding these developments may provide useful insights for how other forms of data might be used. Secondly, education, particularly research, is beginning to realize the huge potential of the data mash-up concept. As Government, too, begins to get involved, it is likely that education will be expected to take advantage of, and indeed come to relish, the new opportunities for working with data. Since, as this report makes clear, data mash-ups that make use of geospatial data in some form or other are by far the most common mash-ups to date, then they are likely to provide useful lessons for other forms of data. In particular, the education community needs to understand the issues around how to open up data, how to allow data to be added to in ways that do not compromise accuracy and quality and how to deal with issues such as privacy and working with commercial and non-profit third parties—and the GeoWeb is a test ground for much of this. Thirdly, new location-based systems are likely to have educational uses by, for example, facilitating new forms of fieldwork. Understanding the technology behind such systems and the way it is developing is likely to be of benefit to teachers and lecturers who are thinking about new ways to engage with learners. And finally, there is a future watching aspect. Data mash-ups in education and research are part of an emerging, richer information environment with greater integration of mobile applications, sensor platforms, e-science, mixed reality, and semantic, machine-computable data. This report starts to speculate on forms that these might take, in the context of map-based data.
1
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Table of Contents Executive Summary
1
1. Introduction
3
1.1 Background and context
4
1.2 Summary and rationale
10
2. State of Play: Maps, mash-ups and the GeoWeb
11
2.1 Harnessing the power of the crowd
12
2.2 Individual production and user-generated content
12
2.3 Openness
16
2.4 Network effects and the architecture of participation
18
2.5 Data on an epic scale
19
3. Technologies and Standards
21
3.1 The role of Ajax and other advances in Web technology
21
3.2 Map mash-up basics
21
3.3 Specific technologies for map mash-ups
23
3.4 Standards and infrastructure
24
4. The future of data mash-ups and mapping
28
4.1 Semantic mash-ups
28
4.2 Mobile mash-ups
29
4.3 Geo-location on the social Web
30
4.4 Augmented Reality
30
4.5 Sensors
32
4.6 3-D and immersive worlds
34
4.7 HTML5
36
4.8 Policy, standards and the wider context
37
Conclusions and recommendations
40
About the Authors
41
References
42
2
JISC TechWatch: Data mash‐ups… (Sept. 2010)
1. Introduction What they are all seeing is nothing less than the future of the World Wide Web. Suddenly hordes of volunteer programmers are taking it upon themselves to combine and remix the data and services of unrelated, even competing sites. The result: entirely new offerings they call ‘mash-ups’. Hof, 2005 (online) Originally the term mash-up was used to describe the mixing or blending together of musical tracks. The term now refers to websites that weave data from different sources into new Web services (also known simply as 'services'), as first noted by Hof (2005). Although 'mash-up' can refer to fusing disparate data on any particular topic the focus in this report is on mashups with some spatial or geographic element. In fact, most map mash-ups blend software and data, using one or more APIs1 provided by different content sites to aggregate and reuse data, as well as adding a little personalized code or scripting to create either a new and distinct Web service or an individualized, custom map. The Traffic Injury Map2 illustrates a typical example of a map mash-up. Using data from the UK Data Archive and National Highway Traffic Safety Administration (US), injuries resulting from traffic accidents are presented on a Google Maps cartographic base (see Figure 1). The mash-up enables users to identify areas with frequently occurring accidents and allows specification of categories to distinguish between accident victims such as cyclists or children.
Figure 1: Traffic Injury Map showing incidents in the Nottingham area
Map mash-ups are the most commonly developed type of mash-up application. According to statistics from ProgrammableWeb,3 a website detailing APIs and mash-up news, mapping APIs (e.g. Google Maps, Microsoft Virtual Earth [now Bing Maps] and Yahoo! Maps) constituted 52% of the most popular APIs in July 2009 (see Figure 2a) – although by October 2009 this was down to 42% (see Figure 2b), and with a different mix of APIs (Google Maps, GeoNames and Google Maps for Flash). This shows the dynamism of this area, albeit with Google dominating the mapping APIs. In this case, the reduction in the total percentage of APIs that are map based is mostly due to the rapid rise in mash-ups involving the social micro-blogging site Twitter (from 5% in July to 20% in October 2009).
1 Application Programming Interfaces (APIs), are defined here as software programs that interact with other software to reduce the barriers to developing new applications. For map mash-ups, applications can be created with nothing more than a simple text editor, with the API providing customizable map 'tiles'. 2 http://www.road-injuries.info/map.html 3 http://www.programmableweb.com/
3
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Figure 2: Most popular APIs for mash-ups as listed by ProgrammableWeb (a) in July 2009 (b) in October 2009
1.1 Background and context Data mash-ups utilize the ideas of Web 2.0 (e.g. data on an epic scale), but because Web 2.0 as a term is starting to fall out of favour it is important to be clear about how it is used in this report. For some, talking simply about technology developments, 'Web 3.0' may seem to be somehow more current, describing 'the next phase' of the Web's progression, perhaps the semantic Web. However, it is often forgotten that O'Reilly's original description of Web 2.0 (2005) is as much about the ideas behind technology developments as it is about the technologies themselves. These ideas are still current – they have not yet changed in response to new technology developments – and are therefore still valid as an analytical framework for understanding data mash-ups and speculating on how they will progress. This section will therefore discuss some of the background and context to the emergence of Web 2.0-influenced geospatial systems and map mash-ups. For example, by understanding something of the way that geospatial information systems developed before the arrival of the Internet and then the Web, and how traditional data capture and spatial analysis is undertaken, it is possible to glean some of the impact of new ideas, particularly Web 2.0. In this section we will review these developments briefly through the concepts of software and data as a service. Finally, we will place these developments within the context of some of the existing activities within JISC and the wider higher/further education (HE/FE) sector. 1.1.1 Pre-Web Graphical Information Systems
Computer cartography started almost as soon as the idea of graphics emerged in the 1950s and '60s with the invention of pen plotters and line printers. A key development in this regard was at Harvard where the Symbol Mapping system SYMAP was developed. However, it was only when graphics tube technology was replaced by screen memories (graphics cards), as a consequence of the PC revolution that began in the late 1970s, that graphics and, subsequently, computer-based mapping really began to take off. There were no mapping systems developed for the original Apple II but the early Apple Mac and the later IBM PC had simple systems. The Domesday Survey of Britain was available for the BBC Micro as a kind of simple but passive map. Rudimentary mapping software and then GIS4 began to appear in the mid to late 1980s but initially they consisted of migrations down
4 Geographic Information Systems (GIS) is a generic term that refers to technology that stores, processes, analyses and displays geospatial data. This kind of software tends to be expensive to produce, requires highly skilled professionals to operate it, and is used to handle large, complex datasets for end-users with very precise requirements. In addition, the geographic data that these systems make use of can also be expensive to procure and users of the data must often adhere to licence agreements designed to prevent unlawful sharing or copying.
4
JISC TechWatch: Data mash‐ups… (Sept. 2010)
from mainframe systems via specialist workstations. Even in the late 1980s, unless you were working with graphics and mapping on workstations, most graphics processing, even on minicomputers, was based on processing first and then separate display. The Digital Map of the World ran on PCs in the early 1990s but it was not until Windows 95 that desktop GIS really made its entry with ArcView (notwithstanding slightly earlier desktop systems such as MapInfo). Desktop systems began to mature as workstations disappeared and graphics and related computer memories got ever larger. In the late 1990s, two things came into this mix that built on the Web. Firstly, what might be called Internet GIS: servers that held a GIS system and associated data and which served maps and analysis functions via the network to client-side machines—usually specialist workstation machines developed mainly for professional use. Xerox PARC was a pioneer in this area, with the Xerox Map Viewer as a notable example. Secondly, came online maps displayed through the Web browser, although the map was not really very manipulable and was used simply for basic navigation with zoom and pan facilities. By the late 1990s, various products that enabled users to 'find their way' – gazetteers and atlases – had appeared5 and in the early 2000s, value was being added to these interfaces as they began to be customized to provide new layers of spatial data that users could query. A good example in the UK is UpMyStreet which was first created in 1998 and now contains a wealth of local information targeted around the search for property and local services. It is important to be clear that such browser-based systems offered little or no spatial analysis, which is the heartland of professional GIS. In fact, this distinction between professional GIS functionality (which we will call spatial analysis) and computer maps is still one that dominates the field, and most Web-based map mash-ups today do not deal with spatial analysis. 1.1.2 GIS data capture and management Traditionally, geospatial work is carried out using GIS software. This is a specialized area of work that is generally known as Geographic Information Science or GIScience (Goodchild, 1992). The specialized nature of GIS applications means their use is often restricted, with several barriers to widespread operation. Firstly, data and software are expensive and secondly, trained personnel are required to successfully operate and manage these systems. Historically, capturing geographic data has been expensive and time consuming. Primary data are collected by surveyors who make use of advanced equipment such as total stations, Real Time Kinematic GPS, aerial photography, terrestrial and airborne laser scanners (LIDAR), amongst other tools, in order to capture detailed measurements and positions of objects in three-dimensional space. The cost of acquiring the equipment and the knowledge needed to operate and deploy it for mapping purposes means that geographic survey remains a skill carried out by highly trained personnel. Secondary data capture involves the digitization of existing geographic data contained in paper maps, and this may include scanning them to create an image file or tracing the cartography to create a geographic database of the features. The latter method is usually preferable although it is a time consuming process. In Britain, topographic map data capture and supply is most commonly carried out by Ordnance Survey, with companies such as NAVTEQ and Tele Atlas supplying much of the consumer-grade data used for navigation devices and electronic maps. There are also Government suppliers and large data vendors of remote/satellite imagery, such as NASA, Digital Globe and GeoEye, and a variety of other infrastructure, business, and sensor
5 such as MapQuest 5
JISC TechWatch: Data mash‐ups… (Sept. 2010)
information suppliers. There is similarly both public sector and commercial acquisition and supply of a wide range of thematic map data (geology, soils, hydrography, land use, etc.). The importance of the Ordnance Survey, however, is that it operates as a Government trading fund producing the high quality, base map information that is needed for many different applications and these data have to be accessed using a licence fee price model.6 The high cost of data capture means the prices of geographic products can therefore also be high. Although some Ordnance Survey data have been opened up for free public access this does not include the bulk of data from which most of their revenue comes. To manage spatial data and undertake geographic analysis GIS software will typically be used. The software provides a powerful tool for manipulating spatial data and as with the data capture stage can be expensive. Typical GIS software is supplied on a 'seat licence' basis, allowing multiple users to operate the application. Systems vary considerably, from those that contain elaborate and extensive sets of toolbox functions that let the user customize the software and link it to other software, to more integrated packages that have many fewer toolbox functions. Pitney Bowes' MapInfo, for example, tends to be a more integrated package whereas ESRI's ArcGIS is now much more of a GIS toolbox. The complexity of the systems means that either users need to be trained or specialist operators are required, thereby adding to the overall cost to the purchaser. Moreover, providing a Web-based GIS solution typically requires further skills, software and hardware and will probably need a separate data licence agreement. All of these factors mean that GIS users have to solve a further set of technical and legal issues if they wish to display their geospatial data on the Web. As a result, the raw geospatial dataset is usually closed to the wider world, whether deliberately or not. In summary, it is important when making sense of the rest of this report to realize that essentially mapping, and the reproduction of maps on a computer screen is a much broader and more general topic than specialized GIScience. Both, however, have been impacted by the introduction of the Web and, more recently, the ideas of Web 2.0. 1.1.3 Software and data as a service A key element in my thinking about Web 2.0 has always been that it's about data as a service, not just software as a service. In fact, the two ideas become inseparable. Tim O'Reilly, in Turner and Forrest, 2008 (p.1) It is into this landscape that Google launched its Google Maps service in February 2005, closely followed by its API in June of that year. Gibson and Erle (2006) argue that three things set Google apart from previous Web and Internet-based mapping systems: a clean and responsive user interface; fast loading, 2-D map images consisting of pre-rendered tiles (the backcloth, or cartographic base); and, most of all, a client-side API that allowed users to customize the backcloths by overlaying additional data and then embed the finished result into their own website. The new service integrated map, satellite and aerial imagery together via a simple interface which included Google's trademark, easy-to-use and accurate search facilities as well as what were then the innovative 'pan and zoom' or 'slippy' maps (activated by clicking and dragging) that have now become an everyday part of life for many computer users. Critically, Google does not give access to the raw data that underpins its map backcloths. This is partly because it was initially buying its data from third party suppliers and the conditions
6 The Ordnance Survey trading fund model is different from other Government trading funds that produce spatial data (e.g. Met. Office and United Kingdom Hydrographic Office [UKHO]) although it is likely that it will be brought into line with other data-rich, trading fund models in the future (DCLG, 2010).
6
JISC TechWatch: Data mash‐ups… (Sept. 2010)
of use meant that Google could only supply rasterized maps based on the data, and partly because of the kind of caching needed in order to deliver at the Internet scale (see section 3.2.2). While these factors have worked in Google's favour in terms of providing a service to millions of users who do not necessarily understand how to deal with raw datasets, it does impose limitations on what it is possible to create with map mash-ups, particularly with respect to accuracy. The result of this is that we now have a software ecosystem, essentially a more diverse market for software and services, where desktop GIS exists alongside Web-based GIS (as part of enterprise architectures within large organizations), 2-D Web map systems that provide basic navigation (e.g. Multimap), customizable 2-D Web map systems, and 3-D Web map systems such as Google Earth, Microsoft Bing Maps etc. In the mix are a variety of services that tend not to be large-scale and are usually locally developed (e.g. UCL's MapTube7). This is underpinned by a data supply ecosystem. Data providers that collect and sell professionally-produced data (e.g. Ordnance Survey, Infoterra) and resellers that repackage it in order to sell it on as a 'value added' product (e.g. LandMark) have been joined by data providers that collect their own data and make it available via a service, data providers that collect publicly available data and repackage it (keeping it free of charge, for example via the portals that are emerging under the Making Public Data Public initiative which has launched the data.gov.uk site) and data providers that collect crowdsourced data and make it available for free (e.g. OpenStreetMap). While the issue of data licensing remains a thorny one (see section 2.3), it is important to note the fluidity of the boundaries between the different software and data models and the potential to 'mix and match'. 1.1.4 Mash-ups in Higher Education The use of geospatial data mash-ups in higher education (HE) forms part of the wider debate about the implications of Web 2.0 technologies and mash-ups (Lamb, 2007). Despite this relative newness there is growing interest in their potential, demonstrated, for example, by the session given over to geospatial services and the benefits for educators at JISC's 2009 conference, which provided a number of examples of how Web 2.0-style data mash-ups can be used including: geo-tagged referential archives for geography students; adding layers of geo-tagged data to photographs; integrating the new domain of neogeography8 with social science (the Geography Undergraduates INtegrating NEo-geographies and Social Science [GUINESS] project). In addition, JISC's Shared Infrastructure Services Landscape Study (Chapman and Russell, 2009) surveyed and reviewed the use of Web 2.0 tools and services in the UK HE sector. As part of this work it reviewed the use of maps and map-based mash-ups by institutions and noted the use of: •
•
General purpose or administrative uses: Google Maps for 'how to get there' and events-based information often mashed with local data such as weather or traffic updates. Google Maps was felt to have 'overtaken' StreetMap and Multimap due to 'ease of access and functionality, a well-documented API and reliability' (p. 13). More specialist use of geo-related data for specialized services where tools such as JISC ShareGeo and GoGeo! were being used.
7 http://www.casa.ucl.ac.uk/websites/maptube.asp 8 Defined as geographical tools used for personal and community activities by a non-expert user (see section 2.2.2.1). 7
JISC TechWatch: Data mash‐ups… (Sept. 2010)
For the latter, more specialist area of use, there are two broad areas of impact: research, and teaching and learning. In terms of teaching, Liu et al. (2008) argue that mash-ups offer: 'exciting new possibilities for classroom instruction, leading to potentially innovative uses of existing Web applications and data' (p. 245). The authors cite a number of examples, mainly American, in which Web-based maps are used as a form of focus for groups of students to explore educational themes. These maps, often based on Google Maps, allow video, audio, graphic and print-based materials to be geo-tagged, added to the map as an additional layer and used to create educational, theme-based explorations. Closer to home, the University of Leicester is leading on a consortium contract with the University of Nottingham and University College London to deliver Spatial Literacy in Teaching9 (SPLINT), a HEFCE-funded Centre for Excellence in Teaching and Learning (CETL) focusing on the pedagogy of geospatial technologies, the pedagogy of taught postgraduates and the enhancement of spatial literacy in HE. It is worth noting in passing that many of the activities make use of the closely related technical areas of location-based media and the use of mobile phones and PDAs in the field. Further discussion of the teaching and learning implications of this are provided in the JISC TechWatch report on location-based services (Benford, 2005) and Edina's Alternative Access Mobile Scoping Study (Butchart et al., 2010). With regard to the research agenda, the mashing of data from disparate sources and disciplines has the potential to open up new areas of research investigation, with Macdonald (2008) citing a number of examples of researchers currently using geo-referenced data and Web 2.0-style mapping facilities including: • • • • • •
Bjørn Sandvik's Thematic Mapping: http://thematicmapping.org John Hopkins University's Interactive Map Tool: http://www.cer.jhu.edu/maptool.html Minnesota Interactive Mapping Project: http://maps.umn.edu/ GeoVista at Pennsylvania State University: http://www.geovista.psu.edu/main.jps University of Maine's Commons of Geographic Data: http://geodatacommons.umaine.edu Project Saxta: http://saxta.geog.umd.edu
Medical schools and public health researchers are also making use of map mash-up technology, a key example being Health Map10 which shows the current, global state of infectious diseases and their effect on human and animal health (Boulos et al., 2010). This website integrates outbreak data of varying reliability, ranging from news sources (such as Google News) to curated personal accounts (such as ProMED) to validated official alerts (such as the World Health Organization). One of the key effects of geospatial data mash-up technology on research is its ability to foster inter-disciplinary work. The WISERD geo-portal11 is an example of this, aiming to support the interdisciplinary work of the WISERD Centre by providing the central GIS framework to integrate, manage, analyse and disseminate quantitative and qualitative data relevant to the programme (Berry et al., 2010).
9 http://www.le.ac.uk/geography/splint/ 10 http://www.healthmap.org/en 11 http://www.wiserd.ac.uk/research/data-integration-theme/the-wiserd-geo-portal/ 8
JISC TechWatch: Data mash‐ups… (Sept. 2010)
JISC is also active in this area, supporting a number of geospatial and mapping related projects and services.12 The JISC Geospatial Working Group13 provides advice on collecting and development priorities for geospatial resources to JISC Collections, through a process that identifies and responds to user needs and supports the execution of various strategies. In addition, the JISC Standards Catalogue includes a number of geospatial related standards.14 Other salient work includes: EDINA Services provided by the JISC national academic data centre (EDINA15) include: • • •
• •
Go-Geo!: the UK HE sector's geoportal providing a geospatial resource discovery tool and ancillary support services Digimap: a core collection of various spatial framework datasets including Ordnance Survey, British Geological Survey, SeaZone and Landmark Information Group UKBORDERS: an ESRC service funded under the Census Programme that provides a broad range of census and administrative boundary data for the UK, for example local election wards agcensus: agricultural census data Unlock (formerly Geo Cross Walk): middleware services for georeferencing and geoenabling including a comprehensive database of geographical features with their name, type and location, plus a set of simple APIs (the Unlock API) and a natural language 'geoparser' service for assisting in geoenabling extant resources.
MIMAS MIMAS16 provides Landmap, a range of geospatial data for use in HE/FE that includes optical, radar and elevation based data collections derived from satellite, aerial photography and similar sources. MIMAS also provides GeoConvert, an online geography matching and conversion service based on the National Statistics Postcode Directory and runs CASWEB, which provides access to census data for academic use. NCeSS The National Centre for e-Social Science (NCeSS) has produced MapTube, a free resource for viewing, sharing, mixing and mashing data that have a locational element. It describes itself as a 'place to put maps' and because users of the site who put up or index the maps they create do not in general collude or co-operate, the site acts as an archive for map mash-ups. MapTube was first developed as part of the work undertaken by the Geographic Virtual Urban Environments (GeoVUE) team based at University College London's Centre for Advanced Spatial Analysis (CASA). The focus in GeoVUE was on visualization and the node has now merged with the MoSeS node at the University of Leeds to augment the visualization capabilities based on maps with the development of spatial and geographical models that require such visualization. Further development has taken place under the NCeSS follow-on project Generative e-Social Science for Socio-Spatial Simulation (Genesis) and the National e-Infrastructure for Social Simulation (NeISS) project, which is funded by JISC as part of its Information Environment programme.
12 See, for example, Project Erewhon (http://erewhon.oucs.ox.ac.uk/) and the winners of the 'Edina' category at the JISC-sponsored Dev8D workshop (http://dev8d.jiscinvolve.org/wp/2010/03/08/dev8dchallenge-ideas-and-winners/). 13 http://www.jisc-collections.ac.uk/About-JISC-Collections/Advisory-Groups/Geospatial-wg/ 14 http://standards-catalogue.ukoln.ac.uk/index/JISC_Standards_Catalogue 15 http://edina.ac.uk/ 16 http://mimas.ac.uk/
9
JISC TechWatch: Data mash‐ups… (Sept. 2010)
1.2 Summary and rationale The release of Google Maps has enabled the idea that non-expert users can not only view maps on the Web but also engage in some sort of manipulation. To date the interaction is quite limited and really depends on the providers of the maps and related software opening their products to user intervention through some sort of API or embedding functions directly into the maps themselves. In this sense the current state of map mash-ups is fairly primitive in terms of the potential for non-expert users to create their own map content and tailor their cartography to very specific applications. The majority of the features developed so far are simply tools for the display – visualization – of basic or simply derived geographic information. They do not provide any of the complexity of spatial analysis per se, merely the visualization of spatial data whose financing is under-pinned by income generation through advertising. This is symptomatic of Web 2.0 but there is rapid change in that users are beginning to not only create more sophisticated maps, but also to define where such information is produced, who uses it and at what time it is created and applied. The implications of location-based systems are of interest in themselves since they are likely to have a profound impact on society in general, and therefore education. However, over and above that, valuable learning may be gleaned more generally about data and how it may be used in the future. Pervasive computing, mobile devices, sensor networks and spatial search are already shaping how the mash-up scene is developing. The importance of location with respect to mash-ups and these other new technologies is reflected in the difficulty of discussing them without referring to positional or spatial information. In this respect, educational institutions need to understand the potential impact of location-based systems on areas such as teaching and research but, as producers and keepers of data about huge numbers of students, also need to understand the potentially far-reaching consequences of how data of various kinds might be used. This report will look at the evolving landscape that O'Reilly, always ready with a handy moniker, has labelled 'Where 2.0'. Others prefer the GeoWeb, Spatial Data Infrastructure, Location Infrastructure, or perhaps just location-based services. In particular it will look at the development and changing nature of map-based, data mash-ups. It will explain the basic concepts behind map mash-ups, how geospatial data gathering and analysis has changed and how new technologies and standards are impacting on this. It will also look at the wider context including some of the policy that is driving the development of map mash-up technology and some of the longer-term technology developments. In the process of explaining the changes that the ideas behind Web 2.0 (outlined in section 2) are bringing to the world of cartography and geospatial analysis, the report will discuss pertinent issues that relate to the way that data in general is gathered and used. One of the key outcomes of this report, it is to be hoped, is a better understanding of some of the issues surrounding the concept of 'data as a service'. These include data gathering, accuracy, provenance, quality, trust, rights and what conditions are attached to its use in the future. As the report notes, map mash-ups are currently the most popular data mash-up category. In this sense they provide a crucible in which many of these issues will be tested.
10
JISC TechWatch: Data mash‐ups… (Sept. 2010)
2. State of Play: Maps, mash-ups and the creation of the GeoWeb The geospatial industry has traditionally been based around what O'Reilly (2005) would call the 'Web 1.0' business model: software is packaged and sold as 'desktop' applications according to scheduled software releases; revenue is based on selling expensive licences to a few specialized customers (doing business with the head, not the tail) etc. Google Maps on the other hand, leverages the ideas of Web 2.017: it has developed mapping services that take advantage of the ongoing technical evolution of the Web and, something that Google has made its trademark, can turn the ever-increasing scale of the Internet to its advantage (although its mapping apps are really just data collation tools and do not have the spatial analysis functions of GIS). Google is no longer alone, others quickly followed, most notably Microsoft with their Virtual Earth product (now Bing Maps). Proprietary GIS software vendors have responded with their own approach, including a greater focus on enabling Web applications on top of their users' GIS databases, better Web map clients, and offering virtual globes (akin to Google Earth) and mapping APIs that can integrate data from the vendor's systems with map images hosted by the GIS vendor. An example is ESRI's ArcGIS Web Mapping API that can integrate maps generated by ArcGIS Server (and its associate analysis functions) with maps served by ESRI. Users can access these APIs at no cost to build and deploy applications for internal or non-commercial use. However, as a footnote to this discussion, it should be noted that although the power of the mash-ups idea promises the prospect of interactive map creation by people with little or no programming knowledge, current creators of mash-ups need to know their way around XML, cartographic map projection and GIS functionality at a high level. In addition, it is only very recently that data sources (as opposed to customizable 'backcloths') have started to become freely available, along with the tools needed to enable immediate and direct visualization. Finally, it is worth noting that map mash-ups are just the visible tip of the GeoWeb, which in turn is rooted in pre-Web cartography and GIScience. Where the Web is a series of interlinked HTML documents the term GeoWeb has been coined to describe the 'interconnected, online digital network of discoverable geospatial documents, databases and services' (Turner and Forrest, 2008, p. 2). When we look at the detail of what this means we can see that, in fact, GeoWeb encompasses 'a range of application domains from global weather and geological sensors to family travel blogs, public transit information, and friend trackers' (ibid.), and this has prompted some to claim that the GeoWeb is actually more about the overlap between geospatial computation and Web 2.0. While this distinction between Web 1.0 and Web 2.0 may be problematic (Anderson, 2007) the concepts that have evolved, under the Web 2.0 moniker, that enable us to talk about how the Web is changing, are important. These concepts were crystallized as 'six big ideas' by Anderson (2007) in his TechWatch report, from an earlier statement of principles from O'Reilly (2005): • • • • • •
Harnessing the power of the crowd Individual production and user-generated content Openness Network effects Architecture of participation Data on an epic scale
It is almost impossible to count the number of map mash-ups that have been developed since Google released its API in June 2005. It would appear that our ability to mix and match
17 See section 1.1 for a discussion of the difference between the ideas and technologies of Web 2.0. 11
JISC TechWatch: Data mash‐ups… (Sept. 2010)
different data and software for different applications depends on a wide spectrum of programming skills that has no coherence with respect to how such mash-ups are produced. So far there have not been any attempts to classify these and thus we use the six ideas as a way of understanding the direction of travel for map mash-ups and speculating on where it will go next.
2.1 Harnessing the power of the crowd The TechWatch report into Web 2.0 (Anderson, 2007) examined the idea of 'collective intelligence' as described by O'Reilly in his 2005 article. Without revisiting the whole debate here it is important to be aware that in terms of the state of play for data mash-ups, the key thing to note about harnessing the power of the crowd is to not make assumptions about the motivations for why people take part. This relates particularly to the notion of crowdsourcing, which is often assumed to be a collaborative activity. In fact, the original Wired magazine article (Howe, 2006) that first highlighted it brought together varied examples. These included individual action that can be aggregated to create a collective result (see below) and competitions to find individuals who can undertake certain tasks: from amateur photographers for cheap or free labour, to experienced research scientists rewarded with six or seven figure 'prizes' and recruited via the Web to solve specific problems for commercial R&D organizations. The range of activity that is covered by the term crowdsourcing is therefore wide and it is not necessarily an altruistic undertaking.
2.2 Individual production and user-generated content There is no widely accepted, precise definition of user-generated content, but it is generally considered to be content that: is made publically available over the Internet; reflects a degree of creative effort; is created outside professional routines and practices (OECD, 2007). When applied to mapping Goodchild (2007) calls it Volunteered Geographic Information (VGI). With respect to map mash-ups there are two aspects that are important: crowdsourcing location-related data to create maps and crowdsourcing other data (in this example it is socioeconomic data) to overlay onto a map backcloth. 2.2.1 Crowdsourcing socio-economic data One example of how user-generated content can be used within a mapping application is provided by UCL's MapTube portal. MapTube is a free resource for viewing, sharing, mixing and mashing data that have a locational element. It describes itself as a 'place to put maps' and because users of the site who put up or index the maps they create do not in general collude or co-operate, the site acts as an archive for map mash-ups. The maps are created using an application called GMapCreator, a service that makes it easier to use Google Maps, and rather than storing the whole map on the MapTube server, only a link to where the map has been published is stored. When maps are shared in this way information about what the map is and what it shows is entered by the owner, along with the link to where the map is published. As the maps comprise the pre-rendered tiled images from the GMapCreator, the raw data is never stored on the Internet. This means it is a safe way of sharing a map without giving away the raw data used to create it.
12
JISC TechWatch: Data mash‐ups… (Sept. 2010)
3(a)
3(b)
3(c)
3(d)
Figure 3: MapTube and Crowdsourcing: The Credit Crunch Mood Map. 3(a) details the Radio 4 website on the Mood Map for the Credit Crunch; 3(b) the user website questionnaire; 3(c) a distribution based on early responses; 3(d) the final Credit Crunch Map
Radio 4, BBC South, BBC Look East and BBC North have all used MapTube to enable users to respond to specific survey questions, through dedicated websites, where they were asked to give their postcode so that geographic 'mood maps' could be created (Hudson-Smith et al., 2009). The process was first used to create a mood map of the 2008 economic recession in the UK: working with BBC Radio 4 and BBC TV Newsnight, a survey was created that asked people to choose the one factor (of six) affecting them most during the recession. No personal information was collected with respect to the 23,000 responses. Figure 3 demonstrates a typical map, produced online in real time through the MapTube portal and constructed in map mash-ups using the Google Maps API. Since then, the same team have developed SurveyMapper,18 where users create their own online survey which in turn creates georeferenced responses that can be mapped in real time. Data were created by individuals acting independently, with the technology providing the aggregation power to visualize the collective result. In fact, with the emergence of many social networking sites millions of users can create their own data and respond to others through various methods of online communication, which can all be tagged with respect to
18 www.surveymapper.com 13
JISC TechWatch: Data mash‐ups… (Sept. 2010)
location. There are, for example, various mash-ups that build up pictures of places from Flickr data that have been tagged using a consistent series of tags such as geocodes, often added after the pictures are produced or when they are uploaded. One last point is relevant to these kinds of middleware that support map mash-ups. They can be used not only for maps but also for any data that needs to be displayed in two dimensions and which requires the functionality that is offered by the basic map platform, of which pan and zoom are the obvious features. Pictures and related artwork are obvious sources. 2.2.2 Crowdsourcing geospatial data Companies are beginning to employ the techniques of crowdsourcing to both extend and improve their datasets. In the area of consumer grade applications such as personal navigation devices (PNDs19) there has been some experimentation in methods of data collection and validation. Turner and Forrest (2008) divide these into active and passive, where active data collection allows users to enter and validate information, and passive data collection captures user actions and behaviours to infer intent and 'interest'—for example using Web analytics to track how long people look at particular webpages and advertisements, and what links they click. As an example of active data collection the authors cite the error correction services provided by TomTom and Dash. The TomTom system allows a driver to 'hit one button' to notify TomTom that the displayed route is incorrect, while the Dash system deploys an Internet-connected PND to identify where 'multiple driver routes diverge from expected roadways, indicating an error in the road data' (ibid, p.4). Not only can Dash update its own data, but it can also provide updated information to other data collection companies. The authors go on to describe how Nokia's decision to purchase data provider NAVTEQ suggests that in emerging markets such as Asia and Africa (where there is limited, high quality geospatial data available, and where Nokia controls a high proportion of the handset market) the company is able to use its position in the market to leverage geospatial data collection, through their mobile devices, to add to the NAVTEQ base. In this way, Nokia is able to 'dramatically expand' its data holdings of an area of the world where there is currently a lack of professionally collected geospatial data. 2.2.2.1 New geography and the rise of the amateur In general, one of the main precursors for the rise in user-generated content was the widespread adoption of cheap, fairly high quality digital cameras, video cameras, mobile- and smart-phones (Anderson, 2007). This is no different for mapping technologies, where the decreasing cost and ongoing improvement of GPS receivers has offered users the ability to easily capture their own spatial information. Prior to May 2000, civilian GPS receivers were subject to selective availability, a feature imposed deliberately by the US Department of Defense to degrade positioning signals, thereby limiting the device's accuracy. Now, however, cheap handheld receivers are increasingly available for leisure and hobby purposes and are commonly found in other devices such as mobile phones and cameras. Where GIS work has traditionally been a highly specialized profession, increasingly, a community of amateur map enthusiasts are using new technologies to capture map data. They may record single locations of objects, perhaps the place where a photograph was taken, or use the device in a logging mode to record many points, which may be edited later on a computer to create a road or other geographic feature.
19 These include the dashboard-mountable navigation devices sold by companies such as TomTom and Garmin. 14
JISC TechWatch: Data mash‐ups… (Sept. 2010)
The term neogeography has been coined in an attempt to capture the essence of this voluntary action, although the term itself has been the subject of some debate.20 On the Platial blog, Eisnor (2006) describes neogeography as '…a diverse set of practices that operate outside, or alongside, or in a manner of, the practices of professional geographers'. Turner (2006) takes this further by stating that rather than making claims about scientific standards, methodologies of neogeography tend towards intuitive, expressive, personal, absurd, artistic, or maybe just simply idiosyncratic applications of 'real' geographic techniques. While neither of these descriptions mentions Web 2.0 explicitly, the sense in which non-experts create maps and manipulate map data, thereby extending the area of practice beyond that of professional geographers, geographic information scientists and cartographers, means that there is obvious common ground between neogeography and crowdsourcing. This is not to say that these practices are of no use to the cartographic/geographic sciences, but that they usually do not conform to the protocols of professional practice (Haklay et al., 2008). The application of neogeography is demonstrated by the OpenStreetMap project (OSM). Launched on 9th August 2004, OSM is the brainchild of ex-UCL student Steve Coast. Behind it is a simple concept: to create freely available geographic information without any legal or technical restrictions. In this sense, it is a kind of Wikipedia for geographic information that relies on crowdsourced spatial data. Contributors take handheld GPS devices (equipped with open source software) with them on journeys or go out specifically to record GPS 'tracks'. This may take the form of 'mapping parties', where they record street names, village names and other features using notebooks, digital cameras, and voice-recorders to collect data (for more on this see the OSM Wiki21). Once the event is complete, the data are added to the central database. Additions such as street names, type of path, links between roads etc. are added based on the notes taken en route and contributions are moderated by more experienced users. These data are subsequently processed to produce detailed street-level maps, which can be published, freely printed and copied without restriction. Anyone is able to take part if they have a GPS unit and the desire to see their work as part of the map. Since 2006, Yahoo! has allowed OSM to use their aerial imagery and to an extent this has lessened the need for GPS traces, although it still requires community effort to gather street names and provide details of road types, road restrictions etc. One example dates from 2007 when OSM began to use Yahoo! imagery to map the streets of Baghdad, Iraq by remote sketching combined with calls to participants in the vicinity to help refine the road layout information. Figure 4 details the layout that was completed by 5th May 2007 on all roads that are visible in the sourced imagery.
20 As a starting point for some of the debate, see: http://highearthorbit.com/neogeography-towards-adefinition/ 21 http://wiki.openstreetmap.org/wiki/Main_Page 15
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Figure 4: The OSM crowdsourced map of Baghdad, May 2007 (from: http://wiki.openstreetmap.org/wiki/Baghdad)
The increase in OSM's activity and coverage has, in turn, fuelled its usefulness and encouraged both corporate and community contributions. Automotive Navigation Data, a Dutch data company, turned over its data on China and the Netherlands because it saw little value in owning an incomplete dataset. Their hope is that by opening up their data via OSM, they will be able to create datasets that are 100% accurate (Turner and Forrest, 2008).
2.3 Openness The development of the Web has seen a wide range of legal, regulatory, political and cultural developments surrounding the control, access and rights to digital content. However, the Web has also always had a strong tradition of openness: working with open standards, using open source software [...] making use of free data, re-using data and working in a spirit of open innovation. Anderson, 2007, p. 25. As well as being an example of crowdsourcing OSM is important for another reason— openness. Originally set up in response to the high cost and restrictive licensing of Ordnance Survey data, OSM is dedicated to providing a source of geospatial data that is free from technical and legal restrictions, in particular intellectual property rights (IPR) pertaining to the use and reuse of map-related data. 2.3.1 Intellectual Property Rights IPR are perhaps the most significant challenge facing the widespread use of geospatial data and the creation of data mash-ups more generally. Using a mapping API can mean that the user is relieved of many worries relating to rights and licences to display map data on the Web – these are essentially wrapped up in the licence to use the API – although this does not provide freedom from IPR per se. As it stands, mapping that has been derived using Ordnance Survey data as a base may not be presented on a Google Map (OS, 2008). This is the case whether the derived data is passed to Google via their API or kept completely separate from them by using a bespoke map interface. The ongoing freeing up of Government data, discussed in section 4.8, will help to alleviate some of these issues and ease access to some datasets. However, the problem does
16
JISC TechWatch: Data mash‐ups… (Sept. 2010)
not only apply to datasets produced by the proprietary data providers. All data shown on Google Earth or Google Maps are protected by US copyright laws. This includes any derivative products, although the licence for Google Earth and Google Maps allows for noncommercial personal use e.g. websites and blogs. Bing Maps and Yahoo! Maps have similar copyright restrictions and non-commercial personal use exemptions. A key organizational challenge is therefore to educate university staff to become equally familiar with both the potential and the limitations of mash-up technologies. The first issue is the terms of service of any map API employed in the mash-up as these may restrict what can be done, but even more than this, may claim rights over data submitted through the API. 2.3.2 'Open' APIs APIs simplify things for mash-up developers. They provide a way for them to make software that interacts easily with other software through a well-defined interface. An API that does not require the programmer to pay licence fees or royalties is often described as 'open'. Such APIs have helped Web 2.0 services develop rapidly and have facilitated the creation of mashups of data from various sources. However, the JISC report into Web 2.0 (Anderson, 2007) cites Brian Behlendorf's encapsulation of one of the common misconceptions of 'open' when applied to services and APIs: just because something is available on the Internet does not necessarily mean that it is open. In fact, we need to distinguish further and say that just because something is available free of charge does not necessarily mean that it is open. In fact, deciding whether something is open or not is dependent on a variety of factors, e.g. what standards does it adhere to and how open are those standards? 2.3.3 Open source software and open data Open source software is starting to have an effect on IPR and how they are perceived. This is being compounded by the crowdsourcing model, which relies on a huge number of usually amateur 'creators' who do not rely on being paid for their content and often choose to give up some of their copyright protections. This of course has a potential knock on effect: data mashups may be republishing material that has been produced to varying degrees of accuracy and for which the process of assigning rights has been obscured. 2.3.3.1 Open source geospatial software Open source geospatial software tools offer new opportunities for developers to create new mash-up applications more quickly and at lower cost. Having a community of developers creates an ecosystem for rapid production of software applications that are robust and may even have greater reliability than some proprietary software solutions. However, while these products may be suitable for Web mash-ups the functionality is unlikely to be suitable for spatial analysis. Governments have realized the benefits of open source and are actively promoting this. The UK Government Action Plan on Open Source, Open Standards and Re–Use is working in this capacity. Tom Watson MP, former Minister for Digital Engagement states the Government's commitment to open source: 'Over the past five years many Government departments have shown that Open Source can be best for the taxpayer – in our Web services, in the NHS and in other vital public services' (Cabinet Office, 2009, p.1). The Open Source Geospatial Foundation (OSGeo) is a not-for-profit organization whose mission is to support and promote the collaborative development of open geospatial technologies and data. The foundation provides financial, organizational and legal support to the broader open source geospatial community. It also serves as an independent legal entity to
17
JISC TechWatch: Data mash‐ups… (Sept. 2010)
which community members can contribute code, funding and other resources, secure in the knowledge that their contributions will be maintained for public benefit. OSGeo also serves as an outreach and advocacy organization for the open source geospatial community, and provides a common forum and shared infrastructure for improving cross-project collaboration. The foundation's projects are all freely available and usable under an OSI certified open source licence. The project development statistics for the various software projects under the OSGeo umbrella give the bigger picture of the potential of open source software in the geospatial domain.22 2.3.3.2 Open data Increasingly, discussions over what constitutes openness have moved beyond the parameters of open source software and into the meaning of openness in the context of a Web-based service like Google (O'Reilly, 2006). Some argue that for a service it is the data rather than the software that needs to be open and there are those who hold that to be truly open the user's data should be capable of being moved or taken back by the user at will. On his blog, Tim Bray, an inventor of XML, argues that a service claiming to be open must agree that: 'Any data that you give us, we’ll let you take away again, without withholding anything, or encoding it in a proprietary format, or claiming any intellectual-property rights whatsoever' (Bray, 2006). OpenStreetMap started with the aim of creating freely available geographic information without any legal or technical restrictions. Contributors use GPS receivers, paper sketches or even draw over aerial imagery to map anywhere in the world and data are released under a Creative Commons Attribution-ShareAlike (CC-BY-SA) 2.0 licence. This means that the source vector data used to make the maps are available for download. Data may be used for free, including for commercial gain, as long as they are correctly attributed as CC-BY-SA together with the copyright owner; as simple as adding 'CC-BY-SA 2009 OpenStreetMap'. It is supported by tools and applications for using and repurposing the data, as well as other services (e.g. OSM Cycle Map) and is complemented by other open data projects such as OpenCellID and WiGLE (base station and Wi-Fi locations). An alternative to OSM is the Google Map Maker service, which began in June 2008. It is similar to OSM in that it crowdsources maps in countries where current mapping data are unavailable or sketchy but in contrast to OSM, its licence terms require that all data submitted and maps created are the intellectual property of Google (Turner and Forrest, 2008). Users are able to trace features in a way that is similar to OSM's use of Yahoo! data: they can sketch directly onto imagery and add roads, railways, etc., even building layouts and business locations, and data are checked by more experienced users. Both OSM and Google Map Maker have varying levels of accuracy and as Haklay (2010a) notes, there seems to be friction between them as to which organization will ultimately prevail amongst Government and NGO users.
2.4 Network effects and the architecture of participation [Architecture of participation] is a subtle concept, expressing something more than, and indeed building on, the ideas of collaboration and user production/generated content. The key to understanding it is to give equal weight to both words: this is about architecture as much as participation, and... the architecture of participation occurs when, through normal use of an application or service, the service itself gets better. To the user, this appears to be a side effect of using the service, but in fact, the system has been designed to take the user interactions and utilise them to improve itself. Anderson, 2007, p. 19.
22 http://wiki.osgeo.org/wiki/Project_Stats 18
JISC TechWatch: Data mash‐ups… (Sept. 2010)
The architecture of participation (AoP) utilizes the power of network effects, a general economic term used to describe the increase in value to the existing users of a service in which there is some form of interaction with others, as more and more people start to use it (Klemperer, 2006; Liebowitz and Margolis, 1994). It is most commonly used when describing the extent of the increase in usefulness of a telecoms system as more and more users join. Anderson (2007) elaborates on this definition of network effects and describes some of the implications for users of Web 2.0 services such as social networking sites. The key to harnessing network effects and AoP for map mash-ups is being able to operate successfully at the Internet scale. One of the ways Google did this was by developing caching for map images, something which has now become an OGC standard (see section 3.2.2). To date, the focus of the benefits of the AoP has been on the benefit to the user. However, a system that gets better the more it is used is also of significant value to the company providing the service. The key point for our discussion here is that there are two other aspects to network effects: the commercial value of the data being collected through mash-up APIs which has become possible through the architecture of participation and the value of complete datasets.
2.5 Data on an epic scale In the world of geographic information, the value has long belonged to those companies that control the underlying data. Turner and Forrest, 2008, p.3 The importance of data and controlling datasets has always been at the forefront of the ideas around Web 2.0. As Tim O'Reilly said, when speaking to the Open Business forum (2006): 'The real lesson is that the power may not actually be in the data itself but rather in the control of access to that data.' This is no less true in the world of map mash-ups. Traditionally, the geospatial data marketplace has been dominated by players such as NAVTEQ and Tele Atlas and, in the UK, by the Ordnance Survey. New technology and the ideas of Web 2.0 are shaking up this market. Google, Yahoo!, Microsoft and others have broken into the market and data mash-ups are just one of the tools that they are using to collect not only geospatial data but also all sorts of other data as well. However, by crowdsourcing large amounts of geospatial data, mobile device manufacturers are also becoming data collectors and purveyors, with Turner and Forrest (2008) stating that Dash (a navigation device company) was set up, right from the very beginning, to handle geospatial data. This increasing interest in data is connected to its market value, with NAVTEQ being sold to Nokia in October 2007 for $8.1 billion (Erkheikk, 2007) and Tele Atlas being sold to TomTom in July 2007 for €2 billion (Hoef, 2007).23 This is about more than the simple collating of very large geospatial datasets alone. There are also issues concerning how those datasets are integrated and made interoperable with other products and services. To take one example, Google makes much use of its highly sophisticated infrastructure of both server capacity and existing software services to leverage increased value from its growing geospatial data collection (see section 3.2.3). There are of course important questions of accuracy, error and precision for crowdsourced data, not least the extent to which datasets that are built up from individual and thus partial records are representative. This is a difficult question to answer in a general form. The
23 More recently, Google has entered the 'turn-by-turn' market. It is expected that all Android devices from Google will use the company's own geospatial dataset rather than rely on third parties. See: http://abovethecrowd.com/2009/10/29/google-redefines-disruption-the-%E2%80%9Cless-than-free%E2%80%9D-business-model/
19
JISC TechWatch: Data mash‐ups… (Sept. 2010)
requirements will vary between uses of map data: backdrop mapping (a map image as a background for one's own mapping) can usually tolerate much more error than, for example, a car routing application. There are also a number of dimensions to spatial data accuracy and revision: • • • •
Positional accuracy: e.g. are the features correctly located, within the scale constraints? Completeness & currency: e.g. are all the real-world features present in the data (moderated by the scale) and out-of-date features removed? Attribute accuracy: e.g. are the geometric features correctly and completely described and annotated, for example do points have associated town names? Logical consistency: e.g. are the topological relationships between features correct, for example is the routing of the roads correct, do rivers and roads meet at the appropriate features (bridge, ford, tunnel, etc.)?
To an extent these questions lie beyond the scope of this report but they are central to evaluating how good a map mash-up actually is with respect to the purpose for which it is generated.
20
JISC TechWatch: Data mash‐ups… (Sept. 2010)
3. Technologies and Standards Turner and Forrest (2008) argue that the tools and ways of working with technology associated with the emerging GeoWeb can be considered to provide a 'GeoStack' that allows the creation, publication, sharing and consumption of map-based information. In this section we will outline some of the key elements of this stack and discuss their implications. Formally we can say that a mash-up is 'an application development approach that allows users to aggregate multiple services, each serving its own purpose, to create a new service that serves a new purpose' (Lorenzo et al., 2009). In the context of the GeoWeb such mash-ups make use of geospatial related services and data feeds. The map image is the fundamental layer of information onto which other data are superimposed. In general, data and services are made available to others through two principal methods: by exposure via a Web API or as a data feed through RSS or ATOM. These services and feeds are basic 'ingredients' to be mixed and matched with others to form new applications.
3.1 The role of Ajax and other advances in Web technology A large part of the success of mash-ups can be attributed to the ongoing increase in the capabilities of Web browsers. Asynchronous processing means that data and page content can be accessed and displayed without reloading the entire webpage, resulting in greatly improved user interaction and response times. In addition, XML has become established as a standard format for transmitting data and messages on the Web. Combination of these technologies is usually referred to as AJAX (Asynchronous JavaScript and XML). The now familiar ability to zoom or pan the map by clicking and dragging (a 'slippy' map), using AJAX methods for map interactions, has changed users' expectations of mapping on the Web. While AJAX improves the interaction with websites for the end user, APIs offer a means of simplifying things for the mash-up developer. They provide a way to make software that interacts easily with other software through a well defined interface. Often using Web scripting languages, such as JavaScript, individuals can create applications with nothing more than a simple text editor, with the API taking care of map data supply. Another important technical development has been the ability for users to request additional data, from a separate Web service, from within the API. A Web service, also known simply as a service, provides a defined functionality across a local network or the Internet for other applications to use.
3.2 Map mash-up basics Web-based mapping solutions fall into two distinct categories: 2-D and 3-D (see section 4.6 for more on 3-D). In order to display 2-D map data on an HTML webpage, a mapping API is used to run code on the page. APIs for these 2-D, Web-based maps fit into two groups: • •
lightweight Javascript-based APIs (e.g. Google Maps, OpenLayers) those based around a more complex technology such as ActiveX, Silverlight, WPF or Flash (which are used in Bing Maps and Yahoo! Maps).
Both types of system work by serving a map that has been reduced to a set of tiled images with a fixed number of rows and columns that can be partitioned further. The size of the tiles is arbitrary, but a common implementation is to use a single 256x256 pixel tile, at the first
21
JISC TechWatch: Data mash‐ups… (Sept. 2010)
zoom level, which covers the whole world. At the next zoom level, there are four tiles, then sixteen, then sixty-four etc. (according to the structure of the relevant quadtree24). It is important to understand that access is usually provided to the tiled images of the map and not to the actual geo-referenced data per se.25 With proprietary maps from Google, Microsoft and Yahoo! the tiles are already rendered (turned from raw data into images) and stored at the portal. With OSM, users do have access to an open source version of these data, so they can render their own tiles. While the proprietary map providers have their own bespoke tile renderers, open data require open source tile rendering software to function. A common combination is OSM data with the OpenLayers API for the map, using Mapnik or OSMarender to render the data using a rules file or style that defines how the OSM data are drawn. So, for example, OSMarender takes in an OSM data file and a rule file describing how the map data are to be marked up, then outputs an image for display in a browser (in the form of SVG). 3.2.1 Vector versus pre-rendered tiled data As we have noted, GIS tends to make use of raw data in the form of vector files rather than pre-rendered, tiled images containing annotations. This is because in order to really do some of the more advanced spatial analysis functions many of the operations need to be on the data and on measures of the geometry of the maps. Thus if one is forced to use tiled images, much of the functionality of spatial analysis is not possible or is very slow. Allowing access to the raw data has one other particular advantage: it enables the user to change the base map projection. At present, all the major 2-D tiled map systems use the same map projection, namely 'Spherical Mercator' (EPSG:3785, 4326, and 900913), a projection that assumes the world to be a perfect sphere rather than an ellipsoid.26 While this works adequately for the most populated areas of the world, any data shown on the map above or below 85º north or south contains errors that are significant for many applications.27 Some users, involved for example in weather forecasting or climate science, require data to be truly global where a polar stereographic projection would make more sense. For these applications, a custom tile renderer with a different projection could be constructed using OSM data, or other open sources of world outline files. To date, little use has been made of custom tile renderers using a different projection, although this is being discussed for visualising environmental data. 3.2.2 Sending images or data to the client Whether pre-rendered tiled images or raw vector data are used there needs to be a process of transfer to the Web client. Before 2005, Web-based maps utilized two OGC standards: Web Map Service for maps sent to the client as images, or Web Feature Service for maps where actual vector data were sent to the client. Neither of these solutions could cache requests (this was implicit in the way the standards were written) and were therefore not scalable to large numbers of users (as each request from every user meant drawing a new area of the map, so creating a load on the server that was correspondingly higher).
24 A form of data structure used in computer science in which each node of a tree has exactly four children. 25 See: http://wiki.osgeo.org/wiki/Tile_Map_Service_Specification for an example of how this works. 26 The Earth is not actually a perfect sphere. Assumptions are made for various navigational and historical reasons. See for discussion: Introduction to Spatial Coordinate Systems: Flat Maps for a Round Planet: http://msdn.microsoft.com/en-us/library/cc749633.aspx 27 See: http://docs.openlayers.org/library/spherical_mercator.html
22
JISC TechWatch: Data mash‐ups… (Sept. 2010)
In contrast, Google knew early on that it required very high scalability and so used an appropriate architecture. Google Maps allows tiled images to be cached both by the system, for rendering and storing, and by the Web server and client browser. This ability to scale has allowed Internet-scale adoption of Google Maps by large numbers of users. In response to these developments, the OGC now has a standard called Web Map Tile Service, which adopts the fixed location, tiled images implemented by Google. 3.2.3 Infrastructure It is also important to note that while OpenLayers and the Google Maps API are intrinsically similar, OpenLayers is simply an open source Javascript library. The Google Maps API is a library but it is also backed by other Google infrastructure. One example will illustrate the extra capabilities afforded by Google's use of its own in-house technology. This concerns a common Web security restriction called 'cross site scripting'. In essence this means that the browser will only show data from the same site as the page that it is displaying. When drawing a Web-based map it is common to overlay annotations on the map (e.g. placemarks, images, polygon symbols, textual descriptions etc.). Usually these data are stored as a KML file (see section 3.4.1.2). However, if the KML data to be overlaid come from a different Web server then the process will fall foul of the cross site scripting restriction. Google gets around this by using a KML proxy which is part of their 'free to use' Web infrastructure. This proxy technology does not exist in the case of the OpenLayers API as there is no corresponding infrastructure. These technical details are important with respect to the nature of the mash-up that ultimately emerges. They not only affect speed of access and size of tile that can be displayed, they determine to an extent the presentation of what might be possible in any application.
3.3 Specific technologies for map mash-ups An API, or more specifically a JavaScript library, of particular relevance to map based mashups is the Mapstraction28 library. Each mash-up vendor provides a different API system and if developers only learn to use one type of API this can create a form of 'lock-in'. Mapstraction gets around this by providing a common platform where developers can switch the mapping provider with a simple code change, without having to rework the entire application. This is a valuable service as it allows map-based mash-ups to be shared without necessarily promoting any one company's services. It also allows some independence from the continued availability of any one particular mapping API. Also of relevance to map mash-up developments are the services offered by Cloudmade,29 a recent start-up created to leverage the OSM data resources. In particular, Cloudmade offers a map API similar to Google's, based on an improved OSM database. This API includes options for user-defined styling, allowing the map cartographic styles to be adjusted to suit the application (see Figure 5).30
28 http://www.mapstraction.com/ 29 http://www.cloudmade.com 30 See also Google's API version 3 which has introduced user defined styling. 23
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Figure 5: Cloudmade map style editor tool http://maps.cloudmade.com/editor
3.4 Standards and Infrastructure Web-based mapping and map mash-ups rely on some form of standardization, although this is in a state of flux as might be expected in an area that is dominated by non-expert users demanding easier functionality. For many simple map mash-ups there is a reliance on de facto standards, implied by using a vendor's API with its particular data format requirements. Before the Web, organizations that needed to share data used either de facto standards or software such as the Feature Manipulation Engine (to push data from one format to another). With the arrival of the Web came the Open Geospatial Consortium (OGC), which grew out of the need to share data, and later services, more effectively. The OGC is an international industry consortium of nearly 400 companies, Government agencies and universities. It coordinates its work closely with the ISO TC211 group and facilitates a consensus process to develop publicly available interface standards known as OpenGIS. OpenGIS standards support interoperable solutions that 'geo-enable' the Web, wireless and location-based services and mainstream IT with standards that cover spatial data formats, protocols and structures for storing and accessing data, as well as various methods for querying, assembling and aggregating data. The standards allow technology developers to make spatial information and services accessible to other applications. 3.4.1 Key standards 3.4.1.1 ESRI Currently, the main de facto standard with respect to spatial data is probably the shapefile, a proprietary binary format for vector data developed by ESRI for use in the popular ArcView and ArcGIS software packages. An attempt was made to bring the shapefile specification into the OGC standardization process, but while the data specification has been published, ESRI retain control of future development and changes to the specification. However, a freely available specification exists for all to use and it is supported by almost all proprietary GIS software. The structure of shapefiles is relatively simple and is based on points, lines and polygons with a linked .dbf database for storing attribute information. Shapefiles are mainly used for spatial data storage as they are capable of handling large amounts of geographic data in any projection and can also be used for spatial data exchange. However, limitations with shapefiles (including size limits, lack of topological information and general inflexibility) led ESRI to introduce a replacement format—the ESRI
24
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Geodatabase.31 It is a powerful format, allowing unlimited size files, multi-user editing and sophisticated spatial relationship structures such as topological networks. It is however a closed format that can only be accessed using ESRI's software development kit (SDK) and the format has not yet 'taken off' in the manner that the shapefile format has. 3.4.1.2 Open Geospatial Consortium The OGC does not support the shapefile format, and instead, recommends the Geography Markup Language (GML32 ) and the Keyhole Markup Language (KML33) as basic standards (OGC, 2008). KML provides for a lightweight means to encode spatial data in an open format, and hence has had wide uptake. GML is an XML schema for expressing geospatial features and is an alternative to KML and shapefile. It offers a more complete system for data modelling and as such is more usually used as a basis for scientific applications and for international interoperability e.g. as the foundation for INSPIRE (a European geo-data harmonization initiative) and GEOSS (the Global Earth Observations System of Systems). A lightweight form of GML, Simple Features Profile, is also available.34 Other important OGC standards include: The Web Feature Service interface standard35 (WFS) describes a simple HTTP interface for requesting geographical features. While the client can specify a geographic area of interest in the same way as for WMS, additional filters also allow fine control over the features returned. For example, a 'QUERY' request might specify all roads within a given area. Unlike WMS, the WFS standard returns geographic features in the form of GML or shapefiles. In addition to 'QUERY' operations, WFS also supports 'INSERT', 'UPDATE', 'DELETE', 'LOCK' and 'DISCOVERY'. The Web Map Service interface standard36 (WMS) describes a simple HTTP interface for requesting maps using layers. The maps are drawn by the server and returned to the client as images (e.g. .jpeg or .png). The client specifies the bounding box of the map together with the layers required and receives the map back as a single image, unlike the WMTS service which returns a set of images as tiles. A Styled Layer Descriptor37 (SLD) extends the WMS and provides for encoding userdefined symbolization and colouring of geographic feature and coverage data. It provides for software that has the capability to control how geospatial data are visualized. Tile server software like Mapnik or GeoServer uses an SLD document to define how the map tiles are drawn from the geographic feature data. In the case of thematic or choropleth maps, the SLD is extended by the Symbology Encoding specification in order to render data that is not provided for in the base SLD specification. The Symbology Encoding document defines how
31 Note that the term geodatabase is typically used much more generally to include any possible spatial database format. They are generally considered more complicated to use and this could be part of the reason for the enduring popularity of the shapefile. 32 http://www.opengeospatial.org/standards/gml 33 The Keyhole Markup Language is named after the company that originally developed it as part of its work on what became, after a take-over, Google Earth. In particular, the styling components of KML were developed with a view to what Google Earth needed to be capable of and therefore were not designed with interoperability in mind. However, Google recently turned KML over to the OGC to bring into the standardization process. 34 http://xml.coverpages.org/ni2005-07-07-a.html 35 http://www.opengeospatial.org/standards/wfs 36 http://www.opengeospatial.org/standards/wms 37 http://www.opengeospatial.org/standards/sld
25
JISC TechWatch: Data mash‐ups… (Sept. 2010)
feature and coverage data are portrayed visually on the map. This is an XML encoding using symbolizers and filters to define how attribute data are displayed. The Web Map Tiling Service standard38 (WMTS) aims to improve performance and increase the scalability of Web map services through caching. It is modelled on the large-scale tiled map systems used by Google, Microsoft and Yahoo!, where requests are made for discrete map tiles which can be cached both in the server and the client browser. The Web Coverage Service interface standard39 (WCS) defines a standard interface for access to coverage data e.g. satellite images, aerial photos, digital elevation and terrain data, LIDAR or any other raster-based tool (as opposed to WFS, which defines a standard interface for access to vector data in the form of points, lines and polygons). Data falling within a bounding box can be queried and the raw vector data returned to the client. The Grid Coverage Service40 (GCS) refers to data that are raster in nature rather than vector. Examples include satellite images, whether visible light or any other sensor, digital aerial photos, LIDAR, elevation and terrain data. The GCS document defines standards for requesting, viewing and analysing raster data. GeoRSS41 feeds are designed to be consumed by geographic software such as map generators. While RSS is used to encode feeds of non-spatial Web content (such as news articles), content consisting of geographical elements, defined for example, by vertices with latitude and longitude co-ordinates, are usually encoded using GeoRSS. 3.4.1.3 Semantic Data Standards A number of organizations are currently developing Semantic Web technologies for representing their geographic data (e.g. UK Ordnance Survey and US Census), and this involves creating geospatial data in the Resource Description Framework (RDF) format. In general, the power of this technique is that all the data are machine-readable. SPARQL is a key standard in this respect: a query language for semantic data. In some ways this is analogous to how SQL is used to query a database, but using the semantic Web instead. In the UK this is backed by the Talis platform, which offers 50 million 'triples' or 10GB of storage for open data as long as it is publicly accessible and under an open data licence. In addition, the Ordnance Survey has developed an administrative geography ontology that contains knowledge about areas. The LinkedGeoData project is working to extract RDF triples from the OSM dataset and make them available to semantic Web researchers.42 There are also plans for EDINA's Unlock service to provide a triple store version. 3.4.1.4 Database Standards Database storage plays an important role in Web-based mapping systems and Simple Features for SQL43 defines the storage and retrieval of geographic feature data in SQL databases. At present, the following spatial databases support this standard: SQLLite, Microsoft SQL Server 2008, MySQL, PostGIS, Oracle Spatial, ESRI ArcSDE, Informix, and IBM DB2.
38 http://www.opengeospatial.org/standards/wmts 39 http://www.opengeospatial.org/standards/wcs 40 http://www.opengeospatial.org/standards/gc 41 http://www.georss.org/Main_Page 42 http://linkedgeodata.org/About 43 http://www.opengeospatial.org/standards/sfs 26
JISC TechWatch: Data mash‐ups… (Sept. 2010)
3.4.2 Technical Infrastructure A basic requirement of most mash-ups is the availability of a Web server to host the site and any extra data. However, this is not always necessary and various hosted services and cloud computing solutions are emerging (Hobona et al., forthcoming). For example, Yahoo! Pipes takes existing content from Web feeds and pages, which users can integrate using a visual programming environment. The mash-up is saved online as part of an account, avoiding the need for a user to manage a separate server or Web host. Similarly, Geocommons provides a free and accessible platform for users to create maps from either their own data or a large repository of map layers. The application provides functionality for both cartographic styling and a variety of base maps from the main map mash-up vendors. An extension to the application, called GeoIQ, enables datasets analysis to be carried out. Such tools will become more prevalent as map mash-ups continue to develop. Cloud and grid computing architectures are used for reasons of efficiency and scalability, but when implemented as a shared resource they will also facilitate increasingly complicated analyses of large datasets in real time which may then be presented as a mash-up. Rather than managing a centralized server, institutions may use a third party, cloud-based architecture that supplies processing, data storage and applications in a secure environment. Furthermore, prevalence of software-as-a-service type applications will alleviate the need for powerful desktop or mobile computer processors. Mash-up APIs using this type of system will allow users to create applications based on complex algorithms, modelling techniques or simulations in order to help their understanding. An alternative to the cloud-based solution is to develop an institutional mash-up site. An example of this is CASA's MapTube, which curates user-contributed choropleth maps for display on a Google Maps base. This is associated with a software application, GMapCreator, which takes GIS data and creates the necessary map image tiles and XML configuration file to be served through MapTube. However, MapTube still requires the user to present the data from their own Web server (not least to free CASA from obligations relating to users presenting unlicensed data). The user's server needs only to be configured as a standard Web server, so the MapTube data could be served from a user account on an institutional Web service. MapTube is limited in presenting a certain type of map on a particular underlying map API. However, more complicated and/or large datasets need an appropriate database and associated server software to operate efficiently, as is the case when displaying more than a simple point dataset on a map. This presents a potential barrier to adoption of more sophisticated mashups. Associated with this are issues of security and authentication—firstly to ensure that the mash-up's Web server is secured through its software stack (from server down to the database), and secondly to only present data to authorized users.
27
JISC TechWatch: Data mash‐ups… (Sept. 2010)
4. The future of data mash-ups and mapping In terms of the future of data mash-ups in general, the importance of mapping and geospatial data mash-ups is likely to be just the beginning. They have popularized the idea of data mashups largely because the potential for rich visual experiences is a powerful driver for the uptake of the services that are associated with them which, in turn, encourages users to contribute huge amounts of data through the service's API. We are now starting to see similar sorts of hacks occurring in many other areas of social and intellectual life, where ideas in one field are being transposed and applied to another. There will be new software developments – temporal maps, for example – and animations to accompany these will become routine. Browser technology is advancing rapidly with an indication that the Read/Write Web is moving towards a Read/Write/Execute status, sometimes called Web 3.0. This will allow software to be run directly within the browser, providing a platform for spatial analysis, advanced map mash-ups and sophisticated data mining. Toolkits that are currently available are likely to develop rapidly and become available beyond the current specialists and interest groups. There will also be an increase in the development of 3-D mash-ups and spatial visualizations allowing the wider communication of complex datasets. In the meantime, questions need to be asked about the epic scale of data being produced, its accuracy, and the potential problems for privacy when data are combined in ways that are not necessarily anticipated when the datasets are developed. For HE there are also particular questions about copyright or research data being given up to organizations that operate proprietary services and the potential for privacy and confidentiality breaches. Over the next ten years or so there are several technologies and applications that are likely to become increasingly important to HE:
4.1 Semantic mash-ups Despite the emergence of Web 2.0 tools such as Yahoo! Pipes and standard geo-data formats like KML, the task of identifying and integrating datasets of interest must be done manually. Automatic data mapping is only just beginning to be explored and requires the ideas of the semantic Web to be incorporated into mash-up development. 'Semantic mash-up' is the idea that computers help humans discover and integrate data. It forms part of a research area known as the semantic geospatial Web, which requires the availability of semantically enriched, machine-readable geospatial and location data (Egenhofer, 2002), and this is likely to be a significant research endeavour in geo-data mashups in the coming decade. Lefort (2009) states that there are two main classes of service being explored: legacy semantic mash-ups and opportunistic semantic mash-ups. The former transform existing geospatial mash-up data in XML into RDF and so into a semantic application. The latter scrape HTML, or in some cases RDFa44 data, from websites as required. An early example of the latter is provided by the DBpedia project which seeks to create a semantic Web knowledge base by extracting marked up data from Wikipedia. Although much of Wikipedia is free text there are various forms of structured information which are marked up using wiki code, for example quite a lot of the data that are held in what are called
44 RFDa is a technique to sprinkle RDF data within existing webpage content (See: http://www.w3.org/TR/xhtml-rdfa-primer/ 28
JISC TechWatch: Data mash‐ups… (Sept. 2010)
infoboxes.45 This is extracted by DBpedia and turned into a knowledge base which contains data on, for example, people (including name, birthplace, birth date etc.) and buildings (latitude, longitude, architect, style etc.). The extraction process makes use of Wikipedia's live article update feed46 and therefore is updated automatically on a regular basis. Data are organized within DBpedia using the techniques of the semantic Web in the form of millions of RDF triples.
4.2 Mobile mash-ups Simon (2007) argues for a future vision wherein mobile phones will serve as generic hardware and software platforms for a variety of geospatial information services. The necessary advanced navigation features are already being integrated into state-of-the-art mobile devices and we can expect this to be even more widespread in the near future. The more advanced mobile phones now include location technology as standard. Existing specialist GPS device manufacturers are also starting to move into the smartphone market (e.g. Garmin's Nuvifone). All these devices include at least GPS technology and some may also include software for determining position based on triangulation techniques using phone mast positioning and wifi base station location.47 Developments in this area are likely to focus on providing a sophisticated hybrid of these different location techniques. Turner and Forrest (2008) cite, for example, the XPS chipset developed by SiRF, a vendor of GPS chipsets, in collaboration with SkyHook. In addition, new types of positioning sensor will be introduced into mobile phones and Simon (2007) discusses the role for a more accurate 3-D location based on compass, tilt and altitude sensor technology. Turner and Forrest (2008) argue that wider adoption of accurate location detection in mobile phones will spur a major new market in location-based services and applications such as Google's Latitude. Typically, the user explicitly controls which services the phone makes use of (explicit interaction) although, based on the users' context, profile and the preferences published by the device, the environment can also trigger services automatically (implicit interaction). In all these situations context, and particularly location, acts as an important filter for selecting the most suitable services (Ipiña et al., 2007). By integrating multiple data sources into one experience, new services can be created that are tailored to the user's personal needs and, by using local sensor data on a mobile device, this experience can be adapted to the user's current situation (Brodt et al., 2008). However, an issue that is still being explored is that although all these applications offer specific functions to mobile phone users, and need to be downloaded and installed on the mobile phone 'one-by-one', many of them are actually implementing the same process of accessing the device's location (from the GPS on the device), transferring it to a server
45 An infobox is a fixed format table designed to be added to the top right-hand corner of articles and which presents a summary of some of the key points of the article. 46 Wikipedia offers a Wikipedia OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) feed. This is a protocol used for harvesting metadata descriptions of records in an archive. It is widely used by libraries, institutional repositories and archives. Data providers provide their metadata in XML in the Dublin Core format, although other formats can be used. See: http://meta.wikimedia.org/wiki/Wikimedia_update_feed_service 47 Many people want to see GPS on their phone when actually what they really need is location information. GPS provides a simple latitude/longitude reading. This can be determined by a number of techniques other than GPS, which apart form anything else is not particularly useful indoors as it needs line of sight of a satellite.
29
JISC TechWatch: Data mash‐ups… (Sept. 2010)
application, and receiving and displaying the location-based information.48 The wide range of mobile platforms (e.g. different Symbian editions, Blackberry, iPhone, Windows Mobile, Google Android, etc.) and software development environments, as well as the fast life-cycle of mobile phones and their operating systems, make such a distribution approach an expensive task. An alternative approach that is being explored is to add 'location capabilities' to mobile Web browsers instead (Karpischek, 2009). Adding location capabilities to mobile Web browsers implements location-based services (LBS) for a wide range of heterogeneous mobile devices but requires significantly fewer resources. The integration of browser and geospatial data is also being explored as part of work to develop the forthcoming HTML5 standard.49
4.3 Geo-location on the Social Web Increasingly, location-based information is being integrated with Web 2.0 applications such as social networking. This development is closely related to the mobile phone developments discussed above. There are a number of existing services and these are likely to expand in forthcoming years. Examples include Loopt, Foursquare, Gowalla, Plazes and BuddyBeacon. FireEagle is an important development in this respect, providing a centralized brokerage service to allow users to control how their location data are shared and has been used in the EDINA personalization geolocator prototype.50 As data mash-up systems begin to incorporate a user's location into their visualization capabilities such services will become increasingly significant.
4.4 Augmented Reality Augmented Reality (AR) applications, where information is overlaid onto our view of the physical world, are likely to become widespread and AR applications for mobile phones are gaining recognition as providing a special kind of location-based mash-up. One method uses two-dimensional barcodes (QR codes) which, when viewed with the camera of an appropriately enabled device, open or stream some form of media. The second method is to superimpose data onto images taken on the device's camera and display this to the user in real time. While the technology is not particularly new (iPhone, Android and other mobile environments that incorporate a digital camera and sufficient processing power offer a suitable platform for AR application development) the AR sector is forecast to grow dramatically. A recent report from ABI Research concluded that revenue from AR will rise from $6 million in 2008 to $350 million in 2014 (ABI Research, 2009). Although these applications are currently experiencing popularity, certain issues with technology such as mobile device localization and usability need to improve. For example, GPS used in a mobile device does not provide high enough accuracy, inertial sensors are subject to inaccuracy and loss of calibration, and more user evaluation studies are required for AR to develop further (Schmalstieg et al., 2009). Significant developments in locational technology such as the inclusion of a built-in digital compass, GPS and accelerometers into mobile phones have allowed not only location but also heading and pitch to be detected and therefore incorporated into data display systems. These built-in technologies have brought AR to the wider public and the phones themselves have sparked a market-driven boom in fusing AR with location-based services.
48 However, this transfer of data may compromise privacy. Some users may wish to use GPS without passing on their position to a third party server, or at least to be informed that such a transaction is taking place. 49 See the W3C's Geolocation API at: http://dev.w3.org/geo/api/spec-source.html 50 https://www.wiki.ed.ac.uk/display/EdinaPersonalisation/API#API-UsingBroker
30
JISC TechWatch: Data mash‐ups… (Sept. 2010)
In the UK, Acrossair launched an AR application in late 2009 – Nearest Tube – that leads users to the nearest tube (subway) stations in London and New York, illustrated in Figure 6.
Figure 6: Acrossair's nearest tube application for iPhone.51
Operating on the iPhone 3GS, Nearest Tube is typical of the current applications, which make use of the user's location and provide a visualization of the current surroundings as a background to the interface. Linked to Google Maps the user can spin around, select a restaurant that is shown on their mobile handset screen, and the location of the restaurant will be provided on Google maps (Figure 7).
Figure 7: Architecture of Acrossair's augmented reality application
Central to all AR applications is geospatial data. The inclusion of a map is superseded by being in the actual location, relegating the need for a map to merely an optional extra. As an example of this, Layar52 displays information from a range of content layers residing on its server. The content data are overlaid onto the display on the phone's camera to show places of interest (e.g. restaurants) within the local vicinity. Content data are added via an API that anyone can contribute to. Figure 8 provides an insight to the service architecture.
51 http://www.acrossair.com/acrossair_app_augmented_reality_nearesttube_london_for_iPhone_3GS.htm 52 http://www.layar.com/ 31
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Figure 8: Layar Service Architecture
Currently, applications are in their infancy and mainly focus on specific topics such as 'show me where the closest x is'. This however represents the tip of the iceberg and with the addition of a GIS into the mix there is notable potential for the industry (Sung and Hudson-Smith, 2010).
4.5 Sensors Location is not the only information that mobile devices are being engineered to gather from the local environment. Existing equipment within the phone is also being used in tandem with geo-data mash-up services to provide new applications. An example of this is NoiseTube,53 a research project that shows how participative sensing of noise pollution by the general public can be conducted using a low-cost mobile platform (Maisonneuve et al., 2009). By downloading an application to a GPS-enabled mobile phone, users are able to record the noise level of their surrounding environment and submit an associated tag with the measurement. Each measurement is then stored and collated so that users contribute to a collective noise map (see Figure 9). As mobile technology develops, other environmental sensing devices are likely to be incorporated into the device. Data recorded from sensors may range, for example, from simple physical measurements such as temperature or altitude, to object recognition in video camera footage obtained from an unmanned aerial vehicle. The combination of the data gathered from these sensors together with the location of the user offers a whole new generation of what is being referred to as 'reality mining' (Eagle and Pentland, 2006). The ability to mash data from these different sources is a major future direction for the technology. Passive crowdsourcing of data via mobile phones will also generate sources of data in real time and some mobile applications currently demonstrate this potential. Waze, for example, provides traffic information based on crowdsourced accelerometer and GPS readings, and Citysense analyses phone data to visualize and predict popular locations. Again, issues of harmonization between datasets and data quality are of importance here. The OGC recently published a report outlining the technological issues that need to be addressed for fusing data from different sensors and databases (OGC, 2010).
53 http://www.noisetube.net/ 32
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Figure 9: An example display from the NoiseTube application
A much more radical form of crowdsourcing is to take the geo-locations from real time responses in SMS texts, or status updates from micro-blogging services such as Twitter or identi.ca (if the user is willing to activate the GPS sensing technology in their devices or provide details of their location in another way). CASA is currently experimenting with monitoring such data in different places and is developing a toolkit to replicate the ability to crowdsource data for any user in any geographic area worldwide. Data can be pulled in directly from social network websites for specific phases, locations or trends and mapped, detailing the spatial relationships of these networks. Figure 10 shows one of a series of New City Landscapes produced by CASA and based on mined, location-based Twitter data. The contours visualize the number of tweets sent as a density of tweets and an interactive version is available at: http://londonist.com/2010/06/londons_twitter_traffic_mapped_as_c.php
Figure 10: Mining Twitter locations for New City Landscapes
33
JISC TechWatch: Data mash‐ups… (Sept. 2010)
4.6 3-D and immersive worlds Due to the development of Web technology, map and geospatial data visualization are no longer limited to static or 2-D format, but can take advantage of immersive and highly interactive virtual environments to explore and present dynamic geospatial data (MacEachren, 2001). The rise in computing power (specifically graphic card technology), crowdsourcing techniques and changes in data licensing models is rapidly moving map data into 3-D environments. These were originally built using computer-aided design (CAD) systems but have now been extended into a range of multimedia, particularly virtual worlds and gaming environments, which are being opened up for the addition of external content.54 An important, emerging area of geospatial-related data visualization is that of immersive worlds. These offer, through the browser window, high-resolution and street level views of locations based on photographs. Users can 'walk' around the location by clicking on various keys. Primary examples include Google Street View, Everyscape, Bing Maps, EarthMine, MapJack, and the open source alternative Planet Earth. Such services form part of a continuum with virtual worlds such as Second Life (see Figure 11 below).
Figure 11: Importing mash-ups created using GMapCreator into the Second Life virtual world
There is also a developing branch of virtual reality known as 'mirror worlds' in which the physical world is replicated in a lifelike virtual model using advanced 3-D computer graphics, which a user explores through their browser. These mirror worlds tend to use graphics to replicate buildings, streets etc. rather than actual photographs. In a recent report for JISC, de Freitas (2008) noted that future development in these worlds included the integration of geospatial data and other mash-ups, most likely through forms of service-oriented architecture. Turner and Forrest (2008) note that there is a move to integrate these mirror worlds into social networking technology. They cite the example of SceneCaster, which allows virtual scenes to be embedded within Facebook. This is a large and rapidly developing area and readers are directed to the JISC report and the EduServe-funded Virtual Worlds
54 Google Earth has an iPhone app allowing 3-D information to be viewed and overlaid with data while on the move and Bing Maps is being integrated with ArcGIS 9.3, allowing two- and threedimensional data to be ported into ESRI's flagship proprietary GIS.
34
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Watch55 for further information. Some of the educational implications of these new technologies were explored by the JISC-funded Open Habitat project.56 A major trend in this area is that, increasingly, 3-D and immersive world services are reaching out to their own users by incorporating crowdsourced data. Google Earth is typical of this trend: its data are now currently a mix of 3-D cities created by the company itself via automated photogrammetry techniques and crowdsourced models created by users through its free SketchUp and Google Building Maker modelling applications.57 Google SketchUp was released in 2006 to complement the professional version of SketchUp, a well known 3-D modelling program. Users are encouraged to use the software to model their local neighbourhoods as part of a crowdsourcing exercise to create 3-D content where automated processes would be cost prohibitive. The process is similar in many ways to Google Map Maker, which operates under similar terms and conditions. Model submissions are reviewed internally by Google as and when the user selects the option in SketchUp that a model is 'Google Earth Ready'. The model is checked by Google employees to determine if the building is 'real, current, and correctly-located'. If the model passes the review process, it is added to the '3-D Warehouse Layer' making it publicly viewable in Google Earth when the box in the sidebar that is labelled '3-D Buildings' is ticked. So far, users have been encouraged to model sections of the earth via a series of 'model your town' competitions where Google exhorts the user to 'Show your civic pride (and maybe win a prize) by creating a 3-D portrait of your community and sharing it with the world. You have the power to get your town on the map – and there's no bigger map than Google Earth' (SketchUp website, 2010). Such an approach is typical of crowdsourcing, although the Google terms and conditions are more stringent than, say, OSM, and have a much more focused and controlled aim in mind. Whether we can include these as map mash-ups takes us to the very edge of our interest here but at the very least this is representative of new ways in which non-expert users can create their own geographical content for their own use. In fact, any user can import map data into the 3-D environment of Google Earth if they are able to represent their data as a KML file. There are now plenty of free plug-ins to do this and many GIS systems are able to import and export KML files. The Free Geography Tools website58 contains a variety of such converters not only for Google Earth but also for OSM and other mapping systems. CASA has produced GEarthCreator which enables users to convert files into KML and display them in Google Earth, demonstrated in Figure 12 (a). The software can also use products such as Google Earth directly in order to exploit the power of the 3-D software to augment other software that does not have such display capability. An example is shown in Figure 12 (b) for a land use transportation model of Greater London in which 2-D data are plotted continually as the users explore the model data, outputs and predictions but also wish to see the data in 3-D. A link to Google Earth enables the user to add additional data from third party suppliers and compare this with the data that are exported from the user's own analysis.
55 http://virtualworldwatch.net 56 http://magazine.openhabitat.org 57 Google Building Maker was introduced in late 2009 and allows the user to model directly on top of oblique aerial imagery using a range of simple shapes. The technique is reminiscent of the CANOMA software tool by Adobe, released in 1999, and now operating over the Web using pre-defined imagery. 58 http://freegeographytools.com/
35
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Figure 12: 3-D mash-ups using Google Earth showing (a) conventional import of a KML file of GDP (b) the exporting of 2-D thematic maps from a land use transport model into Google Earth
4.7 HTML5 In 2007, the W3C formed a working group chartered to work with the Web Hypertext Application Technology Working Group (WHATWG) on the development of the HTML5 specification, the latest version of the core markup language of the Web.59 Of particular note in the specification is the Canvas element60 which allows for dynamic, scriptable rendering of bitmap images. A canvas consists of a drawable region within which JavaScript code may be used to provide dynamically generated graphics. This new technique means that a new generation of more flexible geospatial data services is being created. The first noted example of geographyspecific information served via HTML5 is Cartagen,61 an open source vector mapping framework developed at the MIT Media Lab's Design Ecology group. Introducing the system, Boulos et al. (2010) note that as map data become richer and we strive to present multilayered data in a variety of projections and map zoom levels, traditional Web mapping techniques start to become too inflexible. Instead of sending pre-rendered tiled images for every zoom level, Cartagen draws maps dynamically on the client side, using the canvas element of HMTL5. Moving such elements to the client side, known as local rendering, is a notable step forward in vector based mapping. However, the use of raster, tile-based data should not be underestimated, both in terms of lightening the load on the user's machine and distributing copyright-sensitive data. 4.7.1 Other standards developments New standards from OGC are ongoing, with community-specific application schemas being introduced to extend GML for use in a particular domain. As one example, CityGML is an
59 A key aim of HTML5 is to reduce the need for rich Internet application technologies such as Adobe Flash, Microsoft Silverlight and Oracle-Sun Java FX (Boulos et al, 2010). 60 See: http://en.wikipedia.org/wiki/Canvas_element. Initially introduced by Apple in Safari 1.3 for use inside their own Mac OS X Webkit component to power applications like Dashboard widgets and the Safari browser, Canvas was later adopted by Gecko browsers and Opera and standardized by the WHATWG on new proposed specifications for next generation Web technologies. 61 http://eco.media.mit.edu
36
JISC TechWatch: Data mash‐ups… (Sept. 2010)
encoding standard for the representation, storage and exchange of virtual 3-D city and landscape models. GML 3.0 is an XML markup for geographic data defining points, lines, polygons and coverages and this new work extends the schema to model 3-D vector data along with other semantic data related to a city.
4.8 Policy, standards and the wider context There is widespread agreement that the effective use of geospatial data (e.g. within the context of higher education) requires the establishment of a geospatial data framework to both catalogue the available datasets and provide access to the data (Owen et al., 2009). While there have been attempts to do this in the past, the rapid rate of technological change, driven primarily by commercial interests, has meant that policy and standards have inevitably been playing catch-up with real-world developments. In this section we examine some of the key issues that are currently shaping how mash-ups will develop in the future. 4.8.1 Geospatial data frameworks. The impact of data mash-ups and related technological developments has triggered major national and international programmes to achieve harmonization of geospatial datasets and interoperability of Web service components that utilize these data. Examples include the 'Joined-up Geography' initiative in UK local and central Government, EU programmes such as INSPIRE, GMES, SEIS, and global programmes such as GEOSS. These programmes are usually specified by Government and managed through a collaborative top-down structure which can produce sound, consensus-based solutions in many circumstances. From the user's perspective, changes to formats, new metadata standards, upgraded software to use the new datasets, and training to use the new software are all issues that will need to be tackled (Owen et al., 2009). However, the pace of evolution of mash-ups and mapping and the innovative approaches taken by the business community make it near impossible for conventional, committee-based governmental approaches to maintain the necessary responsiveness. There is a danger that long-term programmes e.g., transport charging, INSPIRE, GMES, and approaches to national security for the Olympics, will be undermined and made either redundant or inferior to technology solutions arising from the ground-up and driven by developments directed at mass consumer applications. Recognition of this danger is driving research initiatives at a number of UK universities and at research organizations worldwide. Current work on U-cities (settlements with ubiquitous information technology) is looking at the use of crowdsourced data for Government-led projects, such as planning, which are usually approached in a top-down fashion. A governmental crowdsourcing model is suggested where urban residents may submit opinions and information to aid participation with policy makers (Jackson et al., 2009). Other uses for crowdsourced data are being explored by The Future Data Demonstrator at the University of Nottingham, where the aim is to combine data from authoritative Ordnance Survey datasets with feature-rich, informal OSM data. The context for this work is the progress of national and international spatial data infrastructures such as the UK Location Programme and INSPIRE, contrasted against crowdsourced geospatial databases. While initiatives such as INSPIRE tend towards a top-down process of harmonized data models and services using ISO & OGC standards, the OSM approach tags data with democratically agreed preferred attribute tags that can change over time (with inherent related issues of data quality). The basic research question behind the demonstrator seeks to capture the best of each approach (Anand et al., 2010).
37
JISC TechWatch: Data mash‐ups… (Sept. 2010)
4.8.2 UK Government policy on mash-ups and open data Government plays an important role in deciding how mash-ups and mapping will develop in future, particularly as its stance is changing dramatically with regard to the openness of public datasets and open source software. Sir Tim Berners-Lee and Professor Nigel Shadbolt, acting as advisers to Government, have proposed that public data should be much more open, in particular recognising the relevance of location data. Partly in response to these suggestions the UK Government published, in December 2009, a consultation paper on policy options for geographic information from Ordnance Survey. The purpose of the consultation was to seek views about how best to formally implement proposals previously made by the Prime Minister to make certain Ordnance Survey datasets available for free with no restrictions on re-use. On 31st March 2010, the department of Communities and Local Government published its response to the consultation exercise examining the way forward for Ordnance Survey data. At the same time the Government confirmed that it was releasing, from 1st April, a range of Ordnance Survey data and products, free of charge, which would be known collectively as OS OpenData.62 This package includes a 1:250 000 scale colour raster, a 1:50 000 gazetteer, OS Street View and Meridian 2. In addition, to assist in the enablement of the Semantic Web, Ordnance Survey will develop a service that allows its TOIDs (unique 16 digit identifiers for geographical objects) to be openly referenced and located. The details of this latter service will be announced in due course. These developments form part of a wider agenda. The Smarter Government white paper (HMG, 2009) describes a variety of ways in which public sector data is to be made available and includes proposals to release Public Weather Service information, NHS Choices details and live transport timetables (with 80% of buses using GPS sensors by 2015). The new coalition Government has confirmed its commitment to continuing this process and announced on 1st June 2010 the formation of a Public Sector Transparency Board to oversee further developments towards opening Government data and increasing public transparency. On 21st January 2010, Government began the process of releasing this kind of public data with the launch of www.data.gov.uk. Over 3,000 datasets are now available through this service. Formerly, data.gov.uk implementation is being led by the Transparency and Digital Engagement team in the Cabinet Office. As well as providing an access point for the newly opened datasets the site brings interested developers together to discuss technical and policy issues relating to the use of UK Government data sources and the development of Linked Data.63 The site shows commitment to the adoption of the semantic Web, publishing many datasets in RDF and including information on how to manipulate it using SPARQL. 4.8.3 Standards, quality and data reliability The trust placed in geographic data and products is an important issue. These technologies will only prove useful if they are fit for their intended purpose, and uncertainty regarding the quality of user-generated content is often cited as a major obstruction to its wider use. Critics argue that amateur contributions may be highly erroneous and as such essentially invalid for serious academic or industrial uses, with Goodchild (2009) arguing that a crowdsourcing project should publish additional documentation and assessments reviewing quality issues. This is an important issue for HE as, while lower grade data may be suitable for some
62 www.ordnancesurvey.co.uk/opendata 63 A process of linking data across the Web rather than HTML documents. It was first proposed by Tim Berners Lee in 1998 with four basic principles derived from the original ideas of the Web. See, for example: http://www.youtube.com/watch?v=OM6XIICm_qo&feature=related and http://www.youtube.com/watch?v=qMjkI4hJej0
38
JISC TechWatch: Data mash‐ups… (Sept. 2010)
consumer applications (with recent studies of OSM data indicating that the quality is comparable with at least that of the Ordnance Survey's Meridian 2 product [Haklay, 2010b]) this is not necessarily the case for Government-funded research projects. It would therefore appear essential to develop suitable metadata so that users can judge for themselves. Standards bodies are aware of the concerns surrounding the quality of user-generated content and as both mash-ups and user generated map data flourish, standards are being drafted in an attempt to deal with issues surrounding quality and metadata. One example, with regard to the increased pervasiveness of ubiquitous computing technologies is the work currently being undertaken within ISO/TC 211 as project 19154 (ISO/TC211:Project 19154 Standardization Requirements for Ubiquitous Public Access 2009). This is concerned with ubiquitous public access, referring to an environment, or infrastructure, to provide geographic information 'in every place, at any time for any device'. To manage crowdsourced data, project 19157 (ISO/TC211:19157, 2009 [draft]) is looking at how to evaluate quality and set standards for defining how this should be done. 4.8.4 Privacy and confidentiality While new Web and mobile-based geospatial services and applications will provide clear benefits to users, there are still ethical and social factors that are not yet fully understood or addressed. There are issues concerning user generated and 'open' data, ethics and privacy issues with regard to the handling of data, and these will have a bearing on the development of location-based information and services. In particular, the privacy issues around broadcasting personal location information, particularly within social networks, will potentially have unintended consequences beyond the simple usage and value that these applications provide.64 The other key question of concern is the extent to which adding different layers of data, which independently are each within the bounds of confidentiality imposed by the data provider and by Government, lead to breaches in confidentiality when added together. Owen et al. (2009) demonstrate the measures that are already being taken in the public sector to limit the amount of geographical information provided in individual-level datasets and to create secure environments for data analysis, in recognition of the perceived high risk of disclosure of confidential data on individuals. However, companies and open data projects are not subject to such stringent oversight and, in addition, the prevalence of crowdsourced data and content produced by individuals who are not necessarily trained in data management or legal issues may result in breaches of privacy or confidentiality, through ignorance as much as intent. Unfortunately, where these mash-ups are being created through a company's API, the data will be stored and, depending on the API's terms and conditions, may be replicated and concatenated further.
64 For example, Privacy International, one of the watchdogs working to help deal with issues of privacy, identified that Google Latitude’s location sharing facility inadequately protected users from unintentionally broadcasting their position. See: http://news.bbc.co.uk/1/hi/7872026.stm
39
JISC TechWatch: Data mash‐ups… (Sept. 2010)
Conclusions and recommendations Data mash-ups in education and research are part of an emerging, richer information environment with greater integration of mobile applications, sensor platforms, e-science, mixed reality and semantic, machine-computable data. For example, development of augmented reality learning experiences will be of particular relevance to those enrolled on distance learning programmes and also to disabled students, and mixed reality applications that integrate the physical and virtual worlds will become particularly relevant to discovering additional, related knowledge and helping to visualize data. In the longer term this should be facilitated by benefits accrued from more open licensing arrangements with data providers. In the meantime there are several issues that HE will need to take account of: 1. As mash-ups become more widespread, and with students more aware and inclined to carry out this type of work, a suitable technical infrastructure will be required. Providing this will present challenges for institutional ICT support teams. While some technicians may already be familiar with these issues in specialist departments, staff in other areas of research will increasingly be affected. This may indicate a need for more centralized management of these ICT issues. In the JISC context, research into the use of and guidelines for Shibboleth or similar technologies to authenticate map and other Web services, such as the Wstieria [sic] project,65 will be valuable. From an institutional perspective, there may be disadvantages in a rise in the number of users or groups managing Web servers for separate mash-ups, along with added security risks. 2. A key organizational challenge is to educate staff and students to become equally familiar with both the potential and the limitations of mash-up technologies so that they are aware of the implications of combining data sources and making the results available on the Internet. Students, in particular, need to become more aware of the legal and ethical issues involved in using mash-up technologies, particularly for university based work. Training on the ethical use of data should be considered. 3. In general, Web 2.0 places an emphasis on making use of the information in the vast databases that are populated by user contributions (through the architecture of participation). However, the terms and conditions of these databases vary considerably and may be revised at any time. Just because a service is free at the point of use does not make it 'open'. There is therefore a question mark over the degree of genuine openness of access to the data and there are implications for institutions working on joint projects with commercial organizations. Within HE, there has been a wide-ranging debate within the academic and publishing communities over open access to scientific and humanities research and the role of journals in this regard, and this is not unconnected to moves within the research community to expose experimental data. The tension between the desire to undertake mash-ups and the requirement to ensure open access to data in the future need to be resolved. It is recommended that JISC undertake work in this area to clarify and to provide advice for institutions. 4. New sources of data are becoming available and easy-to-use toolkits are opening up spatial analysis beyond the traditional user, offering considerable opportunities throughout HE. However, as data availability increases, especially in terms of crowdsourced data, students and educators need to be aware of both the risks and benefits of such approaches. Crowdsourcing may be an acceptable route for data collection as long as standards are put into place to ensure secure survey principles and sampling methods. Data produced using such methods should be clearly identified.
65 http://edina.ac.uk/projects/wstieria_summary.html 40
JISC TechWatch: Data mash‐ups… (Sept. 2010)
About the authors Suchith Anand is Ordnance Survey Research Fellow at the Centre for Geospatial Science, University of Nottingham. He is co-Chair of both the ICA working group on Open Source Geospatial Technologies and the Open Source GIS UK conference series. His details are available at: http://www.nottingham.ac.uk/~lgzwww/contacts/staffPages/SuchithAnand/Suchith%20Anand.htm Michael Batty CBE FBA FRS is Bartlett Professor of Planning at University College London where he directs the Centre for Advanced Spatial Analysis (CASA). His research work involves the development of computer models of cities and regions. He is an expert member of the Advisory Panel on Public Sector Information (APPSI) and Chair of the ESRC Census Advisory Committee. Andrew Crooks is an assistant professor in the Department of Computational Social Science and member of the Center for Social Complexity at George Mason University and is a Visiting Research Fellow in CASA working on the EPSRC project on Global Dynamics and Complexity. He was formerly GLA Economics Research Fellow in CASA at UCL. His research interests relate to exploring, understanding and the communication of urban built and socio-economic environments using GIS, spatial analysis, and agent-based modelling methodologies. Andrew Hudson-Smith is a Senior Research Fellow at CASA, University College London, he is Editor-in-Chief of Future Internet Journal, an elected Fellow of the Royal Society of Arts and author of the Digital Urban Blog. His research interests relate to urban mapping, 3D visualization and The Internet of Things. Mike Jackson was appointed to the Chair of Geospatial Science at the University of Nottingham in April 2005 where he has established the Centre for Geospatial Science. He is a Fellow of the Royal Institution of Chartered Surveyors, a Fellow of the Royal Geographical Society, a non-executive director of the Open Geospatial Consortium Inc. (OGC) and Chairman, Commission 5 (Networks) of EuroSDR. Richard Milton is a Research Fellow in CASA where he works on the Generative eSocial Science (Genesis) project. He is also the developer of the MapTube website and has released the 'GMapCreator' and the 'Image Cutter' software. Previously, he worked on the Equator project where he used GPS tracked sensors to make fine-scale maps of carbon monoxide distribution. Jeremy Morley has been Deputy Director of the Centre for Geospatial Science at the University of Nottingham since September 2009. He is the UK representative to EuroSDR and a member of the UK Location Programme's User Group. His interests lie in the interface between formal, OGC- or SDI-based online GIS and informal, mashup-style online content, and the effects of ubiquitous computing and sensing on our understanding of the world.
41
JISC TechWatch: Data mash‐ups… (Sept. 2010)
References
[All links last accessed 4th September 2010]
ABI RESEARCH. 2009. ABI Research Anticipates "Dramatic Growth" for Augmented Reality via Smartphone (Press Release). ABI Research, 22nd October. Available online at: http://www.abiresearch.com/press/1516ABI+Research+Anticipates+%93Dramatic+Growth%94+for+Augmente d+Reality+via+Smartphones
ANAND, S., MORLEY, J., WENCHAO, J., DU, H., HART, G. & JACKSON, M. 2010. When worlds collide: combining Ordnance Survey and Open Street Map data. AGI Geocommunity conference, Stratford upon Avon, UK, 28th-30th September. ANDERSON, P. 2007. What is Web 2.0? Ideas, Technologies and Implications for Education. JISC, Feb 2007. Available online at: http://www.jisc.ac.uk/whatwedo/services/techwatch/reports/horizonscanning/hs0701.aspx BENFORD, S. 2005. Future Location-Based Experiences. JISC, Jan 2005. Available online at: http://www.jisc.ac.uk/whatwedo/services/techwatch/reports/horizonscanning/hs0501.aspx BERRY, R., FRY, R., HIGGS, G. & ORFORD, S. 2010. Building a geo-portal for enhancing collaborative socio-economic research in Wales using open-source technology. Journal of Applied Research in Higher Education, 2 (1), pp. 77–92. Available online at: http://jarhe.research.glam.ac.uk/media/files/documents/2010-01-29/Berry_d2_web.pdf BOULOS, M. N. K., SCOTCH, M., CHEUNG, K.-H. & BURDEN, D. 2008. Web GIS in practice VI: a demo playlist of geo-mashups for public health neogeographers. International Journal of Health Geographics, 7. Available online at: http://www.ij-healthgeographics.com/content/7/1/38 BOULOS, M. N. K., WARREN, J., GONG, J. & YUE, P. 2010. Web GIS in practice VIII: HTML5 and the canvas element for interactive online mapping. International Journal of Health Geographics, 9 (14). Available online at: http://www.ij-healthgeographics.com/content/9/1/14 BRAY, T. 2006. OSCON - Open Data. ongoing (weblog), 30th July. Available online at: http://www.tbray.org/ongoing/When/200x/2006/07/28/Open-Data BRODT, A., NICKLAS, D., SATHISH, S. & MITSCHANG, B. 2008. Context-Aware Mashups for Mobile Devices. Proceedings of Web Information Systems Engineering (WISE) 2008, Auckland, New Zealand, 1st - 3rd September 2008. Springer: Berlin. BUTCHART, B., KING, M., POPE, A., VERNON, J., CRONE, J. & FLETCHER, J. 2010. Alternative Access Project: Mobile Scoping Study Final Report. EDINA, June 2010. Available online at: http://go2.wordpress.com/?id=725X1342&site=mobilegeo.wordpress.com&url=http%3A%2F%2Fmo bilegeo.files.wordpress.com%2F2010%2F07%2Fdigimap-mobile-scoping-study-final-projectv131.doc&sref=http%3A%2F%2Fmobilegeo.wordpress.com%2F2010%2F07%2F16%2Fmobilescoping-study-report%2F CABINET OFFICE 2009. Open Source, Open Standards and Re–Use: Government Action Plan. UK Government Cabinet Office, 24th February. Available online at: http://www.cabinetoffice.gov.uk/media/318020/open_source.pdf CHAPMAN, A. & RUSSELL, R. 2009. Shared Infrastructure Services Landscape Study. JISC, 15th December 2009. Available online at: http://ie-repository.jisc.ac.uk/438/1/JISC-SIS-Landscape-reportv3.0.pdf
42
JISC TechWatch: Data mash‐ups… (Sept. 2010)
DCLG. 2010. Policy options for geographic information from Ordnance Survey (Government Response). Department for Communities and Local Government, March 2010. Available online at: http://www.communities.gov.uk/publications/corporate/ordnancesurveyconresponse de FREITAS, S. 2008. Serious Virtual Worlds report. JISC, 3rd November. Available online at: http://www.jisc.ac.uk/publications/reports/2008/seriousvirtualworldsreport.aspx EAGLE, N. & PENTLAND, A. 2006. Reality mining: sensing complex social systems. Personal Ubiquitous Computing, 10 (4), pp. 255-268. Available online at: http://www.springerlink.com/content/l562745318077t54/ EGENHOFER, M. J. 2002. Toward the semantic geospatial web. Proceedings of the 10th ACM international symposium on Advances in geographic information systems, McLean, Virginia, USA, 8th-9th November. ACM. EISNOR, D. 2006. What is neogeography anyway? Platial News (weblog), 27th May 2006. Available online at: http://platial.typepad.com/news/2006/05/what_is_neogeog.html ERKHEIKKI, J. 2007. Nokia to Buy Navteq for $8.1 Billion, Take on TomTom (Update7) Bloomberg.com, 1st October. Available online at: http://www.bloomberg.com/apps/news?pid=newsarchive&sid=ayyeY1gIHSSg GIBSON, R. & ERLE, S. 2006. Google Map Hacks. O'Reilly Media Inc.: Sebastopol, CA. GOODCHILD, M. 2009. NeoGeography and the nature of geographic expertise. Journal of Location Based Services, 3 (2), pp. pages 82 - 96 Available online at: http://www.informaworld.com/smpp/content~db=all~content=a911734343 GOODCHILD, M. F. 1992. Geographical Information Science. International Journal of Geographical Information Systems, 6 (1), pp. 31-45. GOODCHILD, M. F. 2007. Citizens as sensors: the world of volunteered geography. GeoJournal, 69 (4), pp. 211–221. HAKLAY, M. 2010a. Haiti – how can VGI help? Comparison of OpenStreetMap and Google Map Maker. Po Ve Sham (weblog), 18th January. Available online at: http://povesham.wordpress.com/2010/01/18/haiti-how-can-vgi-help-comparison-of-openstreetmapand-google-map-maker/ HAKLAY, M. 2010b. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B: Planning and Design, 37 (4), pp. 682-703. Available online at: http://www.envplan.com/abstract.cgi?id=b35097 HAKLAY, M., SINGLETON, A. & PARKER, C. 2008. Web Mapping 2.0: The Neogeography of the Geospatial Internet. Geography Compass, 2 (6), pp. 2011-2039. Available online at: http://www3.interscience.wiley.com/journal/121528007/issue HOBONA, G., JACKSON, M. & ANAND, S. (Forthcoming) Implementing Geospatial Web Services for Cloud Computing. IN: Zhao, P. & Di, L. (Eds.) Geospatial Web Services. IGI Publishing. HOEF, M. V. D. & KANNER, J. 2007. TomTom Agrees to Acquire Tele Atlas for EU2 Billion (Update10). Bloomberg.com, 23rd July. Available online at: http://www.bloomberg.com/apps/news?pid=newsarchive&sid=agT1Po33faG4&refer=home
43
JISC TechWatch: Data mash‐ups… (Sept. 2010)
HOF, R. D. 2005. Mix, Match, And Mutate. Businessweek, 25th July. Bloomberg: New York, USA. Available online at: http://www.businessweek.com/magazine/content/05_30/b3944108_mz063.htm HOWE, J. 2006. The Rise of Crowdsourcing. Wired, June 2006. Condé Nast Digital: New York, USA. Available online at: http://www.wired.com/wired/archive/14.06/crowds.html HUDSON-SMITH, A., BATTY, M., CROOKS, A. & MILTON, R. 2009. Mapping for the Masses: Accessing Web 2.0 Through Crowdsourcing. Social Science Computer Review, 27 (4). Available online at: http://ssc.sagepub.com/content/27/4/524.abstract IPIÑA, D. L. D., VAZQUEZ, J. I. & ABAITUA, J. 2007. A context-aware mobile mashup platform for ubiquitous web. Proceedings of the 3rd IET International Conference on Intelligent Environments, University of Ulm, Germany, 24th - 27th September 2007. JACKSON, M. J., GARDNER, Z. & WAINWRIGHT, T. 2009. The future of ubiquitous computing and urban governance. Internal draft report, University of Nottingham. KARPISCHEK, S., MAGAGNA, F., MICHAHELLES, F., SUTANTO, J. & FLEISCH, E. 2009. Towards location-aware mobile web browsers. Proceedings of the 8th International Conference on Mobile and Ubiquitous Multimedia, Cambridge, United Kingdom, 22nd-25th November. ACM. KLEMPERER, P. 2006. Network Effects and Switching Costs: Two Short Essays for the New Palgrave. Available online at: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=907502 LAMB, B. 2007. Dr. Mashup; or, Why Educators Should Learn to Stop Worrying and Love the Remix. EDUCAUSE Review, 42 (4), pp. 12–25. Available online at: http://www.educause.edu/EDUCAUSE+Review/EDUCAUSEReviewMagazineVolume42/DrMashup orWhyEducatorsShouldLe/161747 LEFORT, L. 2009. Review of semantic enablement techniques used in geospatial and semantic standards for legacy and opportunistic mashups. Proceedings of Australasian Ontology Workshop, Melbourne, 1st December. LIEBOWITZ, S. J. & MARGOLIS, S. 1994. Network Externality: An Uncommon Tragedy. Journal of Economic Perspectives, 8 (2). Available online at: http://www.utdallas.edu/~liebowit/jep.html LIU, M., HORTON, L., OLMANSON, J. & WANG, P.-Y. 2008. An Exploration of Mashups and Their Potential Educational Uses. Computers in the Schools, 25 (3), pp. 243 - 258. Available online at: http://www.informaworld.com/10.1080/07380560802368090 LORENZO, G. D., HACID, H., PAIK, H.-Y. & BENATALLAH, B. 2009. Data integration in mashups. SIGMOD Record, 38 (1), pp. 59–66. ACM. MACDONALD, S. 2008. Data Visualisation Tools: Part 2 - Spatial Data in a Web 2.0 environment. University of Edinburgh, 17th October 2008. Available online at: http://edina.ac.uk/cgi-bin/news.cgi?filename=datasharebriefing2-20081028.txt MACEACHREN, A. M. & KRAAK, M. 2001. Research Challenges in Geovisualization. Cartography and Geographic Information, 28 (1), pp. 3-12. Available online at: http://www.cartogis.org/publications/abstracts/cagisab0101.html MAISONNEUVE, N., STEVENS, M., NIESSEN, M. & STEELS, L. 2009. NoiseTube: Measuring and mapping noise pollution with mobile phones Proceedings of the 4th International ICSC Symposium Thessaloniki, Greece, 28th-29th May. Springer.
44
JISC TechWatch: Data mash‐ups… (Sept. 2010)
O'REILLY, T. 2005. What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. O'Reilly Media Inc., 30th September 2005. Available online at: http://oreilly.com/web2/archive/what-is-web-20.html O'REILLY, T. 2006. Open Source Licenses are Obsolete. O'Reilly Radar (weblog), 1st Aug. Available online at: http://radar.oreilly.com/archives/2006/08/open_source_licenses_are_obsol.html OECD. 2007. Participative Web: User-created content. Organisation for Economic Co-operation and Development, 12th April 2007. Available online at: http://www.oecd.org/dataoecd/57/14/38393115.pdf OGC. 2008. OGC Reference Model. Open Geospatial Consortium Inc., 11th November. Available online at: http://www.opengeospatial.org/standards/orm OGC. 2010. Fusion Standards Study Engineering Report. Open Geospatial Consortium Inc., 21st March. Available online at: http://www.opengeospatial.org/standards/per OPEN BUSINESS. 2006. People Inside & Web 2.0: An Interview with Tim O'Reilly. Open Business (weblog), Available online at: http://www.openbusiness.cc/2006/04/25/people-inside-web-20-aninterview-with-tim-o-reilly/ OS. 2008. Use of Google Maps for display and promotion purposes. Ordnance Survey, Available online at: http://www.freeourdata.org.uk/docs/use-of-google-maps-for-display-and-promotion.pdf OWEN, D., GREEN, A. & ELIAS, P. 2009. Review of Geospatial Resource Needs. ESRC, December. Available online at: http://www.esrc.ac.uk/ESRCInfoCentre/Images/Geospatial%20report%20with%20cover%20Dec09_t cm6-35008.pdf SCHMALSTIEG, D., LANGLOTZ, T. & BILLINGHURST, M. 2009. Augmented Reality 2.0. Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), Orlando, FL, USA, 19th - 22nd October. IEEE. SIMON, R. & FRÖHLICH, P. 2007. A mobile application framework for the geospatial web. Proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada, 8th-12th May. ACM. SKETCHUP WEBSITE. 2010. Available online at: http://sketchup.google.com/competitions/modelyourtown/index.html SUNG, H. J. & HUDSON-SMITH, A. 2010. Augmented Reality in 2015. Association of Geographic Information. Available online at: http://www.agi.org.uk/storage/foresight/datatechnology/GIS%20and%20Augmented%20Reality%20in%202015.pdf TURNER, A. 2006. Introduction to Neogeography. O'Reilly Media Inc.: Sebastopol, CA. Available online at: http://oreilly.com/catalog/9780596529956/ TURNER, A. & FORREST, B. 2008. Where 2.0: The State of the Geospatial Web. O'Reilly Media Inc., September 2008. Available online at: http://radar.oreilly.com/2008/10/radar-report-on-where-20the-s.html
45