THE STATE OF DATA HOW DATA IS CHANGING GOVERNMENT
Big Data, Open Data, Open Government, and You.
LEARN MORE: GOVDELIVERY.COM ∙ EMAIL: INFO@GOVDELIVERY.COM
Two of the most significant data trends on government radar—Big Data and Open Data—are cloaked in mystery, misinterpretation, and misunderstanding. This brief will define these two different concepts and explain how government can use them to improve their relationships with the people they serve through transparency, improve the quality of the services they provide, and spur economic growth. Examples of how government is currently using Big Data and Open Data are included to inspire further exploration of these essential data initiatives.
TABLE OF CONTENTS
IN A NUTSHELL............................................................................................... 3 NAVIGATING CONVERGING CONCEPTS................................................. 4-5 BIG DATA IN REAL-WORLD GOVERNMENT............................................... 6 OPEN DATA IN REAL-WORLD GOVERNMENT........................................7-9 Measuring (Dis)Satisfaction In D.C.............................................7 Windy City Blows Open Data Out Of The Water................. 8-9 TITLE FOR NEW SECTION........................................................................ 9-11 TAPPING THE POTENTIAL OF OPEN DATA IN GOVERNMENT..............11
2
LEARN MORE: GOVDELIVERY.COM ∙ EMAIL: INFO@GOVDELIVERY.COM
IN A NUTSHELL
Big data describes very large, complex datasets that are difficult to process with traditional data processing applications. The information technology (IT) industry identifies Big Data characteristics as high volume, high velocity, and/or high variety information assets that require new forms of processing to “crunch” and transform data into insights for making more informed decisions and optimizing processes. Since “big” is a relative term, what is considered Big Data may not be as big in a few years when data analysis and computing technology improve. While there are some inherent hurdles to digesting Big Data, the value outweighs the challenges—analysis, capture, curation, search, sharing, storage, transfer, visualization and privacy violations. Additional information from analyzing a single large set of related data, as compared to separate smaller sets with the same total amount of data, includes correlations may be found to spot business trends, prevent diseases, combat crime and much more. For example, decoding the human genome originally took 10 years to process; with today’s data processing technologies, it can be achieved in less than a day. NASA’s Center for Climate Simulation (NCCS) stores 32 petabytes (1 petabyte= 1,000,000 gigabytes) of climate observations and simulations on the Discover supercomputing cluster to analyze and potentially stem climate change.
Open data is accessible public data that people, companies, and organizations can use to launch new ventures, analyze patterns and trends, make data-driven decisions, and solve complex problems.
NASA’S CENTER FOR CLIMATE SIMULATION (NCCS) STORES 32 PETABYTES (1 PETABYTE= 1,000,000 GIGABYTES) OF CLIMATE OBSERVATIONS AND SIMULATIONS ON THE DISCOVER SUPERCOMPUTING CLUSTER TO ANALYZE AND POTENTIALLY STEM CLIMATE CHANGE.
The data must be publicly available for anyone to use, and it must be licensed in a way that allows for its reuse. Open Data also should be relatively easy to use, although here there are gradations of “openness.” And there’s general agreement that Open Data should be free of charge or available at a minimal cost.
Open government employs open data to:
• Engage citizens in government with collaborative strategies; • Release data about its operations, like spending and budgets; and • Release data it collects about issues of public interest, such
as health,
environment, demographics, and different industries.
3
NAVIGATING CONVERGING CONCEPTS
While Big Data and Open Data differ, they intersect with Open Government to create six subtypes of data shown on the diagram:
2. Citizen engagement programs not based on data (e.g. petition websites)
OPEN GOVT
1. Non public data for marketing, business analysis, nation security
6. Large public
BIG DATA
government datasets (e.g. weather, GPS, Census, SEC, health care)
3. Large datasets from
4. Public data from state, local, federal govt. (e.g. budget data)
scientific research, social media, or other non-govt. sources
OPEN DATA 5. Business reporting other business data
4
LEARN MORE: GOVDELIVERY.COM ∙ EMAIL: INFO@GOVDELIVERY.COM
1. BIG DATA THAT’S NOT OPEN DATA
4. OPEN GOVERNMENT DATA THAT’S NOT BIG DATA
A lot of Big Data falls in this category, including
Government data doesn’t have to be Big Data to
some Big Data that has great commercial value. All
be valuable. Modest amounts of data from states,
of the data that large retailers hold about customers’
cities, and the federal government can have a major
buying habits, that hospitals hold (as mandated by
impact when it’s released. This kind of data fuels
HIPPA) about patients, or that banks hold about their
the participatory budgeting movement, where cities
credit-card holders (again regulated), falls here. It’s
around the world invite their residents to look at
information that the data-holders own and can use
their city’s budget and help decide how to spend it.
for commercial advantage. National security data, like
It’s also the fuel for apps that help people use city
the data collected by the NSA, is also in this category.
services like public busses or health clinics.
2. OPEN GOVERNMENT WORK THAT’S NOT OPEN DATA
5. OPEN DATA – NOT BIG, NOT FROM GOVERNMENT
This is the part of Open Government that focuses
This includes private-sector data that companies
purely on citizen engagement. For instance, the
choose to share for their own purposes – for
White House has started a petition website, called
example, to satisfy potential investors or to
We the People, for obtaining citizen input. While
enhance their reputations. Environmental, social,
the site makes its data available, publishing Open
and governance (ESG) metrics fall here. In addition,
Data – beyond numbers of signatures – is not its
reputational data, such as data from consumer
main purpose.
complaints, is highly relevant to business and falls in this category.
3. BIG, OPEN, NON-GOVERNMENTAL DATA
6. BIG, OPEN, GOVERNMENT DATA
Here we find scientific data-sharing and citizen
These datasets may have the most impact of any
science projects like Zooniverse. Big data from
category. Government agencies have the capacity
astronomical observations, from large biomedical
and funds to gather very large amounts of data,
projects like the Human Genome Project, or from
and making those datasets open can have major
other sources realizes its greatest value through an
economic benefits. National weather data and GPS
open, shared approach. While some of this research
data are just a couple of important examples. U.S.
may be government-funded, it’s not “government
Census data, and data collected by the Securities
data” because it’s not generally held, maintained, or
and Exchange Commission and the Department
analyzed by government agencies. This category also
of Health and Human Services, are others. The
includes a very different kind of Open Data: the data
U.S. Open Data Policy, enacted in 2013, will make
that can be analyzed from Twitter and other forms of
this category larger, more robust, and even more
social media.
significant over time.
5
BIG DATA IN REAL-WORLD GOVERNMENT
Thirty-two Chicago Food inspectors are responsible for auditing the city’s more than 15,000 restaurants. Traditionally, they are assigned beats, or groups of restaurants, that they inspect a few times a year, depending on a restaurant’s assessed risk level: How complex a restaurant’s menu items are, and how likely ingredients are to trigger food poisoning. The city is experimenting with a new technology to guide where those inspections should occur, based on factors such as current weather, nearby construction, and past health code violations. The Chicago Department of Public Health has been testing the food inspection model for the past few months and is still in the pilot phase until the algorithm is more refined. Like Chicago, New York is among a handful of cities trying to modernize their
NYC’S DEPARTMENT OF HEALTH AND MENTAL HYGIENE IS TESTING SOFTWARE THAT SCANS ONLINE REVIEWS FROM WEB SITES SUCH AS YELP, FLAGGING MENTIONS OF POTENTIAL FOODPOISONING INCIDENTS.
inspection protocol. NYC’s Department of Health and Mental Hygiene is testing software that scans online reviews from Web sites such as Yelp, flagging mentions of potential food-poisoning incidents. In July, IBM unveiled an application aimed at public health officials that processes data, such as retail records and food poisoning reports, intended to trace incidents back to particular contaminated products. Chicago’s health department is applying a similar predictive model to inspections for other public health risks, such as lead-paint exposure in residential buildings. Currently, the software aggregates information from various publicly available data sources — records of building- and sanitation-code violations, demographic characteristics of nearby residents and lists of restaurants with liquor licenses, among others. It analyzes about 10 years worth of historical data, across more than 20 variables, to determine which factors most strongly predict inspection failures. For instance, fluctuations in weather that might cause ingredients to rot were more strongly correlated with failure than a restaurant’s location or a history of past violations.
In tests covering several hundred restaurants, the software has helped inspectors identify 4 percent more critical violations of the health code than before they used the system. The early stages of analytic system may be limited. In the case of food poisoning, it can only analyze the incident reports that restaurant patrons actually file.
6
LEARN MORE: GOVDELIVERY.COM ∙ EMAIL: INFO@GOVDELIVERY.COM
Since citizens are often more likely to post about a bad meal than they are to file a formal report with the city, the department developed a program to mine Twitter for tweets with words linked with food poisoning, such as “vomit.”
directly to food inspectors, who then decide whether it is
The department then responds to the people who posted the comments, encouraging them to file a formal report.
data, to identify homes at risk for lead-paint exposure,
worth a follow-up inspection. While not a standalone solution for tracking food borne disease, the system can help allocate limiting funding and staff to areas most likely to be affected—to solve the problem where it hurts. For instance, Chicago uses a similar software model, based on publicly available especially those likely to be occupied by women and young children. Inspectors can then prioritize those homes for inspections and outreach.
It has since collected a few hundred additional reports this way. Currently, reports found through Twitter are sent
OPEN DATA IN REAL-WORLD GOVERNMENT Measuring (Dis)Satisfaction in D.C. Have you ever been frustrated with government? Think
Sentiment analysis develops Open Data from many
about the last time you had a long wait to get your
individuals’ publicly posted opinions, and, in this case,
driver’s license renewed, visited a park with lots of litter,
makes that data available for both public and
or hit a pothole and had to get your car realigned. In
government use.
recent years, people in several cities have used services like SeeCickFix and Open311 to report problems and get a response. Now initiatives on the East and West Coasts are giving city governments more overall, synthesized feedback on their performance and making city operations more transparent to citizens.
Washington, D.C. is using sentiment analysis—a technique for analyzing opinions from social media—to give public letter grades to its city agencies and services.
The city started with five agencies in the pilot: the DMV, Department of Transportation, Parks and Recreation, Public Works, and Consumer and Regulatory Affairs. The company analyzed Twitter, Facebook, blogs, and online forms city agencies made available to people who used their services. In the first evaluation, four of the five were graded with a C-minus; Public Works was at the head of the class with a C-plus. Knowledge is power when it comes to working toward change and improvement, and their movement towards Open Data is the source of that knowledge.
7
OPEN DATA IN REAL-WORLD GOVERNMENT
Windy City Blows Open Data out of the Water Chicago mandates that every city agency contribute data to its Open Data Portal. Each agency must have an open data coordinator who serves on the City’s Open Data Advisory Group. Few cities require this level of staff participation and resources for Open Data. Beyond making it easier for residents to hold their government accountable, Open Data serves as a platform for innovative tools that improve the lives of all Chicago’s residents. According to its “2013 Open Data Annual Report,” the City of Chicago offers nearly 600 datasets on its portal, more than double what it had in 2011. Which datasets take priority? The most “valuable” ones come first. These are often large datasets that frequently update. They may also be smaller but useful for a large audience.
CHICAGO’S OPEN DATASETS HAVE RECEIVED MORE THAN 15.6 MILLION PAGE VIEWS SINCE 2010 ... BETWEEN NOVEMBER 2012 AND NOVEMBER 2013, DATA DOWNLOADS FROM THE CHICAGO OPEN DATA PORTAL GREW FROM 2 TERABYTES TO 6.4 TERABYTES, AN INCREASE OF MORE THAN 200 PERCENT. USAGE MEANS PARTICIPATION AND INNOVATION—EXACTLY WHAT CHICAGO WANTS.
8
Overall, Chicago has increased the size of the datasets on its portal. For example, it offers a traffic dataset that is updated every 10 minutes using the GPS on its city buses. Its crime dataset spans 10 years. And, it plans to expand its 311 dataset by including more types of requests beyond the 12 most common, such as aircraft noise or improper apartment building heating or cooling systems. Chicago’s open datasets have received more than 15.6 million page views since 2010. And, perhaps the most impressive statistic for Chicago is its data usage. Between November 2012 and November 2013, data downloads from the Chicago open data portal grew from 2 terabytes to 6.4 terabytes, an increase of more than 200 percent. Usage means participation and innovation—exactly what Chicago wants.
LEARN MORE: GOVDELIVERY.COM ∙ EMAIL: INFO@GOVDELIVERY.COM
Its investment in open data and the people who use it has spawned new businesses that make use of public data hosted on the portal. These startups offer services that improve life in Chicago and other cities, and employ dozens of local analysts, designers, and developers. For instance:
•
Purple Binder aggregates social services information for social workers and healthcare professionals about services available to their clients.
• DataMade, one of the first in Chicago to focus exclusively on Open Data, creates apps
for the City of Chicago and other clients, such as Councilmatic and crimearound.us.
• Cartografika uses the building footprints datasets on Open Data portals (Chicago and other cities) to create beautiful maps and drawings for wall and other décor.
• Rob Paral & Associates, a social services policy consultancy for more than 25 years, has enjoyed greater efficiencies and expanded capacity using Open Data.
• Smart Chicago Collaborative hired Datamade when it was just getting started and
also employs Purple Binder, Rob Paral & Associates, and other new companies to create technology products that improve life in Chicago.
THE DATA MOVEMENT CONTINUES Some Federal agencies are already making large volumes of data available to the public. For instance, the Center for Disease Control (CDC) has research data available on a huge range of health-related topics that span viral and systemic diseases, illnesses, mortality, genetics, immunizations, injuries, obesity, physical activity and many more categories. In addition, their website provides tools and resources for leveraging the data. One example, Health Data Interactive, presents tables with national health statistics for infants, children, adolescents, adults, and older adults on a variety of topics such as health insurance access, health care use, health care expenditures, and life expectancy to name a few. Tables can be customized by age, gender, ethnicity, and geographic location to explore different trends and patterns.
9
THE DATA MOVEMENT CONTINUES
Another example, the Health Indicators Warehouse (HIW), provides access to high quality data that improves understanding of a community’s health status and determinants, and facilitates the prioritization of interventions. The HIW is a collaboration of many Agencies and Offices within the Department of Health and Human Services (HHS). Maintained by the CDC’s National Center for Health Statistics, the data, support, and funding are provided by:
• Centers for Medicare & Medicaid Services • Department of Health and Human Services • Office of the Deputy Secretary • Office of Adolescent Health • Office of Disease Prevention and Health Promotion • Office of Minority Health • Office of the Assistant Secretary for Planning and Evaluation • Health Resources and Services Administration
TRANSPARENCY. GOVERNMENT SHOULD PROVIDE CITIZENS WITH INFORMATION ABOUT WHAT THEIR GOVERNMENT IS DOING SO THAT GOVERNMENT CAN BE HELD ACCOUNTABLE.
PARTICIPATION. GOVERNMENT SHOULD ACTIVELY SOLICIT EXPERTISE FROM OUTSIDE WASHINGTON SO THAT IT MAKES POLICIES WITH THE BENEFIT OF THE BEST INFORMATION.
COLLABORATION. GOVERNMENT OFFICIALS SHOULD WORK TOGETHER WITH ONE ANOTHER AND WITH CITIZENS AS PART OF DOING THEIR JOB OF SOLVING NATIONAL PROBLEMS.
The HIW provides a single, user-friendly, source for national, state, and community health indicators. It serves as the data hub for the HHS Community Health Data Initiative, a flagship HHS open government initiative to release data, encourage innovative application development, and catalyze change to improve community health. As part of its work to end extreme global poverty and enable resilient, democratic societies to realize their potential, the U.S. Agency for International Development (USAID) is committed to the President’s Open Government initiative, upholding the values of transparency, participation, and collaboration in tangible ways that benefit the American people:
• Transparency. Government should provide citizens with information about what their government is doing so that government can be held accountable.
•
Participation. Government should actively solicit expertise from outside Washington so that it makes policies with the benefit of the best information.
• Collaboration. Government officials should work together with one another and with citizens as part of doing their job of solving national problems.
10
LEARN MORE: GOVDELIVERY.COM ∙ EMAIL: INFO@GOVDELIVERY.COM
USAID supports innovative applications of development
In addition, open data initiatives around the world have
data by the public sector, private sector, donors, partners,
helped to combat government corruption in Brazil, support
and beneficiaries. It has built USAID.gov/developer to
campaign fairness in Chile, improve safety in New York
connect citizen developers with the tools they need to
City, and enhance healthcare in the U.K., to cite just a few
unlock government data while improving transparency,
successes.
collaboration, and impact.
TAPPING THE POTENTIAL OF OPEN DATA IN GOVERNMENT Open Data is already giving rise to hundreds of entrepreneurial businesses and helping established companies to segment markets, define new products and services, and improve the efficiency and effectiveness of operations. The previous examples demonstrate its potential for enabling government to empower citizens, streamline change, improve the delivery of public services, and foster economic growth.
A McKinsey report suggests that Open Data may generate significant economic value, helping unlock $3 trillion to $5 trillion in the global economy (education, transportation, consumer products, electricity, oil and gas, health care and consumer finance). This economic carrot belongs to those government
However, government has to cultivate a vibrant open-data
organizations that are willing to continue exploring an
ecosystem and implement policies to protect people.
iterative approach to implementing Open Data.
It requires an investment in technologies and talent to collect and analyze data. For citizens, it means being
Ready for the next step of the journey? Interested in
vigilant, savvy providers and users of open data. And there
learning more? GovDelivery continues to serve over 1,000
is much work to be done by governments, companies,
public sector organizations around the globe, and we can
and consumers to craft policies that protect privacy and
help serve yours too. Call us at (866) 276-5583 or email
intellectual property, as well as establish standards to
info@govdelivery.com today.
speed the flow of data that is not only open but also “liquid.�
RESOURCES: Gurin, J. (2013, November 8). Big Data vs Open Data - Mapping It Out. Retrieved October 3, 2014, from http://www.opendatanow.com/2013/11/new-big-data-vs-open-datamapping-it-out/#.VDV_XildWpJ Fiorenza, P., & Pavia, A. (2014). Innovations That Matter: Examining The Big Data Frontier. GovLoop. Retrieved October 3, 2014, from http://www.govloop.com/resources/ innovations-that-matter-examining-the-big-data-frontier-new-govloop-guide/
How Chicago Is Growing Its Open Data Economy. (n.d.). Retrieved October 3, 2014, from http://www.socrata.com/case-study/chicago-growing-open-data-economy/ Sunlight Foundation. (n.d.). Retrieved October 3, 2014, from http://sunlightfoundation. com/ B. Bushey, Personal Interview, October 1, 2014
Manyika, J. (2013). Open data: Unlocking innovation and performance with liquid information. S.l.: McKinsey & Company.
11
reachthepublic.com
LEARN MORE: GOVDELIVERY.COM
facebook.com/govdelivery
EMAIL: INFO@GOVDELIVERY.COM
@govdelivery youtube.com/govdelivery govloop.com
CALL: U.S. (866) 276-5583 U.K. 0800 032 5769
Š2014 GovDelivery, all rights reserved.