1
INNOVATIONS THAT MATTER
CONTENTS THE NEXT FRONTIER OF PUBLIC SECTOR INNOVATION: BIG DATA
3
Cities as Centers for Innovation: Louisville, Ky
6
Building the Data-Driven Culture: LouieStat
Data Analysis Improves Emergency Health Services
The New Normal: How Data Drives Your Mission Forward
10
The U.S. Army’s Big Data Problem: A Sea of Databases
12
The EMDS Solution: Rapid, Holistic, and Visual EMDS as an Evolving Process
A Big Data Imperative: Having a Flexible Data InfrastructurE
16
MAKING BIG DATA STICK AT YOUR AGENCY
18
Understanding Government’s Big Data Challenges Our 8 Best Practices for Big Data Analysis
How In-Memory Computing is Changing How We Think About Big Data
24
GovLoop’s Big Data Cheat Sheet
25
About GovLoop
27
Acknowledgements
27 BIG DATA
2
BIG DATA
THE NEXT FRONTIER OF PUBLIC SECTOR INNOVATION
Although we are living in historic times, today’s digital age holds many parallels to the industrial revolution. During that era, there were countless breakthroughs in manufacturing, creations of new products, workforce changes and movements to create unions to protect workers’ rights. These advancements were all shaped and sculpted by the technological advancements of the 18th and 19th centuries. We also witnessed the development of new government regulations and the emergence of new markets, propelling america to the status of global economic leader. Now we are in the midst of another technological era, which is reshaping the cultural and socioeconomic fabric of society. As we sit on the cusp of remarkable innovations, we must remember that this time, modern innovations are powered by data. Today, governments are using data to improve our standard of living. This is certainly not only an american phenomenon. Governments worldwide have recognized that by leveraging data, they can improve services to citizens and gain a competitive advantage in a flattening world. This report is part of govloop’s “innovations that matter series,” in which we explore top innovations that truly matter and how to make them stick in your agency. We’ll explore how organizations are using big data technologies and solutions to trans-
3
INNOVATIONS THAT MATTER
//
form their organizations. We’ve talked with thought leaders and experts in government and industry in order to understand the best practices to leverage big data as a mission-centric tool, driving innovations for the public sector. Govloop reports are designed to empower you and give you the tools to excel with big data. Here’s what you’ll find in this report:
• A local government spotlight showing how the city of louisville, ky., Uses big data to improve performance management.
• A federal government case study
highlighting the army’s enterprise management decision support program.
• Industry insights on the current big data landscape.
• 8 Strategies and best practices for smart big data adoption and analysis.
• Govloop’s big data cheat sheet. Understanding big data will turn data into insights and transform your agency’s operations. Now is the time to take charge of your data and learn the best practices and methods to modernize your agency. We understand that innovation is an intimidating word. But we’re here to help. Let’s start with learning about the effect that big data analysis is having across government.
Understanding Big Data’s EffecT on the Public Sector // Big data analysis offers publicsector institutions an unprecedented opportunity to transform themselves through big data analysis. But first they have to understand and effectively interpret big data. Many agencies are struggling with that because each has defined and invested in big data analysis differently. Across government, agencies need software that can quickly and securely unlock insights from their information. We can start by defining big data as data that arrives in such enormous quantities, at such a fast rate and in such a diverse array of mediums that it is impossible to understand or leverage through traditional methods. Similarly, a big data problem describes a situation in which you are not able to use your data to accomplish a goal or deliver a service because one of those characteristics (volume, velocity or variety) prevents you from capturing its value within a predetermined window of opportunity. This is just a starting point to define big data. The real issue at hand is understanding what data analysis can do for your agency. For the public sector, the key is to appreciate how big data has the power to drive changes to organizational workflows, improve productivity and efficiency, and reduce costs and overhead.
A recent GovLoop survey of 256 public-sector employees explored the benefits of analytics and big data in government. The survey found that: cited improved efficiency & productivity
said big data drives improved decision-making
cited transparency & accountability
said big data helps control waste, fraud & abuse
These challenges are explored throughout this guide, along with strategies and best practices to avoid roadblocks to adopting big data programs. Although challenges exist, many agencies have moved to adopt big data programs. Our survey highlighted four ways: Performance Analytics
Predictive Analytics
Business Intelligence
Financial Analytics said they use big data to manage resources, budgets & controls costs
To fully leverage big data, leaders face multiple challenges. Respondents noted some of the challenges in our survey: Lack of organizational support Unclear mission and goals Lack of clarity on metrics
Data governance Data locked in legacy systems
Additional benefits were noted, such as improved morale and communication between groups. One respondent said big data analytics “shows the best return on investment, benefits all stakeholders and drives innovation.� The message is clear across government: In order to understand the power of big data, agencies need to invest the time to define, learn and capitalize on information. Our report is the first step to encourage the sharing of best practices, use cases and information sharing across government. To start things off, we learned some remarkable insights about performance measurements from the City of Louisville, KY.
BIG DATA
4
5
INNOVATIONS THAT MATTER
CITIES AS CENTERS FOR INNOVATION: LOUISVILLE, KY Cities are already complex ecosystems to manage. Today local government leaders are faced with dwindling budgets, deteriorating infrastructure and the need to attract and retain business for economic growth. We can confidently say that local government administrators are challenged like never before. Yet, many governments are moving quickly to transform their cities into centers of innovation and reigniting service delivery by capitalizing on their most important commodity: data.
That challenge has led the city to not only improve data sharing across agencies, but to create opendata strategies. In doing so, the city has created a platform for civic innovation for citizens, business and developers. “We are excited about the potential for both innovation and economic growth as it relates to making data more open and available to the public,” RenoWeber said. “We’ve already seen some of that with our open-data push.”
Faced with unprecedented challenges, cities no longer have the option to do business as usual. In Louisville, business is far from usual. Led by Mayor Greg Fischer and the chief of the Office of Performance Improvement (OPI), Theresa Reno-Weber, Louisville has positioned itself to use data in transformative ways. Through their vision and leadership, Louisville has become rooted in data and offers a framework for data innovation for other municipalities to follow.
Louisville leaders have defined data as an imperative for innovation and as the engine for economic growth. As a pledge to the city’s commitment, Fischer signed into law an executive order mandating that data is open by default. The policy covers many of the Open Data Policy guidelines from the Sunlight Foundation, a nonprofit that is working toward greater government openness and transparency. When the executive order was announced, Louisville became one of the first municipalities to codify the “open by default” provision.
“City government has so much data across so many different data systems we are missing opportunities,” Reno-Weber said. “We’re missing synergies between some of our departments and missing clues into information that can predict issues that we will need to address.”
In this section of our report, we highlight two innovations that matter from Louisville: highlighting the LouieStat program and how they have city has leveraged data to improve medical response times.
BIG DATA
6
Building the Data-Driven Culture: LouieStat // One of the premiere data initiatives has been the LouieStat program, which launched in 2012. Administered by Reno-Weber, it was designed to help citizens understand the effectiveness and efficiency of government programs by providing answers to three fundamental questions that every citizen should know and have easy access to:
• What are the key services Metro Government performs?
• How well is Louisville Metro Government performing?
• How can Louisville Metro Government perform better?
The program brings Louisville city leaders together to discuss consistent metrics and benchmark success within their department. These insights take in various data streams to guide success and have built a culture of continuous improvement. “There have been a lot of opportunities that we have identified to eliminate waste and improve the tactical way we deliver services for citizens in the city,” said Reno-Weber.
On LouieStat, the data that is aggregated is:
One example is the work that has been done in regard to Louisville’s emergency management services. (See below.) Another comes from the Public Works and Assets Department. Their mission is: To deliver superior customer service, efficiently manage solid waste and recycling, proactively maintain and enhance city facilities, assets, infrastructure and Metro fleet, initiate and support progressive environmental and energy conservation programs and champion innovative business practices.
• Hours not worked
• Dollars spent on overtime • Overtime hours paid • Hours lost due to work
related illness and injury
• OSHA recordable injury rate
• Employees with high sick leave consumption
LouieStat also pulls data from the Office of Management and Budget, Louisville Zoo, Human Resources and Public Health and Wellness. This resource provides a snapshot of the city’s performance, and links together data from various sources to improve accountability and transparency. OPI staff meets with department leadership teams to regularly define and assess key performance indicators. This focus allows decision-makers to know what results they are trying to achieve and to measure success. Next, OPI works to develop benchmarks and spot areas of improvement. Key metrics are routinely evaluated to track if it has already met its goal, approaching goal or off goal. Additionally, every six to eight weeks department leads provide a report to the mayor that highlights performance. This initiative was inspired by the work of Baltimore’s CitiStat and Maryland’s StateStat. Reno-Weber shared one specific example of success from LouieStat, highlighting work done to improve emergency health services below.
7
INNOVATIONS THAT MATTER
Data Analysis Improves Emergency Health Services // Every municipality is faced with the challenge of improving emergency health services. Local governments invest significant time understanding how to build more efficient emergency relief systems without increasing budgets. Reno-Weber’s team decided to explore how to reduce the time emergency medical services (EMS) staff spent at the hospital after dropping off a patient. Any reduction in time would increase efficiency and allow staff to deliver emergency health services more effectively. “One of the things we were looking at was: How do we get more efficient use out of the ambulances that we already have?,” said Reno-Weber. With large expenses tied to increasing ambulance fleets and staffing, her team was tasked with finding an innovative solution to increase ambulance pick-ups. This was no easy task, since there are a series of protocols that EMS staff must follow when they arrive at the hospital. They must fill out paperwork, clean and disinfect the ambulance, and prepare it for the next departure. In a metro area as large as Louisville, every second counts to get
EMS back on the road and assisting citizens in need of health services. To discover new efficiencies, the city decided to observe the dropoff process at one of the busiest hospitals in Louisville. The team took notes and collected data, breaking down each process. “We put someone in the emergency room tracking the time of the crews and figuring out what the different process steps were,” said Reno-Weber. Within the first three months of observing and analyzing the process, EMS drop-off times decreased by 5 minutes. “We said, ‘There’s definitely an opportunity to improve,’” Reno-Weber said. After analyzing the data, the team set a new data standard in place. The city now required that ambulances must leave the hospital within 30 minutes of dropping off a patient. If that time was to be extended, crews must notify a supervisor.
Reno-Weber conducted the study between September and December 2013. Today, crews are leaving the hospital in 30 minutes or less 90 percent of the time. “That is basically the equivalent of placing two additional ambulances on the street,” she said. Compared to the same time frame in 2012, the city of Louisville was able to deliver 18,000 more patients to the hospital. This is an example of how a commitment to understanding how data can improve services within a city government. The Louisville case study shows the power of understanding data and turning data into insights. Powered by data, Louisville has improved transparency, citizen engagement and quality of the services it provides. You can follow Theresa Reno-Weber at @RenoWeber on Twitter.
BIG DATA
8
9
LearnI N more at www.cloudera.com NOVATIONS THAT MATTER
The New Normal: How Data Drives Your Mission Forward // Today, government agencies are looking at new ways to transform their agencies through big data analysis. As Joey Echeverria, Chief Architect, Cloudera Government Solutions, said, “Big data presents a lot of opportunities to change the way government agencies and organizations manage their business.” For government, big data is forcing leaders to think about data analysis as a way to move their mission forward, rather than as a by-product of their service delivery efforts. This is a new way of thinking for government, as Echeverria noted, “Government never really looked at data as something as core to what it is trying to accomplish, that is the real thing that big data is changing. It’s giving governments an opportunity to see that they can integrate data and information into their everyday processes.” The integration of data into process is essential for agencies, facilitating new insights, findings and better decision making for organizations. To help agencies integrate data and information, Cloudera powers the enterprise data hub. An enterprise data hub is one place to store all data, for as long as desired or required, in its original fidelity; integrated with existing infrastructure and tools; with the flexibility to run a variety of enterprise workloads together with the robust security, governance, data protection and management that organizations require. The advantage of an enterprise data hub is that you can bring together any kind of data and access it in ways that makes sense for your agency. It purposely tries not to be prescriptive about how you are going to end up accessing your data. This means you end up with the right techniques and framework for your specific needs, without being locked in to a pre-defined solution. Another key benefit is the ability to centralize your data management strategies.
Having data in a central location provides many benefits, especially in terms of accessing information and collaborating across your agency. Echeverria provides additional insights on the benefits, as he said, “Having one place to store all your data is going be far more inexpensive rather than duplicating your data and storing over numerous tools. And that’s really where our platform and offerings differ from what’s been available in the past.” Not only does centrally storing data allow organizations to be fully integrated, but also informationdriven. By capturing data from web portals, websites or from various ways government provides services, agencies are creating a framework to understand citizen needs. Capturing that data is just the first step, the second step is using that information to improve service delivery. “If you are providing services to your citizens through portals or websites, you should be harvesting the logs and the interactions that those websites generate. This information should flow back into feedback on how you model and run your agency,” said Echeverria. Like many technology adoption projects, the IT deployment is only one stage. To have success with big data, organizations need to think about the changes to culture. Cloudera provides professional consulting services to help meet these objectives.“We work with our customers and clients so that they learn our best practices, not just for how to use technology, but how that technology can be applied to their specific problem,” said Echeverria. Big data takes time and commitment. “A best practice is don’t go into a big data project with the idea of revolutionizing your agency on day one,” said Echeverria. Government leaders must build consensus and support, and cast a vision of how big data can impact their agency.
BIG DATA
10
11
INNOVATIONS THAT MATTER
The U.S. Army’s Big Data Problem: A Sea of Databases
As mentioned in the introduction, big data problems usually involve more than just juggling petabytes of data. In its most essential form, a big data problem is the inability to fully capture the value of data within a predetermined window of opportunity. This is most often due to the size, speed or format of the data collected. The U.S. Army had one such problem. Its current inventory of information technology systems is more than 3,600, many with unique access requirements. These systems are used to track 10,000 operating units, which are comprised of more than 1.1 million soldiers. They also track millions of pieces of equipment -- from tanks to batteries -- and thousands of training events. The Army’s data infrastructure is large, complex and dispersed among stovepiped and legacy systems, and it operates under various data owners and governance rules. As a result, the process of gathering data to inform decision-making about deployment and assess unit
readiness was cumbersome and time-consuming. It also meant that leaders were basing their decisions primarily on information within their particular specialty area, often missing the rich layer of supplemental information available on other systems. The response to this problem was the Army’s Enterprise Management Decision Support (EMDS) system. This initiative is designed to serve as a decision support tool for senior leaders. Its goal is to provide a single access portal for culling pertinent data from certain key databases. “What we do is reach across those systems, find the most relevant information for our customers and bring that into one place,” EMDS Division Chief Lt. Col. Bobby Saxon said. “So it is one username, one password, one location to go to, so they can see critical information about personnel, on readiness, on equipment, on installations and on training.” In this way, EMDS served as both an aggregator and curator of information.
BIG DATA
12
The EMDS Solution: Rapid, Holistic, & Visual // Now that the EMDS system performs the front-end work of piecing together data from disparate database systems, the process of leveraging the data is more streamlined and timely. However, speed was not the primary motivation for EMDS. Saxon noted that making decisions under very aggressive time-frames is simply par for the course in the Army. Instead, EMDS provides a crucial additional layer of understanding to this existing capability.
“It takes not only the information about the personnel community, but it lays on top of that the information for the equipment or the training,” he said. “Then, we array across the Army’s Force Generation model [the Army’s equivalent of a supply-chain management system], so they see much more of the picture.” In other words, decision-makers have a much broader, comprehensive understanding of a given situation, which allows them to see the wider context in which they are making their decisions. In a sense, EMDS provides a holistic view of the situation, while also providing access to the granular details in relevant specialty areas when necessary. Given these dual, somewhat op-
Understanding unit readiness is paramount for the U.S. Army’s operational success
13
INNOVATIONS THAT MATTER
posing objectives, visualization was an integral design component. Prior to EMDS, information came in a variety of forms, such as spreadsheets, charts and graphs. When you multiply this output by the multitude of different systems accessed, users were left with the formidable task of managing this unwieldy collection of information. To contrast, EMDS has a number of dashboards specifically designed to illustrate the situation in a friendlier, easy-to-understand format. Perhaps more importantly, this format is now standardized across the enterprise, which provides users in different areas access to the same information in a common format. Finally, EMDS provides users with
the opportunity to discover new or previously unknown pieces of information, thereby adding an additional layer of comprehension. The goal is to ensure that users are able to focus on their primary objectives, rather than piecing together ad hoc reports by themselves.
EMDS as an Evolving Process // When the EMDS program was launched in 2008, the Army encountered a few challenges. For one, the surrounding big data community, both proprietary and open source, was nowhere near
as advanced as it is today. More important, however, were the cultural barriers to sharing information across the enterprise. “There was not a clear understanding from stakeholders about how valuable all this information in one place could be,” Saxon said. “We still deal with some of these challenges today. Information is king, and if you are someone who has information that other people may not have, it may give you a bit of a leg up.” The Army is still working through these cultural barriers, but the progress thus far has been transformative. “Senior leaders are continuing to push that all data is the Army’s data and that it should be there for all of us to use to make decisions,” Saxon said.
EMDS also evolved from solely providing a near-real-time snapshot of the present into a product that provides historical and projected pictures in addition to having the foundation for a planned predictive analytical component. “The mind-set now is the belief that we have access to nearly everything. Now people are starting to think, ‘Well where is the data? How do I get my hands on it?’ Previously the thinking was that there was no way we even know what that information is,” Saxon said. Thus, EMDS has been a revolution in IT business process flows but also in organizational thinking, which is the hallmark of a successful big data solution.
– EMDS Division Chief Lt. Col. Bobby Saxon
BIG DATA
14
MarkLogic is the proven platform for government Big Data applications designed to drive real-time analysis, situational awareness, and service delivery. For over a decade, MarkLogic has delivered a powerful, agile and trusted Enterprise NoSQL database that enables government agencies to turn all data into valuable and actionable information. Key features include ACID transactions, horizontal scaling, real-time indexing, high availability and disaster recovery, government-grade security, and more. Organizations around the world rely on MarkLogic’s enterprise-grade technology to power a new generation of information application.
Hello, we’re your Enterprise NoSQL
database solution.
Say hello to the new generation
For more information, please visit www.marklogic.com.
A Big Data Imperative: Having a Flexible Data Infrastructure // Discussions about big data in government are generally dominated by the volume, velocity and variety of the data being produced by our agencies today. Yet, what is often missing is a matching discussion about the changing requirements for the agency’s data infrastructure. This infrastructure is just as essential to improving the way services are delivered to constituents as the data itself. Kevin Shelly, VP of MarkLogic Public Sector, spoke of the changing data landscape, “The traditional way to store operational data is in relational database systems, which have been around for 30 years. While they are very good at what they do, their specialty is transactional processing on structured data, and they haven’t really changed with the times.” Today, government agencies understand the opportunity that big data presents. Many agencies are looking to capitalize on the various data streams to which they have access, whether it is social media, video, photos, documents, mobile, or other forms of structured and unstructured data. It is this mix of unstructured and structured data that presents a challenge for organizations. How can we glean new insights from this data? Governments also recognize that it is no longer an “either-or” proposition for structured and unstructured data. They need to capitalize on both forms of data to drive innovation and meet mission need in new ways. Shelly notes, “Agencies need a database that can link structured and unstructured data so you can leverage 100% of the data that is out there to accomplish your mission or get closer to your constituents.” To find value from structured and unstructured data, agencies must invest in new generation databases. For many agencies, this means looking at NoSQL (“Not Only SQL”) databases. Relational databases
were developed primarily for use with structured data, while NoSQL allows much more flexibility and agility to collect, process and analyze a wide variety of data sources. Also, unlike relational database systems, No SQL databases allow you to ingest the data without having to spend time and money up-front developing a data model, hence you can develop and deploy systems much quicker. The reduced time to value is just one of the advantages of the NoSQL approach. “I think there will be a very large shift towards NoSQL databases,” said Shelly. Ultimately what MarkLogic can provide is a richer experience, leading to improved mission capability and ability to provide enhanced services to constituents. “MarkLogic can operate on premise or in the Cloud, ingest all data types, provide extensive search capabilities, and deliver the data, to various devices, whether it’s a mobility platform, laptop or desktop computer,” said Shelly. For many government agencies, now is the time to start investing in big data, and thinking about how data can drive a more efficient and effective agency. Data will continue to grow at a rapid pace, and as Shelly observes, “If the haystack keeps getting bigger, it just takes more work to find the needle.” Therefore, now is the time for smart investments such as an enterprise-grade NoSQL database with integrated search and application services that can handle all of the data, not just some of it. Yet, navigating the big data frontier does not come without challenges. Agency leaders are challenged to clearly navigate the cultural changes to technology. As Shelly noted, “Most of the challenges are cultural; the MarkLogic technology is trusted, proven, and innovative. Big data represents a cultural change, where people need to look at things differently and do things differently than their predecessors.” This can mean everything from a new business process, a change to workflows, and even changing the way government delivers services. Now is the time to invest, and re-imagine how services are delivered and missions accomplished. This process can start by assessing your data infrastructure, and ensuring you have the right database to support your agency’s needs. BIG DATA
16
17
INNOVATIONS THAT MATTER
MAKING BIG DATA STICK AT YOUR AGENCY Understanding Government’s Big Data Challenges // Before we get to the best practices for big data adoption, it’s important to set a foundation. We’ve identified four challenges from our research and important trends to be aware of as you begin to deploy big data challenges.
1. Large Capital Investments The primary big data challenge for governments is that most agencies have spent an enormous amount of capital on existing IT infrastructure , most of which was implemented long before big data was even on the radar. This often means that governments have the ability to capture large quantities of data, but they don’t have a feasible way to process it in a timely manner. In other cases, these systems were not built for the variety of data sources currently available for capture. – EMDS Division Chief Lt. Col. Bobby Saxon
BIG DATA
18
2. Data Rests in Silos Perhaps most crucially, many systems exist in their own data silos, without a clear method for extraction and integration. This challenge is coupled with the fact that investing in big data solutions can mean adopting technologies that are potentially expensive, untested, less secure and developed by a third party outside the government agency.
3. Finding the Needle in the Haystack Governments collect so much data -- and have such a variety of sources and mediums to choose from -- that it is difficult to pinpoint value. Even in the private sector, it is common for corporations to find the majority of value in a minor percentage of the total data they collect. Therefore, there is a real risk of investing in an analytics apparatus that collects and processes redundant sources of information, especially since existing systems operate in isolation. At the same time, requirements and use cases may evolve, with capabilities unlocking value previously inconceivable without new advancements. This presents a challenge for governments to figure out the smartest and most efficient way to get the most out of their investments.
4. No One-Size-Fits-All Solution There really is no one-size-fits-all big data solution, just as the challenges and opportunities for data use vary for each agency. This presents potential adopters with the challenge of finding the right
19
INNOVATIONS THAT MATTER
solution with the right attributes while dealing with limited capital and space to experiment.
Our 8 Best Practices foR Big Data Analysis // Given these challenges, we have identified eight best practices to help capture the most value while expending the minimal amount of resources. 1. Executive Leadership The shift to big data from traditional data analytics may require more of a revolution in thought and organization than in the technical solution itself. For one, big data operations cannot exist within a silo. They must be integrated across the enterprise, with a common design layout and operating procedures, as the EMDS example illustrates. This requires leadership and vision that go beyond individual use cases into a platform for future integration at all entry points. Additionally, there may be friction among internal units that must be addressed for proper institutional cohesion. The Army’s senior leaders are still transforming the culture to ensure all data is shared across the enterprise, which is vital to the success of EMDS. Finally, there will be costs -- financial, time, labor -- so any
successful implementation will require a leader to ask difficult questions to ensure the investment is worth the cost and to see the process through during periods of complication or difficulty.
2. Business before Treasure: Use Cases Come Before Technical Solutions The key here is that the technical solutions should be matched to business requirements and not the other way around. It is very easy to become seduced by the technology before clearly defined use cases have been found. This places the organization at risk of wasting valuable time and resources implementing a broad program with only marginal value added. Therefore, it is vital that data owners consult with stakeholders, users and executive leaders to come up with very specific business requirements or problems.
“Technology is important, but the people are critical,” Saxon said. It is also important that program managers integrate their plans into the larger strategic objectives for the organization. “A best practice and lesson learned from Louisville is that [big data] needs to be tied to your strategic vision or your strategic goal or whatever it is you are trying to accomplish in your area,” Reno-Weber said.
3. Know Thyself: Define Initial Use Cases Now that the emphasis is on potential use cases, a clear identification of the way big data is specifically going to address your data problems must be established. Think of the three Vs we mentioned earlier in this guide: volume, velocity and variety. There are different solutions for
each of these problems, so the best practice is to find your definition of what a big data solution means to your agency. “Ours isn’t as much a volume problem as it is a ‘difficult to understand’ problem,” Saxon said. “It is the ability to answer questions that were previously beyond reach.” Therefore, the Army peeled away at the broader, general definitions of a big data solution and crafted its own definition based on its specific needs. But how is this done? First, clearly establish your realization criteria (such as reduced transaction times), business requirements (such as security requirements) and performance metrics. It is better to start small and specific, building on your initial deployment than to try to capture everything at once.
“The questions you ask matter. There is a lot of data in city government and in government in general. You could run yourself ragged trying to analyze all of it and make it all good quality,” Reno-Weber said. The key is that you should identify a clear project or goal to solve, such as the example of reducing ambulance time at the hospital. Although every agency’s requirements are different, it is possible to learn from their experiences to help identify where big data solutions may be applicable to your organization.
4. Leverage Existing Resources & then Augment In most cases, good big data solutions are built on an already robust IT infrastructure. This allows organizations to initially make relatively narrow and precise upgrades to address big data problems without remaking their entire technological enterprise. It is helpful to think of this process like upgrading components on your personal computer and analyzing them for efficacy with each phase. This is in contrast to replacing your computer each time a new innovation in processing or storage is released. “Don’t be limited by thinking big data requires a huge infrastructure and a lot of specialized tools,” Saxon said. “Start a database. Start accumulating some of the data. You could even use Microsoft Excel to start doing data analytics to move down that path. You will find nuggets of information that will help you sell your story and help you keep moving forward.”
BIG DATA
20
Similarly, any new augmentation should be able to do two things: address the immediate data problems and be easily scalable to include future requirements for later phases.
5. Integrate Legacy Systems The vast majority of existing systems currently in operation in government were not calibrated to handle big data. As a result, it may be difficult to quickly process or extract pertinent data. Also, these systems may not be equipped to handle unstructured data such as video feeds or social media. This makes augmentations and upgrades a near necessity. However, integration of older systems is essential to the
21
INNOVATIONS THAT MATTER
enterprise-wide analytical power big data provides. Extra time and consideration must be built into program planning to allow for proper field matching techniques, data integration and processing output times. Reno-Weber said Louisville has a number of systems for fire, health, 311, financial systems and human resources, and her teams understand the importance of sharing and connecting these various data streams. “You see the power of the data when it is put in the hands of the users and the operational teams,” she said. “We ask, ‘How can we get our departments to be able to access their own data and use
insights to best allocate resources?’ You can’t do that if only one or two people can get to information.”
6. Carefully Select Your Vendors/Architecture The explosion of advancements in execution strategies and technological solutions makes this a unique time in the relatively short history of big data. In contrast to just a few years ago, data owners now have a wealth of options to build out their big data infrastructure, from competing vendors to a range of architectural options. Therefore, it is vital that you carefully select a solution -- and a partner -- that best matches your needs. In this case, it is best to slow down to speed up. Taking
your time at this critical juncture will ensure future phases do not require large-scale redesigns.
7. Focus on Governance and Data Quality Reno-Weber reminds us of the importance of data quality and governance strategies: “Data is not always going to be perfect, but that should not stop you from analyzing and working it, because the minute you start doing that is the moment data quality starts to get better.” She also notes the importance of having consistency as to how data is entered. “The challenges that I think a lot of us see is that bad information in equals bad information out,” she said. “A lot of
the work that we have done over the last two years has been to get the quality of the input data as good as possible, so what we are analyzing on the back end is valid as possible.”
8. Have a Deep Bench: Train Employees to Serve in Support Roles
practices from big data leaders. Now, it’s up to you to take this information and bring your big data knowledge into your agency. GovLoop is always here to help. Send us your questions, tell us your case studies and let’s work in partnership to transform the public sector.
Data is changing the roles and responsibilities of the future workforce. Having only a few employees who know how to derive insights from data will not work. Instead, the capability must extend across the agency. “You need to have a deeper bench as it relates to people who can access and analyze big data in your organization,” RenoWeber said. Agencies must place an emphasis on training and educating staff at all levels.” Leaping into the world of big data, however you choose to define it, is not a casual or easy process. Following the best practices presented here will help you take the right first steps toward capturing value from your data, which is vital for success. Each step builds on the last, and a careful plan and execution on the front end can mean all the difference on the other side.
– Louisville’s Chief of the Office of Performance Improvement, Theresa Reno-Weber
Big data is changing the way our government operates and is transforming communities. For public-sector leaders, now is the time to understand how to leverage data in new and transformative ways. There’s no denying the power of data. Our report has given you access to the case studies, lessons learned and best
BIG DATA
22
23
INNOVATIONS THAT MATTER
How In-Memory Computing is Changing How We Think About Big Data // Government has been tasked to modernize in a dynamic and quickly changing landscape. While most agencies have recognized the need to leverage their data in transformative ways, they still face challenges in building a sustainable infrastructure for the future. “With shrinking budgets, agencies are trying to find ways to modernize, but also be more cost effective. One critical way to accomplish this goal is for more agencies to realize that there are technology developments that will enable them to scale-up and out while simultaneously leveraging their existing infrastructure,” said Michael Ho, VP for Software AG Government Solutions. One such IT development is the use of in-memory computing. “In-memory computing is a transformative technology that is already becoming a core component of high performing applications within government and with great results,” said Ho. In-memory computing is about achieving massive scale within your existing infrastructure all while supercharging the speed of your enterprise applications so you can tackle big data quickly. Fundamentally, the challenge for any database environment today is that with even the best built applications hitting it, the amount of data that each application user carries is growing exponentially. This is taxing for your database because any time users do anything, your app must do a round-trip to the database, ultimately slowing it down and leaving users stuck waiting. So what happens? Organizations feel forced to continually buy more servers and database licenses which are costly. In-memory computing moves agency data sets to the application’s memory which means less work for your database and more speed for your users. Ho said, “What it allows you to do is take any size of data, whether it’s massive amounts or even smaller amounts, and
move that data closer to where the user ultimately consumes it. This eliminates the need to wait for the transaction gaps you get with normal databases. What took hours or several minutes can take microseconds with in-memory computing.” Ho believes that in-memory computing is the next evolution of data storage and processing technology. “In-memory allows us to take all the systems of records that you have in place, whether that be your Hadoop cluster, data warehouse or existing databases and applications, and speed them up to help you meet the massive scale of data and demand that you are going to receive from your end users and enterprise,” said Ho. Modernizing does not need to be a scary proposition for agencies. Often, modernizing is as much about learning how to leverage existing technology as it is about investing in new technology. Ho noted the benefits of in-memory computing to capitalize on data. “That’s where in-memory computing is really great, it allows you to complement your existing systems and blend the speed and scale to your unique infrastructure,” said Ho. For public sector organizations, in-memory computing means that you do not have to start building from the ground up. “You don’t need to re-design your entire strategy from square one. It’s about taking your existing investments and making them more valuable by adding the proper pieces to them,” said Ho. Ho believes that in-memory computing is changing how organizations think about big data. “Big data as a notion is going to be more than just having terabytes of data. It’s going to be about helping folks make the decisions they want to make, by enabling them to see and analyze the data they have, in the time window of relevance that’s optimal. That’s where these technologies are focused,” said Ho. To help organizations modernize, new technologies will emerge that will help organizations to capitalize on their data. The future of big data is not about size of data; it’s about speed and processing to maximize value from information collected.
BIG DATA
24
GovLoop’s Big Data Cheat Sheet Looking to get smart fast on big data? Look no f u r t h e r // What is Big Data? Big data is characterized as data that arrives in such enormous quantities, at such a fast rate and in such a diverse array of mediums that it is impossible to understand or leverage through traditional methods. Similarly, a big data problem describes a situation in which you are not able to use your data to accomplish a goal or deliver a service because one of those characteristics (volume, velocity, and variety) prevents you from capturing its value within a predetermined window of opportunity.
Why Does Big Data Matter? Data is at the epicenter of the next technological revolution. The organizations best poised to thrive are those that treat their data as a valuable commodity – a raw resource, similar to petroleum or a precious metal. Big data solutions provide unique opportunities to turn these raw resources into key insights and ultimately transform the way your agency does business.
25
INNOVATIONS THAT MATTER
How Can I Start My Big Data Journey? These statistics illustrate how important big data is: • The Obama administration is investing $200 million in big data research projects. (Source)
• Research firm IDC estimates that by 2020, business transactions on the Internet will reach 450 billion per day. (Source)
• Ventana Research found that 94 percent of Hadoop users perform analytics on large volumes of data not possible before, 88 percent analyze data in greater detail and 82 percent can now retain more of their data. (Source)
• More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide, according to a white paper from SAS, a business analytics software and services firm. (Source)
• Poor data across businesses and the government costs the U.S. economy $3.1 trillion a year. (Source)
• By 2018, the United States could have a shortage of 140,000 to 190,000 people with deep analytical skills to fill the demand of big data jobs, according to research firm McKinsey Global Institute. (Source)
See more data by visiting: http://wikibon.org/blog/bigdata-statistics/
There are a number of significant challenges to implementing big data solutions in the public sector, such as prior investments and the fact that any big data solution requires customizing to fit each agency’s specific needs. Here are eight best practices to overcome them.
• Executive leadership: Effective program stewardship begins with strong executive leadership and support.
• Business before treasure: Identify use cases and associated business requirements before building your technical solution.
• Know thyself: Define a specific big data problem and solve it before ramping up throughout your enterprise.
• Strong foundation: It is essential to have a robust existing IT infrastructure. Leverage existing resources and then augment them, being careful to allow room for scalability.
• Integrate with legacy systems: Big data solutions cannot exist in an innovation silo.
• Carefully select your vendors/ architecture: Choose a vendor that best meets your needs and provides desired functionality.
• Focus on governance and data quality: The quality of the data going in indicates the quality of the data going out. Be sure that you have a standard approach and are enforcing your governance policies to ensure high-quality data.
• Have a deep bench: Your agency must invest the proper time in training employees across your agency. Having a deep bench ensures that the entire team can step in and assist as needed.
BIG DATA
26
About GovLoop
//
GovLoop’s mission is to “connect government to improve government.” We aim to inspire public-sector professionals by serving as the knowledge network for government. GovLoop connects more than 100,000 members, fostering cross-government collaboration, solving common problems and advancing government careers. GovLoop is headquartered in Washington, D.C., with a team of dedicated professionals who share a commitment to connect and improve government.
Acknowledgements
//
Thank you to Cloudera, MarkLogic and Software AG Government Solutions for their support of this valuable resource for public-sector professionals. Authors: Patrick Fiorenza, senior research analyst, GovLoop, and Adrian Pavia, GovLoop research fellow Designers: Jeff Ribeira, Senior Interactive Designer, GovLoop, and Tommy Bowen, Junior Designer, GovLoop Editor: Steve Ressler, president and founder, GovLoop
27
INNOVATIONS THAT MATTER
BIG DATA
28
1 1 0 1 1 5 t h S t N W, S u i t e 9 0 0 Washington, DC 20005 Phone: (202) 407-7421 Fax: (202) 407-7501
BIG DATA
29