SPONSORED CONTENT
INNOVATION IN GOVERNMENT EXPANDING THE REACH OF
ADVANCED ANALYTICS
Agencies are exploring new ways to leverage the power of data to improve their operations and achieve their missions.
ï‚© Learn more at carahsoft.com/innovation
contents 2
Agencies Map Out an Advanced Analytics Future
4
Big Data’s Untapped Potential
6
Intelligent Search Creates Connections
8
Visual Search Software for the 21st Century
10
Accelerating the Impact of Big Data Initiatives with Data Wrangling
12
Turning Machine Data into Operational Intelligence
14
Is Your Database Stuck in the Past?
16
Data Blending
18
Wrangling Big Data
20
Fast, Secure and Ready for Government Clouds
22
One-on-One with GSA’s Kris Rowley
ADVANCED ANALY T ICS
SPONSORED CONTENT
AGENCIES MAP OUT AN ADVANCED ANALYTICS FUTURE
Each new implementation reveals new possibilities for leveraging the power of data.
T
HE RAPID DEVELOPMENT of the advanced analytics field is being driven in part by technology and in part by imagination. Without a doubt, the last several years have brought significant leaps in the capabilities of tools and methodologies for capturing and analyzing massive amounts of data of all types. Equally important, however, is the manner in which government leaders are beginning to understand the vast potential for advanced analytics in nearly every aspect of government operations and services. Chief data officers (CDO) and other data experts increasingly find their task is to help agency leaders imagine what is possible. Instead of saying, “Here’s what advanced analytics can do for you,” they are saying, “Tell me the problem you want to solve, and let me see how I can help.” “This is a people job, first and foremost,” says Daniel Morgan, CDO at the Transportation Department, speaking at a June 30 event organized by the Advanced Technology Academic Research Center. “As a chief data officer, you just can’t live behind your desk.” A CDO also can help agency leaders understand how much data they have on hand—and its value. For example, last year, Linda Powell, CDO at the Consumer Financial Protection Bureau (CFPB), had her team create a catalog of CFPB data for the bureau’s internal stakeholders. Rapid Response At other times, however, the value of data becomes most apparent when the need is most pressing. That is the case right now with the fight against Zika, a mosquito-borne virus that has been spreading across the Americas. Public health officials hope advanced analytics can provide more insight into where and how it is spreading. One such effort was the Austin Zika Hackathon, held in May. The event, hosted by Cloudera, assembled more than 50 data scientists, engineers, and students from the University of Texas Austin. The compute power was provided by Wrangler, a supercomputer at the Texas Advanced Computing Center (TACC). During the hackathon, participants looked at how they might combine different data sets—outbreak reports, information on potential breeding grounds, social media feeds—to see what patterns might emerge, according to a TACC report. “If you can see where all the water sources are and then overlay how the reports of outbreaks are happening, then you can create
2
a model for how it’s spreading and how it will spread in the future based on where the water sources are,” says Ari Kahn, human translational genomics coordinator at TACC. The summer flooding in Louisiana also brought a rapid response from data experts. Both NASA and the National Oceanic and Atmospheric Administration already collect and analyze data related to weather. As the situation unfolded, both agencies provided response teams with valuable real-time data analysis. Last year, NASA created a rapid-response team to support emergency responses. This team works with officials at FEMA and other organizations to figure out what types of data might be most useful in a response (see GCN, “NOAA, NASA support flood response with data,” Aug. 22). However, government agencies also are increasingly interested in how advanced analytics could help address larger-scale problems, particularly in the social arena. For example, the Obama administration launched the Data-Driven Justice Initiative in June. This aims to help state and local agencies develop strategies for reducing the number of low-level, low-risk offenders sitting in jail cells. Among other goals, the initiative aims to “[combine] data from across criminal justice and health systems to identify the individuals with the highest number of contacts with police, ambulance, emergency departments, and other services, and, leverage existing resources to link them to health, behavioral health, and social services in the community,” the plan states. More to Come The recent surge in activity in advanced analytics clearly is due in part to advances in the technology itself, with each new generation of tools adding more capabilities while also growing easier to deploy and use. Expect that pattern to continue. Another pattern has emerged as well: the “aha” moment. It often seems that once agency leaders see a compelling, real-life example of advanced analytics in action, their imaginations begin running wild. In a recent report, International Data Corp., a market research and consulting firm, described big data and analytics as “game changers” for government agencies. Agencies, the report states, “need to effectively evolve their big data abilities as these capabilities are becoming increasingly critical to achieving mission outcomes.”
SPONSORED CONTENT
ADVANCED ANALYTICS: THE ROAD AHEAD
The field of advanced analytics is likely to expand in ways agencies can’t even yet imagine. Still, new and emerging tools suggest some important new directions.
ENTERPRISE DATA HUBS
DATA INTEGRATION
When it comes to launching new initiatives, there’s no reason to start from scratch. An enterprise data hub provides a reliable platform for managing operations and security of advanced analytics initiatives.
Advanced analytic initiatives are often stymied because potentially valuable data is spread across so many different silos. Enterprise-level NoSQL provides a means of storing, querying, and searching a wide array of structured and unstructured data.
LOCATION-BASED INTELLIGENCE Social media feeds have become important data sources in many advanced analytics initiatives. In many cases, the value of social data increases significantly when enriched with user-defined location data.
OPEN-SOURCE SOLUTIONS Solutions based on open source provide agencies with a way of building on cost-effective, proven technology without getting locked into proprietary technology.
BIG TEXT
IN-MEMORY DATABASE
From the early days of big data, experts have always discussed the value of mining text, but early solutions often failed to deliver meaningful results. A new generation of technology and methodology has begun to deliver on that promise.
The sheer volume of data involved in today’s advanced analytics initiatives can overwhelm traditional compute platforms. An in-memory database overcomes that obstacle, improving performance while reducing the data footprint and hardware and operational costs.
VISUAL SEARCH Images and video often contain a wealth of information that goes untapped because of the sheer labor involved in sifting through all that material. That is changing, thanks to tools that automate the process of analyzing, indexing, and tagging visual content.
AUTOMATED DATA PREPARATION The intensive work needed to prepare data for analysis usually requires a data expert, which often creates a nearly insurmountable logjam. The more that process can be automated, the more quickly an agency can get initiatives off the ground.
MACHINEGENERATED DATA Many agencies don’t realize the value or the volume of data generated by their IT infrastructures. This data can provide unparalleled insight into the performance and security of the IT enterprise.
3
ADVANCED ANALY T ICS
SPONSORED CONTENT
BIG DATA’S UNTAPPED POTENTIAL
Open source technology provides strong, cost-effective foundation for data analysis.
G WILLIAM SULLIVAN VICE PRESIDENT PUBLIC SECTOR CLOUDERA
4
OVERNMENT AGENCIES are just
now beginning to tap into the vast potential of the data they ingest from multiple sources. They’re using it to improve operations and help manage their most pressing problems, from cybersecurity to healthcare and benefits fraud to enhancing citizen services and engagement. With the growing popularity of open sourcebased tools designed for a broad range of users, agencies can now more easily make data analytics an integral part of their operations. Examples of advanced data analytics supporting the mission of government agencies include big data being used to predict the spread of the Zika virus, reduce the risks associated with space missions, deter predatory behavior, prevent veteran suicides, and advance precision medicine and genomics. More practically, agencies can use analytics to provide better, faster and more efficient government services while providing opportunities to engage with citizens through social media or similar channels. Most agencies, however, have only just scratched the surface of data utilization, partly because of organizational and cultural challenges that can get in the way. These include a slow, cumbersome budget and appropriation process that makes acquiring modern technologies challenging, as well as the diversion of resources to the large number of legacy systems that agencies continue to support and fund. There’s also the problem of sheer data volume. A recent study commissioned by Cloudera revealed nearly 40 percent of agency data goes unanalyzed because there is simply too much of it. Agencies lack the systems necessary to gather the data they need and, when they do, the information is no longer timely when it reaches those who need it. To truly drive effective change, agencies must be able to harness the synergistic potential of social, mobile, data analytics, cloud and the
Internet of Things. Without the most effective tools and methods in place, the volume, variety and velocity of this data will ultimately overwhelm them. There are many ways to handle big data, but open source technologies are an effective solution and a natural fit for agencies grappling with this challenge in a fast, easy and secure manner. And while security remains a primary concern for many agencies, open source technologies, particularly hybrid or commercially backed open source software, often can be made more secure than proprietary counterparts. This can be achieved by integrating the necessary data security controls into the platform in addition to delivering role-based authorization, allowing for secure sharing. Apache Hadoop is an open source framework for large-scale data processing and analysis, core to Cloudera’s products and solutions. It’s gaining momentum in government because it facilitates analyzing any type or volume of data. Support for Hadoop also addresses the administration’s desire for agencies to share and reuse source code rather than buying duplicative custom software. More frequently, agencies are recognizing the power of data analytics to support their mission by improving operations and service delivery, despite limited resources. The emergence of more powerful, precise and costeffective tools has made advanced analytics a viable option for a broad range of potential users—not just data specialists. It’s time for agencies to tap into the full potential of their data so they can make more informed decisions and better respond to the diverse situations they face every day. Hadoop can help them do this in a cost-effective, secure and powerful way. William Sullivan is Vice President, Public Sector, for Cloudera.
The importance of data-driven government Enhanced data insight has become critical to the success of public sector agencies. A modern data analytics platform from Cloudera enables government organizations to more effectively address their mission by using data to drive highly targeted decision making across the enterprise.
Carahsoft.com/innovation/Cloudera-analytics
ADVANCED ANALY T ICS
SPONSORED CONTENT
INTELLIGENT SEARCH CREATES CONNECTIONS New application applies existing text analytics to take a predictive approach.
E JOHN FRANK CEO, DIFFEO
6
VEN WITH the progress government
has made analyzing its massive volumes of data, much remains to be gained by infusing the research and analysis process with machine intelligence. Most agencies rely on text analytics and big data algorithms, and yet, those tools stop short of truly intelligent search. How do I know what I missed? Traditional search cannot help. It’s considered wrong for a search engine to return a document that does not match the user’s query. To retrieve data, a user must have some idea of what she could find. A new frontier in text analytics is expanding users’ reach by automatically analyzing in-progress notes, such as an email in Outlook or a page in OneNote. This allows machine intelligence algorithms to automatically evaluate vast data repositories to recommend nuggets that fill in knowledge gaps in the user’s document—without waiting for the user to randomly stumble across key facts or manually formulate an explicit query. Machine intelligence is key to these new content recommender tools. Users should expect content recommenders to learn from authoring actions, such as quoting a source or highlighting a key phrase. Once users wrap their heads around the idea of an autonomous research assistant that reads their in-progress notes, they like being prompted by the software to consider new documents. It’s a common practice to run text analytics on finished documents. Applying the same algorithms to a user’s in-progress document is a natural extension, and the results are transformational. This allows the machine to continuously re-evaluate all available data for insertion into the user’s document. Such machines operate at human-level accuracy, and enable humans to operate at machine scale. By participating in the user’s iterative discovery workflow, this new kind of engine can also track what the analyst discards. Such feedback allows the machine to learn
automatically and actively throughout the user’s long-running exploration. The recommender algorithm can follow the document even as a user shares it with her colleagues and multiple people engage in editing and discovery. This new application of text analytics accelerates both junior and senior analysts. It learns from the user’s triage of new content, and provides a quick picture of the knowledge graph surrounding a user’s notes. By keeping track of which recommendations a user accepts or rejects, the tool “hangs in there” as the researcher’s thoughts evolve. Researchers can also use this type of intelligent search assistant to track complex connections. For example, with traditional search tools, a researcher tracking an organization’s activities might miss a document indicating that one of its partners or suppliers built a new facility next door – a classic unknown unknown that this new approach can detect. A content recommender can automatically formulate queries to multiple backend data stores, allowing the user to focus on the content rather than handjamming dozens of queries. Many tasks previously considered “AI hard” become tractable when organized as recommender tasks. For example, a content recommender can accelerate a cyber threat analyst gathering evidence for links between hackers, tools, infrastructure and victims. Content recommenders can pull data from across languages and disparate data sources multiple hops out from the user’s current notes. All organizations that process information will benefit from this new kind of intelligent search. From advanced threat intelligence to solving a backlog of information requests, putting machine intelligence into your in-progress documents allows you to uncover the whole story, so you can ask the big questions immediately. John Frank is CEO of Diffeo.
Find Your Unknown Unknowns With every line you write, Diffeo learns more about your interests and how you see your world. Behind the scenes, it uses that information to scour the deep Web and your private data sources to build a dynamic graph of the people, places, things, and events that matter to you. As you work, Diffeo recommends sources that you may never have known existed, and makes connections even when they are deeply buried or hidden on purpose. With Diffeo, your workflow is unbroken, and the insight into your world gets clearer and clearer.
Discover the whole story. Learn more at carahsoft.com/innovation/diffeo-analytics
ADVANCED ANALY T ICS
SPONSORED CONTENT
ASK AND YOU SHALL FIND: VISUAL SEARCH SOFTWARE FOR THE 21ST CENTURY Identifying and analyzing visual images is critical for security applications.
T
JOSEPH SANTUCCI PRESIDENT & CEO PIXLOGIC INC.
8
HE GROWTH OF SOCIAL MEDIA
and the widespread use of mobile devices have led to an explosion of photos and videos posted on the Internet. These images are an increasingly important part of research, security, and investigations at all levels of government. But the sheer volume makes it nearly impossible for those assigned to review these images and come up with keywords for every frame to make analysis possible. Being able to locate and analyze all types of images based on their visual content is something the artificial intelligence community has been working on for the past 40 years. It requires teaching a machine how to see and understand by encapsulating the way a person sees and comprehends using both their eyes and brain. Working with some demanding customers, including the US government, my company has been able to create software that’s capable of analyzing and searching pictures, video, and satellite images based on their visual content, much like an actual person would. The software can separate man-made items from naturally occurring things, understand context in an image, distinguish faces and facial features, identify, recognize and automatically label objects, detect and recognize text in several languages, and create descriptions on the fly for everything it sees. The technology makes it possible to recognize, locate, and count objects such as planes on a tarmac, or cars on a road, or in a military supply chain context, for instance, parts and equipment on shelves in a warehouse. All of this generates data that is very useful and can be mined for a variety of purposes, thereby putting some very interesting image-understanding capabilities in the hands of regular users.
Facial recognition is another valuable function in the software. People can be detected, recognized, and automatically named. This capability can be used in a surveillance context (e.g., monitor street cameras to detect, recognize, track people of interest) or to mine and discover faces on social media (e.g., discover connections among people). There are many other use cases as well. For instance, if you can detect a face, an object, or a text string, you can also redact them. Law enforcement and military staff can now use this technology to automatically create redacted versions of their images and videos where all or specific faces, objects or text in can be blacked out or blurred for subsequent distribution to partners or the public. Computer vision, machine learning and artificial intelligence, are all very difficult things. We have worked hard to make these technologies work for the user in a transparent way so that they can interact with the system intuitively. For instance, because the software can recognize multiple objects, faces, etc. in a picture, users can express not only simple queries (find me this person or this object) but also more sophisticated ones. They can upload a snapshot of a person and a snapshot of a car, for example, and then request all video segments and images that include one but not the other or both in the same shot, greatly speeding the analysis process. Pictures and video have long been effective weapons in the fight for national security and are invaluable to our intelligence and defense communities in their efforts to expose and track potential threats. With new advances in AI-based software, we are making powerful analysis capabilities more broadly available. Joseph Santucci is the President and CEO of piXlogic Inc.
The Visual Search Engine Visual search solutions that automatically analyze, index, and tag the contents of images and videos, providing an unparalleled level of search functionality. Content Discovery Find pictures or videos that contain specific objects, scenes, text, or people of interest Content Auto-Tagging Automatically label an image or video Content Alerting Inform users when items of interest appear in live video streams or web crawls
The piXlogic product suite offers a comprehensive set of solutions for applications in E-Discovery, Security, and Retail. Whether you are a small workgroup or a large enterprise, there is a version of the piXserve software that addresses your needs.
piXlogic solutions are now available on Amazon Web Services (AWS) Marketplace for the U.S. Intelligence Community (IC)
LEARN MORE
carahsoft.com/innovation/piXlogic-analytics
Amazon Web Services and AWS Marketplace logo are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.
ADVANCED ANALY T ICS
SPONSORED CONTENT
ACCELERATING THE IMPACT OF BIG DATA INITIATIVES WITH DATA WRANGLING
Employing data wrangling techniques can help expedite advanced data analytics.
T
ADAM WILSON CEO, TRIFACTA
10
O BEAT THE CLOCK on timesensitive big data initiatives—security threat detection, disease suppression, emergency evacuation planning—today’s organizations are taking a hard look at their traditional processes to identify what could be made more efficient. The biggest culprit? Data wrangling, or the process of converting raw data into a format that is usable for analysis, which consumes up to 80 percent of any analysis process. It’s not just the volume of data the makes data wrangling so time-consuming, but the variety, too. Today’s organizations must contend with a range of complex data—everything from weather data to call detail records to medical files—which, taken at face value, can be indecipherable to business-oriented analysts. In most cases, the work of wrangling this data into a digestible format gets shipped out to their tech-savvy counterparts, who are limited in number (not to mention tasked with other work) and can take weeks, even months, to deliver on the data wrangling requirements. With the right tools, organizations can eschew traditional processes for a more agile solution, one that automates the bulk of this process and allows analysts to work directly with the raw data and quickly transform it into the required format for analysis. It has tremendous impact on the entire analysis process. Cutting out the (technical) middlemen allows organizations to scale wrangling operations across the entire organization, accomplishing more analysis in less time. The Center for Disease Control knows the importance of data wrangling all too well. When faced with an HIV/AIDS outbreak in Austin, Indiana last year, officials sought out the source to quickly put a stop to the spread of the disease. Leveraging a combination of internal CDC data, historical, geospatial and clinical data, the team of analysts was ultimately able to determine that the root problem was intravenous drug use and
contaminated needles isolated to a particular strain of HIV. Not only was the CDC able to more efficiently identify the cause of this outbreak , they were also able to be more effective. The analysts with the most context of the problem at hand were working directly with the data, instead of needing to delegate requirements about a dataset to others. It gave them the opportunity to iterate early on, reducing cycle time of these iterations, and undoubtedly giving them more confidence in their final results. Beyond the CDC, data wrangling has proven its worth to a multitude of organizations with varying use cases from cybersecurity and fraud detection to natural disaster mitigation to predictive maintenance and repair. When it comes to detecting fraud, for example, agencies are using advanced data wrangling to combine and standardize a diverse set of user, system, and application data that allow them to more quickly identify suspicious behavior. To address evacuation plans for natural disasters, agencies are leveraging real-time and historical weather data with geospatial data and third-party travel systems data to better predict the impact of an impending disaster. Finally, predictive maintenance use cases allow military agencies to combine sensor/log data from aircrafts and ships with historical maintenance records and flight log data to proactively address maintenance. While seemingly simple, the time saved from more efficient data wrangling can mean all the difference in suppressing an outbreak, ceasing a security breach, or identifying and correcting a defective airplane part. Data wrangling has never been so readily available as it is today, empowering today’s organizations to unlock the potential of their data. Adam Wilson is CEO of Trifacta.
ADVANCED ANALY T ICS
SPONSORED CONTENT
TURNING MACHINE DATA INTO OPERATIONAL INTELLIGENCE
Machine data is the most valuable segment of big data analytics.
A KEVIN DAVIS VICE PRESIDENT, PUBLIC SECTOR, SPLUNK
12
DIGITAL REVOLUTION is underway that is changing every aspect of every mission and agency. It is shifting business models to online and mobile platforms, opening new opportunities while introducing uncertain risks, and making agile and on-demand interactions the new normal. Gaining deep insights into patterns and behaviors— both machine and human—are imperative to navigating this transformation. Technology leaders are looking at the promise of big data and analytics for this purpose with the hope that endeavors in this realm would help them manage risks better, gain enhanced situational awareness, better understand constituent needs, ensure faster resolutions, and even foresee issues and prescribe remediations to ensure mission success. At the core is machine data—the digital exhaust continuously created by all systems, technologies, users, and infrastructure at the center of this digital transformation. Machine data is generated when a security device senses an event, when a citizen accesses a website, when a warfighter uses his mobile device— literally from any digital activity. It has proven to be the most complex, yet most valuable, segment of big data. Turning machine data into Operational Intelligence—real-time insights that provide an understanding of what’s transpiring across an agency, its activities, and the supporting infrastructure—enables informed and confident decisions for traversing this transformation successfully. The unprecedented visibility gained when machine data is collected, correlated, enriched, and analyzed has proven to resolve complex and tedious issues across an organization, from security to IT operations to mission challenges. Big data attributes—volume, variety and velocity—require technologies that can match
them in terms of scale, ingestion, and agility. Given adaptability as one of its core attributes, machine learning has come to the forefront in facing these challenges and winning favor as a preferred method. Machine learning can analyze vast amounts of data much faster, helping separate the signal from the noise, leaving the cognitive requirements of inquisition and decision making to humans. The more data it is fed, the more accurate the results. It allows organizations to quickly build models that mimic real-world scenarios and environments, create baselines of normal activities, set dynamic thresholds to account for acceptable variances, and quickly identify anomalies based on situational context. When machine data is enriched with other structured and unstructured information and overlaid with machine learning, organizations can optimize IT operations, enhance security, manage risk, detect and even anticipate incidents, and take action before any adverse impact. The rising proclivity for big data and analytics is driving a panacea on the supply side. While point products may solve niche problems, most of them are not built to account for all the complex attributes of big data. Organizations would be well served in considering a platform approach that is inherently built with these attributes in mind. To derive value across the agency, leaders should ensure that the platform can manage data collection and associated flows, make ingestion easy without requiring explicit normalization, and provide visualizations that will help them make sense of the information quickly in the context of their objectives. Kevin Davis is Vice President of Public Sector for Splunk.
HE SAVED MILLIONS LAST YEAR. HOW? HE’LL NEVER SAY.
Splunk® solutions help organizations turn machine-generated big data into Operational Intelligence—valuable insight that can make them more efficient and secure. Government organizations use Splunk software and cloud services to reduce costs and streamline operations, but only a few of them will talk about it.
Gain Operational Intelligence. Get the Splunk platform through Carahsoft. Learn more at www.carahsoft.com/innovation/splunk-analytics
© 2016 Splunk Inc.
ADVANCED ANALY T ICS
SPONSORED CONTENT
IS YOUR DATABASE STUCK IN THE PAST? Open source technology delivers cost and speed benefits for modern apps.
W KELLY STIRMAN VICE PRESIDENT STRATEGY, MONGODB
14
HILE THE TERM “BIG DATA”
has become a tired cliché, there are reasons why people inside and outside of technology are still talking about “Big Data” nearly a decade after the term was coined. The reality is that data volumes are growing exponentially and 80 percent of the data we now generate is semi-structured and unstructured, which doesn’t lend itself to being stored in the rigid row and column format of traditional relational databases. While relational databases have served us well for over 30 years, they are not the right solution for building modern apps that are often running in the cloud and need to store, process, and extract insights from rapidly changing, multi-structured data generated by billions of sensors, mobile devices, and social networks. User expectations have also changed. 24/7 connectivity is a must and digital systems cannot appear slow or go down, even for fractions of seconds – “slow is the new down,” as the kids say. Government agencies face additional challenges with a number of regulations to meet and the increased necessity to be agile and responsive to new application demands. On top of this, government agencies need to find ways to modernize and serve large audiences while working within the constraints of limited budgets. Fortunately, there is a solution. NoSQL database technologies have become pervasive because they allow companies and government agencies to modernize. Organizations are realizing the benefits of NoSQL technologies in terms of building applications faster; adapting to changes in their mission rapidly; easily scaling to accommodate increases in demand; delivering an “always-on” experience to their users; and attracting top technical talent to their teams.
But it’s not just about the technology, it’s also about the licensing model. Most NoSQL products are open source, and this approach to technology fundamentally changes the dynamic for government organizations with technology vendors. First, popular open source projects have wide adoption, which means there are many individuals with skills in the project, as well as many complementary technologies that integrate and support the open source project. Second, open source means ongoing costs for using the technology remain low. Organizations can use the project freely and only pay for complementary offerings they feel are of value, such as a subscription whose benefits can be reevaluated each year. Ultimately, the benefits of the open source model accrue to the user, allowing government agencies to focus their budgets and teams on delivering their mission much more effectively. The City of Chicago took a bet on NoSQL and saw breathtaking results. With one person using one laptop and MongoDB’s technology, the organization was able to build an intelligent operations platform in just four months that pulled together seven million different pieces of data from departments around the city and presented it on visual maps in real time. With MongoDB, the City of Chicago was able to improve services, cut costs, and create the foundation for a truly “smart city.” “Big Data” as a term may be tired, but the underlying challenges associated with this trend are very real. Harnessing this data opens the door for positive change. Open source technologies such as MongoDB are an important part of the way forward, and need to become part of the common vocabulary and tooling for every technology decision maker. Kelly Stirman is the Vice President, Strategy, at MongoDB.
ADVANCED ANALY T ICS
SPONSORED CONTENT
DATA BLENDING
Harnessing big data across the enterprise requires a centralized approach.
E
QUENTIN GALLIVAN CEO, PENTAHO & SENIOR VP, HITACHI DATA SYSTEMS
NTERPRISE-SCALE DATA analytics
requires drawing on vast and varied data sources. However, most organizations have their data sitting in silos, in multiple locations, and on different platforms. This makes it hard for organizations to integrate, prep, and govern their data; especially as they undertake modernization efforts such as data center consolidation and grapple with trends such as the Internet of Things and cloud computing. Government agencies have the added challenge of demonstrating compliance with government regulations and ensuring their data management practices meet specific standards. So how can agencies best access and integrate data in a timely manner, keep it secure, and use it as a strategic business tool? With disparate data sources, manually coding transformations for each source would be time consuming and difficult to manage. What they need is a centralized approach. They must develop a clear plan for their big data and data
integrate and process Hadoop data, have an onboarding process that can support many different data sources, and be able to turn data into analytic data sets for users on demand. They should create a single “data refinery” to automate delivering information to a large number of users. A big data orchestration platform such as Pentaho’s can help support the need that government agencies and enterprises have for strong data governance, diverse data blending, and delivering data analytics at scale. A good example is NOAA’s Office of Response and Restoration, which responds to environmental disasters along the coast. The agency collects a wide range of scientific data typically stored in disparate files and databases, and based on different data standards. This makes sharing information and collaborating with teams a challenge. This limitation became more evident during efforts to contain the Deepwater Horizon spill in 2010. After that, NOAA built a new data analytics
“To maximize the ROI on their investments, data professionals need to consider how each phase of the analytics pipeline adds value and supports the overall business goals.” integration projects to address end-to-end delivery of integrated and governed data. The plan should also include business analytics so data becomes a strategic asset that supports the mission. To maximize the ROI on their investments, data professionals need to consider how each phase of the analytics pipeline adds value and supports the overall business goals, from raw data to end user analytics. The plan should be based on an open source framework such as Hadoop that can support big data processing and keep pace with data demands. The framework must also incorporate strong data governance practices tailored to how the data is used. Agencies should empower employees to
16
platform using open source software and data warehouse and business intelligence tools to integrate information into a common framework. As it added new or updated information, data was automatically audited and pulled into a common repository any user could access. As data becomes more contextually rich, it also becomes more valuable. Advanced analytics has the power to transform government agency operations and provide insight that will help them be smarter, more capable and provide better transparency and services to the public. Quentin Gallivan is CEO of Pentaho and Senior VP of Hitachi Data Systems.
The Power of Big Data at Work
LEARN MORE
carahsoft.com/innovation/pentaho-analytics
ADVANCED ANALY T ICS
SPONSORED CONTENT
WRANGLING BIG DATA
In-memory computing offers agencies an efficient way to create value from data.
A LAURA GRANT HEAD OF DIGITAL INNOVATION, PUBLIC SECTOR, SAP
18
LL GOVERNMENT AGENCIES
are seeing data volumes exploding. Agencies are managing data from emails, transactional systems, videos, social media, sensors, and other sources that are creating treasure troves of information about customers, citizens, products, and services. Less than 0.5 percent of all of this data is ever analyzed—yet if organizations leveraged the full power of big data—they could decrease their operating costs by as much as 30 percent. Why are agencies not leveraging these opportunities? They are trying, but as data volumes grow, data environments grow increasingly complex. Agencies struggle with how to get the most out of their data as IT budgets decrease. How can government organizations efficiently bring disparate data sources together and analyze it to make decisions that positively impact the mission? These challenges require a new way of thinking. In the past, analytical data was nicely structured and kept in relational databases and data warehouses. As data demands increased, organizations simply added hardware and designed custom applications. These approaches, however, took more time and cost more than ever intended. Workarounds, such as complex spreadsheets that are unmanageable at scale, have proliferated across departments. Agencies are dealing with this data explosion while also being expected to provide proactive, continuous intelligence and automated responses that are integrated into their operational model on any device. As a result, IT departments must think differently about their approach to garnering insight from data. Instead of traditional data warehouses, organizations should adopt an agile platform approach that leverages the latest in memory technology and has the ability to support geospatial, predictive, text, and sensor data.
In traditional analytical environments, due to technology limitations of the past, data scientists pulled a subset of data out of a database onto their local machines, built models, and then moved the models to a production environment to use the model against more data. This is no longer needed. Agencies can now keep data, modelling, and analysis within the same environment, which is much more efficient. Agencies can reduce or remove nightly, weekly, and monthly data extractions and batch jobs. By combining data analytics and in-memory computing, agencies are achieving impressive results. One organization is crunching a century’s worth of weather data to provide more accurate climate information to drive decision making based on quantitative analysis. Another has been able to reduce crime rates in targeted areas by 55 percent; and several federal agencies have improved their financial visibility and audit preparedness by reducing the time and effort required to perform data acquisition and validation. The impact of analyzing data at the lowest levels can be seen in a state organization struggling to find the root cause of their high infant mortality rates from data residing in more than 800 individual systems. After a quick implementation, they’ve been able to analyze agency information in a factual, realtime, collaborative environment. This type of granular analysis allows these agencies to tailor solutions to the people who need help most. Organizations are starting to come to grips with the growing complexity of their data environment and the limitations of old technology. It’s time to embrace an agile platform to store, access, and most importantly, analyze data. After all, what good is data if you can’t use it? Laura Grant is the Head of Digital Innovation for Public Sector at SAP.
HOW CAN YOU KNOW IN REAL TIME IF EVERYTHING IS RUNNING ON TIME?
WITH ANALYTICS SOLUTIONS FROM SAP, YOU’LL KNOW. Data is all around us – but how can your organization tap into it in time to act? Analytics solutions from SAP help you analyze all of your data in real time. Accelerating knowledge to provide simple insight into every business question, and every business process. And redefining the definition of agility for your enterprise. For more, visit Carahsoft.com/innovation/SAP-analytics
ADVANCED ANALY T ICS
SPONSORED CONTENT
FAST, SECURE, AND READY FOR GOVERNMENT CLOUDS It’s time to consider Enterprise NoSQL.
D IDRISS MEKREZ PUBLIC SECTOR CTO MARKLOGIC
20
URING THE PAST 25 years, relational
databases have been instrumental in helping agencies store, manage, and analyze everything from financial and payroll data to personnel data— in short—anything requiring the query, analysis, and reporting of structured data. In most cases, relational databases are still the gold standard in these areas. Over the last decade, though, agencies have gathered a staggering amount of information. And most of that is unstructured data from documents, emails, social media, geospatial, institutional knowledge, and sensors. These sources have become critically important in helping organizations make decisions and identifying areas of concern. Also, agencies are collaborating with each other more than ever before. These factors have strained relational databases to the breaking point. In many cases, it is virtually impossible to modify the databases and schemas fast enough to react to the constant flow of new data types, much less analyze that data. Here is a simple example: There was a time when government could keep up with world events by monitoring a few TV and radio channels. With this information, they would be sufficiently informed to respond to everything from financial crises to emergencies. Today, however, a tweet from a private citizen could beat a large media outlet in relaying important news. Which means agencies would have to monitor thousands of news channels and multiple social media outlets to even have a chance of keeping up. For situations where agencies have to ingest and quickly analyze many different sources of data, it makes sense to consider a NoSQL database. Unlike relational databases, NoSQL databases are ideal for ingesting, organizing, and
analyzing large volumes of disparate data types, from geospatial and temporal data to semantic, textual, and statistical information. For mission critical workloads—when security and data integrity is paramount—agencies can leverage an Enterprise NoSQL database to gain insight and make timely decisions. Monitoring suspicious activity is a critical task for many government agencies these days. Doing this effectively means tracking and analyzing many types of information, including social media, the Internet footprints, location data, incident reports, log files, cyber alerts and more—all in real time. With a NoSQL database supporting this process, analysts can quickly find hidden relationships and patterns within the data that can help the mission succeed. Whether considering a NoSQL database for better situational awareness, intelligence analysis, information sharing, or simply to manage multiple data types and feeds, it’s important to ensure the NoSQL database you choose can handle government requirements. Not all NoSQL databases are enterprise-grade, but it’s critical for government. Enterprise-grade NoSQL solutions can safely store and manage mission-critical information in compliance with government policies and regulations. More than ever, now is the right time to consider NoSQL. Government agencies must manage, analyze and share more disparate data sets quickly and securely. At the same time, NoSQL databases are easier to use and more feature-rich than ever before. Today’s enterprise-grade NoSQL databases are secure and ready for government clouds, which helps accelerate their agencies’ mission while decreasing costs. Idriss Mekrez is Public Sector CTO of MarkLogic.
INTEGRATE. STORE. MANAGE. SEARCH.
Data Integration at Mission Speed You need a 360 degree view of data, but getting it is near impossible. Data is spread across disconnected silos and data integration lags the speed of operations. It’s time to rethink what’s expected of a database. MarkLogic is the world’s best database for integrating data from silos. Our operational and transactional Enterprise NoSQL database platform integrates data better, faster, with less cost. Learn more at carahsoft.com/innovation/marklogic-analytics WWW.MARKLOGIC.COM
ADVANCED ANALY T ICS
SPONSORED CONTENT
Executive Viewpoint
ONE-ON-ONE WITH KRIS ROWLEY
GSA’s chief data officer shares his views on making data a part of the conversation.
KRIS ROWLEY CHIEF DATA OFFICER GSA
The General Services Administration is one of the most data-heavy agencies in the federal government. It generates millions of pieces of information that is useful for internal and external stakeholders. Kris Rowley is GSA’s Chief Data Officer. He sat down with Francis Rose, host of Government Matters on ABC 7 and News Channel 8, for an exclusive interview on how GSA uses its data, makes sure its data is clean, and what the future holds for data management and use at the agency. Rose: What are the most common or the most popular kinds of data your internal stakeholders want to use? Rowley: I can split that into two categories. One is the operational program data, the work our Federal Acquisition Service and Public Building Service provide to our federal partners. The second bucket is on the management side. That includes the C level organizations, like the CIO, the CFO, the Chief Human Capital Officer, and so on. The data crosses over a lot, specifically from the management side to the business side, so they can better assess how much it costs them to do business.
Rose: How are you merging the legacy data that you’ve had around for a long time with what you’re able to collect now digitally? Rowley: I think technology over the last few years has made it easier to do that. What technology hasn’t done is make it easier to ensure that we have high quality data that we’re blending together. The people who were managing data 10 or 15 years ago didn’t envision the types of analysis or visualization tools that we have today. In a lot of cases, the data quality suffered, because it wasn’t going to be used in that manner. Rose: What are the best practices that you’ve established about data hygiene? Rowley: To me, attempting to go back and clean up all of your data is a high risk for failure. The position I’ve been pushing people towards has been taking the data as it exists from the system, consuming that in an environment where you can blend datasets together to your heart’s content, and then layering on an analytics tool or visualization tool to present the data. If during that
“As long as there’s about an 80 to 85 percent accuracy within all the data, you can still think it’s worth presenting, and it still could provide some insights, even if you have to add an asterisk in the disclaimer.” 22
SPONSORED CONTENT
“The discussion should be around where the information is coming from, what all the possible options are, what all the possible pitfalls are, and diving into the conversation on the issue you’re having.” presentation, data quality issues come up, don’t panic about that, but proceed. As long as there’s about an 80 to 85 percent accuracy within all the data, you can still think it’s worth presenting, and it still could provide some insights, even if you have to add an asterisk in the disclaimer. But don’t manually clean up that data for the particular presentation. Highlight it during some of the initial discussions and briefings for the executives. It’s so easy for people to say, “Well, I know what that zip code should be for that city, I’m just going to go in and change the number for my presentation,” and I keep telling them, those changes have to be made in the source system, by the people who are managing the systems; otherwise you’ll just never stop cleaning up data manually. Rose: Are we far enough along now in this data revolution where that’s an acceptable answer to people who aren’t data professionals? Rowley: I would say maybe one or two percent. I think it’s a work in progress. Sometimes they’re focused on the four or five percent of the data that’s not accurate or shows something funny, and they say, “I just can’t trust this data. You’ve got to clean this up before I look at it.” That still happens a lot, and my response is, “I can clean this up in five minutes, but that’s not going to change what’s in your system of record, and that doesn’t necessarily change the intent of this conversation.” That’s an uphill climb that we’re still climbing. We’re making some headway at GSA. Probably the biggest tool in our toolbox is formalizing an executive governing body over data management. I chair that and it includes deputylevel commissioners, the CFO, the CIO, and the Chief of Staff. They all sit in for an hour every four to six weeks, where I talk about the data projects I’m working on, the things I’m doing, and the challenges we’re having. I continue to talk to them about things that are going well and things that aren’t going well. It’s an interesting dialog, and it’s one that I think needs to be happening more across federal government.
Rose: I wonder if sometimes we focus a lot on data for the data’s sake, and don’t think about it for the decision making which is the entire purpose of being able to look at it and use it. Rowley: I think there’s an art to presenting analytics. This isn’t a PowerPoint deck where I’m sitting down, feeding you information and at the end giving you two recommendations for you to make a decision. Data analysis is a detailed and complicated issue. The discussion should be around where the information is coming from, what all the possible options are, what all the possible pitfalls are, and diving into the conversation on the issue you’re having. Rose: I imagine a lot of those conversations involve helping the non-data analytics people understand that they should be able to ask questions of the data, and use the answers to those questions in the decision-making process. Rowley: I think there are two ends of that spectrum though. For example, if I would start off at the beginning of a meeting and say, “rate what I’m about to present you from one to ten, one being ‘This is completely useless; it does not meet my needs and I will never make a decision off this information,’ and ten being: ‘This solves all my problems, let’s meet every day because this is going to be the easiest thing I’ve ever done in my life.’” If I end up at a one or a ten, I’ve failed. I’m hoping for a 7, which means “This is really useful, but I completely understand that there has to be context associated with this, and I need the business team to weigh in as to why a predictive analytical presentation may or may not come true.” Data’s not the beginning or the end. It’s just part of the conversation, and I’m just trying to insert it within the conversation as much as possible.
Learn more at carahsoft.com/innovation. 23