Dark Data Management:
The Next Frontier for Government Data RESEARCH BRIEF
Introduction It’s no secret that the growth of data in government is occurring at exponential rates. This has caused several issues within the public sector – rising costs of storing that data, difficulty accessing and analyzing it, and more. But there’s another data problem affecting the public sector: the growth of data when information or value is unknown. That data is called dark data. Think about dark data in these terms: Picture a person who hoards everything in his garage. It may be gardening supplies, boxes of books, sentimental items, clothing and other various items. At one point, it made sense to store these items in the garage – they might be needed later on, or they have a particular value, whether it’s economic or sentimental. But we all know people whose hoarding has gotten out of hand. They store items that have no significance because they convince themselves they’ll need them down the line. And eventually there comes a time when they run out of space. They can’t even find the items they need anymore. The garage may be so full that it crowds out a parking space. In short, the hoarder has taken an approach of storing everything without thinking strategically about its value or if it even makes sense to store it. This causes numerous problems and leads to lost time and money.
Across the public sector, various types of data, structured and unstructured, current and old, sensitive and trivial, pile up in disparate systems like tiny needles in an ever-growing haystack. And this dark data is costing government in terms of time and value. Not only is it costly to store dark data, but it also prevents agencies from finding data of real value that could inform decision-making to better serve citizens. To better understand the cost of dark data in government and how agencies can overcome it, GovLoop partnered with DLT Solutions and Veritas, a leader in data management, for this research brief. To gain additional insights from the government community, GovLoop surveyed 203 public-sector employees on everything from how dark data limits their day-to-day jobs to the roadblocks preventing them from addressing dark data problems and more. We also interviewed Matt Malone, Sales Engineer, Enterprise Data Management at DLT Solutions, and Darryl Richardson, Senior Systems Engineer at Veritas, to learn how they are helping current federal customers overcome dark data challenges.
That’s very similar to the government’s dark data dilemma. Agencies amass data that has no value assigned to it—it is ROT: redundant, obsolete or trivial. This ROT data is stored in government systems, and continues costing money and obscuring more relevant data.
Research Brief 2
The use and importance of data in government is growing rapidly. When used effectively, data analytics can help save lives, improve efficiency, reduce costs and allow the government to deliver better citizen services. But when the useful and relevant data used to make those improvements is overshadowed by ROT data, the efficiencies are no longer there, and decisions can get muddled. So as the importance of data analytics in government grows, it becomes ever more important to reduce the amount of dark data obscuring the data that provides real value and insight.
Dark Data Management: The Next Frontier for Government Data 3
The Reality of Dark Data in Government Today FIGURE 1
Is the amount of data your organization stores growing exponentially? It’s staying the same 10.3%
FIGURE 2
Is dark data a challenge for your organization?
Yes
79.9%
No
Yes
31.8%
68.2%
No
FIGURE 3
What percent of your total data do you estimate might be dark data? 75% or above 9.7%
0-25% 29.1%
50-75%
9.8%
22.4%
25-50% 38.8%
According to the GovLoop survey respondents, they are indeed feeling the pain of not just dark data, but overall challenges in data growth in their organizations. Eighty percent said that data is growing exponentially at their organization. (Figure 1) This is a fact reflected across all of society: Big data and its growth is here to stay. Consider that 2.7 zettabytes of data exist in the digital universe today; 100 terabytes of data are uploaded daily to Facebook and data production will be 44 times greater in 2020 than it was in 2009. It’s no wonder dark data is becoming such a significant challenge for government, a fact that GovLoop’s survey respondents acknowledged. Sixty-eight percent admitted that dark data presented a challenge for their organization (Figure 2), and nearly 40 percent believed that between one-quarter and one-half of all the data their agency had was dark data. (Figure 3) More alarmingly, 10 percent believed more than 75 percent of their data was dark data. None of these facts surprised Malone or Richardson. “Government data growth year over year is close to 40 percent, and storage capacity only grows at about 9 percent,” Richardson said. “One petabyte of data equals about 2 billion files, with an average
file size of 40 kilobytes. So you can just imagine the human element that needs to filter through these files. And then you’ve got other challenges. You have multiple departments that have different decisions, you’ve got very few records management personnel in these organizations that are responsible for looking at all of this unstructured data, and then manually classifying it, tagging it and so on. It’s a huge task for people to have to deal with.” So why is this huge growth in data and dark data happening today? Besides just the pure growth and volume of data creation that’s occurring at never-before-seen levels, it turns out there are a variety of other factors contributing to the issue. According to the GovLoop survey, the No. 1 reason (nearly 49 percent) respondents struggle with dark data is that they simply lack the time to strategically address data issues. Coming in second (27 percent) was “Users treat our storage systems as a data ‘dumping ground.’” Other issues rounding out the causes include “We base our budgets and IT strategies on the volume of data stored and processed, not its value” (10 percent); “Automated applications generate data that is not removed once no longer needed” (9 percent) and “There’s a belief we no longer need to worry about where our data resides while we freely adopt cloud applications and storage” (4 percent). (Figure 4)
Research Brief 4
FIGURE 4
What do you think is the main cause of the growth of dark data in your organization?
48.9%
No time to strategically address our data issues
27.4%
Users treating storage systems as a data “dumping ground”
10.4% 8.9% 4.4%
We base our budgets and IT strategies on the volume of data stored and processed, not its value Automated applications generate data that is not removed once no longer needed A belief that we have no need to worry where our data resides while we freely adopt cloud applications & storage
These factors are not at all surprising in today’s government environment, Malone and Richardson agreed. Basing budgets and IT strategies on the volume of data stored and processed, rather than the data’s value, rewards bad behavior, and there is a real risk in the “free storage” myth. “This belief makes us think we have no need to worry where our data resides while we freely adopt cloud applications,” said Malone. Data dumping grounds are a real issue, too. While the statistics on personal use of storage of files in government are not known, according to a recent Veritas survey, in the private sector, employees amass everything in terms of unstructured data, from personal photos to personal ID and legal documents, as well as music, games and videos and more. Based on their work with government agencies, Richardson and Malone have found that the biggest issues driving the growth of dark data is a lack of leadership, strategies and accountability around data. Many wonder: How did we get here? What are the main causes? “Agencies that have data ownership and procedures for dealing with data already in place typically don’t have an exponential or dark data
FIGURE 5
What issues does having dark data present for you as an organization? (Pick top three) Means we can’t derive the best insights from our data 65.7%
Costs us time 56.7%
We can’t meet our mission as easily or effectively 35.8%
Costs us money 32.1%
Gives us security issues 24.6%
Causes user problems 22.4%
Gives us compliance issues 20.1%
Difficulty in justifying hardware purchases until we can best identify what we are storing 15.7%
growth problem,” said Malone. “Some agencies are doing something right with their procedures, their policies, with what they have in place to curb that data growth. It really comes down to making the data creators, or their supervisors, accountable to management for everything that is stored on the system. That doesn’t happen enough in the public sector.” And until leaders and those responsible for data compliance in agencies become more proactive in their strategies, the startling realities and ramifications of dark data will continue to unfold. At 66 percent, the clearest ramification caused by dark data, according to respondents, was the fact that agencies “cannot derive the best insights from our data.” Fifty-seven percent said that dark data costs them time, while 36 percent said it hinders them from effectively meeting their mission. Other challenges posed by dark data included compliance issues, user problems, security issues and difficulty in hardware purchases. (Figure 5) In short, a growing amount of dark data can drag a government agency down, causing inefficiencies, poor decision-making, rising costs, increased risk and potential compliance issues. So what can be done?
Dark Data Management: The Next Frontier for Government Data 5
Solving Dark Data With the Right Technology It’s clear that a proactive approach to governance of data is mandatory. There is a sharp need for automation and accountability to support large-scale data management and data protection efforts. But according to GovLoop’s survey results, the right solution to solve these dark data management needs has not been found.
FIGURE 6
FIGURE 7
Do you already have a technology solution in place to deal with managing data?
Are you looking to implement a better solution to deal with dark data?
50.4%
58.3%
The fact is, that while over 50 percent of our survey respondents already have a solution in place to deal with dark data, (Figure 6) 58 percent are still looking for a solution that better meets their needs in this arena. (Figure 7) And overall, nearly half of the survey respondents do not even have a dark data management solution in place, leaving them at a loss for how to proceed.
YES
YES
Clearly, a majority of government employees are looking for a new technology approach to better manage dark data and its ramifications. So what capabilities are agencies looking for in a dark data management solution?
FIGURE 8
Far and away the top answer, at 65 percent, was technology that offered an easy-to-use interface, with analytics capabilities close behind at 56 percent, and better visibility, governance abilities and a record classifications workflow coming in nearly all tied for third place. (Figure 8) These results reflect growing trends in government data, storage and analytics, Malone and Richardson said. “A data management platform must be easy for anybody to use and understand,” Malone said. This is because today, more and more non-technical people are involved in the creation, discovery, analysis of and classification of all sorts of data across departments. Analytics capabilities are a must, given that most data in government is used to inform critical decisions and answer questions that support mission need. And visibility, governance and automation workflows are crucial, because without those capabilities, the dark data that we’ve discussed will continue to multiply. These capabilities can be hard to achieve, however, because of legacy technology that is still used by agencies. So what can be done?
What capabilities do you need a data management solution to be able to offer your organization? (Pick your top three) Easy-to-use interface 65.2%
Analytics 56.1%
Better visibility 39.4%
Governance abilities 37.1%
Records classification workflow 36.4%
Backup and recovery 22.7%
Archiving 22%
Research Brief 6
How to Shine a Light on Your Data Veritas and DLT can help organizations improve unstructured data governance to reduce costs and risk, and achieve compliance through actionable intelligence into data ownership, usage and access controls. The reporting, analytics and visualization capabilities in their products can shine a light on the data by giving organizations an understanding of what data exists, how it is being used, who owns it and who has access to it. “We offer products that will give public-sector employees insights and assessments into their data,” Malone said. “It can help them analyze systems, automate the process of workflow, identify data curators, identify who owns what and keep that information automated.” Once Veritas and DLT have helped an organization analyze the existing data, they then work to help agencies identify dark data, and expose the risks and extract the value from the information.
To simplify the decision-making process associated with growing unstructured data, Veritas also provides an integrated File Governance solution to help organizations scan their unstructured data environment, classify their information and decide whether to retain it. The Veritas Data Insight solution helps organizations improve unstructured data governance to reduce costs and risk, and achieve compliance through actionable intelligence into data ownership, usage and access controls. The reporting, analytics and visualization capabilities in Data Insight help government understand their data and who can access it. “We want to help make your process in your daily life much easier, and that’s what our data management solutions are ultimately for,” Richardson said.
Dark Data Management: The Next Frontier for Government Data 7
CASE STUDY:
Creating Efficiencies & Saving Money With Dark Data Assessments A large financial institution in the U.S. discovered that it was about to be hit with millions of dollars in fines from the Federal Communications Commission (FCC) because its dark data problem had gotten out of control without its knowledge, and it was about to be out of compliance with FCC regulations. To rapidly deal with the problem, the company called the Veritas team in for its expertise and dark data management solutions. “Our team went in, evaluated some of the things that they had, and saw right away that they had too much data sitting there that had no way of effectively being identified,” Richardson said. Veritas’ first step was to scan the file data, collect custodian information and the date ranges of the information. “We also worked within the regulations of the institution,” said Richardson, which meant developing a system that could identify user data, and discard any user data that was more than two years old unless it needed to be retained for legal reasons. “This financial institution turned out to have, estimated, about 250 billion files over 72 petabytes of data,” Malone said. “Now, with our automated solution, as opposed to a legacy or manual one, they’re capable of removing 40 terabytes a month of old information because they can identify and filter it. Clearly, that whole verification process is the key.”
The client had an application that logged troubleshooting issues, and those logs were then stored on a file sever that had 3 petabytes of storage. In just over a month, however, that server storage was already 80 percent full. What was happening? “We did a dark data assessment on one file and saw that there were literally tens of thousands of these 10 gigabyte log files in a specific location,” Malone said. “So we went back, we analyzed what the log was and we found out it was coming from this application.” The Veritas team went into the application, and discovered that Deep Level Logging had been unintentionally turned on. Just by turning that off, this client was able to immediately rid itself of almost 2 petabytes of log files that it had never even intended to store.
“Here they are, thinking that they’re running out of storage, when in fact it was really a human issue of turning on a logging feature by accident,” said Malone. “So the dark data assessments that we can do are
In addition to implementing systems that automated the data retention, identification and archiving processes, DLT and Veritas also provides dark data assessments for their clients.
very effective in identifying
“In one case, we worked with a client who could not figure out where their data was being created, so we started out with a dark data assessment for them,” Malone said.
happening for reasons you’re
storage growth that may be not even aware of.”
Research Brief 8
Conclusion Today’s rapid pace of data growth provides agencies with unprecedented insight into the needs of the citizens they serve and the missions they need to meet. With more data, however, comes new challenges around organizing, storing and protecting it all. There’s also the challenge of managing the dark data that inevitably occurs along with the growth of existing data and information. By working with the right partners and having the right data management strategies and accountability around their data, federal agencies can deliver the information insights they need to meet their missions, while protecting themselves from an information crisis.
About Veritas
About DLT Solutions
Veritas Technologies enables agencies to harness the power of their information to drive mission success, with solutions designed to serve the world’s most complex, heterogeneous environments. Veritas works with Federal agencies to help them improve their data availability and unlock insights to make them more successful in support of the overall mission. From traditional data centers to private, public, and hybrid clouds, Veritas helps agencies protect, identify, and manage data regardless of the environment through a comprehensive product strategy and roadmap focused on Federal agency needs. Veritas’ products help automate information management and reduce manual efforts. To learn more, visit: www.veritas.com/solution/government.
For 25 years, DLT Solutions has been dedicated to solving public-sector IT challenges. Guided by our relentless focus, we have grown to be one of the nation’s top providers of world-class IT solutions. Leveraging our strategic partnerships with top IT companies, we develop best-fit solutions for our federal customers. To learn more, visit: www.dlt.com/governmentproducts/veritas.
About GovLoop GovLoop’s mission is to “connect government to improve government.” We aim to inspire public-sector professionals by serving as the knowledge network for government. GovLoop connects more than 250,000 members, fostering crossgovernment collaboration, solving common problems and advancing government careers. GovLoop is headquartered in Washington, D.C., with a team of dedicated professionals who share a commitment to connect and improve government. For more information about this report, please reach out to info@govloop.com. www.govloop.com | @GovLoop
Dark Data Management: The Next Frontier for Government Data 9
1152 15th St. NW Suite 800 Washington, DC 20005 P: (202) 407-7421 | F: (202) 407-7501 www.govloop.com @GovLoop
Research Brief 10