White Paper
Closing the Big Data Management and Security Gap
By Nik Rouda, Senior Analyst
October 2014
This ESG White Paper was commissioned by Zettaset and is distributed under license from ESG. © 2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.
White Paper: Closing the Big Data Management & Security Gap 2
Contents Big Data Is Gaining Momentum, but Increasing Concerns, Too .................................................................. 3 Big Data Projects Still Rely Heavily on Professional Services ................................................................................... 3 Security Still a Top Concern for Big Data Platforms ................................................................................................. 4 How Organizations Should Automate and Secure Big Data Deployments ................................................. 5 Zettaset Delivers a Safer, More Automated and Secure Solution .............................................................. 6
The Bigger Truth ......................................................................................................................................... 7
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
© 2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.
White Paper: Closing the Big Data Management & Security Gap 3
Big Data Is Gaining Momentum, but Increasing Concerns, Too More and more companies are exploring new opportunities offered by big data and advanced analytics, across a broad range of industries and functional lines of business. Data-‐driven decision making is being seen not as a luxury, a management fad, or an area for future innovation, but as an essential need in order to compete successfully in the modern world. In parallel or even driving this interest, emerging technologies like Hadoop and NoSQL databases are finding a ready market and are increasingly being chosen as the primary platforms for accommodating the intense demands of big data. The appetite and applications are virtually endless, applicable to nearly any business process or activity, and limited more often by managerial creativity and institutional resistance to change than by technology today. IT budgets are suddenly reflecting this fundamental shift as well, and recent ESG research found 56% of companies surveyed are increasing their investments in big data and analytics by more than 10% in 2014, as compared with the previous year.1 This rapid increase further indicates that most organizations are now moving beyond small pilots and proof-‐of-‐concept stages into enterprise-‐wide production deployments. However, as big data projects migrate from pilot to production deployment and extend beyond the exclusive realm of IT and into the business unit, new factors come into play. How will the enterprise efficiently scale a technology that is still relatively immature and overly dependent on manual installation and configuration processes? How will the enterprise lock down sensitive data in Hadoop and NoSQL environments for Big Data technologies that were never conceived with security in mind?
Big Data Projects Still Rely Heavily on Professional Services Development of a big data solution is still a complex undertaking that is very interdisciplinary in nature, requiring specialized personnel to provide operational support. Hadoop is rapidly evolving, but has not yet reached the level of maturity and sophistication that traditional relational databases offer. There may not be enough in-‐house expertise to understand all the requirements of the new Big Data platforms, making users more reliant on the professional services. Persistent skills gaps in various IT disciplines impact projects, and these include shortages in security (25% surveyed), architecture planning (24%), BI and analytics (20%), and database administration (17%), as shown in Figure 1.2 If unaddressed, these staff gaps will often lead to unforeseen delays and risks in new initiatives. Hadoop and NoSQL technology is rapidly evolving, but has not yet reached the level of maturity and sophistication that traditional relational databases offer. As a result, users expecting lower operational costs by using Hadoop software and infrastructure can sometimes find they must spend significant sums for software support and maintenance in the form of recurring subscription fees to vendors of branded Hadoop and NoSQL distributions. It could be argued that since professional services represent a substantial revenue source for some distribution vendors, they have less incentive to incorporate more process automation into their respective offerings. While this model may have worked during the early phases of Hadoop deployment in pilot environments, it often becomes a resource issue for organizations wishing to scale their deployments in an efficient and cost-‐effective manner. More automation of management tasks could help organizations to avoid having to spend inordinate sums for outside support and maintenance of a technology that has been touted as cost-‐saving.
1
Source: ESG Research Report, Enterprise Data Analytics Trends, May 2014. Ibid.
2
© 2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.
White Paper: Closing the Big Data Management & Security Gap 4
Figure 1. Top Ten Skills Shortages Impacting Initiative Success
In which of the following areas do you believe your IT organizaGon currently has a problemaGc shortage of exisGng skills? (Percent of respondents, N=545, mulGple responses accepted) Informaeon security
25%
IT architecture/planning
24%
Mobile applicaeon development
21%
Business intelligence/data analyecs
20%
Server virtualizaeon/private cloud infrastructure
20% 19%
Mobile device management Applicaeon development
18%
Database administraeon
17%
Data proteceon (i.e., backup and recovery)
17% 0%
5%
10%
15%
20%
25%
30%
Source: Enterprise Strategy Group, 2014.
Security Still a Top Concern for Big Data Platforms As the number of distinct data sources and total data volumes grow exponentially, correspondingly more strategic planning and tactical administration is required, and this basic talent problem is magnified to potentially deleterious effect. This problem can manifest in different ways, but when asked about it by ESG, 38% of respondents cited security requirements as being a top order challenge due to unchecked size growth and proliferation of databases.3 So not only is there more data, in more places, and too few people to steer projects, but also the stakes are raised for protecting this sensitive information in the age of malicious hackers, advanced persistent threats, and occasional internal malfeasance. One implication is that these new big data projects can’t be led solely by the data scientists, analysts, and database administrators. While they may possess the know-‐how to design in new functionality and support new applications, they may not have the detailed understanding and skill-‐set required to manage the security nuances. A copy of privileged data in a test and development environmental is still a copy susceptible to breach, and more worryingly, the end goal of consolidating as much information as possible into a central data lake or hub can further compound the exposure if not handled appropriately. As such, ESG research found that 84% of respondents in a recent enterprise data survey say it is important or crucial that security teams are actively involved in development of new big data and analytics initiatives.4 This is proven out in customers’ lists of technology evaluation criteria for selecting an enterprise data management platform in Figure 2, below. Security is tied for first place as the most important factor according to survey respondents when defining requirements for new initiatives in big data, analytics, or business intelligence.5 With these various challenges in mind, most customers are looking for already proven approaches to achieving better security in the face of pressure to deliver new deployments in the most efficient and cost-‐effective way.
3
Source: ESG Research Report, Enterprise Database Trends in a Big Data World, July 2014. Source: ESG Research Report, Enterprise Data Analytics Trends, May 2014. 5 Source: Ibid. 4
© 2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.
White Paper: Closing the Big Data Management & Security Gap 5
Figure 2. Top Five Most Important Criteria in Evaluating a Big Data Solution
Which of the following aUributes are most important to your organizaGon when considering technology soluGons in the area of business intelligence, analyGcs, and big data? (Percent of respondents, N=375, three responses accepted) Security
26%
Cost, ROI and/or TCO
26%
Reliability
22%
21%
Performance
Ease of integraeon with other applicaeons, APIs
20% 0%
5%
10%
15%
20%
25%
30%
Source: Enterprise Strategy Group, 2014.
How Organizations Should Automate and Secure Big Data Deployments The good news is that as adoption has accelerated and more production deployments are being settled into enterprise environments, there are now some emerging best practices to follow to automate and secure a Hadoop environment. The bad news is that the requisite functionality is by no means yet a standardized part of any particular distribution, and many customers will need to look carefully at vendors’ glib promises to determine for themselves which are most up for the deployment and security challenge. A typical CISO will be interested in establishing sound methodologies for security efficacy, operational efficiency, and enabling the business to conduct activities in a safe manner without undue burden. Both IT and line of business leaders should take an interest and demand the best-‐of-‐breed capabilities outlined in Table 1 from any production solution. Table 1. Four Primary Considerations in Selecting a Secure Big Data Platform
Common Enterprise Requirements
Deployment (incl. automation and integration of tested configurations) Encryption (both at rest and in motion) and/or data masking as appropriate Key management (incl. policies, HA, and key management interoperability protocol -‐ KMIP) User authentication and access control by role for users and administrators
Impact / Benefit Faster time to production and reduced risk of security gaps Safer ETL and storage of everything in data lake/hub Simplified key admin and more reliable access Only approved people can see only appropriate data Source: Enterprise Strategy Group, 2014.
© 2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.
White Paper: Closing the Big Data Management & Security Gap 6
While set up and configuration of a few management and data nodes in a Hadoop cluster may be touted as relatively easy to do, the manual effort introduces chances of errors, which are increased for each additional instance. Having an automated system for deployment simplifies this process, making for both a more scalable and more reliably protected environment. Encryption may seem like a common “tick box” option on many Hadoop distributions, but not all follow the same conventions or coverage model. Ensure that all data on disk is covered with strong encryption, and take steps to also guard against network attacks for data being transferred between nodes; during extract, transform, and load activities; and when exporting information. Data masking can also be useful if certain fields need to be identifiably unique for analytics without exposing their actual contents. Though encryption itself may seem quite simple to turn on, key management is often the weak point of solutions, particularly in larger, more varied, or more dynamic environments. Unique keys should be generated and controlled via customizable policies, kept and provided in a highly available source, and compliant with KMIP definitions. Key management should also have role-‐based administration and auditing capabilities. Even if the whole environment is defended from external attacks using these mechanisms, steps should be taken to limit access to particular data sets for only authenticated users. This should be fine-‐grained, role-‐based, automatically tied into AD and LDAP protocols, and carry over permissions as specified from these proven access control systems. From a broader perspective, additional steps should be explored as best practices, including establishing a security zone for the analytics servers, deploying these servers in a hardened configuration, frequent scanning and timely patching, and traffic monitoring. These approaches are not necessarily different for Hadoop environments, however, and should be considered as a standard part of a larger IT security framework. Although a non-‐trivial undertaking, IT technology decision makers should build these into their “must have” evaluation criteria, and select products that have functionality to match.
Zettaset Delivers a Safer, More Automated and Secure Solution While many companies, young and old, are rushing to capitalize on the new opportunities afforded by big data, many vendors are seeking to provide them with the technology to do so. Of these, some focus on performance, some on connectivity, and some on vertical-‐specific applications. Zettaset is differentiating with a focus on building rock solid enterprise-‐ready management and security applications that augment and improve the branded open-‐ source distribution frameworks. In doing so, Zettaset enables other vendors’ big data solutions to also better meet enterprise operational requirements. As already noted, these requirements may not be top of mind for the DBA or data scientist, but they will be critical steps before IT infrastructure and operations teams can adopt the new solutions and begin enterprise-‐wide production deployments. Zettaset’s Orchestrator provides a more mature, more comprehensive approach to managing big data environments, automating and standardizing common activities like cluster configuration, node deployment, set up of interfaces to applications, general administration, and not least, securing Hadoop environments. With the recent Fast-‐PATH addition, Orchestrator process automation reduces reliance on manual efforts and accelerates database cluster deployment. In the company’s internal benchmark testing, Zettaset found Fast-‐PATH was able to fully install a 50-‐node Hadoop cluster in 140 minutes, which would almost certainly be quicker and less error-‐prone than a manual effort. The benchmark time includes installation of the Hadoop distribution, as well as installation of Kerberos, HBase, Hive, Encryption, Key Management, and Zettaset’s patented High-‐Availability framework on all nodes. Orchestrator Fast-‐PATH dramatically lowers operational costs and reduces the IT resource requirements necessary to implement Hadoop, as well as reduces time to value from weeks to hours. Now Zettaset is going a step further and modularizing key components, like Hadoop security and their patented multi-‐service high availability and automated failover, to more easily complement and integrate with popular Hadoop distributions from Cloudera and Hortonworks. This enterprise-‐class add-‐on functionality enhances the
© 2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.
White Paper: Closing the Big Data Management & Security Gap 7
management and security mechanisms of most branded distributions, and will help address the considerations outlined in Table 1. Specific modularized Big Data management and security capabilities include: •
•
•
Data-‐at-‐rest Encryption – Zettaset offers a standards-‐based, low-‐overhead approach linking up AES-‐256 bit disk partition encryption with existing frameworks, and smoothly interoperates with KMIP-‐ compliant key management, PKCS hardware security modules, and a wide range of leading Hadoop distributions and NoSQL databases. This complements open source encryption approaches for data in motion in Hadoop clusters, and also ensures the Orchestrator console communications are safe. Multi-‐Service High Availability -‐ Hadoop cluster environments are complex, and require multiple services to productively function. Zettaset Orchestrator uniquely delivers enterprise class high availability with automated fail-‐over for all Hadoop services running in a cluster, eliminating single points of failure that exist in open source Hadoop, and delivering the robust security and compliance capabilities that enterprises expect and need. Fine-‐Grained, Role-‐based Access Control – Because Hadoop may often contain a wide range of information, both management tools and data itself must be restricted to those who “need to know.” Fine-‐grained controls ensure that roles and permissions can be easily customized, and that only appropriate administrators and users can make changes or access sensitive information.
Zettaset has a bigger vision, too, including smoother deployments, better reliability, improved performance, and easier support and administration for broader big data environments. Centralizing and certifying management of all required functions to meet enterprise operational standards will go a long way to facilitating the adoption of technologies that are still evolving and maturing. Modularizing the Zettaset offerings opens them up to the wider community with a flexible “a la carte” menu to suit specific enterprise requirements, while also paving the way for an expanded, more comprehensive, and fully integrated solution for big data management and security.
The Bigger Truth Big data is rapidly entering the mainstream, and new data platforms like Hadoop and NoSQL databases are becoming increasingly popular tools to capture and serve up more enterprise data than ever before, spanning sensitive personal profile, health, financial, and sometimes R&D information. Not only is more data being collected and compiled into a single repository, but also more people are being given access to this data across multiple lines of business for application development and for analysis and reporting. Yet these emerging technologies are not yet fully mature in their security capabilities, increasing the risk of a “super breach.” The financial repercussions and brand damage of an incident are well documented, as are the limitations of simple perimeter-‐based security products. While many are leaping into the big data opportunity with enthusiasm, the need to build a robust, manageable, and safe solution is paramount. Many vendors are paying lip-‐service to these issues, but few have really understood the scope of the problem or yet endeavored to design and implement a truly protected product. Zettaset has focused on building more comprehensive security and management functionality, and offers a great complementary solution that addresses the inherent risks of Hadoop distribution frameworks.
© 2014 by The Enterprise Strategy Group, Inc. All Rights Reserved.
20 Asylum Street | Milford, MA 01757 | Tel: 508.482.0188 Fax: 508.482.0218 | www.esg-‐global.com