VIEWPOINT
A NEW WAY TO DATA RAVI NAIK, CIO AT SEAGATE TECHNOLOGY, ON DATA GRAVITY AND ITS IMPACT ON DATA STORAGE INFRASTRUCTURE
D
ata gravity affects the entire IT infrastructure; it should be a major consideration when planning data management strategies. It’s important to ensure that no single data set exerts an uncontrollable force on the rest of the IT and application ecosystem. Data is now an essential asset to businesses in every vertical, just as physical capital and intellectual property are. Data growth, with ever-increasing quantities of both structured and 34
CXO INSIGHT ME
SEPTEMBER 2021
unstructured data, will continue at unprecedented rates in the coming years. Meanwhile, data sprawl — the increasing degree to which business data no longer resides in one location but is scattered across data centers and geographies — adds complexity to the challenges of managing data’s growth, movement, and activation. Enterprises must implement a strategy to efficiently manage mass data across cloud, edge, and endpoint environments. And it’s more critical than ever to
develop a conscious and calculated strategy when designing data storage infrastructure at scale. What worked for terabytes doesn’t work for petabytes. As enterprises aim to overcome the cost and complexity of storing, moving, and activating data at scale, they should seek better economics, less friction, and a simpler experience — simple, open, limitless, and built for the data-driven, distributed enterprise. The concept of data gravity is an important element to consider in these efforts. According to the new Seagate-sponsored report from IDC, Future-proofing Storage: Modernising Infrastructure for Data Growth Across Hybrid, Edge and Cloud Ecosystems, as storage associated with massive data sets continues to grow, so will its gravitational force on other elements within the IT universe. Generally speaking, data gravity is a consequence of data’s volume and level of activation. Basic physics provides a suitable analogy: a body with greater mass has a greater gravitational effect on the bodies surrounding it. “Workloads with the largest volumes of stored data exhibit the largest mass within their ‘universe,’ attracting applications, services, and other infrastructure resources into their orbit,” according to the IDC report. A large and active dataset will, by virtue of its complexity and importance, necessarily affect the location and treatment of the smaller datasets that need to interact with it. So, data gravity reflects data lifecycle dynamics and must help inform IT architecture decisions. Consider two datasets: one is 1 petabyte, and the other is 1 gigabyte. In order to integrate the two sets, it is more efficient to move the smaller dataset to the location of the larger dataset. As a result, the storage system with the 1-petabyte set now stores the 1 gigabyte set as well. Because large datasets will “attract” other smaller datasets, large databases tend to accrete data, further increasing their overall data gravity. Managing, analysing and activating data also relies on applications and services, whether those are provided by a private or public cloud vendor or