“Consider data as if it were a planet or other object with sufficient mass.” The term Data Gravity was coined back in 2010 in a blog post by Dave McCrory. Back then, McCrory worked as the VP of engineering at GE Digital. Today, he works as the VP of growth and global head of insights and analytics at Digital Realty - thanks, largely I imagine, to that blog post.
In the past couple of years, data gravity has become one of the most high-profile trends in the data centre and cloud industries, driven largely by Digital Realty’s own spirited push to get people to pay attention to the phenomenon.
The concept, McCrory wrote in 2010, compares the impact of increasingly large data sets to objects with a physical mass. “As data accumulates (builds mass) there is a greater likelihood that additional services and applications will be attracted to this data. This is the same effect gravity has on objects around a planet. As the mass or density increases, so does the strength of gravitational pull.”
More than a decade later, the digital economy is feeling that pull more than ever. With the advent of the internet of things (IoT), the way that data flows through the air and connective fibres of the modern world is changing.
By 2025, connected devices alone will generate an estimated 79 zettabytes of information. If you take into consideration the fact that, in 2016, the entire volume of all data on earth amounted to just 18 zettabytes - or about 720bn Blu Ray copies of Blade Runner: the Final Cut - it all starts to get a little mind boggling.
The source of this monumental growth in data creation is largely thanks to the IoT. In a pre-IoT era, the majority of data was created by people - think emails, social media posts, and high-resolution jpegs of Jeff Goldblum lying on that table in Jurassic Park. In an age where there were an estimated 11.7bn connected devices active in the world at the end of 2020 - a figure which is expected to almost triple in the next decade - people are no longer the ones making all the data.
In the Grip of Data Gravity
All this data - being generated by smart water meters, smart doorbells, even smart screwdrivers - is collected as part of massive data sets which, as McCrory noted over a decade ago, are very hard to move around.
Data centres are seeing demand rise to unprecedented levels as a result, which is driving the increased adoption of public cloud, spectacular growth across the hyperscale industry, and the expansion of edge networks in order to create the necessary tools to capitalise on the vast amounts of information being generated.
AI will be essential in transforming huge modern data sets into useful insights, as well as parsing which data to keep and, perhaps most importantly, where it should be kept. If enterprises can achieve a synergistic relationship between their data and their AI applications, the possibilities are virtually endless. However, finding the right strategy to apply and execute using AI is not always a clear cut issue.
Where the AI lives
“All AI starts with data, and analysing it requires a lot of accelerated computing. Most companies adopting AI tend to take a mixed approach,” says Charlie Boyle, VP and GM of DGX Systems at NVIDIA, adding that the growth of IoT is driving adoption of this mixed approach, in which decision-makers use the “public cloud, like AWS, Azure and Google Cloud, and private clouds in on-premises servers to deliver applications with lower latency, in industry parlance, to customers and partners while maintaining security by limiting the amount of sensitive data shared across networks.”
Boyle explains that, as customers increasingly adopt AI, they’re beginning to look at not just how to run their workloads, but also how to optimise and automate them. Finding the right place in which to run an AI workload - whether it’s deep learning model-training, or inference, which applies the trained model to an application - is a key problem to solve in order to increase efficiency. “Luckily,” says Boyle, “running AI workloads creates telemetry data that could be used to automate decision making and boost efficiency.”
This telemetry data can then be used to monitor efficiency, data location and system availability. Combining it with historical data can allow enterprises to predict future performance outcomes in order to understand the best place to run their AI workloads. “For example, if a model predicted that a massive workload will take two weeks to run on one platform, but could be completed in 48 hours using another, that information could help a data science team communicate the resources they need for a project to be successful, while also making sure they meet their deadlines,” he continues. “Without these predictions, teams are stuck making educated guesses and manual choices without real insight on the costs, time to solution, and the expected outcome.”