13 minute read
The DPU dilemma
The DPU dilemma: life beyond SmartNICs
There’s a major shift happening in server hardware, and it’s emerged from a surprising direction: the humble network card
Peter Judge Global Editor
We are near the start of the next major architectural shift in IT infrastructure,” says Paul Turner, vice president, product management vSphere at VMware in a blog. He’s talking about new servers which are built with maximum programmability in a single cost-effective form factor.
And what made this revolution possible is a simple thing: making network cards smarter. The process began with the Smart network interface card or SmartNIC - and has led to a specialized chip: the DPU or data processing unit - an ambiguouslynamed device with a wide range of applications.
“As these DPUs become more common we can expect to see functions like encryption/decryption, firewalling, packet inspection, routing, storage networking and more being handled by the DPU,” predicts Turner.
The birth of SmartNICs Specialized chips exist because x86 family processors are great at general purpose tasks, but for specific jobs, they can be much slower than a purpose-built system. That’s why graphics processor chips (GPUs) have boomed, first in games consoles, then in AI systems.
“The GPU was really designed to be the best at doing the math to draw triangles,” explains Eric Hayes, CEO of Fungible, one of the leading exponents of the new network chips. “Jensen Huang at Nvidia was brilliant enough to apply that technology to machine learning, and realize that architecture is very well suited to that type of workload.”
Like GPUs, SmartNICs began with a small job: offloading some network functions from the CPU, so network traffic could flow faster. And, like GPUs, they’ve eventually found themselves with a wide portfolio of uses.
But the SmartNIC is not a uniform, one-size-fits-all category. They started to appear as networks got faster, and had to carry more users’ traffic, explains Baron Fung, an analyst at Delloro Group. “10Gbps is now more of a legacy technology,” Fung explained, in a DCD webinar. “Over the last few years we've seen general cloud providers shift towards 25 Gig, many of them are today undergoing the transition to 400 Gig.”
At the same time, “cloud providers need to consolidate workloads from thousands and thousands of end users. SmartNICs became one of the solutions to manage all that data traffic."
Servers can get by with standard or “foundational” NICs up to around 200Gbps, says Fung. “Today, most of the servers in the market have standard NICs.”
Above that, network vendors have created “performance” NICs, using specialized ASICs to offload network functions, but SmartNICs are different.
“SmartNICs add another layer of performance over Performance NICs,“ says Fung. “Essentially, these devices boil down to being fully programmable devices with their own processor, operating system, integrated memory and network fabric. It’s like a server within a server, offering a different range of offload services from the host CPU.”
It’s a growth area: “SmartNICs are relatively small in numbers now, but in the next few years, we see increasing adoption of these devices.”
SmartNICS are moving from a specialist
market to more general use: “Today, most of the smart devices are exclusive to cloud hyperscalers like Amazon and Microsoft, who are building their own SmartNICs for their own data centers,” says Fung. “But as vendors are releasing more innovative products, and better software development frameworks for end users to optimize their devices, we can see more adoption by the rest of the market, as well.” SmartNICs will grow at three percent annual growth over the next few years, but will remain a small part of the overall market, because they are pricey: “Today, they are a three to five times premium over a standard NIC. And that high cost needs to be justified.”
In a general network application, SmartNICs can justify their cost by making networks more efficient. “They also prolong the life of infrastructure, because these smart devices can be optimized through software. It's really a balance, whether or not a higher price point for SmartNICs is justifiable.”
But there is potential for confusion, as different vendors pitch them with different names, and different functions. Alongside SmartNICs and DPUs, Intel has pitched in with the broadly similar infrastructure processing unit (IPU). “There’s many different acronyms from different vendors, and we’ve seen vendors trying to differentiate with features that are unique to the target applications they're addressing,” says Fung.
Enter Fungible One of those vendors is Fungible. The company is the brainchild of Pradeep Sindhu, a formidable network re-builder and former Xerox PARC scientist, who founded Juniper Networks in 1996.
Juniper was based on the idea of using special purpose silicon for network routers, instead of using software running on general purpose network switches. It rapidly took market share from Cisco.
In 2015, Pradeep founded Fungible, to make special purpose devices again - only this time making network accelerators that he dubbed “data processing units” or DPUs. He’s now CTO, and the CEO role has been picked up by long-time silicon executive Eric Hayes.
Hayes says the Fungible vision is based on the need to move more data from place to place: “There's data everywhere, and everybody's collecting data and storing data. And the question really comes down to how do you process all that data?”
Equinix architect Kaladhar Voruganti gives a concrete example: “An aeroplane generates about 4.5 terabytes of data per day per plane. And if you're trying to create models or digital twins, you can imagine the amount of data one has to move,” says Voruganti, who serves in the office of the CTO at Equinix.
CPUs and GPUs aren’t designed to help with the tasks of moving and handling data, says Hayes: When you start running those types of workloads on general purpose CPUs or GPUs, you end up being very inefficient, getting the equivalent of an instruction per clock. You're burning a lot of cores, and you're not getting a lot of work done for the amount of power that you're burning.”
Hayes reckons there’s a clear distinction between SmartNICs, and DPUs which go beyond rigid network tasks: “DPUs were designed for data processing. They're designed to do the processing of data that x86 and GPUs can't do efficiently.”
He says the total cost of ownership benefit is clear: “it really comes down to what is the incremental cost of adding the GPU to do those workloads versus the amount of general purpose processing, you'd have to burn otherwise.
According to Hayes, the early generations of SmartNICs are “just different combinations of arm or x86 CPUs, with FPGAs and hardwired, configurable pipelines. They have a limited amount of performance trade off for flexibility.”
By contrast, Fungible’s DPU has “a custom designed CPU that allows a custom instruction set with tightly coupled hardware acceleration. So the architecture enables flexibility and performance at the same time.“
The Fungible chip has a MIPS 64 bit RISC processor with tightly coupled hardware accelerators: “Tightly coupled hardware accelerators in a data path CPU: this is the definition of a DPU.”
The DPU can hold “a very, very efficient implementation of a TCP stack, with the highest level of instructions per clock available, relative to a general purpose CPU.”
What do DPUs do? DPUs make networked processing go faster, but Fungible is looking at three specific applications which shake up other parts of the IT stack.
The first one is the most obvious: speeding up networks.
Networks are increasingly implemented in software, thanks to the software defined networking (SDN) movement. “This goes all the way back to the days of Nicira [an SDN pioneer bought by VMware],” says Hayes. SDN networks make the system more flexible by handling their functions in software. But when that software runs on general purpose processors, it is, says Hayes, “extremely inefficient.”
SmartNICs take some steps to improving SDN functionality, Hayes says, but “not not at the level of performance of a DPU.”
Beyond simple SDN, SmartNICs will be essential in more intelligent network ecosystems, such as the OpenRAN (open radio access network) systems emerging to get 5G delivered.
Rewriting storage The next application is much more ambitious. DPUs can potentially rebuild storage for the data-centric age, says Hayes, by creating running memory access protocols over TCP/IP and offloading that, thus creating, “inline computational storage.”
NVMe, or non-volatile memory express, is an interface designed to access flash memory, usually attached by the PCI express bus. Running NVMe over TCP/ IP, and putting that whole stack on a DPU, offloads the whole memory access job from the CPU, and means that flash memory no longer has to be directly connected to the CPU.
“The point of doing NVMe over TCP is to be able to take all of your flash storage out of your server,” says Hayes. “You can define a very simple server with a general purpose x86 for general purpose processing, then drop in a DPU to do all the rest of the storage work for you.”
As far as the CPU is concerned, “the DPU looks like a storage device, it acts like a storage device and offloads all of the drivers that typically have to run on the general purpose processor. This is a tremendous amount of work that the x86 or Arm would have to do - and it gets offloaded to the GPU, freeing up all of those cycles to do the purpose of why you want the server in the first place.”
Flash devices accessed over TCP/IP can go from being local disks, to become a centralized pooled storage device, says Hayes. “That becomes very efficient. It’s inline computational storage, and that means we can actually process the coming
into storage or going back out. In addition to that, we can process it while it's at rest. You don't need to move that data around, you can process it locally with a GPU.”
Speeding GPUs In a final application DPUs meet the other great offload workhorse, the GPU, and helps to harness them better - because after all, there’s a lot of communication between CPUs and GPUs.
“In most cases today, you have a basic x86 processor that’s there to babysit a lot of GPUs,” says Hayes. “It becomes a bottleneck as the data has to get in and out from the GPUs, across the PCI interface, and into the general purpose CPU memory.”
Handing that communication task over to a DPU lets you “disaggregate those GPUs,” says Hayes, making them into separate modules which can be dealt with at arm's length. “It can reduce the reliance on the GPU-PCI interface, and gives you the ability to mix and match whatever number of GPUs you want, and even thinslice them across multiple CPUs.”
This is much more efficient, and more affordable in a multi-user environment, than dedicating sets of GPUs to specific x86 processors, he says.
A final use case for DPUs is security. They can be given the ability to speed up encryption and decryption, says Hayes, and network providers welcome this. “We want to ensure that the fabric that we have is secure,” says Voruganti.
Easier bare metal for service providers and enterprises Equinix is keen to use DPUs, and it has a pretty solid application for them: Metal, the bare metal compute-on-demand service it implemented using technology from its recent Packet acquisition.
In Metal, Equinix offers its customers access to physical hardware in its facilities, but it wants to offer them flexibility. With DPUs, it could potentially allow the same hardware to perform radically different tasks, without physical rewiring.
“What I like about Fungible’s solution is the ability to use the DPU in different form factors in different solutions,” says Voruganti. “I think in a softwaredefined composable model, there will be new software to configure hardware for instance as AI servers, or storage controller heads or other devices
“Instead of configuring different servers with different cards and having many different SKUs of servers, I think it will make our life a lot easier if we can use software to basically compose the servers based on the user requirements.”
That may sound like a fairly specialized application, but there are many enterprises with a similar need to bare-metal service providers like Equinix.
There’s a big movement right now under the banner of “cloud repatriation," where disillusioned early cloud customers have found they have little control of their costs when they put everything into the cloud. So they are moving resources back into colos or their own data center spaces.
But they have a problem, says Hayes. “You’ve moved away from the uncontrolled costs of all-in cloud, but you still want it to look like what you’ve been used to in the cloud.”
These new enterprise implementations are “hybrid," but they want flexibility. “A lot of these who started in the cloud, haven't necessarily got the networking infrastructure and IT talent of a company that started with a private network,” says Hayes. DPU-based systems, he says, “make it easy for them to build, operate, deploy these types of networks.”
Standards needed But it’s still early days, and Voruganti would like the vendors to sort out one or two things: “We're still in the initial stages of this, so the public cloud vendors have different flavors of quote-unquote smartNICs,” he says.
“One of the things that operators find challenging is we would like some standardization of the industry, so that there is some ability for the operator to switch between vendors for supply chain reasons, and to have a multi vendor strategy.”
Right now, however, with DPU and SmartNIC vendors offering different architectures, “it is an apples-to-oranges comparison among SmartNIC vendors.”
With some definitions in place, the industry could have an ecosystem, and DPUs could even become a more-or-less standard item.
Power hungry DPUs? He’s got another beef: “We’re also concerned about power consumption. While vendors like Fungible work to keep within a power envelope, we believe that the overall hardware design has to be much more seamlessly integrated with the data center design.”
Racks loaded with SmartNICs are “increasing the power envelope,” he says. “We might historically have 7.5kW per rack, in some cases up to 15 kilowatts. But we're finding with the new compute and storage applications the demand for power is now going between 30 to 40kW per rack.
It’s no good just adding another power-hungry chip type into a data center designed to keep a previous generation of hardware cool: “I think the cooling strategies that are being used by these hardware vendors have to be more seamlessly integrated to get a better cooling solution.”
Equinix is aiming to bring special processing units under some control: “We’re looking at the Open19 standards, and we’re starting to engage with different vendors and industry to see whether we can standardize so that it's easy to come up with cooling solutions.”
Standards - or performance? Hayes takes those points, but he isn’t that keen on commoditizing his special product, and says you’ll need specialist hardware to avoid that overheating: “It's all about software. In our view, long term, the winner in this market will be the one that can build up all those services in the most efficient infrastructure possible. The more efficient your infrastructure is, the lower the power, the more users you can get per CPU and per bit of flash memory, so the more margin dollars you're gonna make.”
Fung, the analyst, can see the difficulties of standardization: “It would be nice if there can be multi-vendor solutions. But I don't really see that happening, as each vendor has its own kind of solution that's different.”
But he believes a more standardized ecosystem will have to emerge, if DPUs are to reach more customers: “I’m forecasting that about a third of the DPU market will be in smaller providers and private data centers. There must be software development kits that enable these smaller companies to bring products to market, because they don’t have thousands of engineers like AWS or Microsoft.”