Metaswitch: Virtualization and Containerization of the Mobile Network by Nicole Drain

February 2018

VIRTUALIZATION AND CONTAINERIZATION OF THE MOBILE NETWORK KEY FOUNDATION OF A 5G NETWORK

CONTENTS

Introduction

Containers—Technical Briefing

Why Are Containers Interesting to Telcos?

Architecture Overview

IMS Core Services

The Demonstration

Start of Day

Instantiation

www.metaswitch.com

Video

Reality Check — Are Containers Ready for Primetime?

Summary

About the Authors

INTRODUCTION As LTE and VoLTE are becoming mainstream, the industry is turning to meet the new challenges presented by 5G, even though the exact details of the relevant standards are still being discussed. The 5G vision is one of empowering the network society through three main use cases: enhanced mobile broadband, massive machine-type communications and ultra-reliable, low-latency communications. These use cases lead to a diverse and often contradictory set of requirements which can only be realized economically through low footprint cloud-native network functions providing discrete network slices. Virtualization and containerization are therefore essential foundations for this new 5G core. NFV leaders such as AT&T® are clear that containers are an important part of their strategy. “We have a very robust and extensive container strategy that’s not just confined to the core network or to our software stack sitting above it,” said Andre Fuetsch, President of AT&T labs and CTO of AT&T. “Not everything is suited for virtual machines,” Fuetsch said. “When you start looking at various parts of the network where you need speed, reliability, redundancy, there’s some benefits you can get from containers that you can’t get from the alternatives.” This paper documents a Proof of Concept project carried out at Telenor Research in Oslo, Norway, with several partners including When you start Metaswitch Networks®, to investigate looking at various the state of the art in virtualization and containerization of the moparts of the network where bile core. This project ultimately you need speed, reliability, demonstrated that it is possible to redundancy, there’s some benvirtualize the entire mobile core, efits you can get from containand that containerization of some ers that you can’t get from the elements is possible today, with alternatives. more being possible soon.

ANDRE FUETSCH

www.metaswitch.com

CONTAINERS TECHNICAL BRIEFING From a technical standpoint, containers are application processes running on a shared kernel, which they all inherit to some degree. They are standard Linux® processes which run in their own, isolated network, process and mount namespaces. Namespaces have been a part of the Linux kernel since 2008, so are nothing particularly new in themselves. The fact that a container runs within its own network namespace, for example, means that the process has its own set of networking interfaces, network devices, IP addresses, routing tables, sockets etc. The container cannot “see” any processes other than itself and any processes it has spawned (or that its children have spawned etc.) Containers can be as lightweight as a single process or as heavyweight as an entire operating system, depending on what the developer has chosen. Containers can share support libraries and even file systems, or have distinct, unique copies where necessary. From an application standpoint, they’re a way of wrapping up an application along with all of its dependencies in such a way that the entire package is distributed as a single, distinct entity. This means they are highly portable across a wide range of different environments, allowing simple continuous integration/continuous development toolchains to be leveraged. For example, the developer builds and tests the container on their own machine. Once the code is committed, the CI/CD environment puts

www.metaswitch.com

the container onto different system test machines to pass through a range of automated regression tests. Once passed, it can then pass to QA testing, then to pre-production testing in the labs of end-customers, and potentially even into production once that pre-production testing is complete. It is the combination of the application and its dependencies that makes that flow easy and frictionless.

Virtual Machine

Provides a single application, isolated from all other applications in other VMs

Container

Have their own generally lighter weight OS user space

It is worth comparing containers with virtual machines, since they appear similar to many. A virtual machine makes use of a hypervisor, such as ESXi from VMware® or KVM which is part of Linux. The only thing running directly in the host operating system of the host server is that hypervisor. The hypervisor creates virtual machines which are entirely independent of each other. Each virtual machine (VM) has its own operating system (OS), termed the guest OS. The guest OS in each VM can be different from each other and from the host OS. Each VM provides a single application, which is entirely isolated from all other applications in other VMs. By contrast, with containers, there is no need for a hypervisor and nor for separate guest operating systems. The applications running in each container inherit a portion of the underlying kernel, but have their own OS user space. This OS user space is gener-

Virtual Machine

APPLICATION A

APPLICATION B

Guest User Space

Guest Kernel

ally more lightweight than that utilized by most VMs, although that is not a fundamental requirement. Containers also benefit from the fact that there are more standard ways to get configuration directly from the host, and to share diagnostics back to the host, which removes some of the function and complexity which VMs and VM orchestrators need to cope with. Although containers do not require a hypervisor, some deployment models do suggest use of containers running within VMs on top of hypervisors. These are potentially interesting to those who already have a VM infrastructure in place, with orchestration and the like already available, since it means they can evolve to using containers without having to rebuild their existing infrastructure, and additionally means that they can run container and VM workloads in the same environment.

APPLICATION A

APPLICATION B

User Space A

User Space B

Layered

Union File System

NW, Process, SH Mem... Hypervisor Host Kernel Space

VIRTUAL MACHINES

www.metaswitch.com

Namespace A Namespace B Host Kernel Space

CONTAINERS

WHY ARE CONTAINERS INTERESTING TO TELCOS? There are several reasons why containers are of interest to Telcos, which are covered below. INCREASED DENSITY Since there is no replication of the guest operating system within each container, and in fact in many cases containers can share much of the operating system, the resource overheads consumed by each container is significantly less than that needed for virtual machines. The exact level of the saving will depend on the exact workloads and how the VMs and containers are implemented, but they can be significant. EFFICIENT SCALING Although not mandatory, use of containers makes it easier to implement VNFs (Virtual Network Functions) as several microservices1, with each microservice being provided by a different type of container. Deploying the microservices in distinct containers allows the VNF to be scaled at a much higher granularity than would be sensible when using VMs. This means the service can be tuned much more optimally to deliver as needed with the most efficient resource usage. However, care must be taken to ensure redundancy is built into the design to avoid adding single points of failure as part of this process.

1 Microservices is a software architecture style in which complex applications are composed of small, independent processes communicating with each other using language-agnostic APIs. These services are highly decoupled and focus on doing a single small task well, facilitating a modular approach to system-building. www.metaswitch.com

SPEED OF DEPLOYMENT Starting a container can be as simple as just starting a new process, which can take seconds. This compares very favorably with starting a VM, which requires several minutes to boot an entire operating system. This allows VNFs to respond dynamically to changes in load rather than having to predict underlying trends and attempt to be ready in advance of significant variations. For example, the speed of startup of a lightweight container allows web-scale companies to provide a unique container to service each incoming request – evidence from Google Cloud Platform™ shows latency of HTTP request processing is normally under 200ms even though that includes the time to create a new container for each request.

7 PORTABILITY AND IMMUTABILITY As was touched on above, the fact that a container packages up the application and all dependencies makes CI/CD toolchains possible, and simplifies the test effort needed at each stage of the deployment process from developer to production network. There are also benefits due to smaller batch sizes. Each microservice container represents a much smaller element of the overall system than previous monolithic implementations. Upgrading just one of these, therefore, represents a much smaller change, and such changes can be made more frequently with lower risk of destabilization.

The combination of some of these factors then brings further benefits. Each microservice provides a well-defined, RESTful API to its peers. The vendor can therefore make changes within the microservice to add some new function or improve performance, and can then provide the new version as a new container. Portability means that operators can rely on this container having no dependencies which are not provided for. The microservice nature and constant RESTful API means upgrading the container can be much less impactful than upgrading the entire VNF, allowing faster flow through of the new elements into production networks.

ORCHESTRATION Although there has been significant progress in the management and orchestration space for virtual machines, it is still immature and complex. Containers are much simpler than VMs, with far fewer management and configuration requirements. As a result, container orchestration is a much simpler problem to solve, and a number of solutions exist, with the open-source Kubernetes™ from Google® generally considered to be the most advanced and becoming a de facto standard. Red Hat® OpenShift Container Platform is the leading enterprise distribution of Kubernetes optimized for continuous application development and multi-tenant deployment. Red Hatis a leading contributor to the Kubernetes project and Cloud Native Computing Foundation.

Containers also provide the potential for cloud-bursting. Here, Telcos are able to temporarily expand into public or private cloud infrastructure when they are faced with more load than their normal deployment is able to support. That could be a spike caused by Superbowl or Mother’s Day, for example, or even response to a disaster at one of the normal deployment sites. In this latter case, the operator is effectively able to use the public cloud as a disaster recovery site, at much lower ongoing cost than maintaining an entire redundant site. This is only possible because of the speed with which containers will come online in the public cloud, and because their portability makes deployment in public clouds realistic.

PORTABILITY AND IMMUTABILITY

EFFICIENT SCALING INCREASED DENSITY

www.metaswitch.com

SPEED OF DEPLOYMENT

ORCHESTRATION

ARCHITECTURE OVERVIEW This Proof of Concept implemented a Mobile Network using Virtualization and Containerization by integrating five different vendor solutions, as shown in Figure 1.

METASWITCH

ALTIOSTAR

AFFIRMED NETWORKS

OPENET

vIMS

vBS

vEPC

vPCRF

Guest OS

Red Hat OpenShift Container Platform Red Hat Enterprise Linux (RHEL)

Hypervisor KVM

RED HAT

OpenStack Platform

Red Hat Enterprise Linux (RHEL) Hardware

All elements were deployed as VNFs in an NFV environment. The NFV environment itself was Red Hat OpenStack® Platform. All elements were simple virtual machines apart from the elements of the IMS core, which were containers running inside virtual machines on a container engine.

Figure 1

INTERNET

In detail, the elements were: • Virtualization Environment – Red Hat OpenStack Platform • Container Engine – Red Hat OpenShift Container Platform • Virtual RAN – Altiostar® • Virtual EPC – Affirmed Networks® • Virtual PCRF – Openet® • Virtual IMS Core – Metaswitch Networks As shown in Figure 2, the NFVI utilized for the project were Intel® High Performance Compute (HPC) Servers for the allocation of the Red Hat OpenStack Platform and Red Hat OpenShift Container Platform environment, HP® servers for internal management and accessibility (VPN) duties. The switches were HP High-Density, Ultra-Low-Latency mix of 10Gbe and 40Gbe devices, Top-of-Rack (ToR) switches. Lastly, the cloud RAN used a remote radio head (RRH) with a baseband unit (BBU) connecting it into the NFV cloud. www.metaswitch.com

HP Intel Figure 2

The OpenStack word mark and the Square O Design, together or apart, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission. Red Hat, Inc. is not affiliated with, endorsed by, or sponsored by the OpenStack Foundation or the OpenStack community.

INTERNET

METASWITCH

RED HAT Red Hat OpenShift Container Platform

vMME

vPCRF

P-CSCF A

AFFIRMED NETWORKS

OPENET

BBU

P-CSCF B

ALTIOSTAR

vS-GW

vP-GW

IMS

AFFIRMED NETWORKS

Container Engine

RED HAT

Red Hat Enterprise Linux (RHEL)

Hypervisor KVM

Red Hat OpenStack Platform

Red Hat Enterprise Linux (RHEL)

The above diagram shows a more detailed view of the key elements. The key point to note from this diagram is the IMS core elements, except for the P-CSCF, are deployed as containers rather than virtual machines. See the later discussion on a reality check on container technology for Telcos for details of why the P-CSCF is a special case here – specifically the section on networking support. IMS CORE MICROSERVICES The microservices deployed in independent containers for the proof of concept were: • The SIP routing microservice. • The HSS proxy microservice. • A redundant, distributed timer database. • An in-memory open-source database (Memcached) coupled with a replication layer. • A file-based, open-source, distributed database (Cassandra®). • An open-source configuration distribution service (etcd). Readers familiar with container technology may be wondering about the role of Red Hat OpenShift Container Platform. Red Hat OpenShift Container Platform is a secure, enterprise-grade container platform that combines the industry-leading container orchestration engine with advanced application build and delivery automation features that can span infrastructures - private, public, and hybrid. Red Hat OpenShift Container Platform is built on open source innovation and industry standards, including Kubernetes for container orchestration and Red Hat Enterprise Linux. The relationship is similar to that between Red Hat Enterprise Linux and Fedora or Red Hat OpenStack Platform and Red Hat Distribution of OpenStack Platform (RDO). www.metaswitch.com

CLEARWATER CORE

THE DEMONSTRATION The culmination of the Proof of Concept project was a presentation and live demonstration of the test deployment to a diverse range of senior management and operators from various parts of the global Telenor organization. The pretext of the demonstration was a wilderness festival in one of the less populated regions of Norway, away from mainstay regions of mobile coverage. However, the thousands of international visitors to the wilderness festival represent a significant revenue opportunity for an enterprising mobile operator. The demonstration centered around instantiating, scaling and healing the containerized elements of the IMS core. The remainder of the virtualized mobile core was left operational. This was because the focus of the demonstration was on the manageability and speed of containerized VNFs. START OF DAY Unsurprisingly, with no IMS core instantiated, any attempt to place a mobile call in the site of the wilderness festival was unsuccessful. INSTANTIATION Using a pre-packaged script to drive the Red Hat OpenShift Container Platform engine, the various VNFIs required to bring up the IMS core were brought into being in an impressive 3 minutes 47 seconds. Compare this to the months if not years that have been required to install and commission an IMS core! Whilst there is still design and test effort that needs to be accounted for outside of that, this represents many orders of magnitude improvement in what was previously possible. The VNFIs are architected such that they do not need to be created in any particular order. If there are interdependencies between VNFIs which are not fulfilled as they spin up, they merely retry after a delay. www.metaswitch.comÂ

Clearwater is an implementation of IMS built using web development methods to provide voice, video and messaging services to millions of users. Architected from the ground up for massively scalable deployments within virtualized public or private elastic compute clouds, Clearwater combines the economics of over-the-top (OTT) style service platforms with the standards compliance and reliability expected of telco-grade communications network solutions. The web services-oriented design inherent to Clearwater makes it ideal for instantiation within NFV (network functions virtualization) environments as a virtualized network function (VNF). Clearwater was designed from the ground up to be optimized for deployment in virtualized cloud environments. It leans heavily on established design patterns for building and deploying massively scalable web applications, adapting these design patterns to fit the requirements of SIP and IMS. Clearwater was built in a manner that enables all components to scale out horizontally using simple, stateless, load-balancing. Long-lived state is not stored on individual nodes, avoiding the need for complex data replication schemes. Instead, long-lived state is stored in cloud-optimized clustered storage technologies such as Apache Cassandra and Memcached. Characteristic of innovative internet software architectures, interfaces between the front-end SIP components and the back-end services use RESTful web services APIs, while interfaces between the various components use connection pooling with statistical recycling of connections. This guarantees that traffic loads are effectively spread evenly as nodes are added and removed from each layer. Clearwater Core is helping revolutionize the telecommunications marketplace by easing the transition to new software-defined service function chains that are uniquely flexible, resilient and scalable.

This approach avoids the need for complex scripts to ensure nodes are brought up in a specific order. This is just one of the many features which differentiates truly cloud-native VNFs from those which have merely been ported from the physical domain as-is. The new IMS core was immediately pressed into service by showing that the same test call which had failed just a few minutes previously now succeeds.

SCALING AND HEALING The simplicity of the Red Hat OpenShift Container Platform orchestrator was demonstrated in a couple of ways. The key workhorse of the IMS core, the SIP routing microservice, was scaled up using both a scripting interface and then through the intuitive GUI. This was to simulate the mobile operator growing the resources available to the IMS core as more people arrive at the festival. The GUI clearly shows both the actual deployment state and the desired state, as shown by the below screenshot.

The demo also showed healing by the Red Hat OpenShift Container Platform engine, as one of the running subscriber database containers was destroyed to simulate a failure of some kind. The Red Hat OpenShift Container Platform automatically brought a new container into being to replace the failed one and bring the deployment back to its desired state. VIDEO An abridged video of the PoC demo that was conducted in Oslo on 12th October 2017 can be viewed at the following link for further details. https://youtu.be/fgbjcqaf87c

Screen capture of Red Hat OpenShift Container Platform (RHOCP) www.metaswitch.comÂ

REALITY CHECK ARE CONTAINERS READY FOR PRIME TIME?

There are several concerns which have been raised in relation to use of containers in production Telco networks, and these are discussed below. GENERAL MATURITY Although it has been five years since the initial white paper which brought Network Functions Virtualization to the Telco world, it has only been in the last couple of years that production networks based upon virtual machines have become commonplace. Virtualization as a concept had existed in the Enterprise IT space for over a decade prior to the publication of that NFV White Paper, which demonstrates how cautious Telcos can be when it comes to adopting new technology like this. Containers have been a part of the Enterprise IT space for nearly ten years, with adoption growing over that period, and so are still considered slightly bleeding edge for some. This is an issue only time and proven experience can address. ORCHESTRATION COMPLEXITY The management and orchestration of containers is fundamental. Kubernetes is today the main tool used in that regard. It supports large numbers of compute nodes, containers and topologies2. However, the higher the number of containers, the higher the orchestration complexity. Using VMs may considerably reduce the number of containers. Therefore, the design should be balanced and target a reasonable complexity.

INTEGRATION WITH THE NFV ORCHESTRATOR Operators should expect environments with both containers and VMs interacting with each other in order to provide end-to-end services. In that scenario, the integration of the conventional NFV orchestrator such as OSM and Containers Orchestrators like Kubernetes is still an open challenge that demands more study and development efforts. NETWORKING SUPPORT Enterprise workloads typically only require a single IP address, and so existing container orchestrators are only equipped to configure them in this way. Telco workloads are different, with operators typically wanting separation between signaling and management traffic, and certainly requiring demarcation at the edge of their networks from elements such as session border controllers. Although containers can be set up with multiple interfaces to meet this need, the orchestrators are only just catching up. Notably, the multus3 container network interface project from Intel is making significant progress in this area for Kubernetes.

2 Up to 5,000 compute nodes, 100 containers per node, and no more than 300,000 containers in total. See https://Kubernetes.io/docs/admin/cluster-large/ 3 See https://github.com/Intel-Corp/multus-cni www.metaswitch.comÂ

SPECIALIZED HARDWARE SUPPORT

PERFORMANCE IMPACT OF RUNNING IN CONTAINERS

Community container orchestrators provide no way to identify specific hardware capabilities or configuration. Telco workloads may benefit from access to DSPs for transcoding, or SR-IOV for accelerated media relay, but the container orchestrator cannot provide these. The Enhanced Platform Awareness (EPA)4 project in Kubernetes™ is seeking to address this concern.

Many operators are familiar with the concept of a “virtualization tax” incurred when applications are run on top of a hypervisor as opposed to in a bare metal environment. This tax results from the extra processing which the hypervisor needs to perform that just doesn’t exist when the application is running on bare metal. There is a concern that containerization comes with its own such tax. Evidence to date is that there is only negligible performance impact in most cases, although it can become relevant in some more esoteric networking scenarios.

RELIABLE PERFORMANCE

SECURITY

As it stands, container orchestrators have no mechanism to support features such as core pinning and isolation. These are features used in the VM space to ensure certain workloads are given reserved access to resources in order to ensure that they can handle real-time workloads in a prompt fashion, and thus not introduce latency to media flows. Another Intel project, CPU-Manager-for-Kubernetes™ 5 (CMK) seeks to close this gap. It guarantees high priority workloads are pinned to exclusive cores, which improves throughput and addresses the “noisy neighbors” issue which can otherwise result, where different workloads interact to reduce performance.

Concerns have been expressed that the attack surface of a container is both different and greater than that for a virtual machine. To date, insufficient analysis and development has been done on this topic, although 2018 is becoming widely acknowledged as “the year of container security”. There is no fundamental reason why containers should be any less secure than VMs, but it is true that more work needs to be done to both achieve that, and to prove that it is the case. That work is increasingly getting priority and moving forwards.

4 See https://github.com/Kubernete-incubator/node-feature-discovery 5 See https://github.com/Intel-Corp/CPU-Manager-For-Kubernetes www.metaswitch.com

SUMMARY Containers represent the only viable route for Telcos to build and scale networks which can deliver the diverse and contradictory requirements which 5G use-cases will place on them. This is a technology which Telcos must embrace if they are going to be able to respond to the demands of a 5G world. Although we are not quite ready for fully containerized, cloud-native telco deployments, this proof of concept shows that those deployments will be realizable in the not too distant future. Earlier this year, Industry Analyst 451 Research6 forecast that the container movement is set to grow from $762 million in revenues in 2016 to nearly $2.7 billion in 2020! The paper considers the areas where further work is needed to go from proof of concept to production, and points to promising developments which show a strong and clear direction of travel from a wide ecosystem of players.

6 See https://451research.com/images/Marketing/press_releases/Application-container-market-will-reach-2-7bn-in-2020_final_graphic. pdf www.metaswitch.comÂ

ABOUT THE AUTHORS MIKE DELL, SENIOR DIRECTOR OF PLM METASWITCH NETWORKS Mike Dell is Senior Director of Product Line Management at Metaswitch Networks. He is lead Product Manager for the Clearwater IMS Core. He joined Metaswitch in 2002 and has held roles in Software Engineering, Sales and PLM. DR. ANDRES GONZALEZ, RESEARCH SCIENTIST TELENOR GROUP Dr. Andres Gonzalez is a Research Scientist at Telenor Group, working actively in several NFV and SDN Proof of Concepts (PoC) and trials, as well as contributing with the specification of diverse RFQ on new Telenor Deployments towards 5G. DR. PÅL GRØNSUND , RESEARCH SCIENTIST TELENOR GROUP Dr. Pål Grønsund is a Research Scientist at Telenor doing research towards 5G with a main focus on NFV, SDN, Cloud and Network Slicing. He participates in several international research projects, and is Vice Chair in the Open Source MANO (OSM) project. He holds deep experience in Cloud implementation projects. Pål is an experienced speaker and takes part in academic and commercial conferences as well as other venues. He is the author of several publications in highly ranked journals, conferences and books.

www.metaswitch.com

FURTHER INFORMATION

Metaswitch Networks’ CTO, Martin Taylor, spoke recently on the subject of containers at the SDN/NFV Conference in The Hague. A recording of the presentation can be found at the following link. https://play.webvideocore.net/popplayer.php?it=cxhaomwqlz40&utm_content=62749458&utm_medium=social&utm_ source=linkedin See also the Metaswitch Networks’ White Paper “The Application of Cloud-Native Design Principles to Network Functions Virtualization”, which can be downloaded at the below link. https://www.metaswitch.com/resources/the-application-of-cloud-native-design-principles-to-network-functions-virtualization