Ethernet: The Key Catalyst for the Network Convergence Inside the Data Center VARADHARAJAN PARTHSARATHY Director, Technology, Data Communication Products
RAJKUMAR PAULRAJ Director, Technology, Data Communication Products
www.aricent.com
ETHERNET: THE KEY CATALYST FOR NETWORK CONVERGENCE INSIDE THE DATA CENTER Motivated by a strong need for improved operations, like load balancing and the availability of multiple paths within data centers, several alternatives to converge multiple application-specific data center networks have been proposed. This paper attempts to demystify these new standards and alternatives to help telecom equipment manufacturers (TEMs) understand and choose the best options for data center network convergence, and overcome the challenges of implementing them. INTRODUCTION Data centers are looking to converge multiple applicationspecific networks (e.g., Ethernet-based LAN for data and fiber channel for storage networks) over a single Ethernet network. Several extensions to Ethernet have been proposed by IEEE in order to provide the lossless, guaranteed, low-latency services critical to data centers. Some of these protocols like Enhanced Transmission Selection (ETS), Priority Flow Control (PFC), Congestion Notification (CN), and Data center Bridging Exchange (DCB-X) need enhancements in the hardware and control plane software of data center switches. In addition to this, enhancements
Storage area networks handle traffic between the client server and storage devices, and among storage devices. Such traffic requires fast and reliable delivery. Fiber channels support all these characteristics and are therefore the media for deployment in SAN networks. The increased need for virtualization and cluster computing has led to a high amount of I/O traffic (e.g., moving virtual machines across physical servers requires high bandwidth and low latency). InfiniBand technology, with its high reliability, low latency, and high bandwidth is an ideal fit for such traffic. In addition to this, data centers include data traffic, for which Ethernet is the best fit.
and alternatives to to the Spanning Tree Protocol (STP) for layer
Today’s data centers have grown into a complex milieu of
2 switching, like Transparent Interconnection of Lots of Links
technology standards and protocols. Each technology innovation
(TRILL) and Shortest Path Bridging (SPB), have also been
serves its individual intended purpose, and combined with other
proposed. This rapid development in data center technologies
technology creates enormous overheads in terms of hardware
has created a critical need for understanding the various options
acquisition costs, data center operations, administration and
available and identifying the best route to convergence for
maintenance, and network cabling. Besides these, interoperability
enabling better data center operations.
and new-deployment issues constantly add pressure to the already escalating total cost of ownership of data center management and maintenance.
THE DATA CENTER ENVIRONMENT There are typically three different types of traffic in a data center environment:
> Storage Area Network (SAN) traffic > I/O traffic > Data traffic
NETWORK CONVERGENCE INSIDE THE DATA CENTER The increasing pressure to improve service levels, increase availability, and reduce costs is driving organizations toward better manageability of their IT environment. Being the central
Ethernet: The Key Catalyst for Network Convergence inside the Data Center
1
location for shared computing resources, the data center is the
service and relies on the transport layer to provide reliability.
single most concentrated, complex, and strategic component
To overcome this limitation, DCB adaptations to Ethernet
of the IT environment, and therefore is the starting point for
have been made to ensure bandwidth allocation for all traffic
pursuing reduced complexity and better manageability. The
types, reliable delivery, and congestion avoidance; thereby
convergence of data center technologies for SAN and LAN
overcoming the traditional limitations and making Ethernet
traffic is a significant effort in this direction.
a medium of choice for the converged data center. These
Convergence of the SAN and LAN in the data center cuts costs due to the use of server adapters, reduced cabling, and a cheaper
enhancements ensure that the Ethernet medium remains lossless, and ensures low latency for specific kinds of traffic.
access layer, without any need to retrofit the existing installed base of LAN and SAN. Besides these benefits, a consolidated
Priority Flow Control (PFC)
data center offers an easier operational model. The large strides
The IEEE 802.3x PAUSE frames provide a mechanism for
taken by Ethernet in bandwidth and reliability, coupled with
feedback in Ethernet networks when a link gets congested.
its cost effectiveness, have led to the growing adoption of data
However, the complete traffic on a link is blocked when PAUSE
center Ethernet standards and protocols for convergence.
frames are generated. PFC, as defined by the standard 802.1Qbb, provides a mechanism by which PAUSE frames can be generated on a priority basis (class of service). PFC
SAN A
LAN
SAN B
makes Ethernet lossless by ensuring that high-priority traffic like storage is non-blocking, while low-priority network traffic like data can be blocked when the link bandwidth is exceeded.
Enhanced Transmission Selection (ETS) The purpose of ETS is to maximize link utilization by removing the bandwidth constraints. ETS is defined by the standard IEEE 802.1Qaz. ETS extends the traditional 802.1p strict FCoE Switch
FCoE Switch
priority to support a flexible drop-free scheduler that can prioritize traffic on the basis of traffic classes. In converged Ethernet environments, it is imperative for traffic of different types to flow without any restrictions. ETS achieves this by providing the following functionalities:
Blade Chasis
Server with CNAs
> Grouping traffic of the same priority into traffic groups termed as priority groups (PG) (e.g., SAN, IPC traffic can be defined as a single group). ETS defines 15 PG groups with PG
Data Center Bridging and FCoE
Ethernet
group 15 having no bandwidth restrictions
Fiber Channel (FC)
Consolidated Data Center
> Assigning priorities to the traffic groups > Allocating bandwidth as a percentage of the total link bandwidth and defining scheduling parameters
There are three enabling technologies that make this convergence possible:
> Ethernet enhancements - Lossless Ethernet (PFC) - Bandwidth Management (ETS) - Congestion Management (CN) - Consistent Management of Configurations (DCB-X)
> Fiber Channel over Ethernet (FCoE) > Advanced data center protocols
Congestion Notification (CN) The congestion notification 802.1 QAU provides a mechanism for the traffic sink to provide feedback to the traffic source, to reduce congestion. This ensures congestion is not only controlled at the link level but also at the network edge itself. CN ensures that PFC does not get triggered repeatedly. The CN algorithm consists of two entities:
> Congestion Point (CP): The congestion point is essentially a switch that samples incoming packet buffers and generates CN Messages (CNMs) during congestion.
ETHERNET ENHANCEMENTS Ethernet was not the technology of choice for data center
> Reaction Point (RP): The reaction point is the source that
environments in the past due to its bandwidth limitations. But
generates traffic of different priorities, listens to CNMs,
with the evolution of Ethernet technology from 10G to 100G,
and slows down the rate of traffic on receiving CNMs. In
it now supports higher transmission rates and bandwidth
the absence of CNMs, it increases the rate of traffic in a
requirements. However, Ethernet still remains a best-effort
phased manner.
Ethernet: The Key Catalyst for Network Convergence inside the Data Center
2
The CN algorithm introduces a new tag in the packet, called the
should be capable of supporting jumbo frames since FC frames
CN-tag, which contains a Flow ID to uniquely identify the flow.
could be larger than 1,518 bytes.
The algorithm also defines a CN domain in which all devices,
FCoE also needs a control plane that helps discover and connect
switches, and hosts, are configured to support a particular
FC entities over an Ethernet cloud. The FCoE control plane
priority (CP) as congestion is controlled. By this definition, all
functionality is implemented by FIP-FCoE Initiation Protocol.
devices employ the CN algorithms on the queue associated with this priority.
ADVANCED DATA CENTER PROTOCOLS
The devices at the edge of the congestion control domain
Existing layer 2 protocols provide insufficient scalability due to:
ensure that traffic received from devices that do not support
> Difficulty in converging large networks. STP has issues in
CN, but with priority equal to one of the CPs, is mapped to a
scaling beyond 100 nodes
non-congestion-controlled priority. The devices also ensure
> Inability to choose multiple paths to decrease congestion
that the CN-tag is deleted before a packet is sent out of the
and balance traffic load
congestion domain. The CNMs sent by the congestion point contain the MAC address of the RP as the destination MAC
> Inability to choose paths more efficiently to decrease
address and Flow ID (taken from the received packet), which
convergence time
can be used to uniquely identify the flow in the RP. There are two competing, and in certain ways similar, attempts Data center Bridging Exchange (DCB-X)
to improve layer 2 network control plane performance: TRILL
Data Center Bridging Exchange, defined by the standard 802.1Qaz,
and SPB.
is used to eliminate the need for configuring all nodes in the
IETF’s TRILL is based on a very simple idea: encapsulate Ethernet
network. DCB-X defines a protocol by which DCB protocols like
frames in a new TRILL header and route these frames using IS-
PFC and ETS can automatically configure themselves based
IS. The frames are decapsulated at the receiver. While TRILL
on the configurations in the peer nodes. Additionally, DCB-X
inherits all the advantages of a link-state protocol approach,
can be used to detect incorrect configurations in the network.
it also throws open a lot of sticky questions that need to be
DCB-X works over LLDP, which is the industry-standard layer
resolved before it can be deployed, including:
2 protocol for exchanging information. DCB-X supports both
> How do you manage operations, administration, and
symmetric (PFC exchanges are symmetric because both sides
maintenance (OAM)?
of the link should have the same parameters) and asymmetric mechanisms. DCB-X is used to exchange both the administrative
> TRILL needs modification in the forwarding plane hardware.
and operational parameters (which can become different on
How to automate configurations in the layer 2 network and
the basis of the DCB-X message exchanged).
how to handle zero configurations? Conventional layer 2 switches are called Bridges. Nodes that
FIBER CHANNEL OVER ETHERNET (FCoE)
support TRILL are called RBridges. TRILL proposes RBridge
FCoE extends the Fiber Channel over a lossless Ethernet network.
peering over Bridges, and hence the coexistence of Bridges
So, from an Ethernet point of view, FC is just another upper-
and RBridges.
layer protocol like IP; from an FC point of view, Ethernet is just another new type of cable. FC frames are mapped natively
IEEE’s SPB provides an improved layer 2 control plane protocol
to Ethernet frames and are switched as regular Ethernet
while reusing existing OAM, and promises the coexistence
frames. In addition to lossless Ethernet, Ethernet switches
with legacy STP deployments.
Peers
Router/End Station
RBridge
Peers
RBridge
Bridge
Ethernet: The Key Catalyst for Network Convergence inside the Data Center
Router/End Station
Peers
RBridge
Bridge
3
The following table summarizes the trade-offs between SPB and TRILL: Features/Aspects
SPB
TRILL
Encapsulation
Ethernet
New TRILL encapsulation
Multipath support
Yes: 16*Head-end
Yes: N*Transit-hash
E-Tree/Multicast support
Shortest Path
None
OAM
Ethernet/ITU
Work in progress
Hardware
Existing forwarding plane. Only control plane changes
Control plane and forwarding plane change > New encapsulation > Ability to handle TRILL/STP boundaries
Backward compatibility
SPBV works with MSTP regions
TRILL RBridges can work with Ethernet clouds
ARICENT’S EXPERTISE IN DATA CENTER ETHERNET Aricent has developed extensive experience in data center technology by delivering high-performance, top-of-the-rack switches to leading vendors. Aricent engineers have been
Management (CLI, SNMP, WEB)
Routing Block (Protocols-RIP, OSFP, ISIS, BGP, RTM, Multicast)
involved in the development of software from requirements, customers.
the time required for equipment vendors to quickly develop
Chassis Management and System Monitoring
and deploy a data center switching solution. The Aricent ISS DCB offering has complete compliance to PFC, ETS, CN, and DCB-X standards. Coupled with a comprehensive set of layer
PFC, ETS
LLDP
Congestion Notification
Policy Engine
IP forwarding
design, porting to silicon platforms, testing, and delivery to
Aricent’s Intelligent Switching Solution (ISS) can accelerate
DCBX-for Data Centers
Layer - 2 - Vlan, STP, LACP, IGMP
QOS (Hierarchical, multiple scheduling scheme) & ACL Management
2 and layer 3 features, ISS enables equipment manufacturers to develop high-quality data center switches with an accelerated time to market. The modular ISS architecture, with open interfaces, facilitates third-party stack integration to provide
Hardware - Data Center Enabled Silicon
unique product differentiation. Architecture of a typical top-of-rack switch
VARADHARAJAN PARTHASARATHY
RAJKUMAR PAULRAJ
is a Director, Technology for Data
is a Director, Technology for Data
Communication products at Aricent,
Communication products at Aricent,
focusing on routing and switching
focusing on routing and switching
solutions including Aricent’s widely
solutions including Aricent’s widely
deployed ISS. He has over 16 years of
deployed ISS. He has over 15 years
industry experience in the design and
of industry experience in designing,
development of telecom applications,
and implementing routing and
security solutions, and gateways.
switching products.
Ethernet: The Key Catalyst for Network Convergence inside the Data Center
4
INNOVATION SERVICES FOR THE CONNECTED WORLD The Aricent Group is a global innovation and technology services company that helps clients imagine, commercialize, and evolve products and services for the connected world. Bringing together the communications technology expertise of Aricent with the creative vision and user experience prowess of frog, the Aricent Group provides a unique portfolio of innovation capabilities that seamlessly combines consumer insights, strategy, design, software engineering, and systems integration. The client base includes communications service providers, equipment manufacturers, independent software vendors, device makers, and many other Fortune 500 brands. The company’s investors are Kohlberg Kravis Roberts & Co., Sequoia Capital, The Family Office, Delta Partners, and The Canadian Pension Plan Investment Board.
Engineering excellence.Sourced Aricent is the world’s #1 pure-play product engineering services and software firm. The company has 20-plus years experience co-creating ambitious products with the leading networking, telecom, software, semiconductor, Internet and industrial companies. The firm's 10,000-plus engineers focus exclusively on software-powered innovation for the connected world. frog, the global leader in innovation and design, based in San Francisco is part of Aricent. The company’s key investors are Kohlberg Kravis Roberts & Co. and Sequoia Capital. info@aricent.com
© 2014 Aricent. All rights reserved. All Aricent brand and product names are service marks, trademarks, or registered marks of Aricent in the United States and other countries.