DCD Magazine #51 - Iceland's AI moment

Page 1

Issue 51 • December 2023 datacenterdynamics.com

ICELAND’S

ARTIFICIAL INTELLIGENCE MOMENT Also inside:

Cologix’s CEO Rebuilding Ukraine Quantum Computing


ACHIEVE THE INCREDIBLE

Join Europe’s leading provider of construction solutions.

At Mercury, success is the only option. We think beyond the conventional and go beyond the call of duty with a bold promise that we will always deliver. As a global player with powerful capabilities, we understand that our people are at the heart of what we do and value new talent who are committed to delivering industry-leading results for our clients. We are recruiting for a number of construction and support roles on our projects across Europe. If you are ready to achieve the incredible in an empowered and supportive team, view available positions at

www.mercuryeng.com/careers


Contents December 2023

6 News Cyxtera sold, Portugal’s PM out, Aligned leaves Quantum Loophole 14 Iceland’s AI moment A new cable, cold climes, and renewable energy 22 The Awards winners Here’s who won the DCD>Awards 28 Cologix’s CEO - Laura Ortman How to build a healthy culture

14

31

he AI & Networking supplement T Operating at AI scale

47

ring on the farming Edge B Smart cows and lasering weeds

51 Security by design How to make data centers both pretty and secure

26

57 Insecurity by staff shortages Analyzing a Microsoft outage, and why lightning could strike twice 61 Greening software by the grassroots We probably could have made this mag greener 68 Embracing a European market What comes after FLAP-D?

22

51

57

84

72

mazon’s generative AI show A We head to Vegas to see AWS’ cards

76

Quantum matures How data centers and quantum computers can coexist

84

ebuilding Ukraine’s network R Surviving the war and planning for 5G

89

Powering London The capital’s grid challenges

95

n Openreach update A Connecting the UK, a work in progress

98

ifferent flavors of UPS D Static or rotary?

101 M icrofluidics: Cooling inside the chip Thought immersion was the end game of cooling? Think again. 104 Op-ed: The next great business idea

Issue 51 • December 2023 | 3


www.mcim24x7.com


>>CONTENTS

From the Editor

Meet the team

7.6GW

The year of AI comes to a close

I

t's been hard to escape AI this year. Every headline, every investment decision, and every data center design has been dominated by the generative AI boom. Because we're as inventive as a large language model, this magazine is no different.

The land of AI and fire For the cover, we head to Iceland to hear the country's pitch to AI developers. We talk to the country's president, its biggest operators, and

With decreased latency demands, AI could be offloaded to greener lands. go inside a geothermal power plant to learn about carbon sequestration. We also travel to Vegas to see Amazon Web Services' gamble to catch up with Microsoft Azure with generative AI-focused services. And there's a whole supplement looking at the challenges of operating an AI-scale data center. We're talking high-density racks, memory bottlenecks, and all-optical dreams.

The winners Every year, the industry's biggest figures come to central London to eat fancy food, hear bad jokes, and learn who won the latest DCD Awards. We profile the 2023 winners, including a new category - Editor's pick.

Try to build a healthy company Much of our coverage of this sector is naturally focused on financial results and technological achievements. But, for this issue's CEO focus, we spoke to Cologix's Laura Ortman about an equally critical component: Staff morale and team building.

The war is still going

The total potential capacity that could be harnessed from Icelandic geothermal

The west's coverage of the Ukraine war has waned as new conflicts have arisen, and fatigue has set in. But the tragedy is still unfolding, and the nation continues to fight off Russia while rebuilding its infrastructure. On page 84, we look at the state of the country's telco sector.

News Editor Dan Swinhoe @DanSwinhoe Telecoms Editor Paul Lipscombe Compute, Storage & Networking Editor Charlotte Trueman Reporter Georgia Butler

Partner Content Editor Chris Merriman @ChrisTheDJ Partner Content Editor Graeme Burton @graemeburton Designer Eleni Zevgaridou Head of Sales Erica Baeta Conference Director, Global Rebecca Davison

London's power crunch

Content & Project Manager - Live Events

The UK is in a much more fortunate position, but has still seen its energy sector be rocked by the Ukraine war. Underinvestment and poor planning have also meant that the grid can't support rapid data center growth. What's next?

Gabriella Gillett-Perez Content & Project Manager - Live Events Matthew Welch Channel Management Team Lead Alex Dickins

Pretty, secure Can data centers still be fortresses if they don't look like them? The need for physical security is butting up against local desires for more attractive neighborhoods. Can the two designs meet in the middle?

Channel Manager Kat Sullivan Channel Manager Emma Brooks CEO Dan Loosemore

Head Office

Plus more Can we cut emissions at the software level? What comes after FLAP-D? How are quantum computers adapting to colocation data centers? We've covered a lot of topics this year, but there are still so many more stories to tell. Luckily, we'll have some new hires starting before the next mag...

DatacenterDynamics 22 York Buildings, John Adam Street, London, WC2N 6JU

Sebastian Moss Editor-in-Chief

Follow the story and find out more about DCD products that can further expand your knowledge. Each product is represented with a different icon and color, shown below.

Intelligence

Executive Editor Peter Judge @Judgecorp

Head of Partner Content Claire Fletcher

Dive even deeper

Events

Publisher & Editor-in-Chief Sebastian Moss @SebMoss

Debates

Training

Awards

CEEDA

© 2023 Data Centre Dynamics Limited All rights reserved. No part of this publication may be reproduced or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, or be stored in any retrieval system of any nature, without prior written permission of Data Centre Dynamics Limited. Applications for written permission should be directed to the editorial team at editorial@ datacenterdynamics.com. Any views or opinions expressed do not necessarily represent the views or opinions of Data Centre Dynamics Limited or its affiliates. Disclaimer of liability: Whilst every effort has been made to ensure the quality and accuracy of the information contained in this publication at the time of going to press, Data Centre Dynamics Limited and its affiliates assume no responsibility as to the accuracy or completeness of and, to the extent permitted by law, shall not be liable for any errors or omissions or any loss, damage or expense incurred by reliance on information or any statement contained in this publication. Advertisers are solely responsible for the content of the advertising material which they submit to us and for ensuring that the material complies with applicable laws. Data Centre Dynamics Limited and its affiliates are not responsible for any error, omission or material. Inclusion of any advertisement is not intended to endorse any views expressed, nor products or services offered, nor the organisations sponsoring the advertisement.

Issue 51 • December 2023 | 5


Whitespace

>>CONTENTS

News

NEWS IN BRIEF

The biggest data center news stories of the last three months

Former Interxion CEO David C. Ruberg dies Ruberg passed away on November 14, aged 77. He led the European colocation company Interxion from 2007 until 2020 when Interxion combined with Digital Realty and became EMEA CEO.

Data center fire in Dhaka, Bangladesh, kills three A fire broke out October 26, on the 13th floor of the 14-story Khawaja Tower in the Bangladesh capital. 10 injuries and three deaths were reported, and millions of people were taken offline.

AWS uses reclaimed wastewater for data center cooling at 20 locations

Brookfield Infrastructure acquires Cyxtera for $775 million, will combine it with Evoque Brookfield is to acquire Cyxtera, following the latter’s bankruptcy. The company, which filed for Chapter 11 in early June, will be combined with Brookfield’s other US colo provider, Evoque. The colocation company announced that it has entered into an asset purchase agreement (APA) under which Brookfield Infrastructure Partners and its institutional partners will acquire “substantially all” of Cyxtera’s assets for $775 million. Brookfield Infrastructure Partners said it will “generate strategic value” by combining Cyxtera with Evoque to create a retail colocation data center provider with more than 330MW of capacity in high-demand areas across North America. The total purchase price for the data centers and associated real estate underlying the sites is approximately $1.3 billion, inclusive of transaction costs and net of proceeds received from concurrently selling non-core Cyxtera sites to a third-party. Brookfield will also purchase the real estate at seven of Cyxtera’s US data centers from several landlords. Cyxtera said these transactions will increase existing facility ownership, secure

expansion opportunities, and give the company more control over its cost structure. At least four of those belong to Digital Realty and two to Digital Core REIT. Cyxtera will leave at least six more data centers during 2024, (three in the US and three abroad) which it currently leases from Digital Realty. Digital said it would spend $55 million of that to buy out Cyxtera’s leases in three colocation data centers in Singapore and Frankfurt.

20 Amazon Web Services data centers are using purified wastewater, rather than potable water, for their cooling systems. The company is using this process in Virginia and California.

American Tower & Crown Castle CEOs to retire, ATC founder passes away American Tower CEO Tom Bartlett will retire from February 2024. Crown Castle CEO, Jay Brown, will also retire from his company next year. American Tower co-founder Tom Stoner passed away on October 19 aged 88.

Oracle’s Ellison: We’re building 100 data centers

Brookfield has also granted Digital Realty a purchase option to acquire a colocation data center outside of London, in the Slough Trading Estate.

Oracle founder and CTO Larry Ellison said the company is in the process of expanding 66 of its existing cloud data centers and building 100 new cloud data centers. Ellison said Oracle will launch in 20 Microsoft data centers in the coming months, totaling more than 2,000 racks.

Cyxtera has also signed an agreement to sell its interest in its Montreal and Vancouver data centers to Cologix.

DXN surrenders Sydney data center to landlord

Formed out of CenturyLink’s colocation business, Cyxtera went public via a SPAC in 2021. The company combined with the Nasdaq-listed Starboard Value Acquisition Corp. (SVAC) in a $3.1bn deal. However, within a year, the company was rumored to be looking to go private again as it faced maturing debts. bit.ly/CyxteraExtera

6 | DCD Magazine • datacenterdynamics.com

Australian data center firm DXN is exiting its Sydney facility and aims to exit its other two locations to focus on its modular container business. As a result, the company will exit the lease at 5 Parkview Drive– which had nine more years left to run – and leave the site by the end of February 2024. DXN currently operates two other data centers in Tasmania and Darwin.


DCD Magazine #51

>>CONTENTS

Portugal’s prime minister, António Costa, has resigned amid a corruption probe involving a data center campus in Sines, a nearby hydrogen plant, and several lithium mines. Several executives from the Start Campus company have been arrested. PM Costa resigned hours after prosecutors examining alleged corruption involving lithium and “green” hydrogen deals announced that he was under investigation. Police have searched dozens of addresses, including Costa’s official residence and the environment and infrastructure ministries. Costa stood down, but said he has a “clear conscience” and “complete trust in justice.” “The dignity of the Prime Minister’s duties is not compatible with any suspicion regarding his integrity, his good conduct and, even less, with the suspicion of the commission of any criminal act,” Costa said in a statement. Portuguese President Marcelo Rebelo de Sousa has called a snap election to choose a new prime minister, rather than select a replacement from Costa’s ruling Social Party (Partido Socialista).

“The company is cooperating with the authorities, providing all necessary and requested information, to ensure a complete and impartial investigation of all necessary facts,” Start said. Start reaffirmed its “total commitment to transparency, legality and integrity of all its operations and in all phases of the development of its investment in Portugal” and said “that it will continue its operations and investment in Portugal.” The mayor of Sines, Nuno Mascarenhas, two administrators from the Start Campus company, and a consultant were arrested. Local press are naming the Start employees as CEO Afonso Salema and chief legal & sustainability officer, Rui Oliveira Neves. Start, owned by the Davidson Kempner company of North America, and Pioneer Point Partners of the UK is planning a 495MW campus on more than 60 hectares of land adjacent to a recently decommissioned coal power plant in the city of Sines on the Portuguese coast.

Portuguese PM Announced in 2021, ground was broken on the project last year. The first of six The Portuguese Public Prosecution Service resigns amid buildings, offering 5,000 square meters is investigating alleged “misuse of funds, (53,800 sq ft) and 15MW of capacity across corruption probe active and passive corruption by political six data halls, is due live next year. The figures, and influence peddling” involving remaining buildings are set to be built out involving data lithium mining concessions, a hydrogen until 2028. production project, and the Start data center centers campus in Sines. Previously announced partners for the Start confirmed to the Lusa news agency and Negócios that its facilities were searched as part of the investigation.

project include EllaLink, Exa, DE-CIX, Colt, and Nautilus. bit.ly/CostaCorruption

Digital Realty launches $7bn data center JV with Blackstone Digital Realty plans to set up a joint venture with Blackstone-affiliated funds led by Infrastructure, Real Estate, and Tactical Opportunities. The $7 billion JV will develop four hyperscale data center campuses in Frankfurt, Germany; Paris, France; and Northern Virginia, North America. Together, they are expected to support 500MW of total IT load across 10 data centers upon full build-out. 46MW is already under construction, and the campuses are 33 percent pre-leased. Around 20 percent of the total potential IT capacity is expected to be delivered through 2025. Blackstone will acquire an 80 percent ownership interest in the joint venture for approximately $700 million of initial capital contributions, while Digital Realty will maintain a 20 percent interest. In July, Digital Realty announced a joint venture with real estate investment company TPG Real Estate, shifting three already-operating Northern Virginia data centers into a $1.5bn partnership. It also has a $200m JV with Realty Income Corporation. Blackstone, meanwhile, in July pumped $8bn into data center business QTS to prepare for a “once in a generation” AI boom. bit.ly/DRTxBS

Issue 51 • December 2023 | 7


Whitespace

>>CONTENTS

offering more than 12MW across 12,000+sqm (129,160 sq ft).

Altice sells data centers to Morgan Stanley, launches new colo brand

French telco group Altice has formed a new company holding more than 250 data centers in France and is selling a majority stake to Morgan Stanley. Altice France announced in November that it had agreed to sell 70 percent of the new company, UltraEdge, to Morgan Stanley Infrastructure Partners (MSIP). The transaction values UltraEdge at €764 million ($836.5m) and is expected to close in the first half of 2024.

December also saw InfraRed Capital Partners launch a new European data center platform known as NexSpace. The new company will target regional cities and towns across Germany, Austria, and Switzerland (aka the DACH region). NexSpace’s first facility is a newly-built 2MW facility in Heidelberg, Germany that launched in November.

centers will be transferred to UltraEdge, with servers and active equipment retained by SFR. UltraEdge is one of several new regional European colo companies to launch this quarter. In December investment firm Arcus Infrastructure Partners acquired a German data center operator and launched a new European data center platform known as Portus.

UltraEdge is described as a ‘nationwide independent distributed colocation provider,’ with 257 data centers plus office space across France, totaling more than 45MW of installed capacity. The facilities are currently run by Altice’s mobile operator subsidiary SFR.

Portus Data Centers combines the newly acquired IPHH which operates two data centers in Hamburg, with Arcus’ previous investments EDH in Luxembourg, and Portus Data Centers Munich (formerly SDC SpaceNet DataCenter).

Altice’s release says the passive infrastructure and equipment of the data

Portus portfolio currently totals four facilities across Germany and Luxembourg,

Aligned pulls out of Quantum Loophole campus in Maryland

Located in the Heidelberg Innovation Park (HIP), Nexspace’s debut data center is a three-story facility offering capacity for 252 racks across 4,000 sqm (43,055 sq ft) – reportedly expandable to 8,000 sqm (86,110 sq ft). Ground was broken on the facility in November 2022. Any plans for future Nexspace data centers haven’t been shared. InfraRed is part of SLC Management, the institutional alternatives and asset management business of Sun Life. The company manages more than $14 billion of equity capital. In the Netherlands, I Squared Capital’s nLighten acquired the GyroCenter in Amsterdam. Launched this year, the company has acquired France’s Euclyde and the UK’s Proximity. November saw Digital9 sell European operator Verne to Ardian. Verne operates sites in Iceland, UK, and Finland (see cover feature). bit.ly/UltraExtraMegaEdge

Aligned Data Centers has pulled out of a plan to build 264MW of hyperscale capacity on the Quantum Loophole campus in Maryland after the state limited the number of diesel generators it would be allowed to deploy there. Aligned planned to build 3.3 million sq ft and 264MW of data center capacity at the campus, set on a former Alcoa aluminum smelting plant near Adamstown, in Frederick County. The company applied for exemptions to the state’s emissions rules so it could site 168 diesel generators there, but the exemptions were rejected in August, with the State’s Public Service Commission ruling that they amounted to the equivalent of a single 504MW power plant. A provisional order from the Commission would have limited Aligned to installing up to 70MW, the maximum capacity Maryland applies for a generating station, the equivalent of 20 3.3MW diesel generators. Aligned rejected that order and pulled out of the project, telling the commission that its order “did not present Aligned Data Centers ... with sufficient relief to permit the project to proceed.” Meanwhile, another provider, Rowan Digital Infrastructure, has announced plans for a deployment on a different plot at the Maryland campus. However, it is not clear at this point whether that development will be subject to the same limitations on diesel backup which stymied Aligned. bit.ly/MisAligned

8 | DCD Magazine • datacenterdynamics.com


DATA CENTRE DATA CENTRE COMMISSIONING COMMISSIONING SPECIALISTS COMMISSIONING SPECIALISTS

SPECIALISTS

Unlocking Potential, Empowering Empowering Success Unlocking Unlocking Potential, Potential, Empowering Success Success

Unlocking Potential, Empowering Success

WhatWe WeDo? Do? What What We Do?

OurServices Services Our Our Services

We industry worldwide withwith comprehensive Wefurnish furnishthe theconstruction construction industry worldwide comprehensive We furnish the construction industry worldwide with comprehensive services spanning Commissioning Agent, Commissioning Management, services spanning Commissioning Agent, Commissioning Management, services spanningand Commissioning Agent,Management. Commissioning Management, MEP Validation, Specialist Electrical MEP Validation, and Specialist Electrical Management. MEP Validation, and Specialist Electrical Management. Our expertise shines through in the meticulous delivery of hyperscale Our expertiseshines shinesthrough through meticulous delivery of hyperscale Our in in thethe meticulous delivery of hyperscale data expertise centres, monumental projects, research facilities, utilities, and data centres, monumental projects, research facilities, utilities, centres, monumental projects, facilities, utilities, and and Wedata furnish the construction industryresearch worldwide with in comprehensive distribution systems. We're recognised for our excellence COLO/IBX distribution systems. We're recognised for for ourour excellence in COLO/IBX distribution systems. We're recognised excellence in COLO/IBX data centres, MEICA process facilities, HV testing, commissioning, services spanning Commissioning Agent, Commissioning Management, data centres, MEICA process facilities, HV testing, commissioning, data centres, MEICA process innovations. facilities, HV testing, commissioning, maintenance, MEP Validation,and andgroundbreaking Specialist Electrical Management. maintenance, groundbreaking innovations. maintenance,and and groundbreaking innovations.

What We Do?

Commissioning Management Commissioning Management Commissioning Management Comprehensive Commissioning: From concept

Our Services

Who We Are? Who We Are? Who We Are? distribution systems. We're recognised for our excellence in COLO/IBX Our expertise shines through in the meticulous delivery of hyperscale data centres, monumental projects, research facilities, utilities, and Global Commissioning is the leading data centre commissioning

data centres, MEICA process HV testing, commissioning, Global Commissioning is the facilities, leading data centre commissioning

services bringing istogether years ofdata industry expertise and a Global company, Commissioning the leading centre commissioning

services company, bringing together years of inindustry expertise and a maintenance, andclient groundbreaking dedication to success. We innovations. specialise providing streamlined,

services company, bringing together years of industry expertise and a dedication to client success. We client's specialise in providing streamlined, efficient solutions tailored to each unique needs. dedication to client success. We specialise in providing streamlined, efficient solutions tailored to each client's unique needs. Our approach is rooted in collaboration, trust, and transparency, efficient solutions tailoredinto collaboration, each client's unique Our approach is rooted trust, needs. and transparency, creating enduring partnerships that facilitate business growth. Our approach is rooted in collaboration, trust, and transparency, creating enduring partnerships that facilitate business growth. Leveraging cutting-edge technology and insights, we focus on reducing Global Commissioning istechnology the leading data centre commissioning Leveraging cutting-edge and insights, we focus on reducing creating enduring partnerships that facilitate business risk, enhancing operational efficiency, and preparing your business growth. for the risk, enhancing operational efficiency, years and yourfocus business for thea services company, bringing technology together of industry expertise and Leveraging cutting-edge and preparing insights, we on reducing future. future. dedication to clientoperational success. We specialise in providing risk, enhancing efficiency, and preparing yourstreamlined, business for

Who We Are?

the future. efficient solutions tailored to each client's unique needs.

Our approach is rooted in collaboration, trust, and transparency,

Comprehensive to handover. Commissioning: From concept Comprehensive Commissioning: From concept to handover. MEP Validation Services to handover. MEP Validation Design Reviews Services & Commissionability Studies MEP Validation Design Reviews &Services Commissionability Studies Test Script Production Design Reviews & Commissionability Studies Test Script Production Test Script Production

Commissioning Management Project Management Project Management Project Management Comprehensive Commissioning: From concept On-site Project Management, Support & to handover. On-site Project Management, Support & QAQC Compliance On-site Project Management, Support & MEP Services QAQC Validation Compliance Commissioning Programme Creation & QAQC Compliance Design Reviews & Commissionability Commissioning Programme Creation &Studies Information Management Commissioning Programme Creation & Test Script Production Information Management Construction Management Information Management Construction Management Construction Management

Project Management Electrical Services Electrical Services Electrical Services On-site Project Management, Support &

Electrical Services: Ranging from HV SAP to QAQCLV Compliance Electrical Services: from HV SAP to 400kV, APs, andRanging Electrical Safe Systems. Electrical Ranging from HV SAP & to Commissioning Programme Creation 400kV, LV Services: APs, and Electrical Safe Systems. Cloud Solutions: Including permit-to-work 400kV, LV APs, and Electrical Safe Systems. Cloud Solutions: Including permit-to-work Information Management systems and commissioning software. Cloud Solutions: Including permit-to-work systems and commissioning software. Construction Management System Training, Familiarisation, and Technical systems and commissioning software. System Training, Familiarisation, and Technical Authoring System Training, Familiarisation, and Technical Authoring Authoring

Electrical Services

Electrical Services: Ranging from HV SAP to 400kV, LV APs, and Electrical Safe Systems. GET IN TOUCH Cloud Solutions: Including permit-to-work Leveraging cutting-edge technology and insights, we focus on reducing commissioning admin@global-cxm.com Wilson Business Park, admin@global-cxm.com RadioHouse, House,John Johnsystems Wilsonand Business Park, software. risk, enhancing operational efficiency, and preparing your business Radio for +44 +44(0) (0)1227 1227649 649087 087 admin@global-cxm.com Radio House, John Wilson Business Park, System Training, Familiarisation, and Technical Whitstable, www.global-cxm.com Whitstable,Kent, Kent,CT5 CT53QP, 3QP,UK UK +44 (0) 1227 649 087 www.global-cxm.com the future. www.global-cxm.com Whitstable, Kent, CT5 3QP, UK Authoring

IN TOUCHpartnerships creatingGETenduring

GET IN TOUCH

that

facilitate

business

growth.


Whitespace

Microsoft announces in-house Arm CPU and AI chips and custom racks Microsoft has announced two in-house designed semiconductors - an Arm CPU and a dedicated AI accelerator. The Microsoft Azure Cobalt CPU is designed for general workloads, with a focus on performance per watt. The Microsoft Azure Maia AI Accelerator, meanwhile, is optimized for artificial intelligence tasks and generative AI.

>>CONTENTS

for us to optimize and integrate every layer of the infrastructure stack to maximize performance, diversify our supply chain, and give customers infrastructure choice.” The Azure Cobalt 100 CPU will join rival Arm chip Ampere on Microsoft Azure. It is currently being used for internal Microsoft products such as Azure SQL servers and Microsoft Teams. It has 128 Neoverse N2 cores on Armv9 and 12 channels of DDR5, and is based on Arm’s Neoverse Genesis CSS (Compute Subsystem) Platform. As for Maia, it is built on TSMC’s 5nm node and has 105 billion transistors on a monolithic die. The company claims a performance of 1,600 Tflops of MXInt8 and 3,200 Tflops of MXFP4 - which best rival Google’s TPUv5 and Amazon’s

Trainium. It has a memory bandwidth of 1.6TBps, above Trainium, but below the TPUv5. The chip will be deployed in a custom-designed rack and cluster known as Ares, SemiAnalysis reports. The servers are not standard 19” or OCP and are reportedly “much wider.” Ares will only be available as a liquidcooled configuration, requiring some data centers to deploy water-to-air CDUs. Microsoft said that each rack will have a ‘sidekick,’ where cooling infrastructure is located on the side of the system, circulating liquid to cold plates. Each server features four Maia accelerators, with eight servers per rack. bit.ly/ARMedWithChips

The company said that it would begin rolling out the chips to its data centers early next year, initially for internal services like Microsoft Copilot and Azure OpenAI Service. They will then be made available on Microsoft Azure more generally, but are unlikely to be sold individually. Microsoft is building the infrastructure to support AI innovation, and we are reimagining every aspect of our data centers to meet the needs of our customers,” said Scott Guthrie, EVP of Microsoft’s Cloud and AI Group. “At the scale we operate, it’s important

Prince William County officials OK PW Digital Gateway

Tract plans 46-building data center campus outside Richmond, Virginia Data center park developer Tract is planning a large campus

Prince William County officials have voted in favor of the PW Digital Gateway data center project in Manassas, Virginia, after another marathon 24-hour public meeting.

outside Richmond, Virginia.

QTS and Compass are proposing to develop thousands of acres of greenfield land in Manassas for a massive data center development. The two companies were applying to rezone the land to allow for data centers.

Limited Industrial District to allow for the development of

Supervisors voted 4-3 in favor of the project, representing the next step in the multi-million sq ft project. Supervisors Wheeler, Bailey, Franklin, and Angry voted in favor; Supervisors Weir, Lawson, and Vega voted against. Democratic Supervisor Kenny Boddye of the Occoquan District abstained.

whether it will vote to recommend the project until January.

The mid-December meeting came after Prince William County staff recommended the denial of the site’s rezoning request twice in recent months, and the planning commission voted to recommend denial of the project ahead of the final vote by the County Board of Supervisors.

(State Route 738).

QTS is aiming to develop around 11.3 million gross square feet (1.05 million sqm) of data center space, while Compass aims to develop up to 11.55 million sq ft (1.07 million sqm) of data center space.

analysis study assumes development would run until 2043,

bit.ly/PWGatewayOK

10 | DCD Magazine • datacenterdynamics.com

Via Blenheim Associates, Tract is seeking to rezone around 1,212 acres from A-1, Agricultural District, to M-1(c), a data center technology park in Hanover. In November, the Hanover County Planning Commission voted to defer Documents suggest up to 46 buildings may be developed on the site, located on the south line of Hickory Hill Road (State Route 646) at its intersection with Old Ridge Road Total campus capacity is unclear, but the company is seeking to develop up to eight on-site substations in the park along with multiple battery storage systems. A traffic but Tract’s own planned development timelines are unclear. Tract is planning another large park in Reno, Nevada. bit.ly/TractGainsTraction


CBRE powers performance across the entire data center lifecycle, no matter where you are in the world.

UPTIME

TALENT

EFFICIENCY

PREDICTABILITY

SUPPLY CHAIN

We strive for zero unplanned downtime with industry-leading risk management protocol.

Quality supply and retention of data center professionals.

Balancing site performance with cost to operate and maintain.

Constant monitoring and analytics reduce risk and forecast capacity.

We leverage our global buying power to provide resiliency and competitive prices.

cbre.com/datacenters


Whitespace Vertiv launches wooden data center modules Vertiv has introduced a prefabricated data center module made of out of wood. Announced in November, the TimberMod variant of its SmartMod container series uses mass timber instead of steel for structural elements, including the casing. It is available in the US and EMEA. Vertiv says the wooden box will have a lower carbon footprint than the equivalent steel versions of the product, because timber causes less resource depletion and has a carbon footprint up to three times lower than steel. “Mass timber, if sourced from sustainably harvested wood, serves as a renewable construction material with the potential to minimize resource depletion and lower carbon footprint by up to three times compared to steel based on the reduction of CO2 emissions associated with the cradleto-gate product lifecycle and the transport of materials and structural elements to the assembly site,” said the Vertiv announcement. The TimberMod product meets structural requirements for buildings, and is strong enough to resist seismic activity, wind forces, and structural demands. Vertiv also promises it “adds an aesthetic dimension to data center architecture.” bit.ly/ShiveryTimbers

>>CONTENTS

Equinix to roll out support for direct-tochip liquid cooling at 100+ locations Equinix is rolling out support for liquid cooling across a large proportion of its data center footprint. The colocation firm this week announced plans to expand support for liquid cooling technologies – including direct-to-chip – to more than 100 of its International Business Exchange (IBX) data centers in more than 45 metros globally. Equinix already supports liquid-to-air cooling through in-rack heat exchangers. The company said the new announcement will enable more businesses to use cooling technologies for high-density hardware. The company said it was adopting a vendor-neutral approach so customers can choose their preferred hardware provider. Sites supporting direct-to-chip liquid cooling include London, Silicon Valley, Singapore, and Washington DC. The first of the facilities to host the new technology will reportedly be ready in Q1 2024.

centers cool powerful, high-density hardware that supports emerging technologies, and Equinix is at the heart of that innovation,” said Tiffany Osias, VP of global colocation, Equinix. “We have been helping businesses with significant liquid-cooled deployments across a range of deployment sizes and densities for years. Equinix has the experience and expertise to help organizations innovate data center capacity to support the complex, modern IT deployments that applications like AI require.” In February 2023, the colo giant said the company had been testing ZutaCore’s liquid cooling systems at its co-innovation center for a year, and in June 2022 said a rack full of operational two-phase cooling had, at that point, been stable for six months in its NY5 data center in Secaucus, New Jersey.

“Liquid cooling is revolutionizing how data

bit.ly/Liqiunix

Peter’s factoid Silicon Valley Power expects its data center load to almost double by 2035. Data centers are already the single largest load for the Californian utility, SVP said in a draft Integrated Resource Plan.

Elon Musk’s Twitter/X denied tax break for AI hardware in Georgia Elon Musk’s X has been denied a sizeable tax break for IT equipment in Atlanta, Georgia. The social network firm formerly known as Twitter was denied a $10.1 million tax break by Fulton County officials in December. The 10-year tax break would have been for a $700 million project to deploy IT equipment at a QTS facility at 1025 Jefferson Street. The company said the computer infrastructure would be used

12 | DCD Magazine • datacenterdynamics.com

to develop and train artificial intelligence products for the X Platform, including Large Language Models and Semantic Search. The company had said the savings were needed to prevent X from choosing to install the equipment in Portland, Oregon – but during the meeting said the desire to deploy quickly meant some infrastructure would be deployed at the facility regardless of the incentive. X/Twitter has been present within QTS’ Atlanta campus since 2011. It has shrunk its footprint elsewhere. bit.ly/XDenied



DCD Magazine #51

>>CONTENTS

Iceland’s AI moment Sebastian Moss Editor-in-Chief

It’s cold, it’s renewable, and it’s cheap. But is it ready?

I

celand has long pitched itself as a perfect place for data centers, thanks to its cheap, clean power, and cold temperatures.

But, for years, the majority of the companies that answered the call were cryptominers, riling locals and doing little to shift workloads from the rest of the world.

On a visit to Iceland, DCD found the country is ready to make its case again, thanks to a new submarine cable, increasing pressure for sustainability, and - most of all the surge in artificial intelligence demand.

Bringing data centers to Iceland "Data centers are here to stay," Iceland's president Guðni Th. Jóhannesson told DCD. "When I was studying in England in the late '80s, there was no Internet. My mom sent me newspapers that arrived 14 days later, and I read everything - the obituaries, the advertisements, everything. This was my connection to Iceland. I'd make a phone call, twice a month, just to find out if everybody was still alive.” Things have changed dramatically since. “Wherever you go, you are connected,” Jóhannesson said. “The world has shrunk." This has meant that the people of Iceland are no longer disconnected from the wider world, but the digital age also comes with a decidedly physical footprint. "Last night, I attended a sporting event. I took loads of videos, listened to Spotify on the way back, downloaded necessary stuff, unnecessary stuff,” Jóhannesson said. “But I didn't pay any attention to the energy, because it’s just the cloud - I don’t see smoke billowing up out of my laptop or phone.” As the data center sector expands rapidly, fueled partly by AI demand which now consumes power comparable to entire cities, “this might be a cause for concern,” he argued. “It should make us aware that if we want to be sustainable, we have to be conscious of the energy we use and how we use that energy.” Iceland’s president wants more connectivity, but he told us it shouldn’t come at a cost. He said it should use Iceland’s 100 percent renewable grid.

14 | DCD Magazine • datacenterdynamics.com


The land of ice 

>>CONTENTS

The crypto island

"From now on, we cannot allow ourselves to do that," Jonsson said. "We will be standardizing on gensets, cooling, racks, floor space, etc."

Most early Icelandic data centers had little to do with connectivity or services: They came to mine Bitcoin.

The company needs standardization to target larger clients and expand more rapidly. At the same time, it wants to shift from direct air cooling to also include liquid cooling.

A 2018 report from KPMG found that 90 percent of the nation's data center capacity mined crypto, and warned that the figure was only set to rise. Mining has few demands except cheap power, so many crypto data centers forgo security and other features of a traditional facility. In 2018, this left them vulnerable to the 'Big Bitcoin Heist,' in which a gang stole 600 servers worth almost $2 million from facilities across the country.

If ICE03 at Akureyri is the standard, it would be a sustainable one. It has a building frame made out of glulam, wood laminations bonded together with durable, moisture-resistant adhesives. "Why timber and not steel? It's more sustainable; it's environmentally friendly. It is much more fire resistant - glulam will stand long after a steel building has collapsed," he added.

The peak of the crypto craze soon passed, and those that survived now believe their business are more mature, their facilities more secure, and that they are ready for enterprises. "Crypto has reduced substantially in Iceland," atNorth CEO Magnús Kristinsson told DCD. "I would be very surprised if it vanishes entirely, but we are now seeing AI contracts that are the same size as we could sign crypto contracts for seven, eight years ago." The company acquired the data centers of Advania, one of which was hit by the Bitcoin bandits five years ago. Visiting atNorth's newer ICE03 facility, we noted its high fences, man traps, and copious CCTV cameras.

The ambient temperatures of the region lend themselves well to cooling. Reykjavik is on the south coast, where "the average temperature over a year is probably close to 2-3°C," Jonsson said. "We can get minus 10-12°C, or even lower up here [in Akureyri]. And, in the summertime, it can go up to 18-20°C. So the variance up north is much greater." All atNorth’s sites are slightly different across Iceland, Norway, Sweden, Finland, and Denmark. This reflects past acquisitions, different market approaches, and a wish to test out various approaches, but it can’t last.

Going north "We are not engaging in new crypto customers," Johann Thor Jonsson, atNorth's director of site selection (pictured, right), said as he showed us the site.

Inside, the company does use steel panels for the walls, sandwiching Icelandic rock wool, made from basalt melted with other minerals, then spun and tempered with binders. "Rock wool is environmentally friendly, highly fire resistant, highly noise insulating," Jonsson said. "All of these things that we're doing here are taken with the environment in mind." Rival data center operator Borealis is also seeking to professionalize. It wants to move beyond just crypto, but it has no plans to stop targeting that market, especially as Bitcoin prices return to old heights. Another victim of the heist, Borealis’ newer facility in northerly Blönduós now has security. On our visit, we went past high fences and cameras, although no man trap. "The bad weather becomes a natural resource, and we are able to cool and run data centers in a very efficient manner," CEO Bjorn Brynjulfsson said.

"We want to be the net zero partner of choice and become the largest data center operator in the Nordics," he said. Acquired by Swiss investment manager Partners Group in 2021, atNorth itself recently bought HPC-as-a-service business Gompute in an effort to push deeper into the HPC and AI market.

"We design our facilities to take advantage of this. We are able to run very high-density workloads, AI and HPC being a very good fit to the conditions, allowing us to use pretty much all of the energy for running the IT load and not spending it on cooling."

The site has 6,750 square meters (72,650 sq ft) of data center space, 12MW of power, and a PUE of 1.2. It joins the 3.2MW Reykjavik-based ICE01 and the 83MW ICE02 to the north of the capital.

This site currently has 50MW of capacity, alongside a Reykjavik data center with 10MW and a site in Fitjar to the south with 12MW. All have space to grow, Brynjulfsson said.

That latter facility is "a huge campus," Jonsson said, "and we're going to double that in the next few years - so stay tuned."

"We broke ground here in May 2018, started construction in June, and by mid-September we had the first building operational," Brynjulfsson said. "Part of the reason why we [were so fast] is that blockchain facilities are somewhat simpler than traditional traditional data centers.

At ICE03 in the northern Icelandic town of Akureyri, "we will triple our size easily and more - there's some new power capacity coming on stream. We will need a new transformer, but it's all in the works." Johann Thor Jonsson

Issue 51 • December 2023 | 15


DCD Magazine #51

>>CONTENTS

"But, since then, we've been building this campus out using top-notch infrastructure provided by Schneider [Electric] and ABB. We're putting a lot of energy targeted towards HPC customers." The company has partnered with British firm EcoCooling to work on newer high-density AI designs. The site reflects the transition. One building looked much like Yahoo's classic 'chicken coop' setup. "This one is essentially the same design, totally true," Brynjulfsson agreed, pointing to another building and adding: "This one is much more controlled."

A bit Bitcoin The crypto roots are still very much on display. Entering one hall, we were greeted by the classic mosquito-pitched scream of ASIC miners sucking in outside air through filters. Next to the car park, two standalone Antminers consume 1MW each, acting as minidata centers on the wider campus. The simpler design of a crypto site has its benefits. "It's 1.03 annualized PUE for the simple design, without UPS, and a single feed of electricity and direct free air cooling," Brynjulfsson said. "Whereas the Reykjavik DC, with a closed loop liquid cooling system, double feeds, and [lithium] UPS systems, it has a design PUE of 1.15." While it accepts that a stripped-back crypto design is not fit for enterprise purposes, Borealis differs from atNorth and others, arguing that sites do not need to be overbuilt. "In our view, people should stop building data centers in the ultra-reliable way as we did before, where everything is doubled, and everything has power generation at the site. If this campus was built that way, we would have 50MW of diesel-generating capacity at the site. We don't want that," Brynjulfsson said. "We foresee that maybe 20-50 percent max will be the backup generation. We would really like to put the reliability of the grid to play to provide good services here. Through workload distribution, we'll take your needs and distribute them geographically in multiple locations. If one goes down, we can still continue. We see this becoming the norm in the future." There are other benefits to moving

Photography by: Sebastian Moss beyond crypto. "Generally speaking, cryptomining has a bad reputation, and people say that renewable energy should not be used for mining," local mayor Pétur Arason told DCD as we walked through the town next to the Borealis data center. "That is, of course, affecting companies that run data centers. But if the data centers are also developing their businesses towards AI and HPC, that's not such an issue." A similar sentiment was shared by many that we talked to. “I think there were times where the general opinion, was 'oh, yeah, Iceland, that's the place where crypto goes,’” Tate Cantrell, CEO at Verne Global said. Verne arrived in Iceland before the Bitcoin boom, with a focus on HPC, which the company says led naturally to AI. COO Mike Allen said: “We've eschewed the whole Bitcoin thing, just because of this negative.” "Since 2016, we've had customers that were focused on the training of large language models," Cantrell said proudly. Of course, non-crypto data centers also get their fair share of bad publicity. Grid issues have caused the power-hungry

16 | DCD Magazine • datacenterdynamics.com


The land of data 

>>CONTENTS

Iceland’s small population is not enough to justify the cables Iceland has today, let alone those that Sveinsson has plans for: "The main reason for Iris was to increase the security of Iceland, but data centers are an important customer for us. They clearly help us maintain our network." Their demands are also increasing: "The impact of AI and the discussion of sustainability is really creating a lot of demand today."

Sustainability, without the cost industry to come under the microscope the world over - perhaps most notably in Ireland, where officials claim that data centers accounted for almost a fifth of all electricity used in 2022. But for Iceland, this problem is seen as another opportunity. "Ireland rolled out very aggressive tax reforms and incentives to bring data centers to Ireland," atNorth's Jonsson said. "They may have gone a little bit overboard - luckily, we learn from history." AtNorth and other operators are pitching their country as a natural extension of Ireland, thanks to a new cable, Iris.

some 1,800km to Galway, Ireland. It joins the older Farice-1 cable to the Faroe Islands and Scotland (11Tbps) and the Danice cable to Denmark (40Tbps). The company is considering where to build its next cable. "You mentioned Norway, that might be a good idea," Sveinsson said to DCD. "Northern Ireland, clearly, is another interesting connection point. You only need to look at the map of where Belfast is, it makes sense." For now, however, Farice is confident in its bet on landing at Galway, slap bang in the middle of Ireland's west coast. "We are the first cable in, so we are taking the risk," Sveinsson said. "Is the marine route up to Galway good, how are the backhauls performing, did the planning process work? We have answered these questions." With that out of the way, he is hopeful several other cables from around the world will land at Galway, creating a connection hub a short hop from Iceland. "There is the Portugal Ireland Spain (PISCES) cable which would land in Galway," Sveinsson said. There's also the Far North Fiber Express Route, which hopes to join Europe and Asia, via the North West Passage, cutting within the Arctic Circle.

A Dublin suburb "The Iris cable reduces the latency quite significantly - it goes from 34 milliseconds to Dublin down to 10.5ms [one way]," Thorvardur Sveinsson, CEO of stateowned submarine cable company Farice, told us. "It effectively makes Iceland a digital suburb of Dublin," he proclaimed, using a phrase we heard often. Launched this year, Iris brings a capacity of 145Tbps to Iceland, traveling

That "could be a mega project in the submarine world costing more than $1bn,” said Sveinsson. “And we are working with them to get them to land into Galway and help it develop into a cable exchange.

Going green has always been synonymous with parting with a similarly-colored asset, but Icelandic operators argue that things are different on the Nordic island, where cool temperatures reduce electricity usage and the need for as much cooling infrastructure. At the same time, guaranteed low-cost renewable energy removes the need for complex PPAs, or the fear of sudden price swings. Shearwater Geoservices, which helps map the marine environment, was looking to reduce costs and its own carbon footprint (although its major clients include fossil fuel companies). "Our main data center is in the UK, and then we have sites in the US, India, and Malaysia," the company's processing and imaging manager, Andrew Brunton, said. "We also use a burst model where, if we've got workloads requiring significant amounts of GPUs, we will go to the cloud." As its sites aged, the company considered Iceland as a potential alternative option, and has rolled out a small trial deployment at atNorth's ICE02 facility. "By March next year, we're looking at an 84 percent cost saving versus the UK data center," Brunton said. "That's about £173,000 ($217k) a month, it's crazy." It should also mean a 91.9 percent carbon reduction, and 31.5 percent energy usage saving, he said. The company plans to double its footprint in March.

"They are also planning to have a branching unit, so if there's a case for it, we could branch into that, but the economies have to justify such an investment."

That is not an apples-to-apples comparison: an older on-prem site will always be less efficient than a modern colocation facility. But Brunton said that the price savings were large enough that the company is considering buying more equipment so that it doesn't need to burst to the cloud.

That cable would connect Iceland (via a branching unit or via Galway) to Japan, Alaska, and Western Europe - if it can be pulled off.

"The other concern for us was latency," he said. "The claim from atNorth was 18.4ms; we're seeing 25, which is not noticeable."

Issue 51 • December 2023 | 17


DCD Magazine #51

The latency savings of Iris have meant "we don't have any complaints from users," he said. "There are about 100 people in the UK who are actively logging in every day to the Iceland data center, and they don't see any issues. Second-tier cloud provider IBM also sees an opportunity in the sustainable data halls of Iceland. It is initially dipping its toe into the market with IBM Cloud Satellite, in partnership with Borealis. Rather than a full-blown cloud, it is a managed distributed cloud solution that can run on any data center. This instance will primarily focus on storage, IBM Denmark country manager Lars Lindegaard said. "We're hopefully going to end up with a solution where, on the IBM Cloud, whenever you select to store data via cloud storage, you should be able to have Iceland as a selection where you can choose to store your data in an almost carbon neutral setting." Halldór Már Sæmundsson, chief commercial officer at Borealis, added: "It can scale quite a bit, but we're not going to compete with AWS, Azure, etc., on burstable, accessible storage."

What about hyperscale? Iceland has a major unanswered question. What does the hyperscale market have in mind for the country? Even one of the tech giants could reshape the local market, pick winners and losers, and consume capacity and demand. “I know for a fact there's a lot of interest from the hyperscalers establishing in Iceland now,” atNorth CEO Kristinsson claimed. “They have vast AI needs that they need to fulfill somewhere.” Should they come, he believes that “they will come in a big scale.” Thanks to Iris, he said, “they are contemplating moving the AI part of their work to different countries, and Iceland can serve that very well.” Verne CTO Cantrell concurred: "Iris gives us a lower latency connection to the Dublin Mecca of cloud infrastructure, and we just see Iceland becoming an even better location for what I call 'applicationspecific cloud deployments.'" He added: "We've got a 16MW building behind us that's going up, and we built that to hyperscale requirements. Three distinct points of entry, dual meet-me rooms, 16kW a rack, slab floors, and so on - this is a hyperscale-ready facility, which will be ready to roll by the middle of

>>CONTENTS

next year. We're leaning forward on that perspective, and we will be ready for it." Beyond the top cloud providers, there are others looking to buy at hyperscale levels. “All the hyperscalers are searching for space, yes, but so are these large AI service providers from the US - CoreWeave and the rest - which are looking like crazy for data centers,” Kristinsson said. “It feels like we could build 1,000MW and someone would just buy it.”

Power troubles The problem is, even with its abundance of power sources, Iceland still suffers from the same grid constraints as the wider world. "We have more demand than we can deliver, but I think that's true everywhere in Europe," Haraldur Hallgrímsson, director of business development at stateowned power company Landsvirkjun, said. Each operator we spoke to told the same story - they were expanding, but could expand far faster if the grid could support it. Each was doing their part to move things along. "You'll see a substation here at the site which was built specifically for us, and which we funded," Borealis' Brynjulfsson told us during our tour. Verne's Cantrell echoed: "We are investing in transmission, we've already broken ground on a 240MVA substation." A quick fix to the power crunch is tantalizingly close. The majority of the power produced in Iceland goes to just three aluminum smelters. The electricity-intense process makes aluminum from alumina sourced in Guinea, Australia, and Brazil and shipped to Iceland. Capacity expanded massively in the 2000s, lured by cheap clean power, and the country now makes some two percent of the world’s aluminum. The data center sector thinks that could change if one of the smelters leaves. Rio Tinto runs the oldest smelter, built back in 1969, but has recently complained about Landsvirkjun, and had repeated labor disputes.

immediately open up an industrial site with hundreds of megawatts of power already connected. And maybe, one departure would set off a domino effect and cause the other two to depart.

Locals told DCD that the company was ready to leave Iceland a few years back, paused the move during Covid, and then pushed it further when the Ukraine war sent European mainland energy prices skyrocketing.

Already data centers are giving Landsvirkjun useful leverage in negotiations with the smelters, whose disproportionate power consumption has given them the ability to demand lower prices. A smelter’s threats to leave are much weaker when a data center operator is waiting in the wings.

If Rio Tinto, or one of the other two smelters eventually leaves, it would

Over the longer term, Iceland is working on a ‘Master Plan for Nature

18 | DCD Magazine • datacenterdynamics.com


The land of fire 

>>CONTENTS

Protection and Energy Utilization’ to decide how best to expand its grid, build out new plants, and increase resiliency. The effort, begun in 1999, has slowly wended through government, industry, and local communities, and in its fifth phase it is not clear when it will be ready. “I’m not answering questions,” President Jóhannesson said when we asked for an update. For now, around 80 percent of the country's power comes from hydro, and 20 percent from geothermal - both renewable, and both stable energy sources.

Angeliki Kapatza, a geothermal specialist at ON Power at the time, told DCD as we toured the Hellisheidi geothermal power plant.

The basalt provides Visiting Hellisheidi, you would be forgiven for thinking that the cloud had already come to Iceland. Great plumes of steam hover in the air, ominously looming over a vast building housing enormous turbines. "We are generating 303MW of electricity with seven turbines," Kapatza said, making it the largest geothermal

power plant in the European Economic Area. It doesn’t just produce electricity. The turbines are driven by superheated steam from underground. That leaves hot water equal to 200MW of thermal energy. Landsvirkjun uses that water, and plans to double capacity in two years. "Hydro is the MVP for power, but geothermal covers 90 percent of space heating," said Kapatza. To access the stores of underground heat, ON drill 2.5 kilometer-deep boreholes to reach 250-350°C waters. "There

are boreholes that can be really, really productive. They can produce, just from one borehole, 15MW. That's a lot of power from one drilling." The power can be different between different holes, but once they start producing power "you don't have any fluctuations, it runs 24/7 stably for the grid," she said. This is useful, but made better when paired with hydro which can be more easily ramped up and down with demand.

The country also has a 2MW trial wind turbine deployment. Progress has been slow, and beset by environmental concerns, but it could unlock yet more power generation in the wind-swept nation. "I know there are three locations people are considering, and we're talking about 150MW each. We'll see, [conversations] literally began two months ago,"

The site has met the limits of its electricity-producing capacity, while all seven Icelandic geothermal power plants together produce 755MW. But the island could produce a lot more. "The current utilization of Icelandic geothermal resources is only a small portion of its estimated potential," Kapatza said. More than 7,659MW of hightemperature geothermal systems have been evaluated, although many would be in hard-to-reach or environmentallyprotected regions.

Go deeper The Icelandic Deep Drilling Project (IDDP) hopes to access some of the deeper sources of heat. "The idea is, 'why don't we drill the deepest we can go, and reach

Issue 51 • December 2023 | 19


DCD Magazine #51

supercritical fluids?“ One deep hole could produce as much as 10 regular ones, Kapatza said. "It sounds good, right? But the practicality of it, it was not as great," she said of the two attempts since the project began in 2000. "A lot of things tend to happen at those depths. The first one, they drilled down into magma. The second attempt went 4.6km, but they didn't have the right equipment, drill bits, casings they just melted." A third attempt is underway. "That's a very cool and exciting project that I hope ON is going to take its time on." Should it work, it would add another renewable energy source to Iceland's portfolio. Renewable, that is, but maybe not green. "We have connected 'renewable' with 'green,' which is usually right, but not always," Kapatza said.

>>CONTENTS

would reduce future emissions, but not remove historical ones). This effort could sequester three million tons of CO2 annually. The company is also working with Climeworks, a direct air capture company that hopes to pull CO2 from the atmosphere and give it to Carbfix to bury. "We have a facility here from them, Orca, and in the next two years they have a new project called Mammoth that's under construction that will capture 36,000 tons of CO2 at max," Kapatza said. "They also need so much hot water that they get from here - so much actually that we couldn't give it all to them." The growth of such an industry could pose a challenge for data centers, putting them in the uncomfortable position of competing for power against companies literally fighting climate change.

When drilling deep into the earth, what comes out isn't just water and heat. "There's 0.4 percent that we do not want, which is CO2 and hydrogen sulfide," with a 65:25 split, respectively. The emissions are far, far lower than fossil fuels, with a CO2 footprint of 7.6 grams per kilowatt hour production (gas is ~350g, and coal is ~950g). But it can still add up to a noticeable amount of emissions in a world where we can't afford any more.

"Once it's down there, we just have mineralization, and we store it forever as a rock," she said. "In nature, every day, CO2 combined with water, basalt, and time will mineralize. We figured out that it actually happens in two years." At the Hellisheidi site alone, "we have the possibility of storing more than 200 billion tons of CO2," she said. "All of the world's annual CO2 emissions is a bit more than 36 billion tons. Theoretically, across Iceland, we could fit more than 40 years of our emissions. It's not going to happen, of course." A future project would be to bring CO2 in from abroad, with the government seeking to take emissions from Northern Europe and store it in the country (this

The nation has around 30 active volcanic sites, and few can forget when the 2010 Eyjafjallajökull eruption closed air traffic in Europe. "Let's make no bones about it, Iceland is a geothermal active area, we're not going to shy away from that point," Cantrell said. "I will say that, in Iceland, site selection is super important." He added: "It's one of the reasons that we focused our efforts on the NATO airbase where we are, it's in a geologically different section of the island. Grindavik is more of an active region, and is a separate strata from where we are located. "The activity that we've seen, we saw in '21, we saw in '22, we'll see it again. But we haven't seen any interruption to services. It's really important that you have things like diverse power sources. You want to make sure that you have dual connections to the grid and fiber."

The coming wave Verne remains more focused on the fallout from a different type of explosive growth - that of AI.

Salvation in the rocks To mitigate this, ON has a solution. Through subsidiary Carbfix, which Kapatza has since joined, the company reinjects CO2 fluid into disused boreholes with a goal to be carbon neutral by 2030.

Just as we go to publication, the Fagradalsfjall volcano has erupted, with the glowing red smoke visible from Reykjavik some 42km to the north east. Flights continue, and locals are safe, but the situation is developing.

"I think Iceland will continue to be a destination for anyone who has secure clean energy as a primary objective," Verne's Cantrell said. "There will be other industries that come to Iceland as well - there has to be a place to create synthetic fuels for airlines, and we're working with Landsvirkjun on potentially doing some grants for hydrogen - I would love to transition our diesel generators to green hydrogen fuel cells."

“This is almost like wartime,” Cantrell said. “A lot of them have just put their sustainability goals to the side and said ‘we're just going to go get capacity wherever we can.’ That will only last for a small amount of time before sustainability voices are heard again. “This is where we will continue to play. We're in a location where you can be low cost, 100 percent sustainable, and you've still got the scale to meet the requirements of your growth.” This may set the stage for rapid AI growth in Iceland. Of course, that could leave Iceland’s data center sector in a perilous situation once again.

There is a dark side to Iceland's abundant geothermal power.

When asked if the AI money was at least safer than crypto, atNorth’s CEO Kristinsson replied: “You could argue that, I'm not sure. To be honest, a lot of those guys - will they be around in 12 months? When H100 [GPUs] are not the newest thing on the block, what will happen?”

Soon after we left, the ground was torn asunder, with a 15km-long magma dike running alongside the coastal fishing town of Grindavik.

To hedge against risk, “you should do it through diversifying your portfolio,” he said. “But when 70-80 percent of the demand is AI, it's difficult.” 

The great fire

20 | DCD Magazine • datacenterdynamics.com


Iceland 

At Kirby, we deliver value and build trust.

A Values Driven Business People

. Safety . Quality . Delivery . Value

www.kirbygroup.com Issue 51 • December 2023 | 21


DCD Magazine #51

>>CONTENTS

>> AWARDS 2023

This year's DCD Awards, hosted by Dara Ó Briain, were our biggest ever, with 235 entries. A record 60 independent judges determined who best exemplified the ideals of the data center industry. The shortlist spanned 19 countries. The winners were revealed in a night of suspense and drama in London on December 6.

22 | DCD Magazine • datacenterdynamics.com


Iceland 

>>CONTENTS

The data center community applauded those who were recognized as the best. The event also raised more than £40,000 ($51k) for our charity partner, Chain of Hope, which provides lifesaving heart surgery for children in countries scarred by war or poverty. Who won? Turn the page and find out!

Issue 51 • December 2023 | 23


>>CONTENTS

>> AWARDS 2023

Category Winners Following months of deliberations with an independent panel of expert judges, DCD Awards is proud to celebrate the industry’s best data center projects and most talented people.

Asia Pacific Data Center Project of the Year

Sponsor

Winner: NTT Global Data Centers The 'Green Tech & Liquid Cooling Project' allowed for liquid and immersion cooling facilities to be deployed in an existing multi-tenant data center, with PUE reducing from more than 1.5 to less than 1.2.

Edge Data Center Project of the Year

Winner: NextDC NextDC’s extreme Edge project in remote Western Australia meets the critical demand for pit-to-port ultralow-latency infrastructure and safer, tech-enabled operations that were historically only found in major cities.

North American Data Center Project of the Year

Winner: QScale Powered by 142MW of renewable energy, QScale’s highdensity colocation facility is designed specifically to support HPC and AI. The OCP-Ready facility includes a unique waste-heat recovery design and can host powerintensive liquid-cooled racks of 200kW+.

Latin America Data Center Project of the Year

Sponsor

Winner: Telecom Argentina With over $76m investment in infrastructure and the evolution of over 64 buildings to Edge facilities, the transformation project reduced total-cost-of-ownership and created robust low-latency connectivity. 24 DCD Magazine • datacenterdynamics.com


Awards Winners 2023

>>CONTENTS

European Data Center Project of the Year

Sponsor

Winner: Cyxtera & Oxford Quantum Circuits

Sponsor

Middle East & Africa Data Center Project of the Year

This project took a big step towards moving quantum computing from a “lab” environment into a fully managed, industry-ready environment with the security, interconnectivity, network bandwidth and redundant infrastructure.

Winner: Khazna Data Centers Completed in just 19 months with no less than three million manhours, the Abu Dhabi 6 data center boasts an impressive 31.8MW of IT power capacity. It has been awarded LEED Gold and Estidama Pearl 4 certifications for its construction , and highlights how sustainability and innovation combine to create a brighter digital future across the UAE.

Mission Critical Tech Innovation Award

Sponsor

Winner: Xtralis The Li-ion Tamer Gen 3 detects Li-ion battery failure before it progresses into a fire, thereby enabling automation systems to shut down an affected cell before further damage occurs.

Energy Impact Award

Sponsor

Winner: STACK Infrastructure The project provides heat to 5,000 households in the city, and Celsio calculates that it has reduced its own energy requirement by 17,600,000 kWh (17.6GWh) per year, saving nearly 14,000 tons of CO2 emissions each year.

Environmental Impact Award

Sponsor

Winner: EdgeCloudLink This groundbreaking clean-energy solution revolves around an off-grid, green hydrogen-based powered modular data center, ensuring uninterrupted 24/7 operations with zero emissions. Issue 51 • December 2023 | 25


>>CONTENTS

>> AWARDS 2023 Community Impact Award

Sponsor

Winner: Scala Data Centers Scala’s robust environmental and social management initiative is built around increasing trust and collaboration with local communities. The schemes have created 4,400+ jobs to support local labor and surrounding economies and Scala will have awarded 140+ student scholarships by the end of 2023.

Editor’s Choice Award

Sponsor

Data Center Construction Team of the Year

Winner: Khazna Data Centers Khazna’s 31.8MW facility was completed in Abu Dhabi in just 19 months. The project combined over 3m man-hours without any loss time incidents to deliver against the highest standards of sustainability and efficiency.

Winner: Deep Green Deep Green is establishing a network of metro-Edge data centers across Europe focusing on energy recovery and redistribution. The pilot project has saved the community swimming pool £22,000 a year in energy costs.

Sponsor

Young Mission Critical Engineer of the Year

Winner: Alexander Doey, RED With a Master's degree in Mechanical Engineering, Alexander’s technical knowledge and natural leadership skills have propelled him quickly through the ranks at RED. In just three years, Alex has cemented himself as a mechanical design team leader within the Hyperscale team, thanks to his expertise in refrigerant-based design.

26 DCD Magazine • datacenterdynamics.com

Data Center Operations Team of the Year

Winner: Salute Salute Mission Critical took charge of two Austin facilities within a mere two-week time span, impressing the judging panel and demonstrating their organizational prowess and dedication to excellence in the face of any challenge.


>>CONTENTS

Awards Winners 2023

Outstanding Contribution to the Data Center Industry

Winner: Bill Angle, formerly of CS Technology

Data Center Woman of the Year

Winner: Cyre Mercedes Denny, Schneider Electric

Every industry has its champions, thought leaders and pioneers who will be revered for their achievements for years to come. The previous winners of this Award are all distinguished by their extensive service to the data center industry, and have in some way changed the way in which the industry looks at key challenges or key opportunities. This year's winner, Bill Angle, has been working in the data center business since 1977 when he first developed a preventative maintenance program at Maryland National Bank for critical infrastructure. In the mid-eighties, he began his career as a consultant and later became a founding partner of CS Technology. In the last 40 years, Bill has been involved in the design of nearly 1,800 data center projects, many of them groundbreaking in terms of the evolution of the mission-critical engineering discipline, especially within the financial services arena. Bill holds a B.S. degree in Mechanical Engineering from the University of Maryland and an MBA in Finance from the University of Baltimore. He is now semi-retired but continues to work on special projects. never been more important. Now the industry appears ready to come together to help fight this crisis.

This award celebrates a visionary who has not only pushed the boundaries of technology but also paved the way for inclusivity and innovation. This year's winner, Cyre Mercedes Denny, is the Senior Global Program Director at Schneider Electric, and is passionate about diversity & inclusion. Cyre has served on the Board of Women at Schneider Electric, where she established the first cross-state diversity and inclusion network. She is also an active member of HITEC, the largest community of Hispanic technology professionals. In her free time, Cyre mentors students in her hometown, the Bronx.

Issue 51 • December 2023 | 27


DCD Magazine #51

>>CONTENTS

The social CEO Cologix’s leader Laura Ortman likes to be part of the action

F

or Laura Ortman, running a colocation company is all about the team. Cologix has data centers across North America, and it’s clear from her LinkedIn that she is a CEO who loves to get involved with her colleagues’ lives, from sports events to graduations. “I grew up as an athlete and I'm a big football fan,” she says. “So, from a leadership standpoint, everything is very much teamfocused for me. Part of our values and mantra here is ‘Together, we win.’” She’s been with Cologix for five years, taking the CEO’s seat in late 2022. Before that she was at Equinix, and spent a significant time at VMware from 2008, when that company was getting to grips with virtualizing the world and building the foundations of the cloud.

Cologix started in 2010 and has ridden the expansion of data centers driven by the cloud. It now operates in 11 markets, from Florida to Vancouver, with around 375 employees. They may not all be as extroverted as Ortman, but she reckons she knows them all. Cologix was founded by Grant van Rooyen and Todd Coleman from Level 3 Communications, a network company (now part of Lumen), and its business model has been network-centric, network-neutral facilities, with a carrier-dense ecosystem. The company supports 700 networks, and 360 cloud service providers 30 clouds. “There's a lot of Level 3 alumni that have started data center companies,” she observes. The company started in van Rooyen and Coleman’s home city Denver, and still has its headquarters there: “There’s very strong talent in Denver from

28 | DCD Magazine • datacenterdynamics.com

Peter Judge Executive Editor

a technology and data center perspective,” she says. The previous CEO, now chairman of the board, Bill Fathers, also worked at VMware. He ran successful service provider Savvis, and then led VMware’s ultimately unsuccessful effort to become a cloud provider with vCloud Air. Cologix supports enterprise, colocation, and hyperscale customers: “There’s a value in providing customers with a choice, right? We've had great success in being able to support both network nodes and compute nodes.”

The AI wave All data center operators see the opportunity in the explosion of AI, but Ortman says it is “an additive to our solutions, not a


Team player 

>>CONTENTS

replacement,” saying it will “drive demand in our current markets.” With data centers in multiple locations, she believes Cologix is well-placed for the inference part of the AI wave: “We currently have AI-ready data centers that serve as the backbone for inference use cases, by providing computational power, low latency, scalability, security, reliability, and compliance.” AI needs capacity, she says, but also a “collaborative ecosystem.” AI companies are having to grow rapidly to support their customers, as are the hyperscalers’ AI efforts. “We believe that our cloud on-ramps, our network connectivity, and our being able to provide space and power, is going to be key for them.” It’s often quite hard to get concrete views on what all this AI is being used for, but Ortman says that Cologix is looking at using it within the company. “We're looking at what can we do to drive efficiencies across Cologix,” she says. “We started a ground-up group that looks at how we can implement AI.” The applications she mentions include using AI to help in French translation: “We've got a large presence in Montreal, Quebec.” The company is also looking at AI from an operations standpoint, and the business’ engineering group says it is helping with software development. We suggest to her that AI has so far not provided really revolutionary changes in data center efficiencies: “We're looking across our data centers, but what's the level of impact? Do we have the right resources to do this from a skillset perspective?” Some data center applications for AI may be limited by security concerns over the customer data that would be used: “There is some hesitation specifically around security and compliance and making sure that the data is protected. There's a lot of forums happening with CISOs across the world to understand how can you ensure that there's the right level of security and compliance.” That effort is going to take in multiple vendors: “We have great relationships with our peers and competitors,” she says. Harnessing AI will require partnerships: “I think as an industry that we can come together.”

Planning further out The AI surge, alongside general growth in data centers, is a challenge: “We have to plan further out than we ever had to,” she says. “When you look at supply chains and power availability, in the last several years, we have had to plan even more than we had to during the pandemic.”

“When you look at supply chains and power availability, in the last several years, we have had to plan even more than we had to during the pandemic”

As well as expanding within the company’s existing markets, Ortman is looking for new markets where customer demand is expected to grow but wants to keep a balance between its digital Edge (small, local) and ScaleLogix (hyperscale) businesses. “It has to make sense as part of our strategy, we don't want to veer towards one or the other,” she says. “We want to be able to provide that level of choice for our customers.” For locations, she says the “NFL city”

markets (the 32 cities with a team in the National Football League) will “always be very important to our customers,” Most of those are Tier 1 cities, but she is also “looking at Tier Two markets.” Among those, “Columbus is a great example,” that has “been great for both digital Edge, hyperscale, and now for AI.” She certainly agrees with the emerging view that Edge needs low latency, but that can be provided relatively easily with moderate-sized data centers in Tier 2 or smaller cities: “Edge means something different to a lot of different folks. For me, it's around the need for cloud service providers to have more Metro deployments and be physically closer to their end customers.” To get closer to customers, she says, “I think we'll see that demand really pushing more to the Network Edge and aligning to cloud on-ramps.”

Leading for diversity How is it going, we ask, leading Cologix? She talks about hiring good people, developing them, and trusting them. “Trust is important,” she says. “It's around hiring talented people and empowering them to make the best decisions for Cologix for our customers. We hire very well to

Issue 51 • December 2023 | 29


DCD Magazine #51 the skill sets that we need, but we want to empower them to really make a difference.” She says: “I'm ultimately the one that that's held accountable, but we all do that across the company.” She’s working hard on diversity as part of Cologix’s ESG strategy, and previously was part of the diversity leaders’ groups at Equinix and VMware: “This is a journey we started with gender diversity - that's been a large focus for Cologix. And we have more than 50 percent female representation on our leadership team.” Further down the organization, there’s work going on. The company has a resource group for female employees called Cologix Women's Connection Network, and a leadership program aimed at helping high-performing, high-potential women to manage their career paths. At the entry-level, “we've set goals for ourselves to hire 15 percent female employees, and we're on track to beat that.” This means getting female technical and security staff, and then helping them to advance. Fifteen percent might seem a small step, but it’s important to set realistic goals, she says: “I'm not a big believer of setting a goal that is going to be very hard to achieve initially, but I think we can get there. We know, as an industry, we have to drive more gender diversity, especially across more of the technical types of roles in our data centers,” she says. “This is just the beginning of our diversity goals,” she continues, “and we're going to look at expanding that into other diverse groups, ensuring that all our employees feel included. A sense of belonging is important to us as a company.”

Data centers and the world Digital infrastructure now has such a large footprint that no organization can afford to ignore what some call the “social contract” - the way a power-hungry facility interacts with the community around it. In areas with a high concentration of facilities, like Northern Virginia, that relationship has become quite volatile, with protests and power crises. Avoiding protests, she says, is a matter of being “very transparent around what we're doing and how we're helping the environment.” As well as meeting ESG goals, operators need to “communicate what we're doing to help.” On the basics of energy use, she says Cologix is 65 percent carbon-free, and uses 50 percent renewable energy. “Page Haun, our chief marketing officer and ESG strategy officer, is on the board of the Data Center Coalition in Virginia,”

>>CONTENTS

"I'm not a big believer of setting a goal that is going to be very hard to achieve initially, but I think we can get there. We know, as an industry, we have to drive more gender diversity, especially across more of the technical types of roles in our data centers” which is a good place to understand what is happening around power sources, says Ortman. “As an industry, we're constantly trying to find creative solutions for the power needs, and we'll need to continue to work collaboratively with the utility providers. I think DCC is a great example of sharing what they're doing and bringing in the input of data center providers, and working together. Beyond the power use issue, she’s very sure that when hyperscalers and large enterprise customers invest in a market, “it opens up - that is great for the economy.”

Thinking local But that’s a bit abstract. How can you encourage people to like their local data center? “I think it's very market-specific,” she responds, varying for data centers in “Montreal, versus Minneapolis or Columbus or Santa Clara. I think you have to kind of tailor it a little bit by the market-specific needs.” In future, as ESG evolves, she sees on-site people at facilities helping them to learn better how to partner with the local markets. “We all speak mostly the same language - outside of Montreal - but it's important to understand the culture and the needs of each market. Whether we're doing a marketing event, a customer event, or looking at land banking, we have to be sensitive to the cultural needs and the market needs.” To make that happen, she trusts her local staff: “I'm very proud of our Operational Directors in each of these markets. We do market reviews of the opportunities, the

30 | DCD Magazine • datacenterdynamics.com

challenges, and the demands. What are we doing locally to help?” Cologix has social events to help with the community, and local employee resource groups: “There's a lot of focus on helping the community as well as communicating what we're doing, as a data center provider in those communities. I think those markets appreciate that.”

How have things changed? With her years in the industry, has she seen a big change in how the industry treats women since she started? “Things are definitely better than 20 or 25 years ago,” she says. “Back then, I wouldn't say that I had many mentors and sponsors that were women in leadership,” she says. She started out with a mentor who was “not a great mentor” but quickly learned how to have a good mentor-mentee relationship. Spending nearly ten years at VMware was good, as the company had a positive record of female leadership, hiring women for technical and sales roles: “The men that I worked with were huge supporters of women in technology roles.” Her VMware years “shaped the way that I look at the relationships that I build,” she says. And she means that about the relationships: “I love to meet with all the employees. When I go to each of the markets, I meet with customers and partners, but my focus is really getting to know the team.” She sums up: “Life is short, right? We all work very hard, and we all work together many hours, so I genuinely enjoy getting to know our employees. It energizes me when I spend time with our team and with our customers and partners in the field.” That’s something she learned early in her career, she says. “I used to work for managers that would be in their office and close the door all day and not spend any time with the employees. “When I was at VMware, I didn't want to sit in an office. I was always a fan of being out in the cubicle world. I wanted to be with the engineers, and I wanted to be right with them learning. I learned very quickly that I did not want to be that type of leader.” Cologix has a “culture club” which was important during lockdown, and she’s keeping the connection going: “We have employee engagement activities throughout the company and I join every single one if I can. I get bored if I miss an opportunity to be part of the Scavenger Hunt or Bingo Night because it's important for me to get to know our employees.” But there are limits: “Not karaoke. The other things, I’m happy to do.” 


Sponsored by

The AI & Networking Supplement

INSIDE

The challenges of operating at AI scale Smart networking chips

Getting hotter & denser

Climbing the memory wall

> Broadcom turns to AI to improve AI

> What changing AI densities mean for the data center

> How CXL hopes to help tackle the memory bottleneck


Sponsored by

Cooling by Design Supplement

INSIDE

Seeking efficiency and simplicity Nature’s cool enough > Cooling data centers with natural resources

Air is always there

> Liquid cooling is coming, but air cooling will always have a role

Cleaner cooling water

> A smart way to make data centers use less water and power


AI & Networking 

Sponsored by

Contents 34. Getting hotter, getting denser How the AI is changing densities 38. Advertorial: The AI-ppliance of science Using AI for network design, testing, and optimization 40. Making networking chips smarter Broadcom turns to AI to improve AI 42. G oing all optical Marvell's CTO wants you to see the light 43. C limbing the memory wall How Compute Express Link hopes to help tackle the memory bottleneck 44. T he CPU's role in generative AI As the hype cycle enters its second year, it’s time to focus on efficiency

34

40

42

Building AI systems

I

t would be an understatement to say that this has been the year of AI. Data center designs have been thrown out the window, and existing sites have gone through refurbishments, as the sector has swung to embrace this high-density future.

Hot stuff At the data center and rack level, more GPUs and high-end CPUs means more power to each server and therefore more cooling. We speak to Cyxtera, Telehouse Europe, Schneider Electric, and more about how data centers are adapting to high-density racks. From different cooling approaches to different business models, high-density racks are rewriting the data center game. But many are still not ready to fully embrace the concept, and are looking to hedge their bets with hybrid cooling solutions that only support the lower end of the denser racks.

Breaking down the bottlenecks Getting a facility to support AI servers is only half the battle. Handling the vast networking challenge of tens of thousands of nodes is a challenge too great for humans, or for set networking policies. Broadcom hopes that AI will be able to help, with the company rolling out a neural network onboard its switching silicon. It also is expanding cognitive routing, that should help reduce

congestion by more intelligently routing data. A touch of light Another way to improve networking speeds would be to move into the optical domain. Marvell’s CTO makes the pitch for going fully optical, removing the back and forth hops into the electrical world. This would also help the data center disaggregate, allowing storage, compute, and memory to scale at their own pace. Don’t forget the memory Memory continues to be its own bottleneck, with Rambus’ Steve Woo getting flashbacks to the ‘90s memory wall crisis. His company is among those pushing for companies to embrace the Compute Express Link (CXL) standard, which extends system RAM to external CXL-based memory. This could help unlock more memory, and again further disaggregate the data center. The CPU stands strong While the rest of the supplement is primarily focused on the challenges of a GPU-dominated data center, Ampere’s chief product officer Jeff Wittich has a plea for AI developers - try the CPU. His company is among those noting that CPUs still have a huge role in AI, particularly for inference workloads. They also have the added benefit of still being useful should the AI bubble burst.

The AI & Networking Supplement | 33


 The AI & Networking Supplement

>>CONTENTS

Getting hotter, getting denser How the AI is changing densities

Dan Swinhoe Senior Editor

W

ith the arrival of generative AI, coupled with the release of increasingly powerful GPUs, 2023 has seen discussions around high-density workloads in data centers ratchet significantly up.

In recent years, high density has generally meant

anything between the low double-digits and up to 30kW, some companies are now talking about densities in the triple-figure kilowatt range. Air-based cooling has its limitations, and these new high-density designs will have to utilize liquid cooling in some form. But are data center operators – and especially colocation providers – ready for that change?

34 DCD Supplement • datacenterdynamics.com


Densification 

>>CONTENTS

Data center designs get denser – at least for hyperscalers Schneider Electric predicts AI-based workloads could grow from 4.3GW to 13.5GW over the next five years or so; increasing from eight percent of current workloads to 15-20 percent. “It’s growing faster, but it’s not the majority of the workloads,” Andrew Bradner, general manager, cooling business, Schneider Electric, said during a DCD tour of the company’s cooling system factory in Conselve, Italy. “They [hypescalers] are in an arms race now, but many of our colocation customers are trying to understand how they deploy at this scale.” After years of testing, hyperscalers are now liquid cooling in various forms to accommodate increasingly dense chips designed for AI workloads. “We’re seeing far more activity with large-scale planned deployments focused in the US with your typical Internet giants,” Bradner tells DCD in a follow-up conversation. “Where we see scale starting first is purpose-built, selfbuilt by those Internet giants.” In 2021, Microsoft partnered with Wiwynn to test two-phase immersion cooling in production deployment at a facility in Quincy, Washington. In November 2023, the company revealed a new custom rack design to house its custom AI accelerator chip known as Maia. The rack design, known as Ares, features a ‘side-kick’ hosting cooling infrastructure on the side of the system that circulates liquid to cold plates and will require some data centers to deploy water-to-air CDUs. In November 2023, Amazon said its new instances with Nvidia’s GH200 Grace Hopper Superchips would be the first AI infrastructure on AWS to feature liquid cooling. Most notably, Meta paused a large proportion of its global data center buildout in late 2022 as part of a companywide ‘rescope’ as it looked to adapt its designs for AI. The new-design facilities will be partially liquid-cooled, with Meta deploying direct-to-chip cooling for the GPUs, whilst sticking to air cooling for its traditional servers. The company told DCD that having the hybrid setup allows Meta

to expand with the AI market, but not over-provision for something that is still unpredictable. It said it won’t use immersion cooling for the foreseeable future as it isn’t “scalable and operationalized” to the social media’s requirements at this point. On the colocation and wholesale side, the likes of Digital Realty, CyrusOne, and DataBank are all offering new highdensity designs that can utilize air and liquid cooling. Digital said its new high-density colocation service will be in 28 markets across North America, EMEA, and Asia Pacific. The service will offer densities of up to 70kW per rack using Air-Assisted Liquid Cooling (AALC) technologies” – later confirmed to DCD as “rear door heat exchangers on racks.” DataBank’s Universal Data Hall Design (UDHD) offers a flexible design able to host a mix of high- and lowdensity workloads. ​The base design incorporates slab flooring with perimeter cooling – with the option to include raised flooring – while the company has said technologies like rear door heat exchangers or direct-to-chip cooling can be provisioned with “minimal effort.” 2023 saw CyrusOne announce a new AI-specific built-to-suit data center design that can accommodate densities of up to 300kW per rack. The company has said its Intelliscale design will be able to use liquid-to-chip cooling technology, rear door heat exchanger cooling systems, and immersion cooling in a building footprint a quarter of the size of its usual builds.

let alone reach mass availability to the general market: “And until that happens, it’s going to take time for it to be as prevalent around the world,” he says. “That’s not to say that liquid cooling isn’t growing and people aren’t stacking racks now to 30-40kW. We’re seeing far more requests around the world for higher density. But what’s hard to tell is whether it’s planned density or its actual deployed density; I get a sense a lot of it is more planned density.” In the meantime, even without the latest and greatest AI hardware, the average densities of colocation facilities are still creeping up. In its Q3 2023 earnings call, Equinix noted it is seeing average densities creep up across its footprint. CEO Charles Meyers noted that over the first three quarters of 2023 the company had been turning cabinets over that period with an average density of 4kW per cabinet, but adding new billable cabs at an average of 5.7kW. “The reality is we’ve been paddling hard against that increase in density when it comes to cabinet growth,” he said. “We still see meaningful demand well below that 5.7[kW] and then you see some meaningfully above that. We might see deals that are 10, 15, 20, or more kilowatts per cabinet. We may even be looking at liquid cooling to support some of those very high-density requirements.

Densities up, but retail colo yet to see the liquid boom

“It’s an opportunity for us as we have this dynamic of space being freed up to the extent that we can match that up with power and cool it appropriately using liquid cooling or other means or traditional air-cooling means, then I think that’s an opportunity to unlock more value from the platform.”

DCD spoke to a number of retail colocation providers about what they are seeing in terms of density demands vs liquid cooling. While most said that densities are creeping up, it is slow. What liquid interest there is is still direct-tochip, and there is currently little demand for immersion cooling amongst their wider customer base.

Equinix is also deploying liquid cooling for some of its bare metal infrastructure. In February 2023, the colo giant said the company had been testing ZutaCore’s liquid cooling systems at its co-innovation center for a year, and in June 2022 installed some of the twophase tech in live servers for its metal infrastructure-as-a-service offering.

Schneider’s Bradner notes that supply chain constraints around the latest GPUs and AI chips mean the extreme highdensity demands are yet to trickle out of hyperscale companies’ own facilities and into their colocation deployments,

A rack full of operational two-phase cooling had, at that point, been stable for six months in its NY5 data center in Secaucus, New Jersey. December 2023 saw the colo giant announce plans to expand support for

The AI & Networking Supplement | 35


 The AI & Networking Supplement

>>CONTENTS

warranting the hardware and endorsing it to customers.”

Hybrid cooling to become the norm? While an entirely liquid-based estate might be easier to handle, it would be overkill in environments with a variety of hardware and workload requirements. This means hybrid deployments that mix air-based cooling with the different varieties of liquid cooling are likely to become the norm as densities creep up. Schneider’s Bradner tells DCD that for 350W chips, dry air cooling is still viable, but once we reach 500W per chip, companies need to start thinking about water – though without compressors via dry cooler or adiabatic systems. But at 700W chips, operators are going to need mechanical cooling along with water, and free cooling is less likely to be feasible.

liquid cooling technologies – including direct-to-chip – to more than 100 of its International Business Exchange (IBX) data centers over 45 metros globally. Bradner notes many large companies are still in the learning phase on what large-scale deployments will look like; “I’ve spoken to large Internet customers saying they’re looking at doing 40kW racks but aren’t really sure how they’re going do it and might just look at the square meter density and separate the racks more. “We know that a lot of our colo customers are doing spot deployments of high-density compute,” he says, “but there’s so much segregating of small deployments into the existing infrastructure, which can support that, because you’re not deploying at scale.”

has high thermal design points (TDPs), along with the influx of hyper-converged infrastructure (HCI) that is pushing up the amount of density you can achieve in a single cabinet.” Cyxtera says it has 19 sites that can support 70kW+ workloads with a combination of air cooling in cold aisle containment and liquid cooling. “AI and HPC-specific workloads are pushing limits, and we’re seeing requests of 50kW+ pretty regularly now, we get a few a week, and we have implemented direct-to-chip in the US on mass for some of these requirements,” says Bernard. “But even the traditional workloads we’re seeing; private cloud, SAP clusters, etc. It’s steadily inclining up.”

Cyxtera, a retail colocation firm recently acquired by Brookfield, is also seeing customer densities go up generally.

Right now, he says, liquid cooling has been the preserve of very large enterprises and service providers – the majority of which are doing direct-tochip.

“If we looked at the UK five plus years ago, densities were lucky to be hitting 4kW on average. Now we’re seeing 10+ regularly for customers,” says Charlie Bernard, director of growth strategy, EMEA, Cyxtera. “The environments are densifying and we’re getting much more performance out of less hardware that

“I think the reasoning behind that is likely the lack of OEM support. The majority of customers aren’t going to move forward with a solution they can’t get fully supported and warranted by the service provider and OEM they’re working with. We’ll see a dramatic uptake once the OEMs are supporting and

36 DCD Supplement • datacenterdynamics.com

“Below 40kW a rack, you have a few more options,” he says. “Some are more efficient than others, but you can use air-based cooling as a way to cool those loads. As you get into 50kW a rack, airbased cooling becomes less viable in terms of efficiency sustainability. And then over 60kW per rack, we don’t have a choice, you really need to go to liquid applications. “It may not be the most sustainable way, but we’re seeing a lot of people designing for air-based cooling at up to 40kW, and not making an architectural change to go to liquid.” And, while direct-to-chip might be the most prevalent liquid option today, it still has its limitations, meaning air is still required to take away the leftover heat. “When you talk about a 50kW load, and you’re only doing 70 percent of that [via direct-to-chip], you have 15kW left. Well, that’s almost like what today’s cooling demand is in most data centers,” says Bradner. Spencer Lamb, CCO of UK colo firm Kao Data, said the company’s average rack densities across current installations is circa 7-10kW per rack, but density demand is “increasing significantly.” “We’re now deploying 40kW aircooled AI racks in our data halls. We achieve this by running CFD analysis, ascertaining how and where our customers require the higher density


Densification 

>>CONTENTS

racks within their deployments, thereby ensuring the cooling system isn’t compromised.” Lamb said customers are seeking to deploy both direct-to-chip and immersion cooling, but the company’s design blueprint allows it to deploy either without significant engineering, to complement the existing air-cooled system. “From a data center engineering viewpoint, keeping these technologies separate is the ideal outcome, but is not at all appropriate for the customers seeking to deploy this technology,” he says. “An end-user acquiring a directto-chip liquid cooling platform will more than likely be seeking an adjacent storage and network system which will be air-cooled, so some level of air cooling will be required and this is a trend we see for the foreseeable future. Direct-to-chip cooling will still generate residual heat and require conditioned air. Moving forwards, data halls must be able to accommodate a hybrid approach to successfully house these systems.”

Are colos ready to offer liquid at scale? While purpose-built liquid-cooled data centers do exist – Colovore in California is among the most well-known examples – most colo providers are facing the challenge of densifying existing facilities and retrofitting liquid cooling into them. For now, many colo providers will leverage direct-to-chip because the number and scale of high-density deployments are low enough that they can leverage existing facility designs without too much change. But change is set to come. “There’s a challenge with retrofits before we even got into that realm of high density in legacy sites where it’s complicated just to get to 10-15kW a rack,” says Schneider’s Bradner. “With the question of what happens with even higher densities, they could accept the higher densities but just less of it. But a lot of site power provisioning didn’t contemplate that instead of 1020MW for a whole building, they might need that just on one floor.” Cyxtera’s Bernard notes that many operators are still exploring how

"We’re now deploying 40kW air-cooled AI racks in our data halls. We achieve this by running CFD analysis, ascertaining how and where our customers require the higher density racks within their deployments" to standardize, operationalize, and ultimately, productize liquid cooling, all of which throw up multiple questions: “When you look at operationalizing it, we have to ask which of our data centers can support this; which can we tap into water loops, which have the excess capacity we can use utilize for it? How are we going to handle the fluids in terms of fluid handling? Each data center is different; there’s going to be a differential cost around tapping into pipework.” “One of the challenges right now is the lack of standards with the tanks; they all have slightly different connectors which you have to plug into from a from a liquid standpoint, some have CDUs, some have a heat exchanger,” he adds. “And once the tanks are in place, how do we take those multiple metrics and feeds that we get from the tanks in terms of temperature, for example, and put those data feeds into our BMS so we can monitor them? There’s no one universal set standard between all the manufacturers; they all have their own unique APIs so far.” Telehouse, a global retail colo provider which operates a five-building campus in London, tells DCD it is yet to see major interest in liquid cooling, but is seeing that gradual increase in densities across its facilities. “Most stuff is now in that 8-12kW category as standard,” says Mark Pestridge, senior director of customer experience at Telehouse Europe. “We’re also beginning to see some at 20kW. We don’t need to use immersion or liquid cooling to do that. Some people still only want 2kW in a rack, but we’re definitely seeing a move towards standard high density.” “We have had a couple of inquiries [around liquid],” says Pestridge, “but we think it’s three to five years away.” The company is still exploring the market and how different technologies

could impact future designs of new buildings or potential retrofits of existing facilities. Paul Lewis, SVP of technical service at Telehouse Europe, notes that for interconnection-focused operators, questions still remain around how you mix traditional raised floor colo space with liquid that will need a slab design, and then combine with that with the meet me rooms. Cyxtera’s Bernard notes that there may be more regulatory and compliance considerations to factor in, the fact different cooling fluids with have different flashpoints, and that different customers may utilize different cooling fluids in different tanks. SLAs may have to change in future too, potentially forcing operators to ensure guaranteed flows of water or other fluids. “Currently, we do temperature, we do humidity, we do power availability, but ultimately now we’re going to have to do availability of water as well. “That’s not currently an outlined SLA across the board, but I’ve no doubt we’re going to have measurements on us that we have to provide water to the tank with a certain consistency. So that’s that’s another question.” Bradner also notes redundancy thinking may have to change, especially if SLAs are now tied to water. “Depending on the designs, liquid systems are sometimes UPS-backed up. But now there’s far more criticality because of what would happen to thermal events if you lost the pumps on a CDU because of loss of power.” “At lower densities, you had some sort of inertia if you had a chilled water system to keep things going until your gen-sets came on. Now, with those higher densities, that’s a big problem, and how do you manage that and design for that?” 

The AI & Networking Supplement | 37


 The AI & Networking Supplement

>>CONTENTS

The AI-ppliance of science Sameh Yamany Viavi

Network design, testing and optimization is a highly technical and intellectually intensive task. But the latest AI, ML, and deep learning tools can help, says Viavi’s Sameh Yamany

A

rtificial intelligence is arguably one of the most abused phrases in IT today, often erroneously used, with the bar invariably set astonishingly low when a vendor has something to sell. Therefore, sifting through the hype to get to the truth is often not easy, yet research and development into true AI, not to mention machine learning, deep learning and generative AI, is proceeding apace, with a slew of exciting new applications of all types expected to appear in the next five years. A handful will almost certainly be as revolutionary as Google, the iPhone or fiber-optic communications. Most, though, will target particular sectors and applications, enabling new advances to be made across the board. Indeed, following the advances in AI demonstrated by the release of ChatGPTv4, data centers are building out in anticipation of a boom in demand for IT optimized to deliver AI, and one of the key areas demanding early investment will be network capacity and service – where AI is already being deployed by companies like Viavi with new tools for network design, testing and validation. “AI is taking off now for two

reasons,” says Sameh Yamany, chief technology officer at Viavi Solutions. “First, the computing infrastructure has become powerful enough to be able to meaningfully do it. Second, we’ve generated so much data over the past 20 years that we can use deep learning to create a foundation of understanding for AI. As a result, there are more and more good applications of AI emerging every day.” ChatGPT might have been the first to grab the public and corporate imagination, but it certainly won’t be the last. Away from the public eye, companies like Viavi have already amassed years of institutional experience and knowledge honing models and algorithms to either help solve complex problems or to assist knowledgeable engineers do more, better. In Viavi’s case, that means designing, building and testing more performant, responsive, and more reliable networks. Data center network engineers, in particular, have only seen their roles become more complex, not just as data centers have gone hyperscale, but also as they’ve shifted compute nodes to the Edge. “The data center isn’t just in the middle of everything, today, it’s also moving closer to the Edge [of the network]. It’s become humanly impossible to test, anticipate and remediate everything that’s going on, and

38 DCD Supplement • datacenterdynamics.com

we’ve got a lot of jobs out there – highly technical ones – that don’t have enough well-trained humans to do them,” says Yamany.

Helping hands Viavi’s solutions help network engineers do a whole host of activities better and faster. First, it offers simulation tools that feature a number of scenarios built-in. These scenarios aren’t limited to internal networks but can even simulate highly complex scenarios, such as autonomous vehicles communicating between themselves, with drones, or with the Edge data center they’re connected to. “Or let’s say you bring new networking technology into an existing data center. First, you have to design how and where it will fit in. So Viavi offers simulation tools that can take your existing environment or design, and enable you to simulate different types and levels of network traffic,” says Yamany. At the same time, the tools can test the network designs for scalability to ensure that they can handle a reasonably foreseeable increase in workloads, additional devices and more data traffic – after all, these are only going to continue increasing. Or, alternatively, they can help tweak the network for the purpose of traffic shaping or to maximize energy saving.


VIAVI | Advertorial 

>>CONTENTS

Image courtesy of VIAVI Solutions, all rights reserved

This is an increasingly important demand, especially with cloud-based AI applications using power-hungry GPUs to generate responses. “Energy efficiency is a big part of our AI technology, especially with data center operators forming an integral part of the network architecture of many different verticals, including the fast-changing automotive industry and the so-called Industry 4.0,” says Yamany. “In the past, like Victorian engineers building a railway bridge, networks were wildly over-provisioned, just in case. That was fine when you could manage the power, but with densification energy has become critically important. “But there is a lot of data being generated by the network today – data that’s largely ignored in day-to-day operation – that can help us make better decisions on energy efficiency, from resources in terms of fiber on the Ethernet side to resources being used on the compute side. “Because of the level of data Viavi is collecting to assist with monitoring and management, we’re building applications and solutions using AI and ML to focuse purely on energy efficiency.”

The digital twin can be generated by the AI based on the current installation, and enables network engineers to explore sophisticated ‘what if?’ scenarios to find out how the network might respond when new applications or workloads are introduced. “Let’s say you want to build a new app that will run on the network, say, remote surgery. This is going to require a certain configuration because it’s certainly going to need minimal latency. It’s therefore going to require some dedicated resources from the network. “But that could impact the real network and adversely affect existing traffic. You could trial that on the network itself using synthetic traffic to see how the network might be affected, but you’ll be making some big assumptions – how does it all change at 6pm on a Saturday evening when gaming traffic might be peaking?” asks Yamany. Instead, it’s possible to run simulations based on real network traffic throughout the week, month or year and test how various scenarios might affect the network and the performance you’re able to deliver – all without affecting the real network that people are relying on right now. “We’re collecting a lot of different data and parameters from the real network, in real time. I can create a digital twin of a network that behaves exactly like a real network and apply any case that I want to it. I can develop an app, test that app using the digital twin and save a lot of money in the process,” says Yamany.

In addition, design benchmarking and validation solutions can ensure that when the network is built and deployed it operates precisely as intended. Indeed, Viavi’s tools can also provide ongoing AIbased monitoring, post-implementation, and not just identify problems as they arise, but also help pinpoint their root causes, as well as helping engineers to optimize network performance.

The straight and narrow

Underlying these capabilities, of course, is the ability to build ‘digital twins’ of both existing, real-world networks, as well as proposed designs, against which the AI can conduct its simulations.

Of course, the use of machine learning and AI in real world applications, with the AI constantly ‘trained’ in a bid to improve it can be problematic. But Viavi engineers, who have already spent years working on,

and refining, the algorithms the company deploys today, are well-versed in the potential shortcomings of AI. For example, an AI trained to identify a cat, as distinct from a dog or any other four-legged creature with fur, can become corrupted over time as ‘drift’ in decision making increasingly blurs definitions and introduces biases. Soon, the AI is identifying dogs, pigs and ant-eaters as cats as, to the AI, they all begin to look pretty much the same as more and broader data is ingested. Viavi’s AI, therefore, is designed to maintain a ‘gold guard rails standard’ for network behavior to prevent such drift with what it calls AI optimization. This effectively uses one AI model to check and moderate the outputs of the operational AI in order to keep it on the straight and narrow, so to speak. After all, with so much riding on the coming wave of AI applications, while governments hold summits on the issue, the last thing anyone needs is AI going rogue – especially when it is AI deployed to help networks better manage the AI applications soon to be deployed in data centers across the world.

>>To find out more about VIAVI network testing and other networking solutions and how they can help your organization save both time and money, visit www. viavisolutions.com/hyperscale

The AI & Networking Supplement| 39


 The AI & Networking Supplement

Making networking chips smarter Broadcom turns to AI to improve AI

I

f you’re lucky enough to be able to get your hands on thousands of GPUs and a data center with enough power and cooling to support them, your headache has only just begun.

Artificial intelligence and machine learning workloads require these GPUs to be connected by a dense and adaptable network, that also connects CPUs, memory, and storage. The slightest bottleneck can ripple through a system, causing issues and slow performance across the entire training run. But with countless interconnected nodes, it’s easy for traffic to pile up. Chip giant Broadcom hopes that part of the solution to the problem can lie in AI and software itself. Its new Trident 5-X12 switching silicon will be the first to use the company’s on-chip, neural-network inference engine NetGNT, aimed at improving networks in real-time. NetGNT (Networking General-purpose Neuralnetwork Traffic-analyzer) is “general purpose,” the company’s principal PLM in Broadcom’s Core Switching Group, Robin Grinley, told DCD. “It doesn’t have one specific function; it’s meant for many different use cases.”

40 DCD Supplement Magazine • datacenterdynamics.com • datacenterdynamics.com

>>CONTENTS

Sebastian Moss Editor-in-Chief


Smarter chips 

>>CONTENTS

The small neural network sits in parallel to the regular packet processing pipeline, where the customer puts a number of static rules into the chip (drop policies, forwarding policies, IP ranges, etc.). “In comparison, NetGNT has memory, which means it’s looking for patterns,” Grinley said. “These are patterns across space, different ports in the chip, or across time. So as flows come through the chip, and various events happen, it can look for higher-level patterns that are not really catchable by some static set of rules that you’ve programmed into these low-level tables.” A customer could train the neural network on previous DDoS attacks to help it identify a similar event in the future. “Now, the action could be local on the chip, it could be just okay. When you see this, one of these DDoS flows starting up, disrupt the flow, and drop the packet. In parallel, it can also do things like create a notification when you first identify this and send it up to the Network Operation Center.” AI and ML runs can sometimes suffer from an incast event, where the number of storage servers sending data to a client goes past the ability of an Ethernet switch to buffer packets, which can cause significant packet loss.

“It can detect this - if there’s an accumulation of the buffer due to an incast, it could read that signature and say, ‘Okay, I can take very fast action to start back pressure, maybe some of these flows, or do something else,’” Grinley said. “In an AI/ML workload, it goes in phases, and you only have a few milliseconds between phases. You don’t have time to involve software in the loop and try to make some decisions as to what to do.” With NetGNT running in parallel, “there’s no software in the loop where the more complex the packet processing, the longer it’ll take. Whether NetGNT is on or off, the latency for any packet through our chip is the same.” Given the unique requirements of different networks, it’s important to note that NetGNT does not work out of the box. “The only thing that we provide here is the hardware model: How many neurons? How are they connected? What are the big weights, et cetera.” The rest, the customer has to train with the model. “Somebody has to go look at huge amounts of packet trace data - here’s my network when it’s operating fine; this is the thing I want to track, incast, denial of service, some other security event, whatever it is,” Grinley said. “Some network architect has to go

through all of this massive packet trace data and tag that stuff. And then you shovel all of that training data into a supervised training algorithm and spit out the weights that go into our neural network engine.” This means that the accuracy of the system is somewhat dependent on the quality of the data, the length of the training run, and the skill of the person tagging and training their system. “They’re probably going to have to hire some AI/ML experts who know how to run it, and then they’ll go run it in the cloud or wherever,” Grinley said. It would also be up to the customer how often they re-train the system. “You can reload it while the chip is running,” he added. “So they can reload it daily if they want, but the training time typically is probably on the order of a few days to a week.” Separate to NetGNT, Broadcom aims to help reduce bottlenecks with ‘cognitive routing,’ which was first rolled out with Tomahawk-5. “If you’re using Trident-5 as a tor and Tomahawk-5 as the fabric, these features operate together,” Grinley said. In older products, dynamic load balancing was confined to just the chip. Congestion would be spotted, and flow autonomously moved to a less loaded link. “Now, that works fine on your chip,” Grinley said. “But if you go three or four hops down in the network, that new path that you’re choosing may be congested somewhere else.” Now, the platform attempts to handle global load balancing, he said. “These chips, when they sense congestion, can send notifications back downstream as well upstream. And let all of the chips operate with a global view of the congestion and they can deal with it.” This is running on embedded Arm cores on the chip, because “it’s not something that you can wait for the host CPU.” As the system develops, and the compute on the chip improves, Grinley sees their various efforts converging. “NetGNT could go into cognitive routing version two, or some new load balancing scheme, or some neat telemetry scheme. “Once you have this little inference engine sitting there, then you can hook it in for performance, for security, for network optimization. I think we’re gonna figure out a lot more use cases as time goes on.” 

The AI & Networking Supplement | 41


 The AI & Networking Supplement

>>CONTENTS

Going all optical Sebastian Moss Editor-in-Chief

Marvell’s CTO wants data centers to embrace the light

O

ne of the fascinating tidbits tucked away in the research paper announcing its Gemini large language model was that it was trained not just over multiple compute clusters, but over multiple data centers. “TPUv4 accelerators are deployed in ‘SuperPods’ of 4,096 chips, each connected to a dedicated optical switch, which can dynamically reconfigure 4x4x4 chip cubes into arbitrary 3D torus topologies in around 10 seconds,” the paper states. That optical switch is Google’s Mission Apollo, exclusively first profiled in May by DCD. Noam Mizrahi, CTO at chip company Marvell, sees Apollo as the first part of a much larger story: The move to a fully optical data center.

The second challenge is how to create a network that efficiently “brings together tens of thousands of nodes that would appear as a single one, across an entire data center or data centers,” Mizrahi continued. “And I think the answer to all these things is just having significantly more optical types of connectivity to create the networks.” In traditional network topologies, signals jump back and forth between electrical and optical. Moves like Google’s reduce the amount of those hops, but are still at a facility level. “The problem starts even lower, within a few racks. If you connect them together, it already could take you into the optical domain,” he said.

“As models get ever larger, the pressure moves to the interconnect of all of that because if you think about that, each GPU or TPU today has terabits per second of bandwidth to talk to its peer GPUs/TPUs in a cluster, and your network is built for hundreds of gigabits,” Mizrahi said.

He hopes that systems will embrace optical as soon as possible: “Don’t go back and forth between the digital and then the optical domain, just translate to optics and then run everything over optics and then only on the other side move back,” he said.

“Those are the connectivity points that you have, which means that - as long as you stay within your organic box like a DGX - you can communicate with those rates of terabits. But once you need to create clusters of 1,000s or so, you need to go through a much narrower port that becomes the bottleneck of the whole thing.”

“So a GPU could have an optical port - that can be either an optical chiplet within the GPU, or pluggable - and it is connected into a network with an optical port. And then you have memory clusters, also with optics, and you have storage clusters, also with optics, and the network is all optics,” Mizrahi said.

42 DCD Supplement Magazine • datacenterdynamics.com • datacenterdynamics.com

This would allow “memory to scale at its own pace, because now it’s also a bottleneck to compute the limit to how much you can connect (see next page). The storage will have to scale by itself, and the network, and then compute everything independently.” It’s a promising vision, that has many proponents. But it’s also one that has existed for some time, and has yet to lead to an all-optical revolution. The technology is still being developed, and what’s out there is expensive, even by data center standards. “It’s a gradual thing, it will not happen in one day,” Mizrahi admitted. “No data center will actually be completely redesigned right now in order to do that. They’ll put some platform in, and then replace one portion of it. It will take time to evolve.” This will also mean that it will take some time for the true benefit to be felt - as long as there are intermediary hops from optics to electrical it will still have inefficiencies. “But at some point, we’ll have to do something else than the current approach, because you’ll hit a wall,” Mizrahi said. “And with generative AI you hit the walls very fast. It’s very different to anything that we’ve seen so far.” 


Don’t forget

>>CONTENTS

Climbing Climbing the the memory memory wall wall

flexible in your reconfigurability driving this next inflection. To me, it’s just an expression of computing has changed.” The technology’s progression followed older rules on what a system needs, “and now we’re seeing them run into challenges with the newer way people want to put data centers together,” Woo said.

How Compute Express Link hopes to help tackle the memory bottleneck Sebastian Moss Editor-in-Chief

A

s AI models get larger and more demanding, memory continues to be a significant limiting factor.

For Steve Woo, who has spent the last 27 years at Rambus trying to advance memory systems and system architectures, the moment reminds him of the mid-90s. “It was clear that processors could do all these things to get faster, but memory was becoming a big bottleneck,” Woo, fellow and distinguished inventor at Rambus Labs, said. “There was all this talk about the memory wall, but we seemed to get through that. Some really interesting innovations eased things so that the processors could do so much more.” Processors have historically evolved faster than memory, “and so when the ‘90s bottlenecks were removed, then processors kept advancing, and then suddenly new bottlenecks come up.” Those bottlenecks have changed to reflect how the market has changed, particularly with the rise of cloud computing. “You’re seeing these relatively new constraints: Power efficiency, TCO, being able to share resources, being

This moment was a long time coming, but the recent explosion of AI has thrown fuel on the fire. “The model sizes are getting bigger and bigger, and there’s a direct correlation between things like training performance and memory bandwidth. Being able to support a really wide range of use cases means you have different footprints that you need to support, so having a dynamic, flexible, reconfigurable kind of infrastructure, that’s a great goal to move towards.” One step towards this goal is the Compute Express Link (CXL) open standard, initially founded in 2019 by Alibaba Group, Cisco Systems, Dell EMC, Meta, Google, Hewlett Packard Enterprise, Huawei, Intel, and Microsoft. Others, including AMD, Nvidia, and Broadcom have joined since - as has Rambus, which is now president of the CXL Consortium. At its core, CXL is a technology aimed at providing high-speed, highcapacity CPU-to-device and CPU-tomemory connections built on top of the established PCIe infrastructure. It acts as a cache-coherent interconnect standard that extends system RAM to external CXLbased memory. At this year’s Supercomputing 2023, Rambus announced a CXL 3.0 memory controller “that allows you to disaggregate a memory interface away from a directly attached memory interface in a CPU,” Rambus’ Danny Moore said. “Previously, that CPU had DIMMs (Dual In-Line Memory Module) connected directly to the CPU. Now, with CXL, we actually take that memory controller and we can move it off the CPU onto [the new CXL one].” This allows for memory sharing and pooling,” Moore said. “You don’t need a copy of a piece of data over here and over there - you can have one copy of that piece of data, and both can access it.” Disaggregation also leads to total cost of ownership improvements. “No longer do you need to dedicate die space on a costly large CPU for that memory interface, which is very I/O intensive,” Moore said. “Plus, if you have a system and want

two terabytes of memory, [currently] in order to directly attach that to the CPU, you might have to buy the most expensive DRAMs that are stacked the highest,” he added. “Well, now that you’ve got some flexibility with your memory interfaces, you can change an eight-channel CPU into a 12-channel CPU. You no longer need to put 256GB onto each channel, but put some smaller amount. That allows you to buy more, but less costly, DRAM and reduce total cost per gigabyte.” However, adding the controller does mean an additional step that the data has to flow through, Moore admitted. “A lot of work goes into trying to minimize the impact of that step, one of the primary points of interest in industry, when we talk about CXL, is that latency.” But the wider benefits outweigh the impact of the step, Rambus and other CXL propents believe. Now, it just has to convince the broader market. CXL has evolved significantly from its 1.0 days, and is now at v3.1 specification. “I think that is starting to be very close to feature complete,” Moore said. “We’ve now standardized the fabric concept, which kind of completes the full disaggregation of the memory interface.” He added: “We’ve got a stack that’s ready well ahead of the CPU launches. It will now be up to the hyperscalers to look at it and decide when we want to move to a fully disaggregated data center.” Achieving this will make memory less of a challenge, but will not solve everything. “I go to companies and say, ‘what kind of bandwidth capacity do you need?,” Woo said. “On the AI side, they’ll say, ‘Well, you can’t possibly give me what I need, just tell me what you can do, and I’ll use it all.’ There’s this insatiable demand for just more.” The memory system will continue to be a limiter, but if CXL can loosen some of those limits, it will have a huge impact. “When I’ve talked to other researchers at different companies doing AI, a lot of times they’re experimenting with these algorithms,” Woo said. “And the last thing you want to do when experimenting is work within some constraints. These large language models are really just the culmination of just thousands and thousands of manhours of experimentation. In order to enable the next big leap, you also have to provide the resources for that next wave of experimentation to happen.” 

The AI & Networking Supplement | 43


 The AI & Networking Supplement

The CPU's role in generative AI

As the hype cycle enters its second year, it’s time to focus on efficiency

44 DCD Supplement • datacenterdynamics.com

>>CONTENTS

Sebastian Moss Editor-in-Chief


Ampere's answer to generative AI 

>>CONTENTS

A

s large language models and other generative AI systems remain the workload du jour, data centers have adapted to support deployments of tens of thousands of GPUs to train and inference them.

Nvidia remains the leader in the training race, with its high-end GPUs dominating the market. But, as the generative AI market matures, the size of the models and how they are inferenced could be set to change. “We’re in that part of the hype cycle where being able to say ‘the model has hundreds of billions of parameters that took months to train and required a city’s worth of power to do it’ is seen as actually a good thing right now,” Ampere’s chief product officer Jeff Wittich told DCD. “But we’re missing the point of it, which is the efficiency angle. If that’s what it took to do it, was that the right way to go about modeling?” Wittich is part of a number of industry figures that believes the future will not consist purely of these megamodels, but also countless smaller systems that are highly specialized. “If you have an AI that’s helping people to write code, does it need to know the recipe for souffle?” That version of tomorrow would prove lucrative for Ampere, which develops high-performance Arm-based CPUs. “Even today, you could run a lot of LLM models on something that’s more efficient,” he said. “You could run them on CPUs, but people just aren’t because they went and built gigantic training clusters with GPUs, and then use them to train and inference the models.” Part of the problem is the speed that the market is currently moving at, with generative AI still a nascent sector with everything to fight for. Nvidia GPUs - if you can get them - perform fantastically and have a deep software library to support rapid development. “It’s just ‘throw the highest power stuff at it that we can, to be the fastest and the biggest,” Wittich said. “But that’ll be the thing that’ll come back to haunt us. It’s so power hungry, it’s so

costly to do this, that when that starts to matter this could be the thing that dooms this, at least in the short term.”

20-30 percent of the time, I can create product variations that allow me to incorporate that when it’s needed.

GPUs will still be at the heart of training, especially with the larger models, Wittich said, but questions whether they were truly the most optimal chip for inference. “People are going and building the same stuff for the inferencing phase when they don’t need to because there is a more efficient solution for them to use,” he said.

“You don’t want a bunch of esoteric accelerators in the CPU that are always drawing power and always consuming area.”

“We’ve been working with partners Wallaroo.AI on CPU-based inferencing, optimizing the models for it, and then scaling out - and they can get a couple of times more inferencing results throughput at the same latency without consuming any more power.” Taking OpenAI’s generative speech recognition model Whisper as an example, Ampere claims that its 128core Altra CPU consumes 3.6 times less power per inference than Nvidia’s A10 (of course, the more expensive and power-hungry A100 has better stats than the A10). High memory footprint inferencing will likely remain better on GPUs, but Wittich believes that the majority of models will be more suited to CPUs. The company’s AI team has developed the AI-O software library to help companies shift code from GPUs to CPUs. CPU developers are also slowly borrowing from GPUs. Ampere - as well as Intel, AMD, and others - have integrated ever more AI compute functions into their hardware. “When you look at the design of Ampere One, we did specific things at the micro-architectural level that improve inference performance,” Wittich said, pointing to the company’s 2021 acquisition of AI company OnSpecta. “AI is one of these things where stuff that was very specialized years ago eventually becomes general purpose.”

Of course, GPUs and CPUs are not the only game in town, with a number of chip providers developing dedicated inferencing chips that boast competitive inferencing and power consumption statistics. Here, Wittich counters with the other issue of industry bubbles: That they often pop. “A lot of the AI inferencing chips that are out there are really good at one type of network or one type of model,” he said. “The more specialized you get, usually the better you get at it. “But the problem is, you better have guessed correctly and be pretty confident that the thing that you’re really, really good at is the thing that is important a couple of years from now.” If generative AI takes a dramatic turn away from the current model architectures, or the entire industry collapses (or, perhaps, coalesces around just OpenAI), then you could be left holding the bag. When Bitcoin crashed in value, miners were left with thousands of highly specialized ASICs that were useless at any other task. Many of the chips were simply destroyed and sent to landfills. Ethereum miners, on the other hand, mostly relied on GPUs. Several providers, like CoreWeave, have successfully pivoted their business to the current AI wave.

There’s always trade-offs in design, however: “If a block is included, it is stealing area, power, and validation resources.”

CPUs are inherently general purpose, meaning that a company doesn’t have to bet the farm on a specific business model. “We know that overall compute demand is going to grow over the next couple of years, whether it’s inferencing, database demand, media workloads, or something else,” Wittich said.

He added: “If something is used 8090 percent of time, that’s what I want on every single one of our CPUs. If it’s

“You’re safe regardless of what happens after you get out of the boom phase.” 

The AI & Networking Supplement | 45


Sponsored by

Cooling by Design Supplement

INSIDE

Seeking efficiency and simplicity Nature’s cool enough > Cooling data centers with natural resources

Air is always there

> Liquid cooling is coming, but air cooling will always have a role

Cleaner cooling water

> A smart way to make data centers use less water and power


From farm to pivot table 

>>CONTENTS

The future of farming Georgia Butler Reporter

Can data solve the food crunch?

C

ows may soon be smarter than you. That is, they will be more like a smart device. Vcow monitoring company SmaXtec has developed a way to put a small processor inside dairy cows.

This, it claims, can identify health issues before the cow’s own anatomy has acknowledged it using a combination of on-board sensors and artificial intelligence. “This technology goes back beyond ten years of experience,” explained Stefan Scherer, CEO and co-founder of SmaXtec, whose background is in physics and computer science, not dairy farming. “Originally, it was just a research project.” As they researched, Scherer (along with co-founder Stefan Rosenkranz) realized that the data being collected could enable a more sustainable approach to dairy farming, and solve some of the impending problems facing the industry.

A ravenous planet Much like the data center industry, the world of farming is not at the forefront of many minds. However, while the two sectors could not be more opposite – the great outdoors versus the virtual world – the rise of AgriTech has brought them together. Demand is a little easier to predict in the agricultural sector than the data center industry. The need for food production is directly related to population growth - more people equals more hungry mouths to feed and, with a fundamentally limited set of resources, using what we have as sustainably as possible is essential. Population growth has accelerated in the last century. In 1800 there were an estimated 0.9 billion people on the planet, but by 1900 that figure had reached 1.65 billion. Today, it’s estimated that number is closer to 8.1 billion. Growth is now expected to slow, but the United Nations predicts that Earth’s population will reach 11.2 billion people by 2100. The Food and Agriculture Organization currently estimates that by 2050 we will need to produce around 60 percent more food to feed the population, but using our current techniques and methods - dramatically over-spraying crops, over-medicating livestock, and pillaging all nutrients

Issue 51 • December 2023 | 47


DCD Magazine #51

from the soil - the toll on the planet would be too severe. As a result, the next big farming revolution needs to happen, leaving behind the industrial revelations of tractors and mechanized farming and entering that of the technological and digital - all of which are reliant on data centers in some form. “SmaXtec is basically dedicated to making the dairy industry sustainable for the future,” claimed Scherer. By “digitizing” livestock and using artificial intelligence (AI) in the cloud, Scherer said SmaXtec can improve the performance and profitability of a dairy farmer (i.e, get more milk out of each cow), reduce methane or greenhouse gas emissions per liter of produced milk, and improve animal welfare.

The quantified cow SmaXtec’s solution requires cows to swallow what the company calls a “bolus” - a small device that consists of sensors to measure a cow’s pH and temperature, an accelerometer (which measures acceleration and motion), and a small processor. “It sits inside the cow and constantly measures very important body health parameters, including temperature, the amount of water intake, the drinking volume, the activity of the animal, and the contraction of the rumen in the dairy cow,” Scherer said. Rumination is a process of regurgitation and re-digestion. “You could almost envision this as a Fitbit for cows,” he said, adding that by constantly measuring those parameters at a high density - short timeframes with high robustness and high accuracy - SmaXtec can make assessments about potential diseases that are about to break out.

>>CONTENTS

the cloud via a SIM card. All of the artificial intelligence training and inference is done on the cloud - using a time series machine learning algorithm - and farmers can check on the well-being of their cows via a platform on their smartphone. LoRaWAN is more suited to the task than other methods of connectivity as it is ideal for long-range, low data rates, and low power applications. According to Scherer, the solution doesn’t need the super-low latency that it could get if the AI was done at the Edge. “With respect to the urgency of the disease or the progression speed of the disease, it's basically real-time,” he said. Ultimately, healthier cows produce more milk, and SmaXtec claims to be able to reduce the need for antibiotics at dairy farms by 70 percent. The solution can also reduce methane per liter of milk by around 15 percent - which through the biogenic carbon cycle converts into CO2 after around 12 years. The company predicts that savings will equal around 800kg of CO2 per cow, per year, based on an average cow in Austria. Scherer notes that the reduction of methane per liter is even more in the US as cows are more productive there. However, even if the product can live up to its aims, it is still far from the sustainability savings of shifting to milk substitutes such as almonds and oats, which produce a factor of magnitude fewer emissions.

Incentivizing innovation In the farming industry, there is an overarching push to increase the use of technology in order to improve efficiency in the sector. Cost savings have led this effort, but sustainability is also playing a part.

These sensors gather a “tremendous” amount of data that is processed inside the cow on an extremely small and custom-made processor, before being transferred to a base station on the farm via LoRaWAN - a low-power wide area networking protocol.

The US Department of Agriculture (USDA) has an “Agriculture Innovation Agenda,” which aims to help the increase US agricultural production by 40 percent” and reduce the “environmental footprint of US agriculture in half by 2050” in part by introducing the use of “cutting edge technologies.”

From the base station, the data is sent to

While an active initiative, there has been

48 | DCD Magazine • datacenterdynamics.com

little update on progress since 2020 and investment in Public Agricultural Research and Development has fallen by a third in the past two decades. According to research published by the USDA Economic Research Service, “the United States is falling behind other major countries.” These advancements aren’t limited to the US, but the nation has been one of the biggest backers of AgriTech globally. At COP26, the US teamed up with the UAE to invest $4bn into AgriTech, doubling that figure in the following year as part of a shared Agriculture Innovation Mission for Climate. By 2025, the two nations plan to invest $8bn in nanotechnologies, biotechnologies, robotics, and AI, said Tom Vilsack, US Secretary of Agriculture at AIM’s First Ministerial Meeting in Dubai in 2022. The UK has also launched several Environmental Land Management Schemes, including a 2021 program that allows farmers to be rewarded for farming in a more environmentally friendly and sustainable way. While this incentive was not necessarily directly linked to the technological takeover of the farming sector, the end result is still to the benefit of the planet. As in every industry, there is a certain hesitation and reluctance to welcome hightech solutions with open arms. Some of that can be put down to fear of change, but often the main concern centers around technology taking jobs away from people. “We always get asked this question,” sighed Sarra Mander, CMO of farming robotics business Small Robot Company. “But in arable farms, to some extent, that shift already happened when tractors replaced horses. So it's not about that, really. What it would help with is work-life balance for the farmers who are there.” This is the argument that often gets made: AI won’t take over, it will only supplement. In some instances, this has proved true, but in many others it has led to job losses, and a reduction in quality.


From farm to pivot table 

>>CONTENTS

for Small Robot Co to do some AI inference at the Edge, and thus remove even more of the human element of the process.

On-board processing The company is currently engaging in early trials where processing will be conducted on the robot itself, enabling the immediate removal of the weeds as it crosses them.

The robots take over Small Robot Company is known for its Tom robot (presumably named after the company’s CTO, Tom Walters). Tom - the robot - distantly recalls memories of Doctor Who’s dog K9. The device wheels itself up and down fields, capturing images and mapping out the land. The data is then taken from Tom’s SSD and uploaded to the cloud, where an AI identifies the different plants and weeds, and provides a customized fertilizer and herbicide plan for the crops. “Essentially, we are reimagining farming with robotics and artificial intelligence, to farm at a more precise level, what we call per-plant farming, giving each plant what it needs, with no waste,” explained Mander. In the case of Tom, it won’t replace a farmer as someone will still need to head out onto the land to distribute the chemicals, but will reduce the amount of chemicals needed, and enable them to only be applied where necessary. The chemicals themselves are expensive, so there is a financial incentive to stop wasting what Mander claimed is currently about 90 percent. The chemicals are also bad for the environment. Particularly when synthetically made, it can damage the environment both in the process of creating the fertilizer and herbicide in the first place, and then through damage to the soil, thus reducing long-term productivity. As is, the amount of data gathered by Tom can be staggering. The device has eight cameras continually snapping images and, in the future, this could be layered with other kinds of sensor data, creating terabytes of data “One field we did in the early trials, I think that was something like 13 million wheat plants and half a million weeds,” Mander said. “So the permutations of keeping that amount of data on every single farm and every single field that you're scanning multiple times in a season very quickly rack up the costs if you're keeping that all in a data center.” This is a common motivating factor for shifting AI processing to the cloud, which can easily scale. However, there is the potential

“The proof of concept with the weeding robot took the field scale maps that we get using the survey bots and then being able to use Edge AI and take the robot to a specific point in the field and then target individual weeds,” explained CTO Tom Walters. “The single camera on the robot is able to know where, within its field of view, there are weeds that can then be zapped using the ‘weed zapping arm,’” Walters said. “That was done using an Edge object detection model, running on Nvidia cores onboard the robot, to do the very fine grain last centimeter geolocation of weeds.” The need to do the processing actually on the robot itself, instead of at an Edge deployment on site, comes from the issue of connectivity (or lack thereof). “The issue we have is that we want to provide services to farms at scale, and so it makes sense to be doing as much processing as fast as possible. To turn things around quickly, there’s a huge amount of data that we have to get back from the fields to the data center and then to the cloud for processing. But we are doing this in the middle of nowhere, and there isn’t always the connectivity,” said Walters. As a result, it makes more sense to do as much of the processing on the robots as possible, as this enables a much faster turnaround of services. But, as Walters points out, that data is a real asset, and keeping it long-term could enable the company to use that data to train new models, and provide new insights. “We have all these nice sets of data with multiple variances, and it would be great to be able to go back to the raw data and see new things, train new models, and add new capabilities,” he said.

The financial side of life All of the hurdles ultimately come down to money, and the reluctance (or the inability) to spend it. Farmers could invest in on-site Edge data centers built for AI applications, such as the LiquidStack Micro or Mega Modular solutions which are designed with liquid cooling and high-density computing in mind, but the CapEx may make this unlikely. Even when using the cloud, the costs ramp up as the data

sets grow and more storage and processing power are needed. Broadband instability in rural locations is not an issue that can be solved by one farmer’s bank account - that requires government spending. Investment into telecoms and Internet infrastructure is currently focused around highly-populated areas such as cities or major towns, not the quiet hamlets or mountaintop settlements. And certainly not in farms where there may only be a handful of people for miles. Acknowledging the importance of 5G, the US government’s National Telecommunications and Information Administration said that 5G “will be a primary driver of our Nation’s prosperity and security in the 21st Century.” However, actually getting full coverage of the US is much harder in practice and while moves are being made with satellite-provided 5G, this is still at a relatively early stage. Despite these financial hurdles, there remains hope for the US farming sector and for the AgriTech companies, that their solutions will help to increase food production while reducing carbon emissions. The US Farm Bill was due to expire by September 2023, but was granted a year-long extension “to make sure we do it right,” opening the floor up for discussion. May 2023 saw a subcommittee hearing for rural broadband, while November 2023 saw a full committee hearing for ‘Innovation in American Agriculture: Leveraging Technology and Artificial Intelligence.’ “American Agriculture has always been at the forefront of innovation, and artificial intelligence has the potential to revolutionize the way we grow, harvest, and distribute our crops,” Senator Debbie Stabenow said during the November hearing. “In this rapidly evolving landscape it is imperative that we strike a balance between harnessing the benefits that AI offers, while addressing the concerns such as data privacy, workforce implications, and equitable access to technological innovations.” Stabenow then confessed that the statement had been written by an AI, to illustrate its power and potential. She added, in words either written by herself or an aide: “This just shows us how real this technology and its implications can be. It is opening new pathways to address the climate crisis, increase production, lower input costs, and automate planting and harvesting. “Tractors that scan for weeds and apply targeted herbicides, harvesting machines that use AI to determine ripeness in realtime and autonomously pick crops, and systems that integrate satellite and soil data to more efficiently apply fertilizer are not in the realm of science fiction. These technologies are being put to use on farms across the country today.” 

Issue 51 • December 2023 | 49


Visualizations © Render Vision

WE ARE THE

GREEN DATA CENTER PLATFORM FOR SECURE SERVER HOUSING AND COLOCATION AT PRIME LOCATIONS ACROSS GERMANY. OUR LOCATIONS

FRA 1

BER 1

25 MVA TOTAL CAPACITY 18 MW IT LOAD 6,300 SQM IT SPACE

40 MVA TOTAL CAPACITY 26 MW IT LOAD 10,200 SQM IT SPACE

OUR STANDARDS

Uptime Tier III – EN50600 VK 3, ISO 27001 German Engineering Global track record

CONTACT US

info@data-castle.de | www.data-castle.de


Pretty, secure 

>>CONTENTS

Looking good, feeling safe – data center security by design

Dan Swinhoe Senior Editor

Balancing security and aesthetics in data center design

D

ata centers are capital-intensive developments, often costing hundreds of millions of dollars.

Many developers will say that means there’s little left over to beautify buildings which generally look to stay out of the public gaze. Data center naysayers will argue, however, that there must surely be room to spend a tiny percentage of that huge spend on making them less dystopian in appearance, if nearby residents have to look at these buildings day in, day out, for the next 20+ years. As awareness of, and opposition to, data centers grows in many key markets, local communities and regulators are demanding developers make more of an effort to soften the rough edges of data centers and make them look less like windowless prisons in exchange for granting planning permission. But can data center developers balance the increasing aesthetic demands with the security requirements customers need?

Aesthetic demands increase While some data centers can be very aesthetically pleasing, these are often retrofits of historical buildings in major metros. 60 Hudson in New York City may be among the most iconic examples. Most older sites are merely retrofits of bland offices or warehouses. And many older purpose-built facilities show little aesthetic flair and are simply large grey windowless concrete boxes surrounded by heavy fencing. This was accepted for years when data centers were few and far between but, as data centers increase in number across many markets, they have become a blight for some communities. “When this business started, we used existing logistic [facilities] and old office buildings that we converted into data centers,” says Digital Realty’s chief data center

Issue 51 • December 2023 | 51


DCD Magazine #51

technology and engineering officer, Lex Coors. “Then we started to build two-floor, three-floor, four-floor, or five-floor data center blocks. And then we got pushback and the authorities saying, ‘You're just greying the whole area with blocks. Can you not make them look a little better?’” This has led to the growing trend of adding glass facades to data centers to make them more office-like in appearance, but this is more common in areas where local planning rules require it. As data centers continue to proliferate, the demands around visually appealing facilities are increasing. Operators are choosing a variety of colors for their buildings, including green living walls (also known as vertical gardens), or even painting murals on the sides of buildings. Many are starting to add LEDs that change color depending on the time of year to add a night-time aesthetic. “It's a conversation and a cooperation with the local authorities. We listen to the issue that the local authority wants to solve, and we explain to them the issue of some of their requirements,” says Coors. “But we will not compromise our security regardless of aesthetics.” David Watkins, solutions director of European operator Virtus, notes that regulations have become tighter, but says operators still benefit from the security that obscurity can provide in many locations. “In an ideal world, we sit in the background and do the critical stuff that we do without drawing attention,” he says. “Most people that come past our campus here in Stockley Park probably don't even know that they are data centers. “But that's a key driver for us moving forward; to make sure we meet our commitments in terms of planning, but

>>CONTENTS

also to make our buildings look secure and professional rather than secure and overbearing.” Mary Hart, practice leader, mission critical at architecture and design firm HKS, notes that hiding in plain sight is taking a step back to company promotion. “Branding is a big thing now. They're not necessarily trying to hide; they still want to convey a sense of security, but they are also conveying a brand and selling a product, especially the colos,” she says. “You can see if they've spent money trying to make it a little bit better. But we also have some clients where their model is not to spend any more money than they have to because it’s all going into the infrastructure.” And as facilities get taller, the effort to make data centers look more like offices can be an opportunity to push back against the dirge of big grey boxes. “The big squat buildings are hard to upgrade aesthetically; it's always going to be a big flat warehouse-looking building,” says Hart. “You can dress up the facade quite a bit, but it's hard to break up those really long linear facades into something that is more aesthetically attractive without spending a ton of money to do so.”

What’s in there? Compliance and regulation will always play a key part in data center security. Fences, gates, guardhouses, and CCTV will always be required as industry standards demand. But the more ‘in your face’ a company is about security, the more it risks becoming a lure. “I've looked at sites in the past where the prison analogy is a good one; they've had barbed wire or spikes on the top of the fence, and it all looks very secure,” says Virtus’ Watkins. “But the downside of that is you're attracting attention. People will ask what's

52 | DCD Magazine • datacenterdynamics.com

in there because we do not publicize what we do.” The first thing to note when designing a data center is where it is going to be and what types of buildings it will be neighboring. “It really starts right at the beginning when we're looking at sites,” says Watkins. “Even where we put them is a security consideration; we will do a very extensive due diligence exercise involving surveys, and local searches around what businesses are nearby.” It’s important to make your facility blend in with what’s around you. A glass-covered building in an office district makes sense. Likewise, a welcoming facade sits well in a residential area. But looking super secure in a residential spot or too inviting in an industrial park can be incongruous and attract interest. “It would be weird to see this exotic building landmark in an industrial area because we would be the only ones, and that's what you don't want,” says Digital’s Coors. DataBank’s CISO Mark Houpt notes that, generally, his company tries to avoid making facilities look too much like warehouses as it may invite opportunistic thieves. Instead, the company tries to make them more officelike. “Data centers can be large boxy facilities that look like warehouses,” he says. “But that, in itself, can attract people because they may think that there's material inside that they can break and enter and walk away with. In Plano, for example, the facility looks like a warehouse on the back end of it, but the front we've made look like an office building with big windows.” Telehouse South – the company’s latest facility in London – is housed in a


Pretty, secure 

>>CONTENTS

“What we did was to try and blend into the area,” he says. “The outer fence on a couple of the buildings is like a regular garden fence, only a meter and a bit high. So it's not going to keep anybody out if they're determined to come over the top. “So what we've done is bought the main security fence, which is three meters high, to only about a meter away from the building, and it's colored to blend in with the landscape at that end.” He continues: “On one end of that plant gantry, we’ve put a living wall on one end of it. As well as providing security, because it is a barrier, it's a good environmental feature that looks nicer than a big tall fence that's just stuck on there on the side of the building.” He said the company is looking at using more living walls as aesthetic security measures at other locations going forward. In the Plano area of Dallas, Texas, DataBank’s facility has a small garden feature waterfall out front that is actually a security mechanism. “It's right in front of the door, about 15 feet off of the door, so that a vehicle cannot ram that,” says Houpt. “It makes it look like more of an aesthetic piece rather than a security one.”

former Thompson Reuters building along the Thames riverside in east London’s Docklands. The company is confident enough in its security procedures that it is actually opening up a public pedestrian path along the riverside at the backside of its property. Paul Lewis, Telehouse SVP of technical services, told DCD on a recent tour of the facility that the company is deploying smart CCTV that can detect, count, and monitor people loitering the edge of the facility, but such physical attacks are incredibly rare. Rather than targeting data, the threat the company is more likely to see is opportunistic intruders trying to steal equipment or sellable materials such as copper. Virtus’ Watkins also noted that increasingly smart CCTV cameras mean that fewer are needed around the site as they can, like Telehouse, detect people hanging around close to the campus, which also has a public waterside path on one side. Likewise, tremble detection on fences can send an alert when people try and climb them. At Virtus’ Stockley Park estate campus in West London, the company acquired four former distribution warehouses. The facilities are surrounded by anti-climb fences that don’t look as imposing as any barbed or spike-topped fence.

Using the likes of waterfalls or living walls as environmental security by design is known as crime prevention through environmental design (CPTED). The concept aims to deter and inconvenience criminals before they act; this can be through encouraging more people to an area, landscaping to control the flow of people and reduce opportunities for criminals, and preventing dark areas or easy hiding. There is an ISO standard for CPTED – ISO 22341:2021 – that was introduced in 2021. An example would be turning bioswales – channels designed to concentrate and convey stormwater run-off – into car stoppers. “I've worked on a lot of data center projects where we worked with the landscape to add things like a bioswale that could include a wheel stop - a concrete wall or trench– that would stop anyone who tried to drive through towards the front door,” says HKS’ Hart. Other examples might include large planters or bike racks in front of facilities that are cemented into the ground – instead of simple bollards – to prevent buildings from being rammed. “In one of our facilities in Kansas City, instead of using metal poles for holding up the fence, between poles we use nice decorative brick as the stanchions hiding the steel pole structure that the fencing is attached to. It looks more like an office

building,” says DataBank’s Houpt. He notes that the company has also put electric vehicle charging stations up close to its front doors; the EV points are stout enough to act as protection themselves, but the stations result in more parked cars, which act as a further deterrent. “Another thing that we utilized to prevent people from ramming through is that our driveways snake and won't come in straight from the street, so there's no way for you to gain speed.” Likewise, one-way traffic spike units are put into the ground in a way that they can be raised when needed, rather than continually on show. DataBank also uses higher curbs than normal – 8” instead of 6” – as they are more difficult for a car to get over.

Neighborhood watch Security is table-stakes for data center operators and developers – paying extra to adhere to compliance regulations and customer demands is just seen as a cost of doing business. But increasingly, planning regulators are demanding more concessions around aesthetics, meaning more aesthetic facilities will become just another accepted cost going forward. Where it isn’t mandated, though, a friendly neighborhood data center can still bring security benefits. Happy residents make for much better neighbors than resentful ones. “Being good neighbors with the people that are around us is a significant security function. If you offend or bother them, they're going to be more likely to cause problems or not tell you if something's going on,” says DataBank’s Houpt. “We've had some of our neighbors tell us when people are sneaking along our fences because we are good neighbors to the folks that are around us.” “If we're not, we run the risk of having complaints; of not being notified when there are security events going on; and of being accused of drawing in security risks.” Adding lights and security on the perimeter of a facility can help provide extra reassurance to locals if done well. This can, in turn, make people feel safer, which can bring more people to an area and reduce opportunistic crime. “In certain areas, we provide an additional at least ‘perception’ of security. It starts with perception. If people feel safer, they will act as if it is safer, and eventually, things will become safer because more people will be on the street when it's later in the evening,” says Coors. An example of an area becoming safer – though many would say gentrified – is east London, where Digital (previously Interxion)

Issue 51 • December 2023 | 53


DCD Magazine #51 have operated a data center on the site of the Old Truman Brewery since around 2000. “When I came there in 1999, people asked me if I was nuts, it was a very dangerous area,” he says. Today, that part of east London is the hipster area of the city that rarely sleeps. “It is the same in La Courneuve. La Courneuve has not always been the most safe area of Paris, with our investments there we now see people walking in the park with their children.” In Minneapolis, there were a number of break-ins that happened in a residential area near a DataBank data center. “We were able to share some of our CCTV with the police and with the neighborhood association, which allowed them to see some possible suspects. In Kansas City, we were able to share our CCTV after someone hit a residential fiber optic box. They were able to find their suspect and figure out who did the hit and run. We earned a good reputation there by working with people.” DataBank’s Atlanta 1 facility is located on a college campus. The square has a video board that overlooks a common area where students can sit outside. “That has made it an attractive area, and what that does from a security perspective, is attract the right kind of people. Without that board there would be just a dark area that's between two buildings,” says Houpt. “Now people actually want to go there. And when something's going on that shouldn't be, people will tell our guards or call the campus security.”

Visibility of automation While security guards may not always be the friendliest people, they are definitely more relatable than robots. And, as data center operations become more automated, data center firms need to consider the optics of robots vs the potential security benefits they can bring. The likes of Novva, Digital Realty, Scala, Oracle, NTT, and Fujtisu are deploying free-roaming robots internally at some of their data centers for various tasks, including observation and routine security checks. These machines can provide quick responses at all hours and flag potential problems to human operators on-site. So far, only Novva has said it intends to bring its Boston Dynamics robot dogs outside for possible security patrolling and incident response. Switch previously planned to commercialize campus security robots, but quietly dropped the project. Novva CEO Wes Swenson told DCD that the machines will only be for surveillance and won’t be performing any kind of K9 heroics.

>>CONTENTS

“It's not going to interfere. It's not going to attack, it's not going to defend, it's not going to do any of that stuff. And we don't really want it to. If it's something we need to call the police for, we would do that.” Likewise, Novva is one of the first data center firms to use drones for security purposes. The company has partnered with Nightingale Security to deploy the latter’s Blackbird Unmanned Ariel Vehicles (UAVs) at Novva’s flagship in West Jordan, Utah. While the drones offer 4K cameras, LiDAR, and infrared during their preprogrammed missions, operators will have to be careful about their use. “If your data center is in a residential, super metropolitan area, it’d be a little bit risky to be sending a drone up in tight quarters,” Swenson told DCD. “People are going to get sensitive about it, and you’re gonna get calls; they’re just too dystopian, and people might feel like they’re being spied on.”

Digital Realty hasn’t currently deployed any drones, but could see the value in deploying them at a large campus in an industrial area away from local communities. At deployments in or near residential areas, he’s less convinced. “Drones create noise,” notes Coors. “And so wherever we are, we are always constrained by noise permits. We do not like to add additional noise to a community, and we are not allowed to. “Drones also create an insecure feeling for people who are walking on the streets. But, if people started accepting them and saying ‘we believe that our world will be made safer by drones,’ we would definitely look at this.” Virtus’s Watkins agrees that drones will be difficult in many deployments, but is more positive on robots. “I think robotics is something that you might see more of. Part of our [human] security guards’ daily duties is to do external perimeter walks,” he says. “I see no reason why that might become the remit of some sort of a robot.” 

Securing in plain sight For data centers in shared spaces, sometimes turning data halls into display features is a way to make them secure. Keeping compute in a secure but openly visible space means it’s harder to do anything unnoticed. It may also help some engineers be more mindful about keeping the halls tidy and cabling neat. “Some people keep data centers behind closed walls and keep them hidden and private. Others use them as features,” says Nick Ewing, managing director at UK modular data center provider EfficiencyIT. “The best ones are the ones where the customers like to make a feature of the environment and use it to use it as a bit of a display.” An example he cites is the Wellcome Sanger Institute in Cambridge, where they have four data center quadrants. Each quadrant is about 100 racks; they have man traps at either end of the data center corridor. But one end of the main quadrant is completely full of glass. “They have an LED display, which is talking about how many cores of compute, how much storage they’ve got, how many genomic sequences they've they've sequenced that day,” he says. “They've used it as a feature and used it to their advantage.” Elsewhere in the UK, a Tameside Council data center in an old Victorian bath house is also adopting a similar approach. Sudlows helped the council deploy a 150 sqm (1,600 sq ft) Tier III quality data hall in the middle of the renovated space.

54 | DCD Magazine • datacenterdynamics.com

“We went to quite a lot of trouble to make sure that the data center itself has got windows and glass walls with colored lights,” Tim Rainey, assistant director of Digital Tameside, a division of Tameside Metropolitan Borough Council, told DCD in 2021. “As you wander around the annex, in the middle is this data center which you can see into, with color-changing lights in the walls so that it looks like an attractive feature. It just fits in with the history of the building, but also fits in with the technology side of the building as well.” Likewise, one of the two data centers at UK football team Tottenham Hotspur’s new stadium in London is openly visible for fans to see. The facility, located on the North-West Corner of Level 1, powers most of the stadium’s on-site operations. In Ohio, the Cleveland Clinic deployed an IBM quantum computer in full view of people. Rather than locate the nine-foot sealed cube, made of half-inch thick borosilicate glass, in its existing data center on the edge of the city, the medical organization put the IBM System One on display at the Lerner Research Institute on its main campus in the city. Jerry Chow, IBM fellow and director of quantum infrastructure research, told DCD. “The Clinic wants the system to be visible to both researchers and patients as a symbol of their commitment to innovation for patient care, so it made sense to install the system on-site.”


EUROPEAN DATA CENTRE POWER CABLE EXPERTS

POWERING A SUSTAINABLE FUTURE OUR ACCREDITATIONS AND COMMITMENTS

www.elandcables.com

Issue 51 • December 2023 | 55


DCD Magazine #51

ENABLING YOUR NEW HORIZONS

efficient reliable scalable hello@quantumswitch.com

Data centre developer operator 56 | DCD Magazine • datacenterdynamics.com


Understaffed 

>>CONTENTS

Staffing levels: Are data centers at risk of unnecessary outages? Lessons from a Microsoft data center’s mistakes

Graham Jarvis Contributor

W

ith increasing data center automation, it’s only natural for clients to want assurance that their data will be available as close to 100 percent of the time as possible, and to ask whether enough data center staff are available to achieve a high level of uptime. They also want to know that, when a potential outage occurs, there are enough technicians on duty or available to restore services as soon as possible. Microsoft suffered an outage on 30 August 2023 in its Australia East region in Sydney, lasting 46 hours. The company says it began at 10:30 UTC that day. Customers experienced issues with accessing or using Azure, Microsoft 365, and Power Platform services. It was triggered by a utility power sag at 08:41 UTC and impacted one of the three Availability Zones of the region.

Issue 51 • December 2023 | 57


DCD Magazine #51

Microsoft explains: “This power sag tripped a subset of the cooling system chiller units offline and, while working to restore cooling, temperatures in the data center increased to levels above operational thresholds. We powered down a small subset of selected compute and storage scale units, both to lower temperatures and to prevent damage to hardware.” Despite this, the vast majority of services were recovered by 22:40 UTC, but they weren’t able to complete full mitigation until 20:00 UTC on 3 September 2023. Microsoft says this was because some services experienced a prolonged impact, “predominantly as a result of dependencies on recovering subsets of Storage, SQL Database, and/or Cosmos DB services.”

Voltage sag cause The utility voltage sag was caused by a lightning strike on electrical infrastructure some 18 miles from the impacted Availability Zone of the Australia East region. Microsoft said: “The voltage sag caused cooling system chillers for multiple

>>CONTENTS

data centers to shut down. While some chillers automatically restarted, 13 failed to restart and required manual intervention. To do so, the onsite team accessed the data center rooftop facilities, where the chillers are located, and proceeded to sequentially restart chillers moving from one data center to the next. “By the time the team reached the final five chillers requiring a manual restart, the water inside the pump system for these chillers (chilled water loop) had reached temperatures that were too high to allow them to be restarted. In this scenario, the restart is inhibited by a self-protection mechanism that acts to prevent damage to the chiller that would occur by processing water at the elevated temperatures. The five chillers that could not be restarted supported cooling for the two adjacent data halls which were impacted in this incident.”

What was the impact? Microsoft says the two impacted data halls require at least four chillers to be operational. The cooling capacity before the voltage sag consisted of seven

58 | DCD Magazine • datacenterdynamics.com

chillers, with five of them in operation and two on standby. The company says that some networking, compute, and storage infrastructure began to shutdown automatically as data hall temperatures increased. This temperature increase impacted service availability. However, the onsite data center team had to begin a remote shutdown of any remaining networking, compute, and storage infrastructure at 11:34 UTC to protect data durability, infrastructure health, and to address the thermal runaway. Subsequently, the chilled water loop was permitted to return to a safe temperature, allowing the chillers to be restarted. It nevertheless led to a further infrastructure shutdown, and a further reduction in service availability for this Availability Zone. Yet the chillers were eventually and successfully brought back online at 12.12 UTC, and the data hall temperatures returned to operational thresholds by 13.30 UTC. This culminated in power being restored to the affected infrastructure, and a phase process to bring the infrastructure


Overworked 

>>CONTENTS

back online began. Microsoft adds that this permitted its team to restore all power to infrastructure by 15.10 UTC, and once the power was restored all compute scale units were returned to operation. This allowed Azure services to recover. However, some services still experienced issues with coming back online. The lightning strike was only the initial cause of the issue - it wasn’t to blame for the length and severity of the outage. Microsoft admitted that staffing levels were an issue.

Staffing review Amongst the many mitigations, Microsoft says it increased its technician staffing levels at the data center “to be prepared to execute manual restart procedures of our chillers prior to the change to the Chiller Management System to prevent restart failures.” Michael Hughes, VP of APAC datacenter operations at Microsoft, explains: “You’ve got to run out onto the roof of the building to go and manually reset the chiller, and you’re on the clock,” he adds. With chillers impacted and temperatures rising, staff had to scramble across the site to try to reset the chillers. They didn’t quite get to the pod in time, leading to the thermal runaway. The answer in terms of optimization is to first support the highest load data centers – those that have the highest thermal load and highest number of racks operating to recover cooling there. Microsoft now claims that staffing levels at “the time would have been sufficient to prevent impact if a ‘load based' chiller restart sequence had been followed, which we have since implemented.” It has tweaked its system to focus on the highest load facilities. The autorestart should have happened, and Hughes argues that there shouldn’t have had to be any manual intervention. This has now been fixed, he says. He believes that “you never want to deploy humans to fix problems if you get software to do it for you.” This led to an update of the chiller management system to stop the incident from occurring again.

Industry issue and risk Ron Davis, VP of digital infrastructure operations at the Uptime Institute, adds that it’s important to point out that these issues and the risks associated with them exist beyond the Microsoft event.

“I have been involved in this sort of incident, when a power event occurred, and redundant equipment failed to rotate in, and the chilled water temperature quickly increased to a level that prohibited any associated chiller(s) from starting,” he says before adding: “This happens. And it can potentially happen to any organization. Data centers operations are critical. From a facilities standpoint, uptime and availability is a primary mission for data centers, to keep them up and running.” Then there is the issue of why the industry is experiencing a staffing shortage. He says the industry is maturing from an equipment, systems, and infrastructure perspective. Even remote monitoring and data center automation are getting better. Yet there is still a heavy reliance on the presence and activities of critical operating technicians - especially during an emergency response as outlined in the Microsoft case. Davis adds that during Tier assessments, “we weigh staffing and organization quite highly.”

Optimal staffing levels As for whether there were sufficient staff onsite during the Microsoft outage, and what should be the optimal number of staff present, John Booth, managing director of Carbon3IT and chair of the Energy Efficiency Group of the Data Centre Alliance, says it very much depends on the design and scale of the data center, as well as on the level of automation for monitoring and maintenance. Data centers are also often reliant on outsourced personnel for specific maintenance and emergency tasks and offer a four-hour response. Beyond this, he suggests there is a need for more information to determine whether seven staff were sufficient but admits that three members of staff are usually the norm for a night shift, “with perhaps more during the day depending on the rate of churn of equipment.” Davis adds that there is no reliable rule of thumb because each and every organization and site are different. However, there are generally accepted staff calculation techniques that can determine the right staffing levels for a particular data center site. As for the Microsoft incident, he’d need to formally do the calculations to decide whether three or seven technicians were sufficient. It’s otherwise just a guess. He adds: “I am sure Microsoft has gone through this; any well-developed

operating programs must perform these calculations. This is something we look for during our assessments: have they done the staff calculations that are necessary? Some of the factors to include in the calculations are shift presence requirements – what is the number of technicians required to be on-site at all times, in order to do system checks and perform emergency response? “Another key consideration is site equipment, systems, and infrastructure: what maintenance hours are required for associated planned, corrective, and other maintenance? Any staffing calculation considers all of these factors and more, including in-house resources and contractors as well.”

Microsoft: Advocate of EOPs “From what I know of Microsoft, they are a big advocate for emergency operating procedures and correlating operational drills," Davis says. "The properly scripted EOP, used during the performance of a well-developed operational drill, may have supported the staff in this effort, and/or perhaps identified the need for more staffing in the event of such an incident.” Microsoft had emergency operating procedures (EOPs) in place. They have learned from this incident and amended their EOPs. They are where organizations need to start, and they should examine testing and drill scenarios. A data center’s best protection is, says Davis, a significant EOP library, based on potential incidents that can occur. He believes that the Microsoft team did their best and suggests that they deserve all the support available as these situations are very stressful. This support should come from all the training, tools, and documentation an organization can provide them. As to whether staffing levels could attribute to outages, that’s entirely possible, but that might not have been the sole cause in Microsoft’s case as Booth believes there was a basic design flaw. He thinks an electrical power sag should have triggered backup generators to provide power to all services to prevent the cooling systems from failing. There should therefore be an improved integrated systems test, which is where you test every system under a range of external emergency events. The test program should include the failure of the chillers and any applicable recovery procedures. The team running these tests will also need to be sufficiently staffed. 

Issue 51 • December 2023 | 59


DCD Magazine #51

ACCELERATING THE MIDDLE EAST’S DIGITAL VISION

.

.

efficient reliable scalable Data centre developer operator 60 | DCD Magazine • datacenterdynamics.com

hello@qstdc.com


Green bits 

>>CONTENTS

Greening software from the grassroots

Peter Judge Executive Editor

Smart coders can see the system is broken

C

oders are going to have to get political if the industry doesn’t start to take green software seriously.

The data center sector is tightly focused on reducing emissions by switching to renewable power, and also by redesigning facilities so that cooling systems use less energy. But what happens to the energy that reaches the server racks is rarely scrutinized. Despite this, everyone knows that servers are not used efficiently. Many idle servers keep running, or run code in the background that is not needed. Programs are not written efficiently: they cycle round and round repeated sets of instructions, they fire up cloud resources that then sit idle, they deliver features that are not needed, and different modules will carry out overly complex handshakes, within systems or across networks. Green software could massively reduce emissions by operating more efficiently. It also has the potential to drastically improve the efficiency of the facility, by interactively improving the operation of the entire system. “Self-aware” or “energy-aware” software could even alter its own operations. By referring to real-time figures for energy use and the carbon intensity of the electricity supply, programs could interrupt their own operation and shift themselves to more efficient servers, or to times and places where the electricity is less carbon-intense. Despite this, green software is still mostly overlooked.

Like previous green revolutions, green software is at first being dismissed as impractical, or counter to economic realities. But green coders can see the potential. At a gathering in Berlin in November 2024, convened by the Sustainable Digital Alliance (SDIA), in Berlin, they called for transparency, for regulations, and for action. Max Schulze, founder of the SDIA, told programmers it is up to them: “We are the ones we've been waiting for,” he said. “I would like to ask you to understand what you can do to connect with people to try it out and see what you can do.”

Detlef Thoms made a rough calculation that showed that saving one CPU-second on a transaction will only save 10Joules (10Watt-seconds). That’s a tiny saving, but if 1.5 million people are using the software, and there are 20 transactions a day over 230 work days, that’s 19MWh savings over the year. Software developers can feel powerless to change this and improve it, because they are under pressure to deliver software quickly, and not to finish it properly and make it efficient.

Software kills hardware Impact How much impact does software have? “There is a scaling effect,” said Schulze. “If you have a web application that has 10,000 active users, these 10,000 active users might generate hundreds of thousands of function calls. Those calls use energy. They need server capacity, they need storage capacity. I think we all know that there is an environmental cost.” There’s more, he said: “Think about the backup systems, the monitoring systems, the logging, all these things that are going on for most applications.” The thing about software is that a program written by one person or a team will have a bigger impact depending on how much it is used - and savings in energy will also scale. A year or so back, SAP product engineer

Beyond its direct impact, software is adding to the physical stream of electronic waste, said Anna Zagorski, research associate in Green IT at the German Environmental Agency (the UBA). “Bad software leads to hardware death,” she told the SDIA meeting in Berlin, referring to the way new versions of software can make older hardware obsolete. Older versions of the software may still work but, without support, they become “abandonware.” The hardware they run on won’t support newer versions, and is discarded. Electronic devices are the fastest growing category of waste, and the KDE open source community lays the blame for this “tsunami of e-waste” on software: “What is the cause of all this e-waste, and why do digital devices that still work end up in landfills? Software engineering has an important

Issue 51 • December 2023 | 61


DCD Magazine #51

>>CONTENTS

Credit: Janne Kalliola

but often unseen role in driving our digital consumption patterns. Manufacturers regularly encourage consumers to purchase new devices, often unnecessarily; indeed, they may even enforce it through software design.” The KDE takes the single example of e-readers, on which people consume books. It estimates that the carbon, energy, and materials embodied in an e-reader will be roughly equivalent to thirty books. If you read thirty books on your e-reader, you have broken even environmentally, and every further book is a gain for the environment (as well as for your mind). But published lists show that e-reader models are discontinued in around 1.5 years, after which time the software is no longer supported. Voracious readers will reach 30 books in that time, maybe, but then will lose the environmental advantage of continued use, because they have to throw the device away (maybe to landfill, maybe to recycling). Even in routine use, bad software wastes energy. KDE compares an open source word processor with a commercial one (both unnamed), and notes that the proprietary product uses four times as much energy to perform a single benchmark task. It used more energy in operation, because of all the extra bloatware features added, and it carried on using energy for unknown purposes after the job finished. For software running on servers, and in the cloud, it is to be hoped that the underlying hypervisor and other systems software will be optimized, as it

is commissioned and run by the cloud provider who pays for the hardware. In the early days of computing, programmers used to code sparingly because resources were scarce, but now generous software infrastructure has spoilt them. Resources are easy to get - and therefore too easy to waste, said Atanas Atanasov, a senior software engineer at Intel: “When I started out, I’d order a server, and I had to wait for it to be delivered. A few years later, if I wanted a server it took two or three weeks to configure the virtual machine. Nowadays, I order a server and, if it takes longer than three minutes, I will choose another provider, because it's way too slow.” Application software running on these throwaway server instances will often not be optimized. Indeed, a lack of optimization at this level works in the cloud provider’s favor, as it ensures more chargeable resources are consumed.

Why is programming inefficient? Companies don’t make bloated software deliberately. Programmers with a long memory say it used to be different, one delegate said: “I'm old enough to remember when we coded a different way. The bandwidth was limited, the connectivity wasn't really reliable, the memory was small, and so on.” She went on: “I think we still have that knowledge. Maybe we could go back, helped by old people like me.”

62 | DCD Magazine • datacenterdynamics.com

Janne Kalliola, founder of Finnish software company Exove, agreed that developers could benefit from going “back to the good old ‘80s, when I started coding.” When he noticed the decline, he thought at first “I’ll make a fuss about it.” But he got no response and decided, “I need to be a change agent.” Kalliola brought out Green Code, a free book on the subject. Exove, along with other software firms in the Code from Finland group, have created a “Carbon Neutral Software” label, It is based more on company-wide emissions than the minutiae of coding, which is perhaps an admission of the difficulty of changing those practices. The underlying problem that leads to bad software is that it costs time and money to do it well. Programs take longer to code efficiently. These programs are moneymaking commercial products, so delaying a product to make it run better will add to the cost. That might lead to another organization getting to the market first. “The industry has spent the last 20 years or so focusing on development productivity,” said Max Körbächer, who founded open source consultants Liquid Reply. This leaves no time to consider the operational efficiency. “There's a certain aspect of laziness in the industry,” he said, but it’s the flip side of an industry that wants the developer to deliver faster. “The whole industry is always lusting for the next thing, and every single developer hates the sales guy that sells the next version


Green bytes 

>>CONTENTS

“We need to get much more accurate information on the design of software,” said Zagorski. And what that needs is standards, on how to measure and report the footprint of software. Germany may well be the leader here, with projects to help make software greener, and a labeling scheme designed to give information about the impact of code.

of the product and not the current one.” This leads to “Frankenstein coding,” in which routines that work are put together to build a product, without considering if they will work well together, or if all the elements of each component are really necessary in the new product. To help with that, Körbächer has set up a resource to help those making open source software. He offers tools to measure the footprint of their releases, “we're the ones to give advice,” he said. “To see how well they perform, and most how they can get better.” Green software advocates are clear that, as with other environmental products, the only way to force a better attitude is to expose the environmental footprint of a product, and hope to create a commercial pressure towards efficiency. As with cars and refrigerators, customers need to be shown the impact of what they use. Schulze said: “In order for resourceefficient software to exist, there needs to be a market for it. This means there must be transparency, with each application having to show their environmental impact very transparently to the users.” Use of greener software can be reinforced if public sector bodies lead the way with their procurement, said Zagorsk: “Sustainable software has to get into the guidelines for public procurement strategies.” Schulze agreed: “The market can be reinforced with public procurement. It can be reinforced with regulation. But we believe there needs to be in the market for resourceefficient software.”

Standards The trouble is, you can only have such a market if you can see how efficient your software is. Green software proponents were very clear that the biggest problem with greening software is a lack of visibility. “Transparency is a big problem. We really don’t have very good numbers on the environmental costs of software,” said Schulze.

UBA has been working on SoftAWERE, a project financed by the German Federal Ministry of Economic Affairs and Climate Action (BMWK) to develop tools that would help software developers produce energyefficient, hardware-saving software. SoftAWERE also investigated the idea of labeling energy-efficient software - and making software transparent about the energy it is consuming. Zagorski, who worked on the project, presented its final deliverables to the SDIA event. She said while the SoftAWERE project may be complete, the work will continue. The next steps will be to make the tools more “productized” and usable by others.

that is five years old. “It is the software that determines the energy consumption and operating life of digital infrastructure,” said a book on Blue Angel software compliance from the KDE - which was one of the first organizations whose software earned a Blue Angel label. “Blue Angel really defines what green software looks like,” said Schulze. “You can look at it as the Holy Grail.” As well as software, Blue Angel is looking at bigger infrastructure issues, including green data centers, where green software may eventually reside.

Software development Anyone who has worked on software knows that major aspects of the eventual product are set in stone before a single line of code is written or pasted together. The requirements for the project will demand particular features, and require certain characteristics and performance. Schulze said green practices need to be built in early. “Among the disciplines

The SoftAWERE team launched a Green Coding Wiki at the SDIA event, to share and improve best practices. But in parallel with SoftAWERE, the German Government is already backing a scheme to recognise eco-efficiency programs. Software is included in the Government’s Blue Angel label program for environmentally friendly products. It’s worth mentioning that Blue Angel wasn’t the first program to consider the environmental footprint of software. Hong Kong’s Green Council released some criteria for green software back in 2010, which included such things as having manuals online instead of in hard copy, and using minimal packaging on physical deliverables. In Germany, the Blue Angel scheme added certification for desktop software products in 2020, the first time in the world that an ecolabel certification was applied to software. Blue Angel has “Type 1” labels, which consider the whole-lifetime environmental footprint of a product. The Blue Angel program argues that software has a huge environmental impact beyond the systems on which it runs, in everything from desktop programs to data centers. The program awards certificates based on the efficiency with which the code operates - so it requires the software to allow energy use to be measured and reported. The software also has to give users autonomy to manually reduce the impact of the software. Finally, it has to have a low enough demand that it can run on hardware

Green Code Second extended edition

Janne Kalliola, Exove Foreword by Professor Jari Porras, LUT University

II

“In order for resourceefficient software to exist, there needs to be a market for it. This means there must be transparency" Issue 51 • December 2023 | 63


DCD Magazine #51 we need to learn, the first one is green requirements engineering,” he said. “Before we build software, we should think ‘do we actually need to build it? Can we somehow minimize the resources it uses?’” Beyond that, programming should work in a resource-efficient way. While developers will spend time making sure their applications have enough memory and resources to run, they should also look to limit those demands. And, finally, those running the software will have to work in a way that uses resources in data centers sparingly. Intel’s Atanasov suggested that resource limitations can be enforced across a project during development and deployment, in something like Kubernetes, but it isn’t straightforward, because we don’t have systems that can automatically identify which resources are necessary in a specific application at a given time. “Such systems are being developed in the open source community, to provide resource limitations,” he said. But they will have to be able to process resource requests and put them on the right hardware, to utilize resources better. To help developers pick the greener option when developing code, Atanasov is in favor of schemes that could grade tools and subroutines according to their relative carbon footprint. “Like when you buy a fridge.” There may be problems with this, said Schulze: “No software application is the same. We differentiate software through features. We start with a very specific problem that we're solving with software, and then we slap on more features over time. And that makes it almost impossible to compare.” Also, in the long term, green software builders want to have software that is selfaware, and knows the carbon footprint of the servers where it is running. That’s a problem, because there are many layers to the software, and its actual location may not be clear, and the server itself may not have direct knowledge of its footprint. “It's an interesting engineering problem,” said Schulze, but he says useful results don’t have to be 100 percent accurate. “We tested mathematically if we can get to the energy consumption of a CPU just using pure math. It turns out we can. It's not very precise or very good. But it's good enough. And a machine learning model can give an even more precise guess of the power consumption. Right now, if we can get a 60-70 percent accurate value, that's better than getting no value at all.”

>>CONTENTS

Image problem

some of the SDIA attendees.

Green software pioneers can find it hard to get their ideas across, or make the changes they want.

Körbächer thinks that one major source of the problem we have now is that “electricity is too cheap.”

Anita Schüttler, sustainability strategist at software firm Neuland, said the problems are sometimes hidden, by the many layers of software between the user and the hardware that uses resources. “It’s like the issue with global supply chains. The bad stuff is happening somewhere far away, where no one can see it. Software is seen as something clean because no one sees what's going on there.”

If wasted cycles cost more, then developers would avoid them - and it’s possible this may become more of a factor in future.

The problem, she said, is education: “The actual programmers don’t know about it.” Another issue is the sheer complexity of what is being done. Atanasov said that the high-performance computing field is aware of efficiency, but hasn’t always got the headspace for it: “The amount of complexity which a regular software developer has to deal with takes away all the ability to focus on anything else.” Even if a programmer wants to make greener code, they have to take care of tons of other things first, said Atanasov. “Knowledge is always a problem. This field has research and new tools and products coming to the market, you have to optimize the software and the infrastructure. Too much information can throw people.” Software projects also have organizational complexity: “It doesn't matter how automated processes are, in a project like Kubernetes we have 1,000s of contributors. The software's tested every day, 7,000 times per commit. There's hundreds of clusters simultaneously running programs, and everything is on. No one sees it all.” In major enterprises there are changes that promise to make things simpler, like DevOps, he said, but a continued evolution of software development practices can squeeze out sustainability, said Atanasov: “Companies struggle with containerization and other new technologies and trends. I'm pretty sure most team leaders nowadays don't think about sustainability at all.” Awareness can help because green software practices often save money as well as carbon, Körbächer points out. If developers are made aware of all the costs of unthinkingly spinning up extra resources, they could cut bills as well as carbon: “I think making them aware is the important part.”

Future trends A couple of things might happen to push green software to the fore, according to

64 | DCD Magazine • datacenterdynamics.com

Another factor is that physical resources to make chips may become more restricted, he said. “There are estimates that there will be a lack of silicon in a couple of years,” (presumably not referring specifically to silicon, abundant all over the Earth, but to all the elements required to make processors). “That would be interesting: that actually changes dynamics.” Some coders point out that they are just a part of the picture. Exove’s Kalliola said individual developers are twigs and branches of a tree, but: “The bulk of the tree is the trunk. We don’t have any influence. Schüttler also looks at the bigger picture: “To become sustainable, you need green energy, and you need efficiency, but you also need sufficiency,” said Schüttler. “We have to do less and we're living in a system that promotes generating money.” She is uncomfortable about the emissions her software enables - something akin to its Scope 3 emissions: “In the end, you have conflicts because someone else tells you what software to write. "As programmers, we can make the software a little less bad, but it's still not good. Because what does the software do? It sells products.” Making software more efficient could just result in it being used more and selling more things, she said, unless there is some way to set hard limits: “I'm actually quite pessimistic about this whole thing because, in the end, any savings will get spent.” For some, the only way to set those limits will be tax-like regimes, such as carbon pricing and energy taxes. “If every photo you take cost you €10, you would take fewer photos,” said Kalliola. Likewise, if energy costs go up, people will use less: “The tax man is the good guy.” Schulze is more optimistic. He hoped that infrastructure could be understood and legislated for in the whole context of the society which uses it, and wants to see a growth of “digital sobriety,” or the idea of using enough of a resource, not too much. “I hope that over the next couple of years we can include the social aspect, and the economic aspects,” said Schulze. “It is not just environmentalism. Green software is a step towards sustainable software.” 


DATA CENTER SOLUTIONS

UNIQUE TO YOU Designed with Your Needs in Mind

Hyperscale

Multi-Tenant

Enterprise

Purpose Built

To manage the ever-growing complexity of day-to-day operations, you need a supplier focused on more than products – you need a supplier committed to solving problems in a timeframe that meets yours.

SumitomoElectricLightwave.com

Issue 51 • December 2023 | 65


DCD Magazine #51

>>CONTENTS

Navigating the AI frontier: Balancing data center evolution with AI revolution Advertorial: Ensuring data center excellence amidst the AI surge, says Align

I

t’s hard to escape the endless talk of artificial intelligence (AI) and machine learning (ML) right now. While many buzz phrases come and go, there’s little doubt that AI puts us on the cusp of one of the most significant changes that society has ever faced – it will change the way we work and play alike, and most importantly, it’s here to stay. What that means for the data center is a one-two punch – it vastly increases the capacity the industry will need to offer and introduces the challenge of supporting new AIcapable infrastructure in older facilities. Align has been working in the industry for over 36 years, offering complete lifecycle services from the data center to the desktop. Their data center services include strategy, design/build, migration, decommission, and refresh, while meeting the ever-present need for expansion and consolidation of existing facilities. Who better, then, to discuss the oncoming advances in AI in the data center industry? We sat down with some of Align’s key stakeholders – Tom Weber, managing director for Data Center Solutions; Rodney Willis, managing director for business development, Data Center

Solutions; Simon Eventov, assistant director for Data Center Design & Build, and Tyler Miller, regional director of Data Center Sales, Texas. The first stop in our exploration is an analysis by Tom Weber of why the AI revolution, alongside other high-compute applications, is not just leading to exponential growth in data center construction, but also the consolidation of existing facilities to increase capacity. “In three to five years, we've gone from cabinets running at five, eight, or 10kW and now we commonly see 40kW and beyond, and it is driven by the gear and the more powerful application that is running on it. This may be elementary, but it's denser loads and smaller spaces, which create issues, because in the old days, we would have a megawatt covering 10,000 square feet, in the past few years, it's six megawatts and maybe 15,000 square feet.” “Now we're seeing companies running a 10,000 square foot space capable of running a megawatt and not even a tenth of the space is being used. So, we have an open hockey rink with 20 cabinets in a checkerboard configuration that are all running 50kW.” “One of the big challenges is fitting all that

66 | DCD Magazine • datacenterdynamics.com

power into a cabinet and how you get cooling to that cabinet. You're not just blowing cold air in the front anymore. The biggest thing we see from a physical layer is fitting more into a smaller space and there are pros and cons to that.” “Getting direct cooling at the cabinet level is a huge challenge. Tapping into existing cooling infrastructure often requires shutting down a portion of that infrastructure, which typically supports other critical customers that cannot afford any downtime. This is a real problem when trying to convert a portion of an existing facility to support these higher densities. Today, new builds and retrofits have started including cooling designs to better incorporate standard cooling technologies with the ability to provide direct cooling to areas that require it.” Simon Eventov added: "There will be specific segments, particularly in highcompute tasks, that will remain essential. A prime illustration of this is autonomous vehicles, where proximity is crucial. There will also be a vast volume of data, continuous learning, and outcomes that can be distributed across diverse locations.”


ALIGN | Advertorial 

>>CONTENTS

Edge of Reason? So, does that mean that the future lies in Edge? Weber thinks not. “It will always go back to a central data center. Take Ashburn, Virginia. When I first went there, 12 years ago, I was on a cow farm. Now you drive down the road, and there are data centers stacked up next to each other like apartment buildings in Brooklyn. Because of this dynamic, in many of our data center regional clusters, we have seen many areas simply running out of electrical capacity to support further growth. I have this vision of open land, in the middle of the country where there are solar panels and acres of wind farms, and that drives the data center. Otherwise, how do you get self-sustaining within the data center, outside of being in Iceland, and using geothermal?” Rodney Willis adds: “Or do we adopt micro-nuclear facilities faster? It’s already being talked about. If it can power an aircraft carrier or submarine, it can power a small data center. But that’s a long-term look, there’s the regulatory issues around that, and then societal issues. Will society accept small nuclear reactors close to their environments?” Powering the AI revolution is an important consideration – but is AI destined to go the way of 3D TVs and Google Glass? It certainly doesn’t look that way for what is shaping up to be a revolution, rather than a fad. “Why would you not try to utilize machine learning and artificial intelligence to help make decisions?” asks Willis. “The real key here is interpreting that data, letting the technology learn about data interpretations with some manual help, but then having that backup, for manual processes, AI just makes it more efficient.” Keeping the lights on is one thing – but anyone who has seen the 1983 film War Games might question the wisdom of putting all our technology in the hands of AI, which is why, as Willis explains, we can’t afford to press “immature” neural networks into service. “There is a reason we don't send a 10-yearold out to the workforce. He’s still got learning and training to do. We're basically parents of AI and machine learning. We're educating and teaching and monitoring and correcting where we need to go through that process before we feel comfortable. We have a responsibility for AI machine learning to make sure it goes down the path and delivers the outcome we want. Otherwise, what happens when that 10-yearold becomes a 30-year-old and locks you out of the house?”

Dark side of the data center Miscreant robot security aside, it certainly seems unlikely that we’re approaching an age of ‘dark’ data centers, operated purely by machines – as Weber puts it.

“Will AI go in and teach robots? A lot of the things that need to be done in a data center can be done by a machine. It takes a lot of knowledge to do it, but it's something that could be done by a robot – but robots need human supervisors, mechanics, and programmers.”

and you're trying to consider that this is going to be a 20-plus year facility. Right now, most companies are assigning three to five-year leases, maybe 10 at most, and then they're getting out and moving their data center to somebody who has the capability to deploy these higher-density solutions.”

People will forever remain integral to data centers, although this scenario presents a unique set of difficulties as data centers exclusively depend on GPUs and necessitate constant cooling, leading to significant side effects that must be mitigated.

In other words – AI is likely to become a sub-class of data centers, with more conventional data loads based in more conventional facilities, meaning that the AI switch is certainly not right for everyone – at least for right now.

“Have you been into a data center running pure AI?” asks Willis. “We completed one, a few weeks ago and the background noise was over 100db after the equipment was up and running. The sound pollution that's coming into this environment is incredible.”

“For the smaller facility, going all-in on AI is likely to lead to a high turnover of clients. A way to future-proof your building is to leave vast open spaces for newer mechanical and electrical gear and kit as things change and get more efficient, but then you lose that price per kW war with the guy next door who is lean and mean and doesn't leave a square inch unbuilt,” says Weber.

“Absolutely!” adds Weber. “There is a big market for headphones with microphones that have channels, just so you can talk to project teams. It is mind-boggling how loud these GPUs, all crammed into small spaces, can be. Working for long periods within these environments is extremely stressful. I think the headcount will go down, due to operational predictability provided by AI, and will allow fewer and shorter periods of being inside these spaces.” Overall, then, while AI is likely to make humans more efficient, it’s not likely to replace them anytime soon. For a start, there’s never been a better time to get into the construction industry, where data centers are concerned – and that’s something that will remain very much human-led. “From a building standpoint, AI will require more data center infrastructure, but AI is not going to put up walls, and floors and install the critical infrastructure,” says Weber. “They may tell you the best way to do it, but they're not going to do it for you. So, from a construction standpoint, AI doesn't affect that, it could even make more jobs available. Where I see the labor force changing is in the clerical aspect, like a legal assistant sorting through previous court papers and historical documents – that type of role is going to be history, it’s already starting.”

Is it time to embrace AI? Before your company rushes out to embrace AI, consider that there are many challenges to retrofitting your existing facility with AIcapable technologies, Miller explains: “If you look at industry verticals and the AI marketplace, there's a limited number of companies that are going to be deploying this type of technology – utilizing the power that it takes or putting in the infrastructure and the capital investment. “How do you future-proof? That's a challenge. Unless you're building a greenfield

All of this means that for most small and medium-sized operations, it’s likely to be business as usual for now – the challenge isn’t how to embrace AI, it’s about how to make the most of the space you have to cope with evergrowing workloads. Weber tells us: “I don't think every enterprise and every data center is going to go to 50Kw cabinets, it's going to be the select, major players that are adopting AI. If one of the smaller players in the AI world gets really good at it, somebody's going to buy them and they're going to become part of the major players just like every other technology we've seen, where it gets swallowed up by the companies with the deep pockets, so it becomes consolidated that way. “We see that being a big problem when it comes to densifying existing rooms or data centers. We have to shut down to move, to deliver more power and cooling to this condensed area – whereas if it was built on day one, it wouldn't be a problem – but even that has its pros and cons, as making such considerations will increase the cost of your initial build.” The message is that these are pioneer days for AI – and while businesses should feel empowered to embrace AI infrastructure, you want to be sure that you are doing so with partners that can aid in making informed decisions. Technology is moving fast, and you don’t want to get left behind, as Eventov puts it: The car that you're going to be driving five years from now is probably going to look completely different from the car that you're driving today. You’ve still got to make a regular car with a nice GPS for the next five years before you get to that self-driving electric vehicle. I think it's a similar thing for IT. Businesses are going to have to keep running and changing and making their existing infrastructure work while we figure out the next frontier. 


DCD Magazine #51

>>CONTENTS

Embracing a new European market

Georgia Butler Reporter

FLAP-D faces new competitors

T

here’s a crunch coming.

In Europe, it is becoming increasingly clear that the most popular data center markets are running up against power and land limitations. Frankfurt, London, Amsterdam, Paris, and Dublin (FLAP-D) won’t be able to support all of the continent’s data center capacity. Where will the overspill go? Before we address that, it’s first important

to reiterate that FLAP-D will remain the top locations, for a while. For 2023 alone, CBRE predicted 480MW of take-up across the European hotspots and 524MW being developed. JLL also made record-breaking predictions in May 2023 (though smaller than those of CBRE), suggesting that 432MW of capacity will be developed across the key markets. “There’s still an insatiable demand for these [FLAP-D] locations,” said Ben Stirk, co-head of

68 | DCD Magazine • datacenterdynamics.com

data center advisory at Knight Frank, during a DCD panel. “The cloud has generally gone to the most mature markets because they're either capitals, or financial hubs, or just big metros.” Beyond these key locations, there are several adjacent or ‘secondary’ markets which are seeing significant growth. JLL’s latest EMEA data center report drew attention to the expected Madrid and Berlin sites, and throughout the DCD panel, participants noted the value of Manchester in the UK, Warsaw in


After FLAP-D 

>>CONTENTS

Poland, and Barcelona in Spain - though all of these have their own constraints. Data center demand is increasing across all verticals - retail and wholesale colocation, cloud, AI-dedicated data centers, and Edge facilities. While they are all data centers, their “needs” from a location vary wildly. Ash Evans, Google’s EMEA lead of energy and location strategy, theorizes that we are at a point of major divergence, where data center sub-asset classes are becoming even more distinct. “I think we're seeing the emergence of a new asset class in terms of AI and ML data centers in particular, which are on another scale again, and arguably have a different set of specifications. Are they less reliable? Do they even need to have the same reliability? Do they have the same latency requirement?” said Evans. “This is then where we have the divergence of two different machine learning style data centers.“ Evans compares the two categories of data center between the model training centers - which are “gigawatt plus scale,” and the inference data centers which are smaller, but still a “significant step up” from the cloud data centers of today. The former will likely be less latency-constrained, while the latter will still be bound by its strictures. What impact this new asset class will have on FLAP-D, is unclear. As we all know, Edge is about the super low latency, close to the end- user model. As a result, these facilities do not necessarily conform to the FLAP-D model in Europe anyway. The whole

“We haven’t seen the AI boom our region as much yet. We have heard a lot about ‘it's going to happen, and be huge.' We’ve seen that in the US. We believe it is going to follow in EMEA" point is that they can be deployed as and where needed. But these are smaller sites - and in the big picture, don’t have the same paradigmshifting effect. Where things seem to be getting a little messy for Europe is in the hyperscale facilities: The cloud and AI data centers, both those used for training and for inference. The amount of power these data centers need is extortionate, as is the physical land space, which major cities simply do not have to spare. “Trying to site renewables and data center capacity in a latency-sensitive location is always a challenge. They don’t go hand in hand, and when it comes to the major drivers such as cloud, artificial intelligence (AI), and machine learning (ML), we’re starting to get pushed away from those metropolitan areas,” said Evans. Thus, instead of looking at high-

connectivity FLAP-D locations, data center developers in this realm need to evolve their approach to site selection. “I think we will start to see more and more concepts whereby large-scale renewable developments are sited or colocated next to hyperscale AI or machine learning data centers,” said Evans. This is logical for a multitude of reasons: It keeps the technology green, and the impact off the local grid. It also does not need to be directly “close” to a population - training workloads can be done in isolation, and latency is not a key concern. Land surrounding the FLAP-D areas is also extremely expensive, and paying for a premium that isn’t actually necessary makes little to no sense. From the real estate point of view, Stirk points out that the logical place in terms of resources is in the Nordics. “[The Nordics] are the obvious place for these big training centers. They have cheap land, cheap renewable power, and free air cooling, all of which provide a great opportunity for these massive deployments,” said Stirk. This is an argument that has been taken to heart by the likes of AQ Compute. DCD previously interviewed the company’s Norway CEO Andreas Myr, which has a business model where they develop AI and HPC data center sites in remote areas, the first of which will be pulling energy from Norway’s hydroelectric dams. Similarly, atNorth is currently working on a 60MW hyperscale facility in Kouvala, Finland. In addition to the added benefits of free air cooling, the region's has a developed heat market,

Issue 51 • December 2023 | 69


DCD Magazine #51

and district heating systems provide an ideal opportunity for heat reuse. The company is also pitching a similar approach in Iceland (see cover feature, page 14). Stirk, however, raises the issue that alongside the benefits of these locations, less mature markets pose a risk from their lack of workers. “It’s all very well having a data center in the Arctic Circle, but actually, who lives there?” he said. “Looking at places like Dublin, one of the reasons that hyperscalers went there originally was because of the access to skills and talent in that area. There is a big tech pool that has grown, and it has a similar effect in places like London and Frankfurt.” For the time being, developers and major contractors are bringing workers with them, and Arup’s George Demetriou, director of missioncritical business areas, said. “They [major contractors] take their teams and settle them near the project, and then build the data centers. That’s a model we’ve seen in FLAP-D, and we see that now happening elsewhere. Whether that is sustainable is to be seen but, for now, it is working,” he said. “With the amount of data centers required in these areas, one project can be finished and the same team can easily stay on and develop the next one.” In the long term, these talent pools will need to expand. Those nomadic teams could work with local contractors, upskilling them

>>CONTENTS

and spreading the talent pool wider to new locations, but it is also the responsibility of the data center operators to make waves. “While Google has a large ground-up talent pool development scheme, we can still do better at attracting talent and engaging earlier,” conceded Evans. “A great example of a sister industry that’s been successful at this is the oil and gas industry. They attract talent straight out of universities and colleges and provide very clear career paths and progressions. As of today, I certainly haven’t seen a lot of that [in the data center industry].” While the AI boom of the end of 2022 has continued on through to 2023, Demetriou argues that Europe hasn’t really seen the true impact of it yet. “We haven’t seen it in our region as much yet,” said Demetriou. “We have heard a lot about ‘it's going to happen, it's going to come, it's going to be huge,’ and we’ve seen that in the US, and we believe it is going to follow in EMEA and then the Asia Pacific region.” We can look to the US as both a ‘model’ of sorts of what is to come. Aashna Puri, director of business development and sustainability for CyrusOne, puts the significant size of deployments in the US down to the AI spring - noting that in the last quarter alone it has reached close to two gigawatts. Puri’s question, however, is whether GDPR or data localization is going to mean that the learning and training of AI in the US has to be replicated in Europe also, or whether we can

70 | DCD Magazine • datacenterdynamics.com

simply connect to that in the US, and then process in Europe. With the AI era of data centers still in its infancy, it is very hard to draw definitive conclusions. Nations are still debating the regulations surrounding the technology and, while to outsiders it can feel like its progress is spinning wildly out of control, AI will be limited by the ability of the data center industry to figure out the best solution to these problems. Many operators, for the time being, will retrofit their existing facilities or developments in order to accommodate AI workloads, instead of starting from the ground up - and that won’t necessarily spell the end for FLAP-D. This seems to be the approach taken by Data4, which told DCD that it believes the scale of its existing campuses - a 180MW site in Frankfurt and two in Paris (120MW and 250MW) - will be ideal for accommodating the needs of AI. Similarly, at the recent AI Safety Summit hosted at Bletchley Park in the UK, Microsoft committed to investing £2.5 billion ($3.16bn) in building artificial intelligence infrastructure in the UK. While details are sparse, some of this will go to its existing data centers in Cardiff and London, with “potential expansion into northern England.” The future of the European data center market is likely to play out as it always has: with caution, but steady growth. If the AI data center takes over then, perhaps, we will all be meeting to discuss which letter will be added to FLAP-D. 


Huawei FusionPower6000 3.0 Full-link convergence, Prefabricated in factory, TTM shortened by 75%, Footprint saved by 30%+ Full-link high efficiency, up to 98.4% @ S-ECO mode AI fault prediction and preventive maintenance


DCD Magazine #51

AWS doubles down on generative AI in Las Vegas

>>CONTENTS

Christine Horton Contributor

After falling behind rivals, the world’s largest cloud company lays all its cards on the table

Credit: AWS

72 | DCD Magazine • datacenterdynamics.com

Issue 51 • December 2023 | 72


Head in the clouds 

>>CONTENTS

A

mazon Web Services (AWS) is going all in on generative AI. At the hyperscaler’s flagship event, AWS re:Invent in Las Vegas, gen AI dominated the conversation and was front and center in a raft of new product launches, services, and partnerships.

technology at three layers of the generative AI stack, and its latest launches reflect its efforts to cover all bases.

“AWS re:Invent was a very AI-flavored event,” Neil Ward-Dutton, VP of AI, automation & analytics Europe at IDC, told DCD. “I was surprised by the degree to which AI announcements pervaded the event.”

At the top layer, AWS offers gen AI applications and services like Amazon CodeWhisperer, an AI-powered coding companion, which recommends code snippets directly in the code editor, which the company said accelerates developer productivity.

The announcements from AWS come at a time when all the hyperscalers are attempting to outdo one other with their latest-gen AI offerings. Microsoft announced its partnership with OpenAI around a year ago, followed by Google’s AI announcements of its Bard LLM [large language model] and later Gemini. Many felt that AWS was trailing behind those rivals, with re:Invent seen as the company’s opportunity to try to push back against that narrative.

Three layers of gen AI stack “Amazon’s been innovating with AI for decades, [its] used to optimize things like our companywide supply chain, to create better retail search capabilities, to deliver new experiences like Amazon Alexa, or our ‘just walk out’ shopping technology,” AWS CEO Adam Selipsky told attendees in his keynote speech at re:Invent. “Gen AI is the next step in AI and it’s going to reinvent every application we interact with at work and at home.” AWS’ strategy is based on delivering

At the bottom layer, AWS offers compute instances from Nvidia, as well as its own custom silicon chips, Trainium for training and Inferentia for inference.

In between, at the middle layer of the stack, AWS provides a choice of foundation models from different providers. Customers can customize those models and integrate them with the rest of their AWS workloads through its latest service, Amazon Bedrock. A managed service, Bedrock makes LLMs and other foundation models (FMs) from AI companies – including AI21, Anthropic, Cohere, Meta, and Stability AI – available through a single API. Then, at re:Invent, Selipsky announced new capabilities for Bedrock to help customers customize models, enable gen AI applications to execute multistep tasks, and build safeguards into their applications. It also revealed Amazon Q, a new type of generative AI assistant for work, also underpinned by Bedrock. It is not alone in this approach, while Microsoft has pushed OpenAI most aggressively, and Google its own models, they all offer access to other models. AWS’ third party bet appears to be Anthropic, with Amazon this September making a $4 billion investment in the AI startup.

Anthropic will use AWS Trainium and Inferentia chips to build, train, and deploy its future foundation models. The two companies will also collaborate in the development of future Trainium and Inferentia technology. However, it is not clear how much of that is a PR move to counter OpenAI’s collaboration with Microsoft’s chip team - Anthropic will likely still continue to primarily use GPUs, at least for training. Anthropic also said it plans to run most of its workloads on AWS and the hyperscaler will become Anthropic’s primary cloud provider for mission critical workloads, including safety research and future foundation model development. This is despite Google acquiring a 10 percent stake in the startup back in 2022 and becoming its cloud of choice at the time. It invested again this October, after the company became an AWS-first business. “This will well position AWS to compete with their peers,” Sid Nag, VP, cloud services and technologies at Gartner, told DCD.

Gen AI use cases Chris Casey, director & general manager, industry technology partnerships at AWS, told us that the company is focused on helping partners “move from the hype cycle to the substance cycle as fast as possible with their customers.” This involves helping to familiarize partners on the different large language models and tailor solutions based on customer use cases. Gen AI use cases dominated discussion during the event as customers move from

Credit AWS

Issue 51 • December 2023 | 73


DCD Magazine #51

>>CONTENTS

proof of concept to demonstrating a return on investment. Casey pointed to two types of use cases. “There are the true gen AI-focused applications and use cases that partners might be building. Think of a generative AI user interface that makes it easier for employees to search for internal data and content,” he said. “But also building in generative AI capabilities into existing software applications is also a big focus.” Casey also pointed out there are customers thinking about gen AI use cases, but where their data is stored is “a big differentiator” for them. At re:Invent, AWS announced the general availability of Amazon Simple Storage Service (Amazon S3) Express One Zone. AWS is pitching Amazon S3 as the lowest latency cloud object storage available. It has data access speed up to 10 times faster and request costs up to 50 percent lower than Amazon S3 Standard, from any AWS Availability Zone within an AWS Region. “The broad adoption we’ve seen of things like S3, where there was a lot of customer data stored, AWS already makes that meaningfully more efficient and faster for them to then build generative AI applications on top of, because the data is already sitting in a database,” said Casey. “That seems to be a really common thread amongst customers and their partners that are building solutions and choosing which infrastructure provider they might want to build those solutions on.”

Nvidia partnership gives a boost for AWS Developments in gen AI and large LLMs are driving demand for deployment of highperformance GPU-based servers in data centers, primarily using Nvidia hardware. It was here that AWS played its biggest hand: A broad partnership with Nvidia, which Selipsky said would allow it “to deliver the most advanced infrastructure for generative AI workloads with GPUs.” AWS will be the first to bring the Nvidia GH200 Grace Hopper Superchip to the cloud. It will also provide the cloud infrastructure for Project Ceiba, an Nvidia initiative to build the world’s fastest AI supercomputer that will be used by Nvidia’s own teams. “Driven by a common mission to deliver cost-effective, state-of-the-art generative AI to every customer, Nvidia and AWS are collaborating across the entire computing stack, spanning AI infrastructure, acceleration libraries, foundation models, to generative AI services,” Nvidia CEO and star of the show Jensen Huang said.

Credit: Nvidia

The Nvidia announcement at re:Invent was really about one thing, said IDC’s Ward-Dutton. That was “AWS being able to claim that will be first cloud provider to market with support for Nvidia’s new range of super high-end GPU-based systems – and therefore, that it is the ‘place to be’ for any organization wanting to be at the cutting edge of experimenting, researching or innovating with generative AI.” Gartner’s Nag, too, noted that AWS partnering with the leader in GPUs today, “will give a significant edge to AWS as a public cloud provider.” However, rival cloud providers have all made significant GH200 orders, and will likely roll them out on their own platforms as soon as possible. The company does have its own exclusive hardware, of course. AWS unveiled the next generation of its AWS Graviton processors and AWS Trainium accelerators. “AWS has historically built their own silicon such as Graviton and they announced an upgrade to Graviton as well,” said Nag. “We are going to continue to see this silicon battle across the hyperscalers from an organic capability perspective.”

74 | DCD Magazine • datacenterdynamics.com

Trainium has historically performed below Google’s TPU family, and is expected to be bested by Microsoft’s upcoming Maia chip. But it has proven to be cost-effective for some workloads.

Fighting over what, exactly? While the likes of AWS, Microsoft, and Google are naturally preoccupied by the gen AI race, this likely isn’t of great importance to most customers and end users. “A bunch of people are very preoccupied by a ‘gen AI race,’ and by which vendor might be winning in that race,” said Ward-Dutton. “Right now, though, I’m not sure the idea of a gen AI race really makes sense. Certainly not if we think of which vendor has the biggest AI model, or the fastest platform. That’s interesting to researchers exploring the frontiers of generative AI, but I don’t think it’s relevant for businesses wanting to adopt the technology. “When we ask business leaders and technology practitioners what concerns them or holds them back regarding investment in generative AI technology, they talk about safety, security, privacy, managed cost and quality ‘baked in’ to generative AI use cases. This is what matters to businesses.” 



DCD Magazine #51

>>CONTENTS

Longer coherence: How the quantum computing industry is maturing Academics with screwdrivers are making way for operations engineers and SLAs

Dan Swinhoe Senior Editor

Q

uantum computing theory dates back to the 1980s, but it's really only in the last five to ten years or so that we’ve seen it advance enough to the point it could realistically become a commercial enterprise. Most quantum computing companies have been academic-led science ventures; companies founded by PhDs leading teams of PhDs. But, as the industry matures and companies look towards a future of manufacturing and operating quantum computers at production-scale, the employee demographics are changing. While R&D will always play a core part of every technology company, making quantum computers viable out in the real world means these startups are thinking about how to build, maintain, and operate SLA-bound systems in production environments. This new phase in the industry requires companies to change mindset, technology, and staff.

Culture shocks and changing faces At quantum computing firm Atom Computing, around 40 of the company’s 70 employees have PhDs, many joining straight out of academia. This kind of academic-heavy employee demographic is commonplace across the quantum industry. “I'd venture that over half of our company doesn't have experience working at a company previously,” says Rob Hayes, CEO of Atom. “So there’s an interesting bridge between the academic culture versus the Silicon Valley tech startup; those are two different worlds and trying to bridge people from one world to the other is challenging. And it's something you have to focus and work on openly and actively.” Maturing from small startups into large companies with demanding customers and shareholders is a well-trodden path for hundreds of technology companies in Silicon Valley and across the world.

Credit: Rigetti

76 | DCD Magazine • datacenterdynamics.com

And quantum computers are getting there: the likes of IonQ, Rigetti, and D-Wave are already listed in the Nasdaq and New York Stock Exchange – although the latter two companies have had to deal


This feature ceases to exist 

>>CONTENTS

at various times with the prospect of being de-listed due to low stock prices.

users enter jobs into a queue and get answers back as the queue executes,” says Atom’s Hays.

Most of the quantum companies DCD spoke to for this piece are undergoing a transition from pure R&D mode to a more operational and engineering phase.

“We are approaching how we get closer to 24/7 and how we build in redundancy and failover so that if one system has come offline for maintenance, there's another one available at all times. How do we build a system architecturally and engineering-wise, where we can do hot swaps or upgrades or changes with minimal downtime as possible?”

“When I first joined four years ago, the company was entirely PhDs,” says Peter Chapman, IonQ CEO. “We're now in the middle of a cultural change from an academic organization and moving to an engineering organization. We've stopped hiring PhDs; most of the people we're hiring nowadays are software, mechanical, and hardware engineers. And the next phase is to a customer-focused product company.” Chapman points to the hirings of the likes of Pat Tan and Dean Kassmann – previously at Amazon’s hardware-focused Lab126 and rocket firm Blue Origin, respectively – as evidence of the company moving to a more product and engineering-focused workforce. 2023 also saw Chris Monroe, IonQ co-founder and chief scientist, leave the company to return to academia at North Carolina’s Duke University. During the earnings call announcing Monroe’s departure, Chapman said: “Chris would be the first one to tell you that the physics behind what IonQ is doing is now solved. It's [now] largely an engineering problem.” Atom’s Hayes notes a lot of the engineering work that the company is doing to get ready for cloud services and applications is softwarebased, meaning the company is looking for software engineers. “We are mostly looking for people that have worked at cloud service providers or large software companies and have an interest in either learning or already some foundational knowledge of the underlying physics and science,” he says. “But we're kind of fortunate that those people self-select and find us. We have a pretty high number of software engineers who have physics undergrads and an extreme interest in quantum mechanics, even though by trade and experience they're software engineers.”

Operationalizing quantum computers On-premise quantum computers are currently rarities largely reserved for national computing labs and academic institutions. Most quantum processing unit (QPU) providers offer access to their systems via their own web portals and through public cloud providers. But today’s systems are rarely expected (or contracted) to run with the five-9s resiliency and redundancy we might expect from tried and tested silicon hardware. “Right now, quantum systems are more like supercomputers and they're managed with a queue; they're probably not online 24 hours,

Other providers are going through similar teething phases of how to make their systems – which are currently sensitive, temperamental, and complicated – enterpriseready for the data centers of the world. “I already have a firm SLA with the cloud guys around the amount of time that we do jobs on a daily basis, and the timeframes to be able to do that,” says Chapman. “We are moving that SLA to 24/7 and being able to do that without having an operator present. It's not perfect, but it’s getting better. In three or four years from now, you'll only need an on-call when a component dies.”

“And the first part of that starts with having a separate deployment team and a site reliability engineering team that can then run the SLA on that machine.” He adds: “Not all problems are quantum problems. It can't just be quantum engineers; it's not scalable if it's the same people doing everything.” “It's about training and understanding where the first and second lines of support sit, having a cascading system, and utilizing any smart hands so we can train people who already exist in data centers.”

Supply chain matures While the quantum startups are undergoing their own maturing process, their suppliers are also being forced to learn about the needs of commercial operators and what it means to deploy in a production data center. For years, the supply chain – including for the dilution refrigerators that keep many quantum computers supercooled – has dealt with largelyself-reliant academic customers in lab spaces.

Rigetti CTO David Rivas says his company is also working towards higher uptimes. “The systems themselves are becoming more and more lights out every quarter,” he says, “as we outfit them for that kind of remote operation and ensure that the production facilities can be outfitted for that kind of operation.”

Physicists with screwdrivers Manufacturing and repair of these systems is also maturing, since the first PhD-built generations of quantum computers. These will never be mass-produced, but the industry needs to move away from one-off artisanal machines to a more production line-like approach. “A lot of the hardware does get built with the assistance of electronics engineers, mechanical engineers,” says Atom’s Haye, but much is still built by experimental physicists. IonQ’s Chapman adds: “In our firstgeneration systems, you needed a physicist with a screwdriver to tune the machine to be able to run your application. But every generation of hardware puts more under software control. “Everywhere a screwdriver could be turned, there's now a stepper motor under software control, and the operating system is now doing the tuning.” Simon Phillips, CTO of the UK’s Oxford Quantum Circuits, says OQC is focused on how it hires staff and works with partners to roll out QPUs into colocation data centers. “If we’re going to get to the point that we put 10 QPUs in 10 locations around the world, how do we do that without having an army of 100 quantum engineers on each installation?

Credit: Atom Computing

Richard Moulds, general manager of Amazon Braket at AWS, told DCD the dilution refrigerator market is a “cottage industry” with few suppliers. One of the main fridge suppliers is Oxford Instruments, an Oxford University spin-out from the late 1950s that released the first commercial dilution unit back in 1966. The other large incumbent, Blufors, was spun out of what is now the Low Temperature Laboratory at Aalto University in Finland 15 years ago. Prior to the quantum computing rush, the biggest change in recent years was the introduction of pulse tube technology. Instead of a cryostat inserted into a bath of liquid helium4, quantum computers could now use a closed loop system (aka a dry fridge/cryostat). This meant the systems could become smaller, more efficient, more software-controlled - and more user-friendly.

Issue 51 • December 2023 | 77


DCD Magazine #51

>>CONTENTS

“With the wet dilution fridge (or wet cryostat), you need two-floor rooms for ceiling height. You need technicians to top up helium and run liquefiers, you need to buy helium to keep topping up,” says Harriet van der Vliet, product segment manager, quantum technologies, Oxford Instruments. “It was quite a manual process and it would take maybe a week just to pre-cool and that would not even be getting to base temperature.”

“The market has grown for dilution fridges, so there are lots more startups in the space as well making different cooling systems,” says van der Vliet. “There are many more players, but the market is growing. “I think it's really healthy that there's loads of players in the field, particularly new players who are doing things a little bit differently to how we've always done it.” The incumbents are well-placed to continue their lead in the market, but QPU operators are hopeful that competition will result in better products.

For years, the fridges were the preserve of academics doing materials science; they were more likely to win a Nobel prize than be part of a computing contract. “Historically, it's been a lab product. Our customers were ultra-low temperature (ULT) experts; if anything went wrong, they would fix it themselves,” says van der Vliet. “Now our customers have moved from being simply academics to being commercial players who need user-friendly systems that are push button.” While the company declined to break out numbers, Oxford said it has seen a “noticeable” change in the customer demographic towards commercial quantum computing customers in recent years, but also a change in buying trends. QPU companies are more likely to buy multiple fridges at once, rather than a single unit every few years for an academic research lab. “The commercial part is growing for sure,” adds David Gunnarsson, CTO at Blufors. The company has expanded factory capacity to almost double production capabilities to meet growing demand. “There have been more and more attempts to create revenue on quantum computing technology. They are buying our systems to actually deploy or have an application that they think they can create money from. We welcome discussion with data centers so they can understand our technology from the cryogenics perspective.” And while the industry is working towards minimizing form factors as much as possible, for the foreseeable future the industry has settled on essentially brute force supercooling with bigger fridges. Both companies have released new dilution fridges designed for quantum computers. Smaller fridges (and lower qubit-count) systems may be able to fit into racks, but most larger qubit-count supercooled systems require a much larger footprint than traditional racks. Blufors’ largest Kide system can cool around 1,000 qubits: the system is just under three meters in height and 2.5 meters in diameter, and the floor beneath it needs to be able to take about 7,000 kilograms of weight. “It has changed the way we do our product,” says Gunnarsson. “They were lab

Credit: IonQ tools before; uptime wasn’t discussed much before. Now we are doing a lot of changes to our product line to ensure that you can be more certain about what the uptime of your system will be.” Part of the uptime challenge suppliers face around fridges – an area where Gunnarsson notes there is still something of a mismatch – is in the warm-up/cool-down cycle of the machines. While previously the wet bath systems could take a week to get to the required temperatures, the new dry systems might only take a day or two each way. That is important, because cooling down and warming up cycles are effectively downtime; a dirty word when talking about service availability. “The speed with which you can get to temperature is almost as important as the size of the chip that you can actually chill,” says AWS’ Moulds. “Today, if you want to change the device's physical silicon, you have got to warm this device up and then chill it back down again, that's a four-day cycle. That's a problem; it means machines are offline for a long time for relatively minor changes.” While this might not be an issue for inoperation machines – Rigetti CTO Rivas says its machines can be in service for months at a time, while Oxford Instruments says an OQC system was in operation non-stop for more than a year – the long warm-up/cool-down cycle is a barrier to rapid testing. “From a production perspective, the systems remain cold for a relatively long time,” says Rivas. “But we're constantly running chips through test systems as we innovate and grow capacity, and 48 hours to cool a chip down is a long time in an overall development cycle.” Oxford Instruments and Blufors might be the incumbents, but there are a growing number of new players entering the fridge space, some specifically focusing on quantum computing.

78 | DCD Magazine • datacenterdynamics.com

“There will be genuine intellectual property that will emerge in this area and you'll definitely start to see custom designs and proprietary systems that can maintain temperature in the face of increasing power.” Atom’s Hayes notes that, for laser-based quantum systems, the lasers themselves are probably the largest constraint in the supply chain. Like the dilution fridges, these are still largely scientific technologies made by a handful of suppliers. “We need relatively high-powered lasers that need to be very quiet and very precise,” he says. “Ours are off the shelf, but they're semi-custom and manufacturer builds to order. That means that there's long lead times; in some cases up to a year.” He adds that many of the photonic integrated circuits are still relatively small - the size of nickels and dimes - but hopes they can shrink down to semiconductor size in future to help reduce the footprint of what are often room-sized devices.

Commercial noise: the death of quantum collaboration For now, the quantum industry is still enjoying what might be the autumn of its happy-go-lucky academic days. The next phase may well lead to quantum supremacy and a new phase in high-performance computing, but it will likely lead to a less open industry. “I think it’s nice that the industry is still sort of in that mode,” says AWS’ Moulds. “The industry is still taking a relatively open approach to the development. We're not yet in the mode of everybody working in their secret bunkers, building secret machines. But history shows that once there's a clear opportunity, there's a risk of the shutters coming down, and it becoming a more cut-throat industry.” “In the end, that's good for customers; it drives down costs and drives up reliability and performance. But it might feel that might feel a little bit brutal for some of the academics that are in the industry now.” 


If you observe it 

>>CONTENTS

When data centers met quantum

Dan Swinhoe Senior Editor

What happens when quantum computers arrive in data centers?

W

e are now reaching the point where quantum computers are being deployed into data centers, leading to a new way of learning. Data center operators are having to come to terms with hosting dilution fridges with new cooling requirements, while quantum computing and dilution fridge providers are having to learn how data centers operate and where they can fit in.

Quantum computers come to colocation facilities In what DCD understands is a world-first, UK quantum computing firm Oxford Quantum Computing (OQC) has deployed six of its QPU systems in two colocation data centers. “We wanted to avoid just sitting in a lab in stealth for two years, and pop out and expect people to have already made applications for it,” says Simon Phillips, OQC CTO. “What we need to do is allow access.” The company currently operates a lab in Reading, hosting its eight-qubit Lucy system, which is accessible online through AWS’s Braket service. Phillips tells DCD the decision to deploy into colo sites was made in the wake of Storm Arwen in 2021.

“We had a power brownout, and that meant that one part of the computing stack could not restart properly,” he explains, leading to the system warming up. While the system was undamaged and re-cooled, it made the company realize it needed to consider uptime more seriously. “We're trying to demonstrate to the world that computers are ready to start using, and it's not good enough for it to go off. But this has already been solved; there are these data centers around that have all this infrastructure.” At the same time, a proof of concept project was at risk of stalling because the customer said it couldn’t have live data leaving its environments into a lab facility.

OQC systems is live and customer-facing; the other two are live but held for upgrades, testing, etc. Each of the three QPU and accompanying cooling system requires around 15-20kW (mostly going towards the six 9kW watercooled helium compressors), but on a footprint described by Phillips as around 14x10 600mm floor tiles. OQC first reached out to Cyxtera in March 2022; the first fridge was installed in January 2023, and the final fridge in March. Facility work continued until April, after which testing began. The systems are now live. The Cyxtera deployment is in an enclave room in a former tape library from the company’s CenturyLink days.

“Suddenly the penny dropped that the data center was full of our customers,” says Phillips. “The mission became clear that to actually access real-world customer data outside of research projects, we have to connect directly to people's infrastructure in the lowest latency way.”

“That room was good because we could achieve a watts-per-square-foot that made sense for us,” says Charlie Bernard, Cyxtera’s EMEA director for growth strategy. “We could get enough power in there, there was enough air cooling to make it work, and there is plenty of pipe work under the floor.”

The company has deployed three 32-qubit systems in each of Cyxtera’s LHR3 facility in Reading, UK, and Equinix’s TY11 facility in Tokyo, Japan.

While the system was taller than the room ceiling height – 3.4m with an additional 50cm of overhead clearance required for pipework – the single-story facility has a large ceiling void so Cyxtera had extra wiggle room to work with.

In terms of each deployment, one of the

Modifications to the ceiling and fire alarm system had to be made to allow for sufficient overhead clearance. Through a raised floor environment, the frame of the multi-ton system was attached directly to the slab and mechanically isolated from the raised floor. Each dilution fridge requires up to 40 liters of liquid nitrogen to be pumped into the fridges each week. For the chilled water-cooled compressors, Cyxtera had to tap into the building’s water pipework. To accommodate liquid helium and hydrogen, the company also installed oxygen detectors in the room with light indicators and alarms to show if there’s a potential leak or gas is boiled off. “We're taking them on that journey of Credit: IonQ

Issue 51 • December 2023 | 79


DCD Magazine #51

>>CONTENTS

what it means to deploy into a data center, because they’ve been working their design in something of a silo,” says Bernard. “I think it's been eye-opening for them, coming to see our facilities and learning about things like weight limitations, door sizes, noise, interference, and so on.

In terms of operations, the handling of the liquid hydrogen and nitrogen within the facility is managed by the system owner, rather than the colo provider. But data center operators will benefit from understanding how to handle the liquids and potentially manage cryogenic cooling systems.

“There's been that knowledge share of us to them, also there's been us reaping the benefits for us learning the practices and procedures of handling liquid gases and moving that throughout the building.”

“There is some learning with how to run all of the systems,” notes Oxford Instruments’ Harriet van der Vliet. “How to close a system properly so that you don't have a leak, how to pump out a system, how to leak check a system. But we're very good at providing training to our users so that they're able to run their systems themselves.”

Phillips adds of his company’s learnings: “There was a maturity aspect around language and where the demarcations of responsibilities are. In a lab, everything's everyone's responsibility. Now there's a line which something like chilled water is their responsibility and a line at which is our responsibility.” The Cyxtera site also doubles as OQC’s network operations center, with a number of the company’s staff on-site daily. In Tokyo, the OQC deployment is in a cage in the data hall, with that deployment designed to be remotely managed, with Oxford Instruments’ in-country service team retained to deal with any issues. “It's the same team for them that do MRI scanners,” says Phillips. They tour the hospitals, checking all the cryogenics in MRI systems, so it's nothing new to them.” An interesting component of the Tokyo deployment that the company had to adjust to was the seismic requirements. Cryogenic systems have a liquid nitrogen container that normally sit free-standing on the floor, and the company had to ensure regulatory compliance around how it was bolted to the floor. Phillips said he expects future deployments to go “a lot smoother” as the company continues to learn and develop its operations. “We've got the playbook on how do we install superconducting computer into a data center anywhere in the world,” he says. “Every time we go to a new location, we work out where the differences are. To us, they all seem pretty standard; data center people will say they're all different, but we're finding they're very similar compared to labs. Prior to these deployments, it wasn’t known how different types of interference – sound, vibration, and electromagnetic – would impact deployments in data halls. Likewise, there were potential issues around how the system’s highfrequency microwave controls could impact nearby IT hardware. Phillips says so far he hasn’t seen any problems around that; when asked if that is solely because of design considerations or perhaps the problem had been overstated, he says likely a mix of both. “We have done a lot of work around shielding, but we’re yet to see anything that’s creating a problem.”

“That's something that we've had to work on; going from our users being ULT experts who don't need training because they could theoretically just build their own systems, to users who are not experts in cryogenics and do require some training.” Both Cyxtera and OQC say they are now better prepared for future deployments. “Our first install had a very primitive layout. They look very similar to a lab-based thing. But there are a number of projects underway already to make that easier, faster, more suitable, because we're now asking the questions that nobody's ever asked before,” says OQC’s Phillips. “The way the cooling works, or the way the air handling works, no one's ever asked questions before.” Cyxtera’s Bernard notes the company hopes to replicate this project elsewhere in future: “This was the first deployment, and it's been a great learning curve for all of us. We're in a much better position now.” OQC said it aims to first expand into more markets, then build out existing deployments with updated form factors as demand increases. “There are a number of data hubs in the world – clusters of financial companies in a certain location or material sciences companies in another – that we can target getting them access to a QPU in that location, and then we'll kind of build out and scale from there,” says Phillips. “I think that the solution does lie within colos as a priority. “The biggest barrier to entry for using quantum computers was actually the integration into existing digital infrastructure, not qubit account. The fact that we're able to have a chat about data centers and quantum computers means it's all moved in the right direction.”

Supply chains learn about data centers Fridge supplier Oxford Instruments has installed a Rigetti quantum computer at its main site in Tubney Wood, Oxfordshire. The system – Rigetti’s first outside its the US and still one of the few the company has shipped outside its own facilities – is connected

80 | DCD Magazine • datacenterdynamics.com

to Rigetti’s cloud for its own and its customers’ use. Harriet van der Vliet, product segment manager, quantum technologies, Oxford Instruments, tells DCD that the company did not have an on-site data center prior to the installation, and still doesn’t have what we’d consider a traditional white space data hall. “Our factory is not the equivalent of a colocation facility. We provided the fridge and the facility as part of Innovate UK funding so that Rigetti and its partners Phasecraft, Standard Chartered, and others could work on their applications for quantum computing,” she says. “We had to learn all about having a commercial system with the correct connections to the cloud, backup generators, backup chillers, which you don't necessarily think about in a lab,” she says. “If the power goes out in a university, you just say, ‘okay, I'll warm up and then I'll cool down the next day.’ But if that happens in a data center or system that's connected to the cloud for customers, that's absolutely not okay.” When asked if the company would look to host more quantum computers in future or this was a one-off, van der Vliet said that there “is always the opportunity to do lots more” in the space. “We learned about building in redundancy, such as backup generators, chillers, etc. so that the system had good uptime for the customer. It was great working with Rigetti to host their first quantum computer in Europe.” Beyond its own facility, Oxford Instruments was involved in a number of QPU deployments in live data centers – including OQC’s deployment in Cyxtera and Equinix facilities, and the CESGA supercomputing center in Spain. “We have now installed various systems into data centers,” says van der Vliet. “We've had to install into different areas, which we're not used to, and be prepared for security having to get people particular badges to get them in and out of data centers, which is something we've not been used to.” “We had to change how we install systems physically, and getting into the data centers has been indifferent. Some of the data centers might have certain areas which are just ready to wheel a 19-inch rack through and our systems are obviously quite a bit bigger than that.” “But it’s not really that different [from academic settings]; we have had these issues and difficulties within installs for many years,” she adds. “It's just about getting that experience. We now have that second and third installation, and both installations were much better than the first.” 


Issue 51 • December 2023 | 81


DCD Magazine #51

Defining net-zero: You can’t manage what you don’t measure

Dr. Maria-Anna Chatzopoulou Principal mechanical engineer at Cundall

Advertorial: How we look to achieve net-zero in today’s resource-stressed climate

66 | DCD Magazine • datacenterdynamics.com

>>CONTENTS


CUNDALL | Advertorial 

>>CONTENTS

A

s the move towards achieving net zero carbon gains traction, the data center industry is facing potentially conflicting demands – on the one hand needing to deal with the rising demand for data processing capacity, while on the other, needing to limit the growth in energy and water consumption, to reduce the stresses being placed upon national infrastructure.

can be reduced by the adoption of highdensity racks and/or by moving towards highrise data centers buildings.

At the same time, the industry is striving to develop appropriate solutions to achieve net-zero carbon and to limit, and ultimately reverse, the environmental impact of the sector. However, at present, much of the construction industry is wrestling with a clear definition of what net-zero carbon is, and has yet to agree a unified methodology for how it should be measured.

The Facility Water Systems (FWS) that support liquid cooling often operate at elevated temperatures (>40C), relative to the systems that support air cooling. This makes them ideally suited to integration with waste heat recovery and district heating networks in the urban environment.

Much work needs to be done to provide clear direction on what is to be included in the net-zero target. Operational carbon has been the focus, but embodied carbon is more difficult to quantify. It needs a broader understanding of the entire supply chain carbon emissions, requiring consistent data collection and monitoring to be able to create benchmarks and track them. Everyone has a role to play in achieving true net-zero carbon (vendors, owners, consultants, contractors, and authorities.) Over this series of articles, Cundall will take you through some changes they are seeing in their work as designers, pushing the boundaries a little further, to deliver the energy-efficient, resilient digital infrastructure the global community needs, in the net-zero carbon era.

How do we get to net-zero carbon in a resource-stressed environment? Data centers can play a key part in promoting a circular economy and moving towards net-zero carbon by making efficient use of resources, including electricity, water, and land, and by maximizing reuse across multiple dimensions. Heat from the racks can be recovered and re-used in local district heating networks, fulfilling the heating demands of the local community. However, enabling waste heat recovery in a data center facility, requires some inventive thinking, to ensure that the facility’s resiliency and business continuity are not compromised. As the designers of the largest data center waste heat recovery system in Europe, we will examine the challenges and opportunities of such systems, in our upcoming articles. The land-use of the data center building

Cooling of high-density racks, with power densities that may exceed 50 kW per rack, goes beyond the capabilities of conventional air-cooling systems. Liquid-to-chip cooling is a suitable technology for addressing such high loads.

The potential to exploit free cooling also increases, which may in turn reduce or eliminate the need for evaporative or mechanical, refrigerant-based cooling. This can greatly reduce the energy and water consumption of the facility. In this series of articles, we will share our experience in designing liquid-cooled facilities and highlight the benefits and challenges.

Location, location, location The development of high-rise data centers is often driven by several factors: primarily the scarcity and cost of suitable development land, particularly in urban areas where the location of the facility may have been selected due to its proximity to a target group of users. An example of this might be data centers clustered around financial centers. Other factors may also apply, such as the desire to exploit ‘trapped’ capacity in locations where power is available, but only to a site of limited size, such as on a partially built-out DC campus. The spatial constraints force the developer to build upwards, rather than outwards. In these circumstances, maximizing utilization of the available power supply, and plot footprint is key. High-rise data centers may therefore benefit from increased rack power density, where this is paired with the right cooling technology. However, further challenges arise from the need to ensure efficient operation: how can waste heat be effectively dissipated away from the building, and how can heat rejection equipment, mechanical and electrical plant be accommodated in a building, where both the roof space and plot size are limited? We will draw upon our experience of designing highrise data centers across the EMEA and APAC regions, to provide some answers to these questions.

Be water aware With water becoming a scarcer commodity, water usage effectiveness (WUE) is growing in importance. In projects across multiple regions, lack of an adequate water supply is driving the adoption of waterless heat rejection systems, while PUE still needs to be optimized. This is forcing designers to re-evaluate other parameters, such as the thermal operating envelope within the technical space. By designing for higher operating temperatures, the period when free cooling is viable may be extended. The incorporation of waste heat recovery may also be beneficial. However, in many cases, the move away from adiabatic cooling is forcing increased reliance upon mechanical, refrigerant-based cooling solutions, which come with their own set of problems. The use of zero ODP and low GWP refrigerants is a fundamental requirement for this transition. But while HFOs were initially seen as being the best ‘drop in’ alternatives to HFCs, from both an environmental and energy efficiency perspective, recent research has revealed that they are not as benign as first thought; when some commonly used HFO refrigerants decompose in the atmosphere, they form trifluoroacetate (TFA) – a salt of trifluoroacetic acid. TFA is harmful to aquatic life, even in low concentrations. It is also virtually nondegradable in the environment and cannot easily be removed from drinking water supplies using any current water purification techniques. Other research has proposed that one of the breakdown products of a widely used HFO refrigerant is R-23 – which itself, is a highly potent greenhouse gas. Would the use of natural refrigerants be able to overcome these problems? Is the industry ready for such a transition? We will be discussing this topic as part of this article series. Within Cundall, we have many years of experience in designing efficient data centers. However, the more we learn, the more we come to appreciate how much work is still to be done. True net-zero carbon means engineeringout carbon emissions through design, to the greatest extent possible, whether those are the upfront emissions generated from the manufacture, transportation and installation of materials and equipment, or operational energy and water-related emissions. The aim of the upcoming series of articles is to explore the areas where we, as designers, can have the most impact. 

Issue 51 • December 2023 | 67


DCD Magazine #51

>>CONTENTS

Rebuilding Ukraine’s telecoms infrastructure amid war Plans to modernize the country’s networks have been delayed, but 5G could be on its way next year

Paul Lipscombe Telecoms Editor

Photos by Lifecell Ukraine

"

W

e’ve had more than 10 percent of our sites completely or partially destroyed,” explains Sasha Ananyev, head of the network operation department at Vodafone Ukraine. That’s roughly 1,400 base stations, and does not include base stations located in Donetsk and Luhansk, areas that have been occupied by Russia since 2014. Since Russia's unprovoked invasion of Ukraine began in February of last year, CSIS (Center for Strategic and International Studies) estimates that Ukraine's ICT infrastructure has been hit by more than $2 billion worth of damage. The war has led to the destruction of more than 4,000 base stations across all operators, plus 60,000 kilometers of fiber optic lines, while 12.2 percent of households have lost access to mobile services. Almost two years on, the country is still stuck rebuilding its telecoms infrastructure. Vodafone Ukraine estimates that the war has cost the company in the region of 2 billion hryvnia ($54.3 million) as its telecoms infrastructure has been significantly damaged. Some sites have even been damaged again after being fixed, says the operator.

84 | DCD Magazine • datacenterdynamics.com

Cutting the lines It’s not an uncommon tactic for invaders to attack telecommunications infrastructure during war, using it to quash communications and suppress locals from accessing the wider world. This happened from the outset of the invasion. “We could tell by the way our antennas had been shut down that it was deliberately targeted,” says Ananyev, who claims that Russia has stolen millions of dollars worth of its telecoms equipment, and bragged about it on social media. “It was deliberately shut down. The electric supply stations were also destroyed as the occupiers were afraid they could be used by our army.” His view was shared by another Ukrainian telecoms provider, Lifecell, which also reported around 10 percent of its base stations being destroyed at its peak, at around 900 sites. “The first thing the occupiers did in any territory they entered was attack the telecoms infrastructure to deprive Ukrainians of the opportunity to receive news from the Ukrainian mass media, and transmit information about the movement of Russian troops to the Armed Forces of Ukraine,” says


Ukraine's telecoms 

>>CONTENTS

Konstantin Sotnikov, manager of the regional operation division, Lifecell Ukraine. Others have noted that it also stops those left in occupied territory from sharing evidence of war crimes.

our servicing map, it just disappears. When there are war actions in that region, we can't do anything. But when the territory is liberated, and we know it from our government or our army, our engineers go in there to fix it.”

Following the occupation of Kherson, connectivity disappeared several times before all operators were shut down in the region at the end of May 2022, Sotnikov says.

He notes that the army needs to de-mine the area before any repair work can go ahead and then, depending on the damage caused to the site, the frame of work is determined.

“For some time, one Lifecell mobile base station worked in the city, thanks to which the people of Kherson could keep in touch with their relatives and read Ukrainian news. But, later, the occupiers turned it off as well.”

In November of last year in Kherson, Ananyev says his team was able to reconnect the first base station in a week after the Ukrainian forces liberated the city.

Impact of the sites going down Damage can vary, with some sites only partially damaged, while others were fully destroyed as a result of heavy shelling. Vodafone estimates that it can cost between $60,000 to $70,000 if a site is completely damaged. When it is still salvageable, the company believes it spends around $40,000 on average to repair telecoms infrastructure while using its own funds to reconnect these sites. Lifecell’s Sotnikov explains that to deal with the impact of the war, the operator has had to optimize the network operation in different regions to best maximize 2G connection in more than 400 communities and 3/4G high-speed mobile Internet access services in more than 100 communities in the regions where Ukrainians from combat zones were evacuated.

Sotnikov told DCD that all of its engineers are provided with bulletproof vests, which are mandatory. He explains that in Vovchansk, in the Kharkiv region, a team of optical line engineers was hit by an airstrike while carrying out emergency recovery work in the region. “All the repairmen received a concussion, bruises, trauma, and shrapnel injuries,” he says. “The base station and the tower itself, which were only recently restored after the occupation, were completely destroyed.” It’s “impossible to continue repair work in the city,” as it is under constant shelling, he says.

“We have several millions of subscribers that we've lost due to the war, some of them surely relocated to the west of Ukraine, but many of them relocated abroad in the European Union and all over the world.” Support for Ukraine’s mobile operators has also come in the form of Elon Musk’s SpaceX subsidiary Starlink. Musk has provided thousands of Starlink terminals to Ukrainians on the frontline, primarily supported by the US government and other governments. Sotnikov says the terminals have been crucial in providing instant communication. “Starlink solutions allow us to restore the network quickly, especially in those areas where the transport network is seriously damaged and needs time to repair. “The satellite as a component of the transport network is a revolutionary solution for the Ukrainian market, which during

The telco says it attempts to restore telecom sites remotely as much as it possibly can, but will send its engineers to the region where the coverage is down if it is not occupied. In preparation for the sites going down, Lifecell says that it has repair teams assigned in each region.

“To enable the mobile network to meet the growing demand for data services and avoid congestion, Lifecell turned on LTE coverage in the 2,100MHz band with the support of state authorities,” he says.

“Since the beginning of the full-scale invasion, 90 repair crews throughout Ukraine have been making two to three trips every day to return the network to Ukrainians, sometimes under fire and at risk to the lives of our engineers,” says Sotnikov.

“But that was not all that needed to be done. The three biggest Ukrainian mobile operators (Lifecell, Vodafone, and Kyivstar) launched national roaming in Ukraine. It means that subscribers can switch to the network of the other operators if it is not possible to use the signal of their mobile operator.”

“Often the repair work is complicated and slowed down due to problems with energy supply in the de-occupied territories. We need generators and fuel, which is quite problematic to find in the newly liberated areas, as well as people who will deliver this fuel and regularly refuel the generator.

Even now, the national roaming initiative is still in place. Sotnikov claims that Lifecell hosts up to 300,000 subscribers of other mobile operators on its network daily.

However, the war has forced many of the population abroad, meaning mobile subscribers have been lost, explained Vodafone. The UN estimates that more than 6.2 million people have fled the country, mostly women and children.

“The military and residents help us with this. Although the territories have been liberated by the Armed Forces, they are still heavily mined and periodically come under enemy fire.”

Carrying out repairs safely When a cell site goes down due to damage, it’s not as simple as just getting an engineer deployed to the site, as it would be in normal circumstances. During war, the operators had other factors to contend with. Keeping employees safe is prioritized, says Vodafone’s Ananyev.

Keeping the workforce supported From a personnel perspective, both Lifecell and Vodafone have noted that some of their employees have been called up to fight in the war. Fortunately, neither operator has lost their workforce to the war.

“So when the base station disappears from

Issue 51 • December 2023 | 85


DCD Magazine #51

>>CONTENTS

which was held in Riga, Latvia, Ukraine signed an MoU with the Latvia government, which will see the latter provide support to rebuild Ukraine's telecoms infrastructure. “For us, the difference between five base stations or 50 base stations is not so big, as we were used to trying to connect up to 500 base stations at the height of this,” Ananyev laughs. “We recently had severe weather conditions in the south of Ukraine, and their 400 base stations were blacked out here, and it was just peanuts for us to reconnect them all because of our experience.” the war allows operators to quickly restore communication in the liberated territories because the usage of Starlink does not depend on the usual Internet infrastructure.” However, Starlink hasn’t always been reliable, with Ukrainian government officials reporting “catastrophic” outages last October, while it’s also grappled with “signal jamming” issues, according to Musk. Ukrainian forces have also had to deal with Musk’s occasional interference, with the CEO reportedly turning off the service in certain areas.

Supply chain issues Unsurprisingly, the war has made resupplies a challenge. “At the beginning of the war, the enemy deprived us of the opportunity to quickly import necessary equipment and machinery for the repair of base stations from abroad,” says Sotnikov, due to the closure of airports and seaports. Sotnikov also says that, because Lifecell buys its equipment with a foreign currency (USD), the exchange rate increased, further ramping up costs. On top of that, the shortage of fuel only intensified issues. “We need the fuel for the work of the mobile network repair team and both for refueling the diesel generators to keep the network running during power outages. During terrorist attacks on energy infrastructure, companies’ fuel costs increased significantly.”

Falling behind in the 5G race While many operators across Europe have focused on launching 5G networks, the same can’t be said for Ukraine. Speaking to DCD, Stanislav Prybytko, director general of the directorate for mobile broadband for Ukraine's Ministry of Digital Transformation, explains that the war with Russia has stalled the country's 5G goals. "Our government adopted a plan in 2020, to launch 5G technology in 2022. But due to the full-scale invasion, we changed this," he says. "It's currently not safe to carry out scientific research, so we need to wait until the end of the war, and only then will it be practical to start. But still, we don't want to waste this time. We are in the process of negotiations with our military services to launch a 5G pilot next year." Ananyev is confident that the operator will be able to make progress with its 5G development next year. “If it were not for the war, we would have already introduced the 5G network, because before the war we already held some trials,” he says. “I suppose that we would already have the network in the main cities of Ukraine by now.”

Defiance Support for the development of its 5G network is also coming from Latvia. During the 5G Techritory event in October,

86 | DCD Magazine • datacenterdynamics.com

As for Lifecell, there have been clear improvements in reducing the number of base stations offline, with the figure currently down to approximately 6.7 percent from above 10 percent. “Since the beginning of the war, we have built more than 800 new base stations throughout Ukraine,” says Sotnikov. “Thanks to the construction of new and modernization of existing base stations, 4G communication and mobile Internet have been improved for 542 settlements.” While the destruction of mobile sites is still a real threat, Lifecell and Vodafone continue to operate in dangerous conditions, to keep their people and forces connected in the face of war, and deliver the latest technology. The Ministry of Digital Transformation of Ukraine's Prybytko is confident that this moment will pass, and Ukraine will stave off Russia’s invasion, retaining its independence. “Our estimation on 5G’s full-scale launch will be eight months after our victory,” he says. 

The Ukraine war has severely impacted millions in the country. Consider donating today to help those suffering.


When it comes to powering your UK data centre, there are more options available than you might think. Connect your future with us.

Let’s start a conversation matrixgroup.co.uk enquiries@matrixgroup.co.uk 03303 209 899

Issue 51 • December 2023 | 87


NEW

BC 2 - 500 UPS Battery Cabinet Powered by Nickel-Zinc Smallest Industry Footprint & 50% Increased Current Capacity WATCH VIDEO


London calling for power 

>>CONTENTS

Powering London As we weather a storm of multiple energy crises, how can a major capital city like London provide power for the future?

T

he relationship between data centers and power capacity is increasingly tumultuous. This is all the more notable in data

center hotspots, such as Northern Virginia, Frankfurt, and Dublin. In the UK, London has faced a similar crunch. The city has been plagued with power capacity concerns, with data centers often getting the brunt of the blame after house building in the west ground to a halt.

Georgia Butler Reporter

But the problem of power in the capital city is not one of simple causes and easy fixes. “You only need something to go slightly awry, and you will see a significant bump in the price of power,” says Wayne Mitchell, director of Dallington Energy consultancy. “On the back of the Gaza and Israel situation, energy prices jumped 44 percent in the space of eight days [in the UK]. Similarly, on the other side of the world, workers at the Chevron liquified natural gas plants in Western Australia went on strike, and UK gas prices

Issue 51 • December 2023 | 89


DCD Magazine #51

jumped 20 percent that day.” Two years ago, Russia declared all-out war on Ukraine, and the impact on energy systems is still being felt in London’s energy systems, and across Europe. With Russia as a major gas exporter for much of Europe, and the political chasm that the first day of invasion caused, many nations were suddenly in a position where energy prices skyrocketed, and supply was instantaneously very fragile. The UK has, to an extent, found solutions to this energy crisis - but the costs are still felt by the populace, and the headlines remain. With fragility comes insecurity, and the media has locked onto this in the last year - either warning the UK population of potential risks, or cashing in on drama by fearmongering, depending on your point of view. As power-hungry monsters, data centers are often at the forefront of conversations that warn of potential winter blackouts and power provisioning issues. Data centers are held to blame for reports that they have taken power that could have gone to houses.

Winter blues This winter and last, we have wondered if our power system will make it through the dark season. Of course, London is not alone. The whole country gets cold, and shares one national transmission grid. In 2022, several media outlets reported that the UK could expect to see blackouts during particularly cold periods (we didn’t), as a result of a carefully edited quote from National Grid’s CEO John Pettigrew, who was talking about a truly worst-case scenario in which everything went wrong, all together, simultaneously. 2022 was more precarious than this winter is set to look (though the energy system remains ever-tempestuous and easily aggravated) but according to Mitchell, countrywide blackouts were never really a serious concern.

>>CONTENTS

“Blackouts are the very last thing that is going to happen when you look at the long hierarchy of actions laid out by the Electricity Supply Emergency Code (ESEC),” he explains. The ESEC is a backstop piece of legislation that allows the grid operator to take emergency actions in those worst-case scenarios. Regarding the media frenzy John Pettigrew inadvertently kicked off, Mitchell points out that the ESEC only works with extensive risk assessments. “As a grid operation, absolutely they should always be doing scenario planning, and doing really awful scenario planning. He was talking about the worst possible case, and people latched onto it. “The word ‘blackout’ is really quite alarming if you don’t understand the network, and ultimately ten or twenty different outlets ended up putting the story on the front page.” Blackouts are alarming to residents, but to data centers they can cause concerns on a par with a “security incident.” Dallington Energy and Kao Data teamed up on a whitepaper this year to dispel some of the anxieties in advance, or as Mitchell puts it, to “introduce some rational thinking.” The ultimate message was that winter is nothing new, and the grid and our data centers will trot along quite comfortably.

The London Hot Spot As is common worldwide, data centers in the UK seem to center around one particular location. In this case, it is London: specifically, west London. This is for a variety of reasons. There are transatlantic cables that run from Cornwall to London through the M4 corridor, creating a well-connected digital highway that data center operators have taken advantage of, creating the well-known clusters - with Slough, the Docklands, Park Royal, City, and Isle of Dogs among them. Beyond that, there is also the proximity to

90 | DCD Magazine • datacenterdynamics.com

the capital. Some data centers can be spread further away from major urban locations, but some industry sectors, such as finance, rely on high-speed digital functions, so being close is a key advantage. The way the grid works, however, means that this can cause some problems. Power goes on a long and arduous journey before it reaches the end user. Originating at the power plant, it enters the National Grid’s transmission network - the high voltage network covering the entire nation. This then connects to the distribution network - the more local and lower voltage network, where electricity providers portion it off to their customers. Controlling all of this is the Electricity System Operator. “The Electricity System Operator (ESO) has a National Control Centre, down in the southeast, which is basically a control room - it looks a bit like the NASA headquarters,” explains Paul Lowbridge from National Grid. In that control room, the ESO has the entire electricity system mapped out, and can see what is being generated and where it's being used. The ESO’s job, then, is to make sure it runs smoothly. “Part of it is ‘how does it all come together


>>CONTENTS

It is unsurprising, therefore, that in the wake of the Capacity Constraints document, an onslaught of news articles circulated, blaming data centers for a “ban” on housing developments. The GLA document describes data centers as using “large quantities of electricity, the equivalent of towns or small cities,” while broadsheet The Telegraph nicknamed data centers “Energy Vampires.” A spokesperson for the Mayor of London told DCD via email in July 2022: “The Mayor is very concerned that electricity capacity constraints in three West London boroughs are creating a significant challenge for developers securing timely connections to the electricity network, which could affect the delivery of thousands of much-needed homes.” In an updated response from the GLA, the infrastructure policy lead Louise McGough told DCD that the issue remains a “vital” concern for London, but that in the immediate term, the plan is to “connect developments if they require under 1MVA of electricity annually (where they are not also facing distributionlevel constraints), allowing many housing developments to progress that would otherwise have been stalled. “While there remain challenges for schemes requiring higher electricity capacity, the solutions being pursued by the utility providers are encouraging and are a step in the right direction.” and make sure it's all balanced?’” Lowbridge says. “But then, within that, there will be particular constraints. In certain parts of the country, although balanced overall, they might need to make some adjustments about how power is flowing in and out of a certain region, or city, and then the distribution networks will have their eye on particular cities or locations.” One such constraint might be the problem of having more than 100 data centers in one city’s boundaries. “There was no thought or foresight in developing data centers around London,” Spencer Lamb, chief commercial officer of local data center operator Kao Data, says. “Because no one had any appreciation at the time - and rightly so - that they would eventually be consuming hundreds of megawatts of power infrastructure as they are today.” Data centers have not only balooned in quantity and size but also in density. If the AI boom follows current industry predictions, data centers are only going to get more powerhungry. In other words, this problem is just going to get worse. The logical thing to do would be, where possible, to build data centers away from London. “I think we need to locate data centers in locations that have the opportunity to get

private energy production to them bypassing the distribution networks and therefore not paying that premium but also, in itself, it will enable development investment in renewables,” Lamb argues. This would take some of the stress off of the power distribution networks in London, which are struggling with the weight of the Internet on their shoulders.

Home sweet home In 2022, the Greater London Authority (GLA) published a document exploring the “West London Electrical Capacity Constraints,” which revealed that major London-based applicants (over circa 1MVA) to the Scottish and Southern Electricity Networks distributed network would have to “wait several years” to receive new electricity connections. This included housing developments. Housing is, according to the GLA’s website, “the greatest challenge facing London today.” The website goes on to say: “In recent decades, London has excelled at creating jobs and opportunities. But at the same time, we have failed to build the homes we need. Now a generation of Londoners cannot afford their rent and many are forced to live in overcrowded or unsuitable conditions.”

DCD reached out to several housing associations but they did not provide comment. Lobby group techUK in 2022 said some of its data center operator members were also finding it hard to get grid connections due to the first come first served queue system. TechUK was in conversation with the GLA, attempting to find solutions to this problem, according to Luisa Cardani, head of the group’s data centers program. She told us that the local authority has heeded many of techUK’s suggestions. One of the key recommendations made by techUK was to rethink the queue-style setup. “The connection queue was first come first served, but following our recommendation, this has been changed, meaning they can get rid of what we call ‘zombie’ projects,” explains Cardani. “Some [companies] - and we're not speaking on behalf of techUK members here - just hedge their bets. Some developers put numerous bids in for potential future projects and that would make it very difficult to then prioritize because it isn’t obvious how quickly they need that power.” Shortly after the publication of the Capacity Constraints document, a freedom of information request was filed with the GLA requesting data/reports on the connection

Issue 51 • December 2023 | 91


DCD Magazine #51

requests. The GLA declined to share, stating that the data is “commercially sensitive.” For the time being, it seems the companies behind those zombie projects will remain unidentified. There are currently 400GW of connection requests in the queue, of which Ofgem reckons 60 to 70 percent will fail to materialize or connect. Some 57 percent of users (queue dwellers) have submitted multiple Modification Applications (i.e., they get to the front of the queue, but aren’t ready yet, so ask for more time). In fact, according to Ofgem: “As it stands, if all connections in the current queue were to take place, this would enable more generation capacity than is anticipated to be needed to achieve a Net Zero power system by 2035, even under the ESO’s most demanding Future Energy Scenarios (FES) modeling scenario.” Ofgem published its revised queue management system on November 13, 2023. The new system is set to kick off on November 27 and is hoped to speed up connections for viable projects and enable stalled or speculative projects to be “forced out” of the queue. National Grid will have the power to enforce strict milestones for connection agreements, and the first terminations are expected to happen in 2024. Also in November 2023, the UK government

>>CONTENTS

shared plans to invest £960 million ($1.1bn) in green industries and overhaul the country's power network. The “connections action plan” is hoped to cut connection delays from years to just six months, and free up around 100GW of capacity, while the funding is going to manufacturing capacity in net zero sectors. This is a semi-long-term solution, but in the meantime, National Grid has been working on short-term boosts to capacity in the region. “There are potentially some quite technical options we [National Grid] can explore that would effectively allow those distribution networks to have more capacity available for some of the smaller connectors that at the moment are facing constraints,” explainsLowbridge. “We think there are potential options to be explored around being able to release some more capacity on their networks by looking at some of the engineering assumptions we use at the boundaries between transmission and distribution.” He continues: “There's a point where the transmission network links to the distribution network, and then the distribution network makes capacity available to the local network. "There are two things going on. One

92 | DCD Magazine • datacenterdynamics.com

is across all of the distribution networks throughout the country, we've been coordinating and looking at a solution called ‘technical limits,’ which is when you look at the engineering assumptions at that boundary. We think there are different formulas and models that can be used that would then release a certain amount of capacity for the distribution network in addition.” Lowbridge estimates that this could free up around 30GW, a proportion of which could then be allotted to the distribution network operators (DNOs) in London. “Then the second point is, again, still in that kind of relatively tactical space. We've been working directly with the GLA, SSE, and UK Power Networks to look at potentially increasing some of the technical boundaries that we use between the DNO and transmission in that area. We think that there is potential that it could make more capacity available for projects connecting at lower voltage levels,” he says. “It's slightly early days to see exactly how quickly that can have an impact. But we expect over the coming months for that piece of work to start to show what can be made available, and we think there's potential there for that piece of work to help release some capacity in the London area.” 



Powering the Future of Edge Data Centers Where cutting-edge reliability meets unmatched efficiency and sustainability In a data-driven world, the edge is where business happens and downtime is not an option. Generac understands that your edge data center is the heartbeat of your operation, providing critical, real-time processes to stay ahead.

Our set of Energy Management solutions include:

Behind the Meter

Demand Response

Energy as a Service

Leasing & Financing

Maintenance & Warranties

Microgrid Integration

Our comprehensive energy management solutions encompass a diverse portfolio of power products, each designed to meet the unique requirements of edge computing. Beyond standby power, we offer systems that provide intelligent, seamless integration with your data center's infrastructure, ensuring not just power, but also operational excellence and continuity. Unlock the full potential of your operations with Generac's reliable and innovative energy solutions. Visit our website at www.generac.com/industrial to begin the journey of designing an energy management strategy tailored to your unique needs.

Up to

1000 kWh

Business Connectivity

Energy Storage

Up to

2 MW

Standby Generators


Fiber for all 

>>CONTENTS

Checking in on Openreach’s UK fiber buildout Halfway to its aim of 25 million fiber connections, how is Openreach doing?

Paul Lipscombe Telecoms Editor

I

n late December, Openreach announced that it had reached the halfway point for its fiber broadband rollout in the UK.

12.5 million homes are now directly connected to the independent BT subsidiary’s fiber broadband network. The telco is aiming to cover 25 million by 2026, and is delivering its “full fiber” broadband to an average of 6,000 premises a week. DCD recently spent a day at an Openreach training center, to get a better grasp of what goes into deploying and maintaining fiber networks. “We are building at a furious rate, further and faster than any other network provider,” says Openreach MD for corporate affairs Catherine Colloms. “In terms of Openreach and its position in the UK, we want to be the national full fiber provider.” Its network is used by more than 650 Internet service providers, including Sky, BT, TalkTalk, and many more.

Fiber goes further Britain built out its copper landline network over a period of some hundred years, beginning in 1881. When data arrived, broadband services pushed the limits of what copper lines could do. More was needed. Fiber optic cables, first made practical in 1965, were the answer. They first arrived in the home broadband network in 2011 and, since then, the rollout has been quicker than copper. But first fiber links were fed to roadside cabinet, delivering up to 100Mbps over the final few hundred yards of copper. Realistically, however, most in the country do not live that close to the cabinets, causing speeds to plummet dramatically. More recently, fiber to the home (FTTH) has appeared, allowing gigabit speeds for some 15 percent of the population. Fast as this sounds, the UK’s fiber rollout is terribly slow in comparison with other countries. The country has languished near the bottom of the international leagues for broadband speeds, with the regulator Ofcom quoting a current average of 69.5Mbps. Openreach says this will change. A single strand of a fiber cable can provide enough capacity to serve up to 32 individual properties with gigabit speeds, and the

Issue 51 • December 2023 | 95


DCD Magazine #51

>>CONTENTS

That copper will eventually all be pulled out and recycled. Doing so will have another green benefit: At present, the whole of BT uses one percent of the UK’s total electricity. Some of that will be powering Openreach’s network, and fiber will use significantly less power.

Protecting the copper Copper has another negative: It’s valuable. Thieves often rip up lines and steal the metal to re-sell for scrap. It’s led to multiple outages, and in some cases left small towns and villages cut off. The issue is even worse in the winter months. Openreach told DCD earlier this year that it has stepped up security at its sites to prevent copper theft.

Photography by Sebastian Moss

“Openreach’s network is protected by an alarm system which notifies any malicious cut or theft,” says Richard Ginnaw, senior security manager, Openreach.

infrastructure firm says that it has 29,000 field engineers who can splice around 30,000 fiber optic cables a day, as it adds fiber across the nation. Given the topology of the UK, laying that fiber cable isn’t always as simple as just digging up a road. In some instances, especially in remote rural areas, there’s very little knowledge of the surrounding areas and where/how best to install a fiber cable. To speed up the process, Openreach has turned to DJI Drones to help overcome obstacles such as trees, rivers, and valleys. It’s also seen as a safer method for workers, especially in hazardous areas. Openreach’s DroneOps division currently has 22 trained pilot crews across the country. “I think we’re one of the only companies using drones to build our fiber networks,” an engineer at Openreach said. The company has been using drones for around five years. “With the drone, we’ll fly over potential obstacles, and we’ll attach the cable that will be attached to be erected on the poles and then pull it back. This will save our engineers having to climb onto

“Our dedicated security team investigates all malicious network incidents. They work closely with the police and law enforcement agencies to arrest and prosecute anyone causing malicious damage or stealing from the network.”

buildings, etc,” they said.

What about the copper? As Openreach continues to deploy its fiber network, this means the aging copper infrastructure is becoming increasingly redundant. In September, Openreach confirmed it had stopped selling new voice copper telephone lines on its national network after more than a century, and plans to retire the copper-based public switched telephone network (PSTN) by the end of 2025. “We all knew at Openreach that full fiber would be the future,” Colloms says, noting its reliability and speed enable it to offer a significant upgrade on copper networks. “Fiber doesn’t suffer from some of the problems that occur with copper. It has lower latency and offers greater reliability. Unlike copper, which does not particularly like the British environment or water, nor certain things it interacts with, including revving motorbikes.”

96 | DCD Magazine • datacenterdynamics.com

Ginnaw says that security enhancements are installed in the network infrastructure in areas where the risk is high, while cables are also protected with a forensic marking solution, and all street cabinets are locked. “The security team conducts patrols where there is a known risk to the network and will deploy covert devices to monitor and protect the network from any attacks. The team also works closely with Crimestoppers in gathering information on offenders who target the network,” he added. But, with fiber made out of glass, there’s little incentive to steal it, and so this issue will become redundant in the future.

Cashing in Full fiber connectivity is tipped to add £72 billion ($91.5bn) to the British economy, according to the Centre for Economics and Business Research. Openreach’s £15bn ($19bn) fiber buildout hopes to get the country there. But, with half of its build still to go, only time will tell if the company - and the country - can reap the rewards. 


xIntegra is the latest step in the evolution of Eaton’s Data Centre power management capabilities and digital expertise, an integrated systems-engineered approach that shifts from the traditional mix-n-match design and procurement of individual elements to a group of intelligent components, acting together as a complete system along the power train. Through designed-in optimisation at both component and system level, xIntegra ensures system level performance and integrity at every lifecycle stage within your data centre – from design to implementation, operation to retirement.

To discover more visit www.eaton.com/xintegra

Issue 51 • December 2023 | 97


DCD Magazine #51

>>CONTENTS

Pick your UPS flavor A deep-dive into the different types of UPS systems

I

n the last issue of the magazine (#50), we explored how data centers use uninterruptible power supplies (UPSs) to power through grid outages.

This time, we will explore the two main types of uninterruptible power supply (static and rotary UPS systems), when each type is best deployed, and the conditions these systems work best in. The design of a power system inside a data center is entirely dependent on the continuity requirements of the IT load. If the customer can live with shutting down all ICT services for either maintenance or failure, then the overall system design can be quite simplistic and low-cost. However, for the majority of data centers, a shutdown of ICT services is essentially not an option. In such cases the power must have two different power paths to each server - this is known as concurrent maintainability. The stronger the continuity requirement for the IT load, the more complex and costly the power system design will be. For the IT load to survive a worst-case scenario, another level of resilience is needed - otherwise known as fault tolerance. As we have seen in the previous feature, the IEC Standard classifies UPS systems through three metrics - Input dependency, Output Voltage Waveform, and Dynamic output performance ensuring common naming conventions between manufacturers. When looking at these systems through a data center lens, things tend to become simpler static UPS and rotary UPS.

The static UPS The static UPS is called “static” because, along its power path, there are no primary moving parts. A typical static UPS used in the data center space usually consists of a Rectifier, an Inverter, and a set of batteries. A typical representation of a static UPS can be seen in figure 1. Within a data center, the most prevalent type of static UPS is a double conversion topology, in which the incoming utility electrical current is converted from AC to DC by the rectifier. This keeps the batteries charged, and is then converted back into a clean AC sine wave. This sine wave is cleaned due to the constant and continuous electrical current output from the set of batteries.

98 | DCD Magazine • datacenterdynamics.com

Vlad-Gabriel Anghel DCD>Academy


Don't interrupt 

>>CONTENTS

Regardless of how the internal topology is set up, there will be a point where the DC current interfaces with the batteries - otherwise put, it keeps the batteries charged. When the input power cuts off (utility failure), the IT load runs off the set of charged batteries - giving enough time for the on site generator to start and warm up. A three phase static UPS usually has a runtime of anywhere between five to 30 minutes. This runtime is calculated based on the size and the criticality requirement of the load and the capacity of the batteries. The capacity of the energy storage system is generally calculated to allow for enough time for the generator to start while still maintaining the availability of the IT load. Should the generator fail to come online, the UPS will be configured to properly shut down the load instead of just cutting power to it. Static UPS systems are characterized by their scalability and reliability. With fewer moving parts by design, the mean time before failure for each component is higher than with rotary systems. Further, the static UPS system is scalable with modules added or removed to account for changes in load. That is not to say that they are without disadvantages, mostly arising from the energy storage system’s reliance on batteries. Because the whole IT load is kept alive by the batteries, the overall system footprint is relatively large. Maintenance and replacement is an ongoing task with multiple sets of batteries. The system can also be quite heavy, posing more installation and servicing challenges.

Within a data center, the most prevalent type of static UPS is a double conversion topology, in which the incoming utility electrical current is converted from AC to DC by the rectifier The rotary UPS In a rotary UPS, the output sine wave is the result of rotating generation - like a dynamo or a flywheel, as opposed to the static UPS in which the output sine wave is generated directly from the energy storage system. In the rotary UPS represented in figure 2, a motor is fed directly from the utility which in turn provides mechanical power to the generator component to effectively generate a clean AC sine wave that is fed directly to the IT load. When the UPS detects that utility voltage and frequency parameters are out of the required limits the rectifier and the inverter provide controlled power to the motor which is coupled to the generator fueling the IT load. In case of a full blackout, a set of batteries provide power to the motor which will provide sufficient runtime for the standby generator (outside of the UPS) to ramp up to full speed and provide power to the facility.

Unlike static UPSs, rotary UPS systems are not modular and, therefore, need to be oversized to accommodate any future increases in the load. They also require additional ventilation as motor-generators will fill work areas with fumes during runtime. In the majority of cases they are placed in a special outbuilding or in a custom built room. Both static and rotary UPS systems are considered highly reliable. Because they don’t rely on large battery capacities to power the whole IT load and instead have a motor, rotary UPSs can have a longer lifespan if properly and frequently maintained. Due to their nature, rotary UPS systems are better suited for a centralized power architecture approach to the data center whereas static UPS systems can lend themselves to a more distributed solution. Rotary UPS systems are also ideal for environments where multiple short inrush currents of power are expected such as satellite stations or broadcast stations where power hungry amplifiers turn on or off randomly. Within the data center space, rotary UPS systems represent a niche in the market and are mostly found within Europe. Static UPS systems dominate the data center space, but as load availability requirements climb a mixture of UPS types can be found in data centers with loads in the megawatts. As with everything within the data center space, there really is no one-size-fits-all approach and this is especially true when designing its power train. Each type of system comes with its own advantages and disadvantages, and careful consideration must be given to selecting one. In the next part of this series we will dig deeper into the most common types of actual UPS systems currently deployed in data centers. 

Issue 51 • December 2023 | 99


DCD Magazine #51

Credit Nvidia

100 | DCD Magazine • datacenterdynamics.com


A deep dive 

>>CONTENTS

Microfluidics: Cooling inside the chip

Peter Judge Executive Editor

If you think immersion tanks are the end game for liquid cooling, think again. DCD hears from the engineers who want coolant to flow inside your chips

W

e all know that liquid cooling is the future for data centers. Air simply can’t handle the power densities that are arriving in data halls, so dense fluids with a high heat capacity are flowing in to take over. As the heat density of IT equipment increases, liquids have inched ever closer to it. But how close can the liquids get? Running a water-circulating system through the rear doors of data center cabinets has become well-accepted. Next, systems have been circulating water to cold plates on particularly hot components, such as GPUs or CPUs. Beyond that, immersion systems have sunk whole racks into tanks of dielectric fluid, so the cooling liquid can contact every part of the system. Major vendors now offer servers optimized for immersion. But there is a further step. What if the fluid could be brought closer to the source of that heat - the transistors within the silicon chips themselves? What if coolants flowed inside processors? Husam Alissa, a director of systems technology at Microsoft, sees this as an exciting future option: “In microfluidics, sometimes referred to as embedded cooling, 3D heterogenous, or integrated

cooling, we bring the cooling to the inside of the silicon, super close to the active cores that are running the job.” This is more than just a better cooling system, he says: “When you get into microfluidics, you're not only solving a thermal problem anymore.” Chips with their own cooling system could solve the problem at the source, in the hardware itself.

Birth of microfluidics In 1981, researchers David Tuckerman and R F Pease of Stanford suggested that heat could be removed more effectively with tiny “microchannels” etched into a heatsink using similar techniques to those used in silicon foundries. The small channels have a greater surface area and remove heat more effectively. The heatsink could be made an integral part of VLSI chips, they suggested, and their demonstration proved a microchannel heatsink could support a then-impressive heat flux of 800W per sqm. From then on, the idea has persisted in universities but only tangentially affected real-life silicon in data centers. In 2002, Stanford professors Ken Goodson, Tom Kenny, and Juan Santiago set up Cooligy, a startup

with an impressive design of “active microchannels” in a heatsink built directly onto the chip, along with a clever silent solid-state electrokinetic pump to circulate the water. Cooligy’s ideas have been absorbed by parts of the mainstream. The company was bought by Emerson Network Power in 2005. Its technology, and some of its staff, still circulate in Emerson’s new incarnation, Vertiv. The idea of integrating cooling and processing became more practical as silicon fabrication developed and went into three dimensions. Starting in the 1980s, manufacturers experimented with building multiple components on top of each other on a silicon die. Making channels in the upper stories of a multi-layer silicon chip is potentially a quick win for cooling, as it can start simply by implementing tiny grooves similar to the fins seen on heatsinks. But the idea didn’t get much traction, as silicon vendors wanted to use 3D techniques to stack active components. That approach is now accepted for highdensity memory, and patents suggest that Nvidia may be intending to stack GPUs. In the microprocessor industry, cooling and processing were seen as separate disciplines. Chips had to be designed to dissipate their heat, but this was done by

Issue 51 • December 2023 | 101


DCD Magazine #51 relatively unsophisticated means, using thermal materials to siphon the heat to the big copper heatsink on the surface. The heatsink could be improved by etching smaller channels, but it was a separate item, and heat had to cross a barrier of adhesive to get there. But some researchers could see the possibilities. In 2020, Tiwei Wei, of the Interuniversity Microelectronics Centre and KU Leuven in Belgium, integrated cooling and electronics in a single chip.

>>CONTENTS

the active part of each transistor device, just a few micrometers away from where the heat is produced. This approach could improve cooling performance by a factor of 50, he said. Matioli etched micrometerwide slits in a gallium nitride layer on a silicon substrate, and then widened the slits in the silicon substrate to form back of the die was etched away selectively, to a depth of 200 microns, leaving a stubble-field pattern of rods 100 microns thick - the “micropins” that form the basis of the integral direct-to-chip cooling system.

Wei, whose work was published in Nature in 2020, did not think the idea would catch on in microprocessors, saying that micro cooling channels would be more useful in power electronics, where large-sized chips made from semiconductors like gallium nitride (GaN) actually manage and convert electricity within the circuits. That possibly explains why Emerson/ Vertiv wanted to get hold of Cooligy, but Wei didn’t see the tech going further: “This type of embedded cooling solution is not meant for modern processors and chips like the CPU,” he told IEEE Spectrum.

Digging into the chips Already, by that time, researchers had been working on etching microfluidic channels into the surface of silicon chips for some years. A team at Georgia working with Intel in 2015 may have been the first to make FPGA chips with an integrated microfluidic cooling layer, on top of the silicon, “a few hundred microns [micrometers] away from where the transistors are operating.” “We have eliminated the heat sink atop the silicon die by moving liquid cooling just a few hundred microns away from the transistor,” team leader Georgia Tech Professor Muhannad Bakir said in Georgia Tech’s press release. “We believe that reliably integrating microfluidic cooling directly on the silicon will be a disruptive technology for a new generation of electronics.” In 2020, researchers at the École Polytechnique Fédérale de Lausanne in Switzerland, took this further, actually running fluid in tunnels underneath the heat-generating transistors.

That’s a delicate task, warns Alissa: “You have to consider how deep you are etching, so you are not impacting the active areas of the silicon.” channels that would be big enough to pump a liquid coolant through. After that, the tiny openings in the gallium nitride layer were sealed with copper, and a regular silicon device was created on top. “We only have microchannels on the tiny region of wafer that’s in contact with each transistor,” he said at the time. “That makes the technique efficient.” Matioli managed to make powerhungry devices like a 12kV AC-to-DC rectifier circuit which needed no external heatsink. The microchannels took fluid right to the hotspots and handled incredible power densities of 1.7kW per sq cm. That is 17MW per sqm, multiple times the heat flux in today’s GPUs.

On to standard silicon Meanwhile, work continues to add microfluidics into standard silicon, by creating microfluidics structures on the back of existing microprocessors. In 2021, a Microsoft-led team, including Husam Alissa, used “micropin” fins etched directly on the backside of a standard offthe-shelf Intel Core i7-8700K CPU.

Professor Elison Matioli saw the opportunity to bring things even closer together: “We design the electronics and the cooling together from the beginning,” he said in 2020, when his team’s paper was published in Nature.

“We actually took an off-the-shelf desktop-class processor, and removed the case,” he says. Without the heat spreader cover and the thermal interface material (TIM), the silicon die of the chip was exposed.

Matioli’s team had managed to engineer a 3D network of microfluidic cooling channels within the chip itself, right under

“When that die was exposed, we applied etching methods to carve out the channels that we want to see,” he continues. The

102 | DCD Magazine • datacenterdynamics.com

Finally, the back of the CPU die was sealed in a 3D-printed manifold, which delivered coolant to flow amongst the micropins. The chip was then overclocked to dissipate 215W of power more than double its thermal design power (TDP), the energy it is designed to handle safely without overheating. Surprisingly, the chip was able to perform at this level using only roomtemperature water. Delivered through the manifold. The experiment showed a 44 percent reduction in junction-to-inlet thermal resistance and used one-thirtieth the volume of coolant per Watt than would have been needed by a conventional cold plate. The performance was evaluated with standard benchmark programs. This was the first time microfluidics channels were created directly on a standard consumer CPU and achieved the highest power density with microfluidic cooling on an active CMOS device. The results show the potential to run data centers more efficiently without the need for energy-intensive refrigeration systems, the group reported in IEEE Xplore. All that would be needed would be for the chip maker to mass-produce processors with etched micropins, and sell them packaged with a manifold attached in place of the usual heatspreader cap. If foundries like TSMC could provide their chips with built-in liquid cooling, that would change the dynamics of adoption. It would also allow the technology to push boundaries further, says Alissa. “With cold plates, you might get water at 40°C (104°F) but with microfluidics


Drilling down into it 

>>CONTENTS

you could probably have 80°C (176°F) and higher coming out of these chips, because the coolant is so close to the active cores,” he says. “This obviously enhances the efficiency and heat recovery benefits, paired with lower requirements for flow rate.”

because of the proximity.” There’s another benefit. If microfluidics allows chips to go to a higher thermal design point (TDP) this could remove one of the hurdles currently facing silicon designers.

The future of microfluidics “There are two main flavors of microfluidics,” says Alissa. The lighter touch option, which he says could be deployed “in a couple of years,” is the approach his team showed - to etch channels in commercial chips: “Go buy chips, do the etching, and you're done.” A more fully developed version of this approach would be for the foundries to do the etching before the chip reaches the consumer - because not everyone wants to lever the back off a processor and attack it with acid. Beyond that, there is what Alissa calls the “heavier touch” approach. In this, you “intercept early at the foundry and start building 3D structures.” By this, he means porous chips which stack components on top of each other with coolant channels in the layers between.

That’s a development based on the approach used by Matioli in Lausanne. As Alissa says, “That promises more but, obviously, it's more work.” Alissa has a goal: “The North Star we want to get to is where we're able to jointly optimize this chip for cooling and electrically at the same time, by stacking multiple dyes on top of each other, with [microchannel] etching in between.” If cooling allows multiple components to be stacked in one deep silicon die, connected by “through chip vias” (TCVs), copper connections through the silicon die. These tower chips could need lower energy and work much faster, as the components are closer together: “Overall, you're gaining on performance, you're getting on cooling, and also on latency

The difficulty of removing heat means that today's largest chips cannot use all their transistors at once, or they will overheat. Chips have areas of “dark silicon” (see box), and applying microfluidics could allows designers to light those up, boosting chip performance. But don’t expect microfluidics to solve everything. Back in 2012, Professor Nikos Hardavellas predicted the next problem: “Even if exotic cooling technologies were employed, such as liquid cooling coupled with microfluidics, power delivery to the chip would likely impose a new constraint.” Once we work out how to get more heat off the chip, we will have to develop ways to deliver a large amount of power, that can provide signal integrity at the low voltages required by the transistors. Are we ready for that one?  This feature originally appeared in our Future of Cooling Supplement. Read the rest for free today.

DARK SILICON

C

urrent and future generations of chips have a fundamental problem. Performance has always increased, as more transistors are packed into a single processor. But now, there are so many that they cannot all be used at once, without the chip overheating. Processor makers publish a thermal design power (TDP) for each chip, which is the amount of energy it can handle and dissipate safely - and will assume there is a good heatsink on the chip. TDPs have grown very high. For example, the H100 SXM5 Nvidia GPU has a 700W TDP, which is massive compared with standard CPUs like Intel Xeons, which consume around 130W. But how much can you do with this power? Currently, transistors fabricated at 4nm consume a tiny 10 attoJoules (10-18 Joules) each, so if one of them switched at 1.8GHz, it would consume 18 microWatts (18 x 10-9 W). That is tiny, but today’s processors have colossal numbers of transistors. Jon

Summers of the Swedish research institute RISE calculates that the Nvidia H100 GPU, which has 80 billion transistors, would generate 1,440W - more than double the TDP Nvidia publishes for it. “With a TDP of 700W, it must mean that 51 percent of the chip is dark silicon,” Summers told an audience at DCD Connect London in November 2024. Continued miniaturization won’t fix the situation. Smaller transistors have a lower switching energy, so more can be lit up within the TDP envelope, but the number of transistors is also going up. Summers says that Intel plans to have a trillion transistors on a chip by 2030, each using around 1aJ per switch. If the clock speed has gone up to 4GHz, the chip is 1,000 sq mm, and thermal flux, then that means 40 percent of the transistors must remain dark. Now, TDPs are based on a maximum heat flow (or flux) that can be removed from a chip. The Nvidia H100 has an area of 814 sq mm, so the heat flux is

860kW per sqm. That is comparable to the levels that are seen in nuclear fusion demonstrations, and Summers expects Intel to push on to 2.4MW per sqm. The issue of dark silicon has been known about for a long time: In 2012, Professor Nikos Hardavellas of Northwestern University, said in the magazine of the Advanced Computing Association, Usenix: “Short of a technological miracle, we head toward an era of “dark silicon,” able to build dense devices we cannot afford to power. Without the ability to use more transistors or run them faster, performance improvements are likely to stagnate unless we change course.” There have been plenty of approaches to the problem, most notably, increasing the use of specialized cores within chips, that are only used when needed. But maybe a way to reduce dark silicon would be if fluids could flow inside the chip itself, where they can remove more energy, and allow more heat flux. 

Issue 51 • December 2023 | 103


DCD Magazine #51

>>CONTENTS

 The GPU glut

The next great business model

P

repare for the GPU glut.

That might sound like a strange statement in the age of AI, where Nvidia’s best and most expensive™ are impossible to buy, and yet shipping in greater quantities than ever. But we’re in the boom phase. Generative AI is the only thing anyone can talk about, and the data center sector has gamely stepped up to meet the demand for high-density, liquid-cooled facilities with unprecedented power consumption.

Another possibility is that it does live up to the hype, but one company is exponentially better than the rest. Should that happen, the market will coalesce around the victor, and the GPUrich assets of the bankrupt businesses will be left unused. Other variants of the future are that models become significantly more efficient to train - “I think we're at the end of the era where it's going to be these giant, giant models,” OpenAI CEO Sam Altman said earlier this year. “We'll make them better in other ways.”

This rapid pivot is both impressive and fully reasonable - new data center millionaires will be minted (alongside new AI billionaires) as the riches of the latest hype wave are shared across the industry.

Or, perhaps, a more LLM-specific chip will prove far better at training or inferencing, rendering expensive GPUs unnecessary.

And then, as is the way of things, comes the fall. There are multiple ways that this could play out.

For those who have missed out on the generative AI wave, it may be worth thinking about what could be built with those highly-capable chips. When Bitcoin crashed, ASICs were pulped by the thousand as they had little other use.

One is that generative AI cannot live up to the hype - at some point, investors will expect them to make actual money and not just derivative art and hallucinated sentences. It’s not entirely clear current approaches can ever pull off all their proponents’ wild claims, nor that they won’t steadily decline in quality as the Internet fills with generative AI data that is fed back into future models.

104 | DCD Magazine • datacenterdynamics.com

Each of these options leaves us with millions of high-end GPUs on the market, with little use.

GPUs have broader applications, initially being used in video games, then simulation, and now also AI. Riding the next wave could lie in predicting what can be built with them when their price suddenly comes crashing down.  - Sebastian Moss, Editor-in-Chief


GLOBAL REACH Local Knowledge Delivering Mission-Critical Solutions Globally

for hyperscalers, colocation, and edge service providers. Our programmatic approach fully integrates core infrastructure requirements, from design/planning through installation to Day 2 support services, along with helping our customers maneuver and account for advanced technologies like AI.

Comprehensive Data Center Solutions Having a global data-center strategy as a Div 27 Telecom Data Center Provider means: • Anticipating (demand) requirements • Consistent methodologies and mechanisms in place for hiring, retention, and redeployment • Investing in long-term, program-level training • Reallocating resources to the next project for full-circle, repeatable, predictable services

Rapid Deployment | Scalability | On-time and On-budget Delivery Global Reach with a Single Vendor and Standard

BLACKBOX.COM

| 855.324.9909

| CONTACT@BLACKBOX.COM


The Business of Data Centers

A new training experience created by


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.