A POWERFUL COMBINATION TO MAXIMIZE WORKSTATION PERFORMANCE
Combine the compute power of advanced graphics cards for professional creators with AMD Ryzen™ and Ryzen™ Threadripper™ PRO processors to dial-in system performance and maximize creative potential. The new AMD Radeon™ PRO W7000 Series Workstation Graphics, with options for 48GB or 32GB memory, lets you handle 3D models and RAW media files while making it easier to multitask between apps. The new AMD Radiance Display™ Engine, alongside support for DisplayPort™ 2.1, enables next generation 12K displays and delivers 8K at 60Hz with 68 billion colors – while providing dual simultaneous encode / decode engines and superior refresh rates.
#AMD #TogetherWeAdvance
To learn more visit amd.com/RadeonPRO
The flexible workstation
If one thing has become clear in 2023, it’s that flexible working is here to stay. AEC firms that resorted to ‘sticking plaster’ hacks to get them through the pandemic, are now looking for more robust solutions to support staff working from home.
Centralising workstations and data is key but there are many ways to skin this particular cat — public cloud, private cloud, or on-premise Virtual Desktop Infrastructure (VDI). The humble desktop workstation is becoming as much at home in the data centre as it is on the desk.
In the UK, there are plenty of specialist firms that will happily replace your office workstation resource with one in a dedicated server room or the cloud. Many are laser-focused on the AEC sector, bringing expertise in BIM-centric workflows and data management as well as the remote workstations themselves.
Inevidesk goes hard on price with its custom ‘pods’ (tinyurl.com/inevidesk-AEC)
CreativeITC is addressing sustainability (see page WS44), Scan is applying its knowledge of desktops to the cloud (see page WS23), while IMSCAD has its eggs in many different remoting technology baskets (see page WS42)
Then, of course there’s the major public cloud service providers. Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure give firms on-demand access to a wide a variety of GPU-accelerated virtual workstations anywhere in the world. And in true public
cloud fashion, everything is elastic, so firms can upscale and downscale as needs change. This can be done directly through the cloud provider or via multi-cloud platforms like Frame or Workspot.
Performance can vary dramatically between VMs, which is something we explore in our in-depth report on page WS30. If you don’t know your g4dn.xlarge from your NC16asT4v3 and everything in between, this is an essential read.
Public cloud has many benefits, particularly when it comes to global availability and IT management, but for performance alone, it’s impossible to compete with the desktop workstation. With desktops, instead of giving each user a slice of a multi-core CPU or GPU, they get a dedicated resource, often with a CPU optimised for frequency rather than number of cores.
Firms including HP and Lenovo have cottoned on this and are now building rack mount and remote management capabilities directly into their personal workstations, blurring the boundaries between desktop and datacentre. You also get the simplicity of a 1:1 connection so you don’t need to get involved with the complexity and cost of virtualisation.
The Lenovo ThinkStation P7 and PX, for example, harness the power of Intel
‘Sapphire Rapids’ CPUs and Nvidia RTX Ada Generation GPUs to handle some of the most demanding AEC workflows (see page WS14). Meanwhile, with the HP Z2 Mini G9 you get incredible rack density in a workstation optimised for CAD (see page WS42), all managed through the HP Anyware remoting solution.
Finally, cloud doesn’t always have to play second fiddle to desktop in terms of performance. UK firm Armari, through its ‘Ripper Rentals’ cloud workstation service, has the most powerful Intel Xeon W-3400 and AMD Ryzen Threadripper Pro workstations we’ve ever used. With custom liquid cooling they push Intel’s and AMD’s flagship workstation processors to their absolute limits, delivering up to 19% more performance than standard aircooled desktops (see page WS11)
When it comes to workstations, there’s no one-size-fits-all approach. Some firms go all in on cloud or VDI, others use a variety of desktop, mobile and virtual, wherever they make sense. Centralised workstations can offer massive benefits, delivering performance wherever work may take you, but then you’re always reliant on good connectivity. And, as a recent rail journey from London to Sheffield reminded me, you can still struggle to download a simple email attachment at times.
Intel Xeon
‘Sapphire Rapids’ AMD Ryzen Threadripper Pro for rendering, simulation, reality modelling, CAD and beyond
Rapids’ workstation
surpass AMD’s Ryzen Threadripper
Pro? Greg Corke puts these high-end CPUs through their paces
Ten years ago, it would have been unthinkable that Intel today would be playing catchup with AMD in workstation processors. But, the overwhelming success of AMD Ryzen Threadripper Pro, coupled with Intel’s failure to launch a true workstation-class processor since 2019, has led us to this precise situation. Intel desperately needs its new ‘Sapphire Rapids’ Xeon processors — specifically the Intel Xeon W-2400 and W-3400 — to be a success.
The chip giant certainly has its work cut out here. With Threadripper Pro, AMD delivered the holy grail of workstation processors, combining vast numbers of cores (up to 64) with high turbo frequencies and high-memory bandwidth to deliver impressive performance wherever your workflows may take you — single threaded CAD, multithreaded rendering, or memory intensive
simulation, Threadripper Pro can handle pretty much anything you throw at it.
Not surprisingly, Intel has followed a similar tack for its new ‘Sapphire Rapids’ workstation processors — up to 56-cores, up to 4.8 GHz turbo and 8-channel DDR5 memory. It also follows AMD in terms of architecture. Like Threadripper Pro, ‘Sapphire Rapids’ processors feature a ‘chiplet’ design where several smaller chips are packaged together as one. This is in contrast to traditional monolithic designs, where all cores are on a single chip, making it more prone to manufacturing defects, and therefore lower yields and higher cost.
Intel has a much wider workstationfocused product range than AMD, with a total of fifteen models across its Intel Xeon W-2400 and W-3400 series (see chart on page WS6) . In contrast, there are only six “Zen 3” Ryzen Threadripper Pro 5000 WX-Series models, sporting 12,
16, 24, 32 or 64 cores. All have 8-channel DDR4 3200 memory.
Intel Xeon W-2400 / W-3400
Intel differentiates its Xeon W-2400 and Xeon W-3400 processor families in two main ways: by number of cores and by memory channels.
The Xeon W-2400 Series is classified as a ‘mainstream’ workstation processor with eight models ranging from 6 to 24 cores and 4-channel DDR5 4800 memory.
Meanwhile, the Intel Xeon W-3400 Series is for ‘experts’ with seven models ranging from 12 to 56 cores and 8-channel DDR5 4400/4800 memory.
The new processors are comprised entirely of ‘Golden Cove’ cores — they do not have the hybrid Performance Core (P-Core) / Efficiency Core (E-core) architecture pioneered by 12th Gen and 13th Gen Intel Core processors.
‘Golden Cove’ is not Intel’s latest CPU
Intel has launched its long awaited ‘Sapphire
processors, but do they have enough to
architecture. It formed the foundation for the P-Cores in 12th Gen Intel Core.
Beyond the cores, there are some other significant differences between the two processor families. Compared to the Intel Xeon W-2400, the Intel Xeon W-3400 has more memory capacity (4 TB vs 2 TB), more PCIe lanes (112 vs 64) (so it can support more add-in GPUs), more Intel Smart Cache (L3), and a higher max base power (350W vs 225W).
As a first for Xeon processors, certain models — those with an X suffix — are unlocked so the processor can be overclocked. A range of tuning features are available through the Intel Extreme Tuning Utility (Intel XTU).
While it’s highly unlikely that major OEMs will ever go down the overclocking route, this level of control could leave the gates open for specialist workstation manufacturers to differentiate themselves by squeezing more performance out of the platform. This might be one for the future,
however. Currently, there are no off-the shelf All-in-One (AIO) water coolers that we know of for the power-hungry processors, although UK firm Armari has developed a custom liquid cooling solution for its Intel Xeon W-3400 rack workstation (see box out on page WS11).
Among the Intel Xeon W-2400 Series, the processors that stand out are the Xeon w7-2495X and w7-2475X which combine high core counts with the highest boost frequencies. The lower-end models may be suited to certain Finite Element Analysis (FEA) or other simulation tools that benefit from higher memory bandwidth but can’t necessarily take advantage of large numbers of cores. They can also provide a platform for multi-GPU workflows, such as GPU rendering.
There’s a similar pattern with the Intel Xeon W-3400 Series, with the higher end models featuring the largest number of cores and highest boost frequencies. The range tops out with the 56-core Intel
Xeon w9-3495X with a base frequency of 1.9 GHz and a Turbo Boost Max 3.0 of 4.80 GHz.
The lower-end CPUs in the family, such as the Intel Xeon w5-3425, could offer similar potential benefits for engineering simulation, plus support for even more GPUs. You can see the full specs in the tables below.
Meanwhile, Xeon W-2400 and Xeon W-3400 supports the latest technologies, including PCIe Gen 5, DDR5 4400/4800 memory (which offers more memory bandwidth than Threadripper Pro’s DDR4 3200) and Intel WiFi 6E.
While the majority of workstations focus on the single socket, high core count Intel Xeon W-2400 and Xeon W-3400 Series, ‘Sapphire Rapids’ does not spell the end for dual processor workstations.
4th Gen Intel Xeon Scalable processors, which are primarily designed for servers, have already made their way into workstations from HP and Lenovo. The top-end model, the Intel Xeon Platinum 8490H, offers 60-cores per processor, which gives you a whopping 120 cores in a dual socket workstation. However, among the major OEMs, you’ll only see this chip in the Lenovo ThinkStation PX (read our review on page WS14) and, at $17,000 per processor, the market it somewhat limited. The HP Z8 G5 also comes with 4th Gen Intel Xeon Scalable processors, but only those models with up to 32-cores.
Test setup
For our testing we focused on the top end workstation processors from Intel and AMD — the 56-core Intel Xeon w9-3495X and 64-core AMD Ryzen Threadripper Pro 5995WX. We also tested the dual socket 60-core Intel Xeon Platinum 8490H.
You’ll find details of our test machines below. However, it should be noted that both Lenovo workstations were preproduction units, so they may be slightly different to the final shipping machines. Performance, for example, may increase with BIOS updates, so our test results should not be treated as gospel.
Lenovo ThinkStation P7
• Intel Xeon w9-3495X CPU (56-cores) (1.9 GHz base, 4.80 GHz Turbo Boost 3.0)
• 256 GB (8 x 32 GB) DDR5 4,800MHz memory
• 4 x Nvidia RTX A4000 GPU (16 GB)
• 2 TB Samsung PM9A1 SSD
• Microsoft Windows 11 Pro for workstations
• (read our review on page WS14)
Benchmarks: processor comparisons
Lenovo ThinkStation PX
• 2 x Intel Xeon Platinum 8490H CPUs (60-cores) (1.9 GHz base, 3.5 GHz Max Turbo)
• 256 GB (16 x 16 GB) DDR5
4,800MHz memory
• Nvidia RTX 6000 Ada Generation
GPU (48 GB)
• 2 TB Samsung PM9A1 SSD
• Microsoft Windows 11 Pro for workstations
• (read our review on page WS14)
Scan 3XS GWP-ME A1128T
• AMD Ryzen Threadripper Pro 5995WX processor (64-cores) (2.7 GHz base, 4.5 GHz boost)
• 256 GB (8 x 32GB) Samsung ECC
Registered DDR4 3200MHz memory
• Nvidia RTX 6000 Ada Generation
GPU (48 GB)
• 2TB Samsung 990 Pro NVMe PCIe
4.0 SSD
• Microsoft Windows 11 Pro
• (read our review on page WS22)
Power hungry
To put it bluntly, Intel’s ‘Sapphire Rapids’ processors are very power hungry. Both the Intel Xeon w9-3495X and Intel Xeon Platinum 8490H processors have a base power of 350W. But this is only part of the story.
When rendering in Cinebench, for example, we observed 530W at the socket with the ThinkStation P7 and 1,000W at the socket with the ThinkStation PX. Even when rendering with a single core, the Lenovo ThinkStation P7 drew a substantial 305W.
That’s not to say that the Threadripper Pro 5995WX is that much better. With a default TDP of 280W, the Scan 3XS GWP-ME A1128T workstation still drew 474W at the socket when rendering in
Cinebench with all 64-cores. Finally, it’s important to note that all our tests were done with the ‘ultimate performance’ Windows power plan and power draw may be different with future BIOS updates.
On test
We tested all three workstations with a range of real-world applications used in AEC and product development. We also compared performance figures from Intel’s and AMD’s ‘consumer’ processors, including 12th Gen Intel Core (Core i912900K), 13th Gen Intel Core (Core i913900K), and ‘Zen 4’ AMD Ryzen 7000 Series (AMD Ryzen 7950X), although we did not have a data for all our benchmarks.
Computer Aided Design
CAD isn’t a key target workflow for Intel ‘Sapphire Rapids’ or AMD Ryzen Threadripper Pro. In fact architects, engineers and designers that only use bread-and-butter design tools like Solidworks, Inventor and Revit, will almost certainly be better served by 12th or 13th Gen Intel Core processors or AMD Ryzen 7000 (read our comparison article –www.tinyurl.com/13thGenRyzen7000).
Intel and AMD’s entrylevel CPU families generally have fewer cores and less memory bandwidth, but higher clock speeds and higher Instructions Per Clock (IPC), which are important for these largely single threaded applications.
But these days, CAD is often just one of many tools used by architects, engineers
and designers, some of which do benefit from having more cores or higher memory bandwidth. So, it’s important to understand how ‘Sapphire Rapids’ performs in CAD.
We used Solidworks 2022 as our yardstick, a mechanical CAD application that is largely single threaded or lightly threaded, so only uses a few CPU cores.
As expected, the Intel Core i9-12900K, Intel Core i9-13900K and AMD Ryzen 7950X had a clear lead. With fewer cores, higher turbo frequencies, and (apart from the Core i9-12900K) better IPC, Intel and AMD’s high-end workstation processors simply can’t keep up.
The Xeon w9-3495X did show a small but significant lead over the Threadripper Pro 5995WX in the rebuild, convert and simulate tests. But the Xeon w9-3495X didn’t have things all its own way, lagging behind in the mass properties and boolean operations tests.
To get an idea of pure single threaded performance, albeit through a synthetic rendering test, we also used the Cinebench ST benchmark. Here the Xeon w9-3495X had a clear lead of 22% over the Threadripper Pro 5995WX. Interestingly, despite its significantly lower turbo frequency, the Intel Xeon Platinum 8490H wasn’t that far behind the AMD processor.
Reality modelling
Reality modelling is becoming much more prevalent in the AEC sector. Agisoft Metashape 1.73 is a photogrammetry tool that generates a mesh from multiple hires photos. It is multi-threaded, but uses multiple CPU cores in fits and starts. It also uses some GPU processing, but to a much lesser extent.
We tested using a benchmark from specialist US workstation manufacturer Puget Systems. The Threadripper Pro 5995WX just about edged out the Xeon w9-3495X in the smaller Rock model test but was 13% faster in the larger school map test. Interestingly, the Xeon Platinum 8490H was way off the pace. We wonder if the software spreads the load across both CPUs but is not optimised for this. It’s hard to explain this by the lower frequency alone.
Point cloud processing software, Leica Cyclone Register 360, assigns threads according to the amount of system memory. On a machine with 64 GB it will run on five threads and on one with 128 GB or more it will run on six.
For the last few years AMD has had little in the way of competition in workflows that benefit from many cores or high memory bandwidth ’’
Impact of memory channels on performance (testing with Intel Xeon w9-3495X)
The Threadripper Pro 5995WX was 10% faster than the Xeon w9-3495X when registering our 99 GB dataset. Both CPUs lagged behind AMD’s and Intel’s consumer processors. Even though those test machines only had 64 GB of memory, so only ran on 5 threads, their higher frequencies and IPC gave them the lead.
Rendering
Ray trace rendering is highly scalable. Roughly speaking, double the number of CPU cores to half the render time (if frequencies are maintained).
The Threadripper Pro 5995WX
significantly outperformed the Xeon w9-3495X in KeyShot and V-Ray, two of the most popular tools for design visualisation, and in Cinebench 23, the benchmark for Cinema4D. The Threadripper Pro 5995WX was 35% faster in Keyshot, 27% faster in V-Ray and 20% faster in Cinebench. This is a considerable lead.
But the advantage that AMD’s top-end workstation processor holds over the Xeon w9-3495X is not just down to it having 8 more cores. The relative energy efficiency of both processors and, therefore, the allcore frequencies they can maintain, has a major impact on performance.
In Cinebench, for example, the Threadripper Pro 5995WX maintained 3.05 GHz on all 64-cores while the Xeon w9-3495X went down to 2.54 GHz. The Xeon w9-3495X’s relationship between power, frequency and threads can be seen in more detail in the charts to the left.
Meanwhile, the dual Intel Xeon Platinum 8490H beat both single socket processors considerably. But with 120 cores and 240 threads to play with this came as little surprise.
Engineering simulation
Engineering simulation includes Finite Element Analysis (FEA) and Computational Fluid Dynamics (CFD). FEA can help predict how a product reacts to real-world forces or temperatures. CFD can be used to optimise aerodynamics in cars or predict the impact of wind on buildings. Both types of software are extremely demanding computationally.
There are many different types of ‘solvers’ used in FEA and CFD and each behaves differently, as do different datasets.
In general, CFD scales very well and studies should solve much quicker with more CPU cores. Importantly, CFD can also benefit greatly from memory bandwidth, as each CPU core can be fed data quicker. This is one area in which ‘Sapphire Rapids’ can outperform Threadripper Pro. Both have 8-channel memory, but ‘Sapphire Rapids’ uses faster DDR5 4,800MHz whereas Threadripper Pro uses DDR4 3,200MHz. For our testing we used three select workloads from the SPECworkstation 3.1 benchmark. This includes two CFD benchmarks (Rodinia, which represents compressible flow, and WPCcfd, which models combustion and turbulence) and one FEA benchmark (CalculiX, which models a jet engine turbine’s internal temperature).
In Rodinia, the Xeon w9-3495X outperformed the Threadripper Pro 5995WX by a whopping 101%. In WPCcfd, the lead was smaller but, at 13%, still significant. Performance of both processors were dwarfed by the dual Intel Xeon Platinum 8490H.
Both Intel processors fared much worse in the Calculix (FEA) test, where the Threadripper Pro 5995WX took a substantial lead.
Memory bandwidth
In addition to cores, memory bandwidth is one of the main differentiators between workstation processors and their consumer counterparts.
This is governed largely by the number of memory channels each processor supports, but also by the type of memory.
Memory channels act as pathways
between the system memory and the CPU. The more channels a CPU has, the faster data can be delivered.
13th Gen Intel Core and the AMD Ryzen 7000 Series have two memory channels, while the Intel Xeon W-2400 Series has four, and Intel Xeon W-3400 Series, 4th Generation Intel Xeon Scalable and Threadripper Pro 5000 Series all have eight. To get the full memory bandwidth, all memory channels must be populated with memory modules, as was the case with all our test machines.
As mentioned earlier, ‘Sapphire Rapids’ Xeons have an advantage over the AMD Ryzen Threadripper 5000 Series as they support faster memory – DDR5 4,800MHz compared to DDR4 3,200MHz.
A quick run through the SiSoft Sandra benchmark shows the comparative memory bandwidth one can expect. The Threadripper Pro 5995WX recorded 139.27 GB/sec, while the Intel Xeon w93495X pulled 184.64 GB/sec and the dual Intel Xeon Platinum 8490H went up to 325.6 GB/sec. These figures help explain why Sapphire Rapids does so well in our memory intensive CFD benchmarks.
To see how memory bandwidth impacts performance in different workflows, we tested the Xeon w9-3495X with a variety of different memory configurations, from 1-channel with a single 32 GB DIMM, all the way up to 8-channels with 8 x 32 GB DIMMs. Interestingly, even with 6-channels, the Xeon w9-3495X edged out the Threadripper Pro 5995WX in memory bandwidth, delivering 141.21 GB/sec in SiSoft Sandra.
As most of our benchmarks fit into 32 GB of memory, the fact that we reduced the capacity should have minimal impact on results, although it can’t be ignored altogether. The exception is our Leica Cyclone Register 360 test, which adjusts
Intel’s single socket ‘Sapphire Rapids’ workstation processors can be overclocked. This requires more power to be pumped into the CPU, which, of course, means more heat and therefore liquid cooling. While none of the major OEMs get involved with this, UK firm Armari is an expert.
For the Intel Xeon w9-3495X, Armari has developed a custom water-cooling solution for its 2UR56SR Node, a rack workstation available
the number of cores used in relation to system memory. This is why performance drops off massively with 32 GB.
As you can see from the charts on page WS9, memory bandwidth in the WPCcfd benchmark has a massive impact on performance. Interestingly, even with 6-channels filled, the Intel Xeon w9-3495X outperforms the AMD Ryzen Threadripper Pro 5995WX.
Another workflow massively influenced by memory bandwidth is recompiling shaders in Unreal Engine 4.26 which uses all available cores. However, where Threadripper Pro 5995WX loses out in GB/sec it makes up for in cores and all-core frequency, as it still managed to beat the Xeon w9-3495X in our automotive benchmark.
Performance in CAD (Solidworks), ray trace rendering (V-Ray) and reality
From our tests, however, Sapphire Rapids is not going to be the Threadripper Pro 5000 WX-Series killer we thought it might be, at least in the broader AEC sector.
In ray trace rendering, the 64-core Threadripper Pro 5995X still has a considerable lead over the 56-core Xeon w93495X. And while Intel may possibly win out at certain price points, simply because it has so many different models across its Xeon W-2400 and W-3400 families, we certainly don’t expect viz specialists to move to ‘Sapphire Rapids’ en masse. Plus, as you move down the range, it will face more competition from 13th Gen Intel Core.
But ‘Sapphire Rapids’ does have some big plusses. In single threaded workflows it appears to have a lead over Threadripper Pro, which could make a real difference in some CAD/BIM applications. Better single threaded performance should also boost 3D frame rates in CPU-limited applications.
modelling (Leica Cyclone Register 360 and Agisoft MetaShape Professional 1.73) appears to be virtually unaffected by memory bandwidth. There are a couple of caveats in Solidworks. In the simulation test, performance dropped a little when going from 4-channels to 1-channel. In boolean operations, 1-channel memory actually delivered marginally better results.
Conclusion
The importance to Intel of ‘Sapphire Rapids’ Xeon W-2400 and Xeon W-3400 being a success cannot be overstated. For the last few years AMD has had little in the way of competition in workflows that benefit from many cores or high memory bandwidth. Intel will have certainly felt the impact of Threadripper Pro.
through its ‘Ripper Rentals’ cloud workstation service. It allows the CPU to support up to 500W on all-core boost — a full 150W above its default TDP.
Armari also has a similar offering for the Threadripper Pro 5995WX, the 2UR64TP-RW Node. We put both machines through their paces in Cinebench R23.
The Intel Xeon w93495X machine hit 2.88 GHz on all cores, 0.3 GHz faster than the air-cooled Lenovo ThinkStation P7.
We found the biggest potential benefit for ‘Sapphire Rapids’ to come from engineering simulation, specifically CFD. Our tests show that ‘Sapphire Rapids’ can deliver a massive performance boost, largely thanks to its superior memory bandwidth. While solvers and datasets vary, serious users of tools from Ansys, Altair and others should certainly explore what the Xeon W-3400 and 4th Gen Intel Xeon Scalable processors can do for them. Extremely complex simulations can take hours, even days to run. Cutting this time in half could deliver monumental benefits to a project.
All of this is exciting, but one can’t help but keep one eye on the future. AMD is expected to launch its next generation ‘Zen 4’ Threadripper Pro CPUs later this year. And, if rumours of 96-cores and 12-channel memory (DDR5) become a reality, then any lead Intel might have could be short lived.
This delivered a score of 69,811, equating to a significant 19% performance uplift.
The Threadripper Pro
76,117, corresponding to an 8% performance uplift.
Armari also offers an overclocked desktop Threadripper Pro
workstation, which we reviewed in the January/February 2023 workstation special report. ■ www.armari.com
Overclocking Pump up the power
The biggest potential benefit comes from engineering simulation, specifically CFD. Our tests show that ‘Sapphire Rapids’ can deliver a massive performance boost, largely thanks to its superior memory bandwidth
Workstations for every workflow
Speed up modelling, animating and rendering with workstations optimised for M & E applications such as 3DS Max, Cinema4D & Unreal Engine
19,999.99 INC VAT
‘‘
The aesthetic design, functional design and build quality of the ThinkStation P7 and (in particular) the ThinkStation PX, is simply incredible ’’The Lenovo ThinkStation PX includes lockable front access hot swap storage
Review: Lenovo ThinkStation
These are two of the most well designed and manufactured workstations we’ve ever seen, built for desktop or datacentre, but AEC firms will need to look closely at which workflows will benefit from the new ‘Sapphire Rapids’ Intel Xeon processors inside
Price £POAwww.lenovo.com/workstations
Lenovo has played its workstation hand extremely well over the last few years. In 2020, while HP and Dell continued to rely on ageing Intel ‘Cascade Lake’ processors to power their high-end workstations, Lenovo embraced AMD Ryzen Threadripper Pro and the ThinkStation P620 was born.
The processor’s 64-cores gave Lenovo a significant performance advantage in a range of multi-threaded workflows, from simulation to ray trace rendering. Intel had nothing that came remotely close, but now with its new ‘Sapphire Rapids’ workstation processors, this is about to change.
And Lenovo is certainly going big with ‘Sapphire Rapids’. Its new workstations, the ThinkStation PX (pronounced P10), P7 and P5 arrived with considerable fanfare in March 2023. The striking black and red design is the result of a collaboration with legendary automaker Aston Martin. The workstation’s front grill and side panel’s flush handle are classic Aston Martin.
The flagship ThinkStation PX is the most expandable of the new machines, featuring dual 4th Gen Intel Xeon Scalable processors (up to 2 x 60-cores), up to 2 TB of DDR5 4,800MHz memory, and up to four dual-slot GPUs, including the Nvidia RTX 6000 Ada Generation. The machine is designed to handle the most demanding multi-threaded or multiGPU workflows such as Computational Fluid Dynamics (CFD), ray trace rendering and video editing.
The ThinkStation P7 comes with a choice of workstation-specific Intel Xeon W-3400 Series processors (up to 56-cores), and up to 1 TB of DDR5 4,800MHz memory. The single socket machine will likely hit the price/performance sweet spot for many visualisation and simulation workflows, especially those that want the combination of high clock speeds for single threaded operations and 56-cores. It can also support
up to three dual-slot GPUs.
The ThinkStation P5 features Intel Xeon W-2400 Series CPUs (up to 24 cores) and up to two dual-slot GPUs. Lenovo calls the P5 an ‘industry workhorse’ and it looks well suited to a wide range of workflows from CAD and visualisation to simulation and reality modelling, although we expect it will face stiff competition from Lenovo’s Intel Core-based workstations.
Rack optimised
The ThinkStation PX and P7 were built from the ground up to be ‘rack optimised’ and offer several features to transform these desktop machines into what Lenovo describes as ‘hybrid cloud workstations’, with remote management capabilities like those found in rack servers.
This includes an optional Baseboard Management Controller (BMC) card that gives IT managers ‘full remote management’. According to Lenovo, it will enable them to monitor the workstation, cycle on and off, perform BIOS or firmware updates and re-image the machine if necessary. In addition to data centre deployments, this could be of interest to IT managers supporting those working from home.
The machines also feature enhanced on-board diagnostics with a small LCD display on the front that shows a QR code in the event of a system error – even out of band failure states when a machine won’t turn on. The user simply snaps the code with their smart phone camera, and they will be taken directly to the relevant page on the Lenovo service website.
Lenovo ThinkStation PX design
As Lenovo’s flagship ‘Sapphire Rapids’ desktop workstation, it’s hardly surprising that the ThinkStation PX has the most impressive chassis. The build quality is superb, arguably the best we’ve seen in any workstation. The solid metal chassis has handles built into all four corners. It feels incredibly strong. And it certainly needs to be. Our test machine was heavy enough with a single GPU, single PSU and no Hard Disk Drives (HDDs). Carrying a ThinkStation PX around is a two-person job. Lifting it into a rack could be an Olympic sport.
The ThinkStation PX is primarily a desktop workstation, but it’s also been built from the ground up for the datacentre with a rack optimised ‘5U’ design. Bolt holes are hidden under a removable top cover, making it easy to deploy in a standard 19-inch rack with the optional sliding rack rail kit.
For resiliency and redundancy, the machine comes with an optional second
rear hot-swappable 1,850W power supply unit (PSU), so should one PSU fail, the machine will carry on working. There’s also a rear accessible power button and lockable front access hot swap storage, which includes options for both 3.5-inch Hard Disk Drives (HDDs) and Solid State Drives (SSDs). Up to two SSDs can also be mounted on the motherboard but will be hidden under a GPU in multi GPU configs.
Alongside the front drive bays, you’ll find the power button, headphone jack, LCD diagnostics display, two USB Type A and two USB Type C ports, which light up when powered on. This this is a big plus to stop you scrabbling around in the dark.
There are plenty more ports at the rear – 6 x USB Type A and 1 x USB Type C, along with two RJ45 Ethernet ports - 1GbE and 10GbE. There’s also an optional Intel AX210 WIFI PCIe Adapter with antennas built into the top of the chassis.
Inside, the system is essentially split into two distinct sections with the motherboard offset from the side. Above the motherboard you’ll find CPU, memory and GPUs. Beneath the motherboard is storage and power supply units (PSUs).
The beauty of this design is that the components that generate the most heat enjoy uninterrupted airflow from front to back. And considering that a fullyspecced ThinkStation PX can house up to two 350W Intel Xeon Platinum CPUs, up to four 300W Nvidia RTX 6000 Ada Generation GPUs and up to 2TB of DDR5 memory spread across 16 DIMMs, it certainly needs all the help it can get.
To optimise thermals, Lenovo uses a tri-channel cooling system. Fresh air is drawn in through the ‘3D Hexperf’ front grill, the design of which was inspired by Aston Martin’s iconic DBS grand tourer. But it’s not just for looks. The spacing and shape of the rigid plastic grille, which has rounded spikes that protrude at the front, is optimised for maximum airflow.
The engineering star of the show is the redesigned ABS plastic air baffle, that acts as a wall of separation between the trichannel cooling system’s three distinct zones. Each zone is fed by its own fans — the idea being that you don’t get any pre-heated air from the CPUs going into the GPUs and vice versa. The baffle also separates the CPUs and brings a different channel of fresh air to each as well as the memory DIMMs.
Despite the close attention to thermal engineering, the ThinkStation PX is not a silent machine. Fan noise was quite noticeable when rendering or solving Computational Fluid Dynamics (CFD) problems using both Intel Xeon Platinum 8490H processors. But this is hardly
surprising, as it drew 1,000W at the socket. Still, compared to a rack mounted server it’s an oasis of calm.
The ThinkStation PX scores very highly on serviceability with tool-free access on everything bar the CPUs. It’s not only one of the most beautifully engineered workstations we’ve ever seen; it also feels like everything has been manufactured to very low tolerances. This starts with the side panel which can be removed easily with a simple press and pull of the stylish flush handle. The panel effortlessly clicks back into place, which can’t be said of many desktop workstations.
All serviceable components are signposted with red touch points, from the replaceable fans with blind mate connectors and the PSU(s) at the rear, to brackets that hold the GPUs in place and levers to ease out the hard drive caddies. Aston Martin’s Cathal Loughnane reckons you don’t need a user manual. We wouldn’t go that far, but it’s certainly intuitive.
Lenovo ThinkStation P7 design
From the outside the ThinkStation P7 looks like a slimmed down version of the PX. It’s the same height, but not as deep or wide (4U, for racks). This means there are no front accessible drive bays, and all interior components are located on one side of the motherboard – CPU and memory in the middle, GPUs either side and PSU and HDD caddies at the bottom.
An air baffle channels cool air directly over the CPU, while both 4 DIMM memory banks have their own cooling fan units which clip off.
As the front CPU fans only need to cool a single Intel Xeon W-3400 series processor, they are much smaller than those used on the ThinkStation PX. And, it seems, they don’t have to work as hard. When rendering in KeyShot, for example with the single Intel Xeon w9-3495X processor, the machine was remarkably quiet, even though it drew 530W at the
socket. And it can do this for hours on end. In Keyshot 2023, for example, when rendering a multi-frame animation on all 56-cores, fan noise remained constant, and the CPU maintained a steady 2.85 GHz.
The P7 follows the same design ethos as the PX with red touch points throughout. You don’t get quite the same level of serviceability, however. Once you clip out the cooling fans, for example, you must still disconnect the cables from the motherboard.
Elsewhere the chassis shares many of the same features as the PX – rear power button, built in WiFi, dual Ethernet, etc.
ThinkStation P7 / PX in action
Lenovo lent us a ThinkStation P7 and ThinkStation PX. These are pre-production units, so they may be slightly different to the final shipping workstations. Performance, for example, may increase with BIOS updates, so our benchmark results should not be treated as gospel.
The core specs can be seen below.
Lenovo ThinkStation P7
• Intel Xeon w9-3495X CPU
• 256 GB (8 x 32) DDR5 4,800MHz memory
• 4 x Nvidia RTX A4000 GPU (16 GB)
• 2 TB Samsung PM9A1 SSD
• Microsoft Windows 11 Pro for workstations
Lenovo ThinkStation PX
• 2 x Intel Xeon Platinum 8490H CPUs
• 256 GB (16 x 16) DDR5 4,800MHz memory
• Nvidia RTX 6000 Ada Gen GPU (48 GB)
• 2 TB Samsung PM9A1 SSD
• Microsoft Windows 11 Pro for workstations
CPU workflows
The ThinkStation P7 is built around the new workstation-specific Intel Xeon W-3400 Series processors, supporting up to 56-cores in a single socket. While it can’t match the ThinkStation PX for
number of cores, the Intel Xeon W-3400 boasts higher Turbo clock speeds, so will outperform the ThinkStation PX in general system operations and in workflows that can’t take advantage of more than 56-cores.
CAD is a classic single threaded application and in Solidworks 2022 the ThinkStation P7 had a clear lead over the PX in everything but rendering. This lead also extended to reality modelling in MetaShape Pro (photogrammetry) and Leica Cyclone 360 (point cloud processing).
But in such single threaded or lightly threaded workflows, the ThinkStation P7 can’t hit the same heights as Lenovo’s mainstream workstations. The Lenovo ThinkStation P360 Ultra with 12th Gen Intel Core i9-12900K outperformed the ThinkStation P7 by a considerable margin. And this lead should grow even bigger with the P360 Ultra’s successor, the ThinkStation P3 Ultra, which features 13th Gen Intel Core processors.
But CAD users — at least those who only use CAD — are not really the intended audience for Lenovo’s ‘Sapphire Rapids’ workstations. The real beneficiaries will be those that have workflows that either benefit from a) lots of cores, such as ray trace rendering or simulation, b) from high memory bandwidth, such as Computational Fluid Dynamics (CFD), or c) just use colossal datasets that need lots of memory.
Of course, these are also workflows that are ideal for the AMD Ryzen Threadripper Pro 5000WX Series, the processor at the heart of the Lenovo ThinkStation P620.
While we don’t have benchmark figures for that specific machine, we do have them for another 64-core AMD Ryzen Threadripper Pro 5995WX-based workstation, the Scan 3XS GWP-ME A1128T workstation.
We found the Scan workstation (64-core Threadripper Pro 5995WX) outperformed the ThinkStation P7
(56-core Xeon w9-3495X) in all of our rendering benchmarks – V-Ray, KeyShot, Blender and Cinebench. Here the additional eight cores and higher all-core frequencies appear to make a big difference. In Cinebench for example, Scan’s Threadripper Pro 5995WX maintained a 3.05 GHz Turbo, while Lenovo’s Xeon w9-3495X peaked at 2.54 GHz. Of course, frequencies cannot be compared directly, as both processors deliver different Instructions Per Clock (IPC).
There was a different story with Computational Fluid Dynamics (CFD), testing the WPCcfd and rodiniaCFD workloads in the SPECworkstation 3.1 benchmark. The ThinkStation P7 had a small lead with WPCcfd and a substantial lead with rodiniaCFD. Here, we think Sapphire Rapids’ superior memory bandwidth gives it an advantage as it is able to feed its cores much quicker. While both AMD and Intel processors feature 8-channel memory, Intel has DDR5 4,800MHz which is much faster.
As one might expect, with 120 cores to play with, the ThinkStation PX had quite a considerable lead in both our rendering and CFD benchmarks.
We explore ‘Sapphire Rapids vs Threadripper Pro’ in more detail in the article on page WS4.
GPU workflows
Of course, the ThinkStation P7 and PX offer much more than just ‘Sapphire Rapids’ processors. They can also host multiple highperformance Nvidia pro GPUs, up to the Nvidia RTX 6000 Ada Generation (read our review on page WS24).
The main difference between the two machines is that the PX can support four double height GPUs or eight single height GPUs, whereas the ThinkStation P7 can support three double height or six single height.
Our ThinkStation PX came loaded with a single Nvidia RTX 6000 Ada Generation GPU. This is an incredibly powerful GPU for pro viz workflows with 48 GB of memory to handle colossal datasets. We got incredibly smooth graphics in our real-time viz tests with very high frame rates at 4K resolution in Enscape (118 FPS) and in Unreal Engine with the Audi Car Configurator model (64.5 FPS / 39.4 FPS with Ray tracing disabled / enabled).
Not surprisingly, it also delivered incredible scores in our GPU ray tracing benchmarks (KeyShot, V-Ray and, Blender). To provide some context of what this might mean for day-to-day workflows,
in Solidworks Visualize with the 3ds Stellar rendering engine it finished a 4K resolution 1,000 pass render in 81 seconds and a 100-pass render with denoising in a mere 8 seconds. In KeyShot, with denoising enabled, it rendered our bike scene at 8K resolution with 128 samples in 24 secs.
The ThinkStation P7 was configured rather differently, with four Nvidia RTX A4000 GPUs, each with 16 GB of memory. The obvious use case for this setup is virtualisation where the ThinkStation P7 could be carved up into four Virtual Machines (VMs) each with their own dedicated GPU.
The four GPUs could also be put to work in a single workstation, and we found enough collective power there to edge out a single Nvidia RTX 6000 Ada Generation in V-Ray, even though the Nvidia RTX A4000 is built on Nvidia’s older ‘Ampere’ architecture. With the
seems to have added real value.
The big question for many AEC firms is whether ‘Sapphire Rapids’ is the right workstation platform for them? Or might they be better off with AMD Ryzen Threadripper Pro, available in the Lenovo ThinkStation P620.
Much of this depends on workflows. Our tests show that the ThinkStation P7 with 56-core Intel Xeon w9-3495X wins out in single threaded software, such as CAD, and those that are typically heavily bottlenecked by memory bandwidth such as CFD. But the 64-core Threadripper Pro 5995WX offers significantly better performance for rendering, thanks in part to its additional eight cores.
Meanwhile, the ThinkStation PX with its dual Intel Xeon Platinum 8490H processors sits top of tree in all our highly multi-threaded tests, but at $17,000 per processor it feels the market for this level of performance will be quite limited. Plus, you must take a substantial hit in single threaded workflows.
Of course, ‘Sapphire Rapids’ for Lenovo’s workstations is not just about these top-end processors. For the ThinkStation P7, Lenovo offers a total of seven Intel Xeon W-3400 processors, ranging from 12 to 56 cores, compared to five for the Threadripper Pro 5000 WX-Series, so customers may find sweet spots where Intel wins out on price/performance.
Nvidia RTX 4000 Ada Generation GPU, which should launch later this year, we would expect a considerable performance uplift, probably more memory per GPU, and four GPUs to still cost less than a single Nvidia RTX 6000 Ada.
Of course, in a single workstation setup there are two big downsides to spreading all that GPU power across multiple boards – a) you’ll mostly only be able to harness the power of one of those GPUs for real-time visualisation, and b) the size of datasets will be limited by the memory capacity of a single board.
Conclusion
Lenovo has done an incredible job with its ‘Sapphire Rapids’ workstations. The aesthetic design, functional design and build quality of the ThinkStation P7 and (in particular) the ThinkStation PX, is simply incredible. Partnerships with leading brands often feel very surface level, but the one with Aston Martin
The options for the ThinkStation PX feel more limited, with lower core Intel Xeon Scalable processors competing with higher core count Intel Xeon W-3400 Series processors in the ThinkStation P7. Such configs may become more attractive when customers want to load up the workstation with four double height GPUs and don’t necessarily need tonnes of CPU performance.
Finally, it’s important to state that the ThinkStation P7 and PX are much more than just desktop workstations. By making them easily rack mountable, and offering server grade remote management and serviceability, they also give AEC firms the flexibility to support staff wherever they need to work.
Importantly, Lenovo’s ’hybrid cloud workstation’ approach means AEC firms can manage the transition to hybrid working at their own pace, without having to jump in with both feet when investing in a centralised datacentre workstation resource.
Power. Performance. Productivity.
Get pro-grade performance with workstations custom-designed for advanced workflows—such as 3D modeling, rendering, and simulation—and software applications to help you stay productive.
Professional performance
Design in real time with the power of high-performance workstations configured with professional GPUs, advanced CPUs, and large memory configurations.
ISV certified
Certified with leading software applications to ensure peak performance.
Customize and upgrade
Configure to meet your advanced workflow needs today, with room to add or swap components as your project complexity grows.
Reliability
Our workstations undergo military-standard testing,1 plus up to 360K additional hours of testing.
Intel Xeon ‘Sapphire Rapids’ workstation round up
Our top picks of single socket Intel Xeon W-2400 / W-3400 Series and 4th Gen Intel Xeon Scalable workstations — desktop and rack
HP Z4, Z6, Z8 & Z8 Fury G5
HP has four ‘Sapphire Rapids’ workstations, the most out of all the major vendors. Like Lenovo, it is also looking to blur the boundaries between desktop and datacentre, introducing several features more commonly found in servers, with a view to minimising downtime and enhancing system management. This includes hot swappable M.2 SSDs, redundant PSUs and the HP Anyware Remote System Controller to help IT managers better manage workstation fleets – desktop, rack and hybrid. The HP Z4 G5 features Intel Xeon W-2400 Series (up to 24-cores), and up to two dual slot GPUs. The HP Z6 G5 features the Intel Xeon W-3400 Series from 12 to 36 cores (not including the flagship 56-core model) and three dual slot GPUs. The HP Z8 G5 features ‘Sapphire Rapids’ fourth generation Xeon Scalable processors, but only up to 32 cores, and two dual slot GPUs. The HP Z8 Fury G5 supports the whole range of Intel Xeon W-3400 Series CPUs (up to 56-cores) and up to four dual slot GPUs.
■ www.hp.com/zworkstations
Boxx Apexx W3 / W4 & Raxx W3
Boxx’s ‘Sapphire Rapids’ workstation family is split neatly into three machines: the desktop Xeon W-2400 Apexx W3, the desktop Xeon W-3400 Apexx W4 (which can also be rack mounted), and the dedicated 3U rack, the Raxx W3, which features liquid cooled Xeon W-3400 processors.
The Apexx W3 can host two double slot GPUs, up to the Nvidia RTX 6000 Ada Generation, while the Apexx W4 and Raxx W3 can have four.
■ boxx.com
Scan 3XS Render Pro X6
Scan uses its expertise as a custom workstation manufacturer to offer something different to most others. Its Intel Xeon W-3400 Series-based 3XS Render Pro X6 makes the GPU the star of the show. It packs six Nvidia GeForce RTX 4090s into a Corsair 1000D chassis to create a massively powerful desktop workstation for GPU rendering.
The Nvidia GeForce RTX 4090 is a triple slot card out of the box, but Scan has stripped it down to single slot with a custom water cooler to keep thermals under control.
Scan also offers a more standard ‘Sapphire Rapids’ workstation. The 3XS Custom GWP 4677 features Intel Xeon W-2400 Series CPUs and dual GPUs up to the Nvidia RTX 6000 Ada Generation. ■ www.scan.co.uk/3xs
Dell Precision 5860 Tower , 7960 Tower & 7960 Rack
For its ‘Sapphire Rapids’ desktops, Dell has taken a different approach to HP and Lenovo. It has focused exclusively on Intel’s single socket workstation processors - the Xeon W-2400 (Precision 5860 Tower) and W-3400 (Precision 7960 Tower) - ignoring the server-focused 4th Gen Intel Xeon Scalable altogether. However, with these two machines, Dell should have most bases covered in AEC, especially with the Precision 7960 supporting 4 GPUs on top of its 56-cores. 4th Gen Intel Xeon Scalable still gets a look-in with the datacentre-focused dual socket Precision 7960 Rack. While on paper this 2U machine offers much greater user density compared to HP and Lenovo’s 4U / 5U offerings, it can only support up to two GPUs (whereas the Lenovo ThinkStation PX can support up to four), so those with more demanding GPU-centric workflows may lose out or have to use two machines instead of one, which will likely add to costs.
■ www.dell.com/precision
Workstation Specialists WS IXW-W7900 & WS IXW-W7901
UK firm Workstation Specialists offers two ‘Sapphire Rapids’ desktop workstations which differ largely by the number of GPUs they can support. The WS IXW-W7901 offers up to four double slot cards up to the Nvidia RTX 6000 Ada Generation, while the WS IXW-W7900 offers three.
Interestingly, on paper the WS IXW-W7901 should be able to do this with both Intel Xeon W-2400 and W-3400 Series processor options. This is in contrast to most workstation manufacturers who only offer four double slot GPUs with the more expensive Intel Xeon W-3400 Series processors.
So if your workflows are CPU light and GPU heavy, then configuring the WS IXW-W7901 with the entry-level Intel Xeon w3-2423, for example, could give you the power you need for GPU rendering without spending money on a high core count, high memory bandwidth CPU you don’t need. ■ www.workstationspecialist.com
Scan 3XS 3XS GWP-ME A1128T
With an Nvidia RTX 6000 Ada Generation professional GPU and 64-core Threadripper Pro CPU, this monster desktop workstation packs a serious punch for the most demanding design viz workflows
For its latest high-performance workstation, Scan has combined two of the most powerful workstation-class processors out there — the 64-core AMD Ryzen Threadripper Pro 5995WX CPU and the Nvidia RTX 6000 Ada Generation GPU.
Coupled with 256 GB of DDR4 memory and an ultra-fast 8TB SSD RAID 0 array, this machine will likely be the envy of most design viz artists.
Given that the Nvidia RTX 6000 Ada Generation is fresh off the production line (read our review on page WS24), it is arguably the silicon star of this workstation. With 48 GB of GDDR6 memory, the ultra-high-end GPU is well equipped to handle the most demanding viz datasets, both in real time 3D and ray tracing / path tracing.
It absolutely obliterated many of the benchmark records set by its predecessor, the Nvidia RTX A6000 (48 GB) (read our
review www.tinyurl.com/AECRTXA6000). The biggest gains were seen in GPU ray tracing where the third generation RT cores really come into their own, outperforming the Ampere Generation GPU by a factor of 1.93, 2.05, and 2.19 respectively in the V-Ray, KeyShot, and blender benchmarks. This is a phenomenal generation on generation increase.
It’s no slouch in real time 3D either. In Unreal Engine 4.26 we saw frame rates with our Audi Car Configurator model increase by 1.50 and 1.41 times respectively with ray tracing enabled and disabled. The performance increase rose to 1.63 in arch viz tool Enscape 3.1, and also 1.63 in high-end automotive viz software Autodesk VRED Professional 2023.
Product spec
an automatic improvement in 3D frame rates either, as most real time viz tools are not multiGPU aware.
With so much processing power available through the GPU, it’s easy to forget there’s also a monster Threadripper Pro 5995WX CPU at your disposal. Rendering is an obvious beneficiary of the 64-core CPU but that’s also a job that the RTX 6000 Ada Generation does exceptional well.
To boost performance further, the Scan 3XS GWP-ME A1128T can take a second RTX 6000 Ada Generation GPU, but at £7,149 (Ex VAT) per card, you’ll need seriously deep pockets. This should cut ray trace render times significantly (by up to half), but you won’t get a 96 GB pool of memory to play with like you would with two Nvidia RTX A6000s. Nvidia has dropped support for NVlink. Don’t expect
Viz users often have well defined rendering pipelines that focus on either CPU or GPU and not necessarily both. That’s not always the case, of course. While V-Ray has entirely different render engines for GPU and CPU and users tend to stick to one, Solidworks Visualize can use both concurrently, and KeyShot allows you to easily swap between GPU and CPU as and when required. This could be to help free up compute resources in order to focus on other workflows, such as real time 3D, video editing or video encoding. Unreal Engine also has different compute-intensive processes that run on CPU and GPU.
CPU rendering also has the benefit of being able to work with incredibly
large datasets and with 256 GB of system memory (8 x 32 GB Samsung ECC Registered DDR4 3200MHz) the Scan workstation is certainly well equipped.
With a default TDP of 280W, the Threadripper Pro 5995WX is one of the more challenging CPUs to cool. Scan uses a 360mm Corsair H150i Elite Cappelix RGB hydrocooler mounted in the Fractal Design Meshify 2 case and has replaced the fans with more efficient Noctua models.
This gives enough thermal headroom to increase all core frequencies above the base 2.70 GHz, peaking at 3.05 GHz in both Cinebench and KeyShot 2023.
It’s not the best Threadripper Pro implementation we’ve seen. The Armari Magnetar M64TPRW1300G3 (read our review in AEC Magazine’s January /February 2023 Workstation Special Report), with its custom All-in-One (AIO) cooler, manages to hit 3.38 GHz in Cinebench and 3.45 GHz in KeyShot, outperforming the Scan machine by a factor of 1.05 in Cinebench and KeyShot and even more in V-Ray (1.1).
Both processors pump out some serious heat and that’s hardly surprising considering how much power they draw. When rendering in Cinebench (CPU) we recorded 474W at the socket, 550W with V-Ray GPU, and a whopping 740W when using both processors in Solidworks Visualize. The machine was fairly noisy when CPU rendering, less so when GPU rendering.
The chassis is Scan’s trademark Fractal
Design Meshify 2 with 3XS custom front panel. It’s a little on the large side (542 x 240 x 474 mm), but is solid and well-built and has a ready supply of ports. Up front and top, there are two USB 3.2 Type A and one USB 3.2 Type C, with plenty more at the rear (eight USB Type-A and two USB 3.2 Type C). For networking there two superfast 10GbE NICs and WiFi 6 built in.
The Scan 3XS GWP-ME A1128T has some other tricks up its sleeve. While the 2TB Samsung 990 Pro SSD system drive is standard fare for workstations these days, the project drive certainly is not.
The ultra-fast 8TB RAID 0 array is built using four 2TB Samsung 990 Pro NVMe PCIe 4.0 SSDs mounted on an ASUS Hyper M.2 PCIe add-in card, and delivers phenomenal sequential read / write speeds. In CrystalDiskMark we recorded 24.6 GB/s read and 24.8 GB/s write, compared to 7.4 GB/s and 6.8 GB/s on a single 2TB Samsung 990 Pro.
This all sounds great on paper, but the reality is there are only certain workflows that will benefit from such fast storage and only in certain conditions. This includes engineering simulation (with gigantic datasets that don’t fit entirely into system memory), or video editing (with colossal, super high-resolution files). There may be more, and we’d love to learn what they are.
We did see a small benefit over a single SSD when copying files. A zipped file containing 90 GB of point cloud scan data delivered the biggest speed up, with
Scan Cloud workstations
Scan is best known for its desktop machines, but the Bolton-based firm also has a dedicated cloud workstation division that offers systems with Nvidia virtual GPUs (vGPUs) hosted in iomart datacentres in the UK.
Customers have a choice of pre-configured vGPU instances, available to rent on a monthly subscription. Alternatively, customers can go down to a granular level, selecting different vCPU, RAM, vGPU and storage options through an online configurator — in much the same way one would spec out a desktop workstation.
Customers get real time
feedback on the monthly rental price, then add to the basket when happy. While this shopping basket approach is a great way to understand the costs of components most new customers will likely call Scan’s Cloud workstation division for advice. Here they can help size vGPU instances based on the applications used and types of models created. Building a relationship in this way can also get you a free ‘Proof of Concept’ trial.
Each vGPU instance comes pre-loaded with Windows 10. However, the OS is unlicensed
the RAID 0 array finishing 35% faster. The same uncompressed dataset (7,414 scans) was 25% faster, a 3ds max dataset (60 large scene files and 4,400 smaller materials, totalling 4.6 GB) was 24% faster and a Revit dataset (68 files, totalling 4.6 GB) was 11% faster.
Of course, the downside of RAID 0 is it introduces multiple points of failure, so should one drive fail all data is lost. It makes regular backups more important than ever.
The verdict
The Scan 3XS GWP-ME A1128T is a serious workstation for design viz professionals, with buckets of processing power for all different workflows, from real-time to ray trace rendering, video editing and beyond. But it also comes with a serious price tag.
If £16,666 (Ex VAT) seems a lot more than you’re used to paying for a machine of this type, that’s because it probably is. The price of a Threadripper Pro CPU has increased significantly, and the Nvidia RTX 6000 Ada Generation costs considerably more than its predecessor did at launch.
But that’s the current reality of super high-end workstation hardware. Both AMD (CPU) and Nvidia (GPU) have had little in the way of competition in recent times. But with Intel’s long-awaited ‘Sapphire Rapids’ Xeon W-3400 Series CPUs (see page WS4) and AMD’s Radeon Pro W7800 and W7900 GPUs (see page WS28) out now this could change.
— the idea being that customers can save money by using their own Windows 10 corporate licences. Ubuntu is also available.
Scan also points out that
there is no charge for uploading or downloading data, which is not the case with hyperscale public cloud providers.
■ www.scan.co.uk/business/scan-cloud
Nvidia RTX 6000 Ada Generation
One can’t deny the breathtaking performance of Nvidia’s new flagship workstation GPU and its potential to completely transform viz workflows, but some will find the price offputting
Price £7,150 + VAT
www.nvidia.com | www.pny.com
New GPU architectures are delivered from the top down. And the new Nvidia RTX 6000 Ada Generation is very much at the top of the stack. With a price tag of £7,150 (Ex VAT), this 48 GB professional GPU is reserved for those that take design visualisation, simulation or AI extremely seriously.
The first thing to get out of the way is the name of this new workstation-class GPU. It is built on Nvidia’s Ada Lovelace architecture, named after the English Mathematician credited with being the first computer programmer.
Recently, Nvidia has used a single letter prefix for its pro GPUs — P for Pascal, T for Turing, A for Ampere, and so on.
As ‘A’ was already taken, Nvidia initially referred to the Ada Lovelace GPU as the Nvidia RTX 6000, but soon after tagged ‘Ada Generation’ on to the end, presumably to avoid confusion with 2018’s Turing-based Nvidia Quadro RTX 6000.
We don’t know why Nvidia didn’t use an ‘L’ prefix as it has done for its ‘Ada Lovelace’ datacentre GPUs (the Nvidia L4 and L40), but this is where we are
at now. The Nvidia RTX 6000 Ada Generation might be a bit of a mouthful, but at least it has a clear identity.
The workstation card
The Nvidia RTX 6000 Ada Generation is a dual slot, full height, PCIe 4.0 workstation GPU with four DisplayPort 1.4a connectors. It looks virtually identical to its predecessor, the Nvidia RTX A6000, and has a minimal angular black and gold design. The radial type fan blows hot air directly out of the rear of the workstation via the grille on the bracket.
The total board power is 300W, which is less than its triple slot ‘Ada Lovelace’ consumer counterparts — the Nvidia GeForce RTX 4090 and GeForce RTX 4080. Power is delivered via a single PCIe CEM5 16-pin connector.
The card has a phenomenal amount of processors: 142 third-gen RT Cores for ray tracing (delivering 211 TFLOPs), 568 fourth-gen Tensor Cores for AI compute (delivering 1,457 TFLOPs), and 18,176 next-gen CUDA cores for general purpose operations, boasting 91 TFLOPs of single precision performance. This is a significant jump up from the Nvidia RTX A6000 it replaces, which delivers RT Core performance of 76 TFLOPs, Tensor performance of 310 TFLOPs, and singleprecision performance of 39 TFLOPS.
The Nvidia RTX 6000 Ada comes with 48 GB of GDDR6 memory, which should be plenty for most viz-centric workflows. However, unlike its predecessor, the RTX 6000 Ada Generation does not support NVLink, so two GPUs cannot be bridged together with an adapter to create a 96 GB memory pool. While this shouldn’t matter to most users, it could be a barrier for those working with exceptionally high poly count models / high resolution textures.
It could also limit its use in engineering simulation, including Computational Fluid Dynamics (CFD).
48 GB is still double that on offer in the top-end consumer Nvidia GeForce RTX 40-Series. The RTX 6000 Ada also differentiates itself from Nvidia’s consumer cards in several other ways, including pro drivers, pro software certifications, support for Error Correction code (ECC) memory, and some niche features for pro viz, including stereo and Frame Lock for viz clusters.
It also supports Nvidia virtual GPU (vGPU) software, which allows a workstation to be repurposed into multiple GPU-accelerated virtual workstation instances. With workstation vendors, especially Lenovo and HP, actively making their new ‘Sapphire Rapids’ desktop workstations rack friendly, this feature is likely to be more important than ever before.
Finally, it boasts 3x the video encoding performance of the Nvidia RTX A6000, for streaming multiple simultaneous XR sessions using Nvidia CloudXR.
Optimised for visualisation
The Nvidia RTX 6000 Ada offers all the generational improvements you’d expect from a new GPU architecture, but there are also significant changes in the way the GPU carries out calculations to increase performance in viz-centric workflows.
Deep Learning Super Sampling 3 (DLSS) and Shader Execution Reordering (SER) are the two technologies that stand out.
Nvidia DLSS has been around for several years and with the new Nvidia RTX 6000 Ada, it is now on its third generation. It uses the GPU’s AI Tensor cores to boost performance.
With Nvidia’s previous generation
‘‘
The Nvidia RTX 6000 Ada Generation is a phenomenally powerful GPU. In several visualisation workflows, it delivered more than double the performance of the previous generation Nvidia RTX A6000 ’’Autodesk VRED Professional 2022 (OpenGL)
‘Ampere’ GPUs, DLSS 2 took a lowresolution current frame and the highresolution previous frame to predict, on a pixel-by-pixel basis, what a highresolution current frame would look like.
With DLSS 3, the Tensor cores generate entirely new frames rather than just pixels. It processes the new frame, and the prior frame, to discover how the scene is changing, then generates entirely new frames without having to process the graphics pipeline.
So far, we’ve only seen DLSS 3 implemented in Nvidia Omniverse, but we expect others to follow. Enscape and Autodesk VRED, for example, both support DLSS 2.
As a background to Shader Execution Reordering (SER), Nvidia explains that GPUs are most efficient when processing similar work at the same time. However, with ray tracing, rays bounce in different directions and intersect surfaces of various types. This can lead to different threads processing different shaders or accessing
memory that is hard to coalesce or cache.
With SER, the Nvidia RTX 6000 Ada Generation can dynamically reorganise its workload, so similar shaders are processed together. According to Nvidia, SER can give a two to three times speed up for ray tracing and a frame rate increase of up to 25%. But these are probably extremes. For offline path tracing in Unreal Engine 5.1, for example, Nvidia quotes speed improvements of 40% or more.
Engineering simulation
In AEC, while visualisation is the primary use case for the Nvidia RTX 6000 Ada, the GPU can also be used for engineering simulation.
At launch, Nvidia highlighted the use of Ansys software, including Ansys Discovery and Ansys Fluent for Computational Fluid Dynamics (CFD).
Compared to the RTX A6000, the RTX 6000 Ada not only has more cores and faster cores, but significantly larger L2 cache (96 MB vs 6 MB) and increased
memory bandwidth (960 GB/s vs 768 GB/s). According to Ansys, this results in ‘impressive performance gains’ for the broad Ansys application portfolio.
However, the Nvidia RTX 6000 Ada is not suited to all simulation tools. Some simulation solvers require double precision and with relatively poor FP64 performance (which at 1,423 GFLOPSs is 1/64 of its FP32 performance), the RTX 6000 Ada is unlikely to perform that well in those that do. In fact, for double precision solvers, even 2016’s Nvidia Quadro GP100 boasts better FP64 performance of 5.17 TFLOPs.
Testing the Nvidia RTX 6000 Ada
AEC Magazine put the Nvidia RTX 6000 Ada Generation through a series of real-world application benchmarks, both GPU rendering and real time visualisation.
The GPU is simply overkill for current generation CAD and BIM software, so we didn’t do any testing in that regard. However, it’s important to note that it
will still be certified for the likes of Revit and Archicad, which is useful if you plan to use those types of core applications alongside viz focused tools like Enscape, V-Ray, Twinmotion and Lumion.
The full spec of our AMD Ryzen Threadripper Pro test machine, the Scan 3XS GWP-ME A1128T can be seen below. You can read a full review on page WS22
Scan 3XS GWP-ME A1128T
• AMD Ryzen Threadripper Pro 5995WX processor (2.7 GHz, 4.5 GHz boost) (64-cores, 128 threads)
• 256 GB (8 x 32GB) Samsung ECC
Registered DDR4 3200MHz memory
• 2TB Samsung 990 Pro PCIe 4.0 SSD
• 8TB RAID array (4 x 2TB Samsung 990 Pro NVMe PCIe 4.0 SSDs)
• Asus Pro WS WRX80E Sage SE WiFi
motherboard
• Corsair H150i Elite Cappelix RGB with Noctua fans
• 1,500W Corsair HXi, 80PLUS Platinum
• Microsoft Windows 11 Pro
For comparison, we also tested the Nvidia RTX A6000 GPU inside the same machine. Nvidia’s 528.24 pro driver was used for both GPUs.
Real time 3D
Real time 3D visualisation with applications that use OpenGL, DirectX or Vulkan graphics APIs continue to be a very important part of architectural visualisation. Key applications include TwinMotion, Lumion, Enscape and Unreal Engine.
We recorded frame rates (Frame Per Second) within Enscape, Unreal Engine, Nvidia Omniverse, and Autodesk VRED Professional, a pro viz application
commonly used in automotive design.
We only tested at 4K (3,840 x 2,160) resolution. At FHD (1,920 x 1,080) , it’s a given that the Nvidia RTX 6000 Ada Generation can deliver more than enough performance.
In Enscape, we tested with five different models. Overall, our experience was incredibly smooth, even with the large RTX-enabled Enscape 3.0 building complex which uses 11 GB of GPU memory (see picture right)
However, our preferred benchmark model, an urban scene from Enscape 3.1, was a little unresponsive, sometimes taking a few seconds to react to mouse or keyboard movements. We don’t know why this was, but it could be because it includes custom assets and textures and there is a conflict of some sort. Once it got going, however, we recorded a phenomenal 124.95 FPS, 63% faster than the Nvidia RTX A6000.
In Unreal Engine 4.26, the generationon-generation gains were smaller. The biggest increase came when ray tracing was enabled on our Audi test model, with the RTX 6000 Ada Generation delivering 1.5 times more FPS than the RTX A6000.
In Autodesk VRED Professional 2023, performance increases ranged from 1.63 to 1.89. The biggest came when antialiasing was disabled.
In Nvidia Omniverse Create 2022.3.3, we tested with the Brownstone building sample model. In RTX - Real-Time mode with DLSS enabled the RTX 6000 Ada was a whopping 3.62 times faster than the RTX A6000. However, there’s a case of comparing apples with pears here as the RTX 6000 Ada uses DLSS 3 while the RTX A6000 use DLSS 2 (see earlier on). In saying that, we saw no visual difference between the two. In RTX – interactive
(path tracing) mode the RTX 6000 Ada was 2.16 times faster.
Ray trace rendering
We tested with a range of GPUaccelerated ray-trace renderers, including V-Ray GPU, KeyShot, Solidworks Visualize, Nvidia Omniverse and blender. With the V-Ray, KeyShot and blender benchmarks, the RTX 6000 Ada Generation shot ahead, outperforming the RTX A6000 by a factor of 1.93, 2.05 and 2.17 respectively. We saw similar gains in Nvidia Omniverse, with the RTX 6000 Ada taking less than half the time of the RTX A6000 to render the brownstone building scene.
However, in some of our real-world application tests, the gains were nowhere near as large. In Solidworks Visualize 2023, rendering the 1969 Camaro test scene with Nvidia Iray and the new 3DS Stellar Physically Correct global illumination engine, showed the RTX 6000 Ada to be between 32% and 40% faster than the RTX A6000.
In KeyShot 2023 the RTX 6000 Ada Generation rendered our motorbike model between 33% and 41% quicker in a range of stills and turntable animations. With KeyShot ‘s sample drone scene it went down to 23%-25%.
Conclusion
The Nvidia RTX 6000 Ada Generation is a phenomenally powerful GPU. In several visualisation workflows, it delivered more than double the performance of the previous generation Nvidia RTX A6000. And when Nvidia has full control over both software and hardware, and the GPU’s fourthgen Tensor cores kick in with DLSS 3, the boost in real-time performance
Nvidia RTX 4000 SFF Ada Generation
Nvidia is having a staggered roll out of its Nvidia RTX Ada Lovelace GPUs for desktop workstations. Following on from the ultra-high-end Nvidia RTX 6000 Ada Generation, which launched at the tail end of 2022, Nvidia has now released a second desktop GPU designed specifically for compact workstations with form factors like those of the HP Z2 SFF, HP Z2 Mini, Lenovo ThinkStation P360 Ultra, Dell Precision 3460 SFF and Dell Precision 3260 Compact. The GPU will likely be of interest to users of CAD or BIM software who want to extend their workflows into visualsation, VR and simulation with tools like Enscape, Twinmotion, Lumion, V-Ray, Omniverse and more.
The Nvidia RTX 4000 SFF Ada Generation features 20 GB of graphics memory, nearly double that of its predecessor, the Nvidia RTX A2000
(12 GB), and is said to offer a 2x performance improvement. It also offers greater memory bandwidth, so it can transfer data to and from its memory more quickly. According to Nvidia, this results in improved graphics, compute and rendering performance.
The Nvidia RTX 4000 SFF is a low-profile, double height graphics card, which takes up two slots on the motherboard. It has four mini DisplayPort1.4a connectors.
The GPU is designed to operate with PCIe slot power alone and has a max power consumption of 70W. This is significantly lower than previous ‘4000’ class GPUs, which
(with seemingly no impact on end user experience) is simply breathtaking.
But these are the extremes of what you can expect from this super high-end workstation GPU. In some of our real world tests the performance increases were as low as 23%. But even this can save hours in the working day. When rendering out a 600 frame 4K animation in KeyShot for example, render times dropped from 6 hours 26 mins to 4 hours 50 mins.
There are some downsides to the new GPU. First, there is no NVLink, which may come as a disappointment to those really pushing the boundaries of complexity. Second it is quite power hungry, drawing 300W at peak. And finally, of course, there’s the price.
typically draw up to 140W. However, Nvidia confirmed that, in the future, it will also launch a standard Nvidia RTX 4000 GPU
include 6,144 CUDA parallel processing cores, 192 Nvidia Tensor Cores and 48 Nvidia RT Cores. Nvidia quotes 19.2 TFLOPs single precision performance, 44.3 TFLOPs RT Core performance, and 306.8 TFLOPS Tensor core performance. This is significantly more than the Nvidia RTX A2000 (8.0 TFLOPs, 15.6 TFLOPS and 63.9 TFLOPS respectively). And while single precision performance and RT core performance is very close to the 140W Nvidia RTX A4000, Tensor performance for AI operations has doubled.
both GPUs will be share the same silicon, but the Nvidia RTX 4000 SFF will be clocked lower.
Nvidia RTX 4000 SFF specs
£7,150 (Ex VAT) is an incredible amount to pay for a single graphics card and considerably more than Nvidia previously charged for its top end workstation GPUs. The Nvidia RTX A6000, for example, only cost £3,730 (Ex VAT) in February 2021.
While some will consider £7,150 to be a price worth paying for the transformative effect it could have on their workflows, others may seek better value elsewhere.
The new AMD Radeon Pro W7900 (48 GB) is one such option (see page 28), although it does not currently offer the same breadth of software support.
More than ever, perhaps Nvidia is facing its biggest competition from itself. The consumer-focused ‘Ada Generation’ GeForce RTX 4090 comes in around
The Nvidia RTX 4000 SFF GPU costs around $1,250, and will be available from workstation manufacturers later this year.
£1,500, but you miss out on some pro features, superior build quality and access to 48 GB of memory, double that of the 4090. And for some viz artists that’s a big deal. 48 GB allows you to work with more complex datasets and render them at higher resolutions. In simulation, engineers can increase the fidelity of the solver for more accurate results.
The additional memory will not just offer potential benefits for single app workflows. These days many viz artists to use multiple apps at the same time - render in V-Ray while working on real-time experiences in Unreal Engine, for example. And the RTX 6000 Ada Generation is much more likely to allow them to do this without having to compromise their workflow or output.
‘‘
When Nvidia has full control over both software and hardware, and the Tensor cores kick in with DLSS 3, the boost in real-time performance (with seemingly no impact on end user experience) is simply breathtaking ’’
Preview: AMD Radeon Pro W7800 / W7900
AMD’s new RDNA 3 pro graphics cards target price / performance to take the fight to Nvidia and its RTX 6000 Ada Generation, writes
Greg CorkePrice $2,499 (W7800) / $3,999 (W7900) www.amd.com/radeonpro
It’s been a while coming, but AMD has finally delivered its new professional graphics cards, built on the same RDNA 3 architecture of its consumer Radeon RX 7900 Series, which launched in December 2022. We say ‘delivered’ but AMD is not quite there yet. Our review cards missed the deadline for this Workstation Special Report by a matter of days.
The new boards — the Radeon Pro W7900 and Radeon Pro W7800 — target workflows including visualisation, real-time 3D, ray trace rendering, photogrammetry, VR, simulation, video editing, compositing and more. But with pro driver optimisations and certifications they can also be used for CAD and BIM.
The AMD Radeon Pro W7900 is a triple (2.5) slot GPU with 48 GB of GDDR6 memory, 61 TFLOPs of peak single precision performance and a total board power of 295W. It costs $3,999.
The AMD Radeon Pro W7800 is a dual slot GPU with 32 GB of GDDR6 memory, 45 TFLOPs of peak single precision performance and a total board power of 260W. It costs $2,499.
Both GPUs comprise multiple unified RDNA 3 compute units, each with 64 dual issue stream processors, two AI accelerators and a second gen ray tracing (RT) accelerator. According to AMD, RDNA 3 offers up to 50% more raytracing performance per compute unit than the previous generation.
Optimised ray tracing
For ray tracing, the new GPUs are compatible with Unreal Engine, Unity,
Lumion, Enscape, Solidworks Visualise, D5 Render, Maxon Redshift, plus other applications that support DirectX Raytracing (DXR), Vulkan ray tracing, or AMD Radeon ProRender, including Acca Edificius, Autodesk Inventor, Rhino, Autodesk Maya, and Blender (Cycles X).
This list should grow. AMD is also working with other software developers to help convert their existing Nvidia CUDA applications to run on the new GPUs and other AMD hardware. This is being done through AMD’s open-source toolset HIP (Heterogeneous-Compute Interface for Portability), which includes a ray tracing library, HIP RT, so developers can take advantage of the dedicated ray accelerators in AMD’s GPUs.
The new GPUs will go up against the 48 GB Nvidia RTX performance.
In SPECviewperf 2020 GeoMean, for example, AMD claims the Radeon Pro W7900 is within 7% of the performance of the Nvidia RTX A6000 Ada Generation but offers more than double the price/performance, as it costs less than half as much ($3,999 vs $8,615).
AMD also highlights support for DisplayPort 2.1,
the latest version of the digital display standard which offers three times the data rate of DisplayPort 1.4. According to AMD, this means its new GPUs are future proofed for next gen displays in terms of refresh rate, pixel resolution and colour bit-depth, while pointing out that the Nvidia RTX 6000 Ada Generation supports DisplayPort 1.4.
Both the AMD Radeon Pro W7800 and W7900 feature three DisplayPort 2.1 and one Mini DisplayPort 2.1 connectors, a change from the previous generation Radeon Pro W6800 with six Mini DisplayPort 1.4.
Memory boost
With 48 GB, the Radeon Pro W7900 also marks a step up in terms of memory, with 50% more than its predecessor, the Radeon Pro W6800, putting it on par with the Nvidia RTX 6000
Memory is becoming increasingly important for viz workflows, not just to support extremely complex high-polygon datasets, but for multi-tasking as well, as product designer, Dr. Adi Pandzic, Ph.D, explains, “Large format renders require more horsepower, especially when doing 4K raytraced animations using [Solidworks] Visualize. The Radeon Pro W7900 allows me to easily keep working on the model [in Solidworks CAD] while rendering in
Rich Hurrey, president, founder, Kitestring, shares similar experiences, “The increased memory that the new AMD RDNA 3 GPUs offer, allows us to have multiple instances of Maya, Modo, and Unreal Engine open at the same time. All of this means that production work gets
Memory also differentiates the new GPUs from AMD’s consumer focused RDNA 3 GPU, the AMD Radeon RX 7900 XTX which has 24 GB. Both pro GPUs also come with AMD Software: Pro Edition. This offers pro software certifications for ‘performance and stability’, and pro features such as ViewPort Boost, which dynamically adjusts viewport resolution to boost performance, remote workstation support and more.
Look out for a review soon.
Cloud workstations for CAD, BIM and visualisation How the major public cloud providers stack up
Using Frame, the Desktop-as-a-Service (DaaS) solution, we test 23 GPU-accelerated ‘instances’ from Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, in terms of raw performance and end user experience
If you’ve ever looked at public cloud workstations and been confused, you’re not alone. Between Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, there are hundreds of different instance types to choose from. They also have obscure names like g4dn.xlarge or NC16asT4v3, which look like you need a code to decipher.
Things get even more confusing when you dial down into the specs. Whereas desktop workstations for sale tend to feature the latest and greatest, cloud workstations offer a variety of modern and legacy CPU and GPU architectures that span several years. Some of the GCP instances, for example, offer Intel ‘Skylake’ CPUs that date back to 2016!
Gaining a better understanding of cloud workstations through their specs is only the first hurdle. The big question for design, engineering, and architecture firms is how each virtual machine (VM) performs with CAD, Building Information Modelling (BIM), or design visualisation
software. There is very little information in the public domain, and certainly none that compares performance and price of multiple VMs from multiple providers using real world applications and datasets, and also captures the end user experience.
So, with the help of Ruben Spruijt from Frame, the hybrid and multi-cloud Desktop-as-a-Service (DaaS) solution, and independent IT consultant, Dr. Bernhard
The ‘system performance’ is what one might expect if your monitor, keyboard, and mouse were plugged directly into the cloud workstation. It tests the workstation as a unit – and the contribution of CPU, GPU and memory to performance.
For this we use many of the same real world application benchmarks we use to test desktop and mobile workstations in the magazine. For BIM (Autodesk Revit), for CAD (Autodesk Inventor), for real-time visualisation (Autodesk VRED Professional, Unreal Engine and Enscape), and CPU and GPU rendering (KeyShot and V-Ray).
Tritsch, getting answers to these questions is exactly what we set out to achieve in this in-depth AEC Magazine article.
There are two main aspects to testing cloud workstation VMs.
1. The workstation system performance.
2. The real end user experience.
But with cloud workstations ‘system performance’ is only one part of the story. The DaaS remote display protocol and its streaming capabilities at different resolutions, network conditions – or what happens between the cloud workstation in the datacentre and the client device – also play a critical role in the end user experience. This includes latency, which is largely governed by the distance between the public cloud
While benchmarking helps us understand the relative performance of different VMs, it doesn’t consider what happens between the datacentre and the end user
datacentre and the end user, bandwidth, utilisation, packet loss, and jitter.
For end user experience testing we used EUC Score (www.eucscore.com), a dedicated tool developed by Dr. Bernhard Tritsch that captures, measures, and quantifies perceived end-user experience in virtual applications and desktop environments, including Frame. More on this later.
The cloud workstations
We tested a total of 23 different public cloud workstation instances from AWS, GCP, and Microsoft Azure.
Workstation testing with real-world applications is very time intensive, so we hand-picked VMs that cover most bases in terms of CPU, memory, and GPU resources.
VMs from Microsoft Azure feature Microsoft Windows 10 22H2, while AWS and GCP use Microsoft Windows Server 2019. Both operating systems support most 3D applications, although Windows 10 has slightly better compatibility.
For consistency, all instances were orchestrated and accessed through the Frame DaaS platform using Frame Remoting Protocol 8 (FRP8) to connect the
end user’s browser to VMs in any of the three public clouds.
The testing was conducted at 30 Frames Per Second (FPS) in both FHD (1,920 x 1,080) and 4K (3,840 x 2,160) resolutions. Networking scenarios tested included high bandwidth (100 Mbps) with low latency (~10ms Round Trip Time (RTT)) and low bandwidth (ranging between 4, 8, and 16 Mbps) and higher latency (50-100ms RTT) using networkcontrolled emulation.
CPU (Central Processing Unit)
Most of the VMs feature AMD EPYC CPUs as these tend to offer better performance per core and more cores than Intel Xeon CPUs, so the public cloud providers can get more users on each of their servers to help bring down costs.
Different generations of EPYC processors are available. 3rd Gen AMD EPYC ‘Milan’ processors, for example, not only run at higher frequencies than 2nd Gen AMD EPYC ‘Rome’ processors but deliver more instructions per clock (IPC). N.B. IPC is a measure of the number of instructions a CPU can execute in a single clock cycle while the clock speed of a
CPU (frequency, measured in GHz) is the number of clock cycles it can complete in one second. At time of testing, none of the cloud providers offered the new 4th Gen AMD EPYC ‘Genoa’ or ‘Sapphire Rapids’ Intel Xeon processors.
Here it is important to explain a little bit about how CPUs are virtualised in cloud workstations. A vCPU is a virtual CPU created and assigned to a VM and is different to a physical core or thread. A vCPU is an abstracted CPU core delivered by the virtualisation layer of the hypervisor on the cloud infrastructure as a service (IaaS) platform. It means physical CPU resources can be overcommitted, which allows the cloud workstation provider to assign more vCPUs than there are physical cores or threads. As a result, if everyone sharing resources from the same CPU decided to invoke a highly multi-threaded process such as ray trace rendering all at the same time, they might not get the maximum theoretical performance out of their VM.
It should also be noted that a processor can go into ‘turbo boost’ mode, which allows it to run above its base clock speed to increase performance, typically when
thermal conditions allow. However, with cloud workstations, this information isn’t exposed, so the end user does not know when or if this is happening.
One should not directly compare the number of vCPUs assigned to a VM to the number of physical cores in a desktop workstation. For example, an eight-core processor in a desktop workstation not only comprises eight physical cores and eight virtual (hyper-threaded) cores for a total of 16 threads, but the user of that desktop workstation has dedicated access to that entire CPU and all its resources.
GPU (Graphics Processing Unit)
In terms of graphics, most of the public cloud instance types offer Nvidia GPUs. There are three Nvidia GPU architectures represented in this article - the oldest of which is ‘Maxwell’ (Nvidia M60), which dates back to 2015, followed by ‘Turing’ (Nvidia T4), and ‘Ampere’ (Nvidia A10). Only the Nvidia T4 and Nvidia A10 have hardware ray tracing built in, which makes them fully compatible with visualisation tools that support this physics-based rendering technique, such as KeyShot, V-Ray, Enscape, and Unreal Engine.
At time of testing, none of the major public cloud providers offered Nvidia GPUs based on the new ‘Ada Lovelace’ architecture. However, GCP has since announced new ‘G2’ VMs with the ‘Ada Lovelace’ Nvidia L4 Tensor Core GPU.
Most VMs offer dedicated access to one or more GPUs, although Microsoft Azure has some VMs where the Nvidia A10 is virtualised, and users get a slice of the larger physical GPU, both in terms of processing and frame buffer memory.
AMD GPUs are also represented. Microsoft Azure has some instances where users get a slice of an AMD Radeon Instinct MI25 GPU. AWS offers dedicated access to the newer AMD Radeon Pro V520. Both AMD GPUs are relatively lowpowered and do not have hardware ray tracing built in, so should only really be considered for CAD and BIM workflows.
Storage
Storage performance can vary greatly between VMs and cloud providers. In general, CAD/BIM isn’t that sensitive to read/write performance, and neither are our benchmarks, although data and back-end services in general need to be close to the VM for best application performance.
In Azure the standard SSDs are significantly slower than the premium SSDs, so could have an impact in workflows that are I/O intensive, such as simulation (CFD), point cloud processing or video editing. GCP offers particularly fast storage with the Zonal SSD PD, which, according to Frame, is up-to three times faster than the Azure Premium SSD solution. Frame also explains that AWS with Elastic Block Storage (EBS) has ‘very solid performance’ and a good performance/price ratio using EBS GP3.
Cloud workstation regions
All three cloud providers have many regions (datacentres) around the world and most instance types are available in most regions. However, some of the newest instance types for example, such as those from Microsoft Azure with new AMD EPYC ‘Milan’ CPUs, currently have limited regional availability.
For testing, we chose regions in Europe. While the location of the region should have little bearing on our cloud workstation ‘system performance’ testing, which was largely carried out by AEC Magazine on instances in the UK (AWS) and The Netherlands (Azure/GCP), it could have a small impact on end user experience testing, which was all done by Ruben Spruijt from Frame from a single location in The Netherlands.
In general, one should always try to run virtual desktops and applications in a datacentre that is closest to the end user, resulting in low network latency and packet loss. However, firms also need to consider data management. For CAD and BIM-centric workflows in
particular, it is important that all data is stored in the same datacentre as the cloud workstations, or deltas are synced between a few select datacentres using global file system technologies from companies like Panzura or Nasuni.
Pricing
For our testing and analysis purposes, we used ‘on-demand’ hourly pricing for the selected VMs, averaging list prices across all regions.
A Windows Client/Server OS licence is included in the rate, but storage costs are not. It should be noted that prices in the table below are just a guideline. Some companies may get preferential pricing from a single vendor or large discounts through multi-year contracts.
Performance testing
Our testing revolved around three key workflows commonly used by architects and designers: CAD / BIM, real-time visualisation, and ray trace rendering.
CAD/BIM
While the users and workflows for CAD and Building Information Modelling (BIM) are different, both types of software behave in similar ways. Most CAD and BIM applications are largely single threaded, so processor frequency and IPC should be prioritised over the number of cores (although some select operations are multi-threaded, such as rendering and simulation). All tests were carried out at FHD and 4K resolution.
Autodesk Revit 2021: Revit is the number one ‘BIM authoring tool’ used by architects. For testing, we used the RFO v3 2021 benchmark, which measures three largely single-threaded CPU processes –update (updating a model from a previous version), model creation (simulating modelling workflows), export (exporting raster and vector files), plus render (CPU rendering), which is extremely multithreaded. There’s also a graphics test.
What is Frame? The Desktop-as-a-Service (DaaS) solution
WebRTC/H.264, which is well-suited to handling graphics-intensive workloads such as 3D CAD.
Frame is a browser-first, hybrid and multi-cloud, Desktop-as-a-Service (DaaS) solution.
Frame utilises its own proprietary remoting protocol, based on
With Frame, firms can deliver their Windows ‘office productivity’, videoconferencing, and high-performance 3D graphics applications to users on any device with just a web browser –no client or plug-in required.
The Frame protocol delivers audio and video streams from the VM, and
keyboard / mouse events from the end user’s device. It supports up to 4K resolution, up to 60 Frames Per Second (FPS), and up to four monitors, as well as peripherals including the 3Dconnexion SpaceMouse, which is popular with CAD users.
Frame provides firms with flexibility as the platform supports deployments natively in AWS, Microsoft Azure, and GCP as well as on-premise on Nutanix
hyperconverged infrastructure (HCI). Over 100 public cloud regions and 70 instance types are supported today, including a wide range of GPUaccelerated instances (Nvidia and AMD).
Everything is handled through a single management console and, in true cloud fashion, it’s elastic, so firms can automatically provision and de-provision capacity on-demand.
■ https://fra.me
All RFO benchmarks are measured in seconds, so smaller is better.
Autodesk Inventor 2023: Inventor is one of the leading mechanical CAD (MCAD) applications. For testing, we used the InvMark for Inventor benchmark by Cadac Group and TFI (https://invmark.cadac. com), which comprises several different sub tests which are either single threaded, only use a few threads concurrently, or use lots of threads, but only in short bursts. Rendering is the only test that can make use of all CPU cores. The benchmark also summarises performance by collating all single-threaded tests into a single result and all multi-threaded test into a single result. All benchmarks are given a score, where bigger is better.
Ray-trace rendering
The tools for physically-based rendering, a process that simulates how light behaves in the real world to deliver photorealistic output, have changed a lot in recent years. The compute intensive process was traditionally carried out by CPUs, but there are now more and more tools that use GPUs instead. GPUs tend to be faster, and more modern GPUs feature dedicated processors for ray tracing and AI (for ‘denoising’) to accelerate renders even more. CPUs still have the edge in terms of being able to handle larger datasets and some CPU renderers also offer better quality output. For ray trace rendering, it’s all about the time it takes to render. Higher resolution renders use more memory. For GPU rendering, 8 GB should be an absolute minimum with 16 GB or more needed for larger datasets.
Chaos Group V-Ray: V-Ray is one of the most popular physically-based rendering tools, especially in architectural visualisation. We put the VMs through their paces using the V-Ray 5 benchmark (www.chaosgroup.com/vray/benchmark) using V-Ray GPU (Nvidia RTX) and V-Ray CPU. The software is not compatible with AMD GPUs. Bigger scores are better.
Luxion KeyShot: this CPU rendering stalwart, popular with product designers, is a relative newcomer to the world of GPU rendering. But it’s one of the slickest implementations we’ve seen, allowing users to switch between CPU and GPU rendering at the click of a button. Like V-Ray, it is currently only compatible with Nvidia GPUs and benefits from hardware ray tracing. For testing, we used the KeyShot 11 CPU and GPU benchmark, part of the free KeyShot Viewer (www.keyshot. com/viewer). Bigger scores are better.
Real-time visualisation
The role of real-time visualisation in design-centric workflows continues to grow, especially among architects where tools like Enscape, Twinmotion and Lumion are used alongside Revit, Archicad, SketchUp and others. The GPU requirements for real time visualisation are much higher than they are for CAD/BIM
Performance is typically measured in frames per second (FPS), where anything above 20 FPS is considered OK. Anything less and it can be hard to position models quickly and accurately on screen.
There’s a big benefit to working at higher resolutions. 4K reveals much more detail, but places much bigger demands on the GPU – not just in terms of graphics processing, but GPU memory as well. 8 GB should be an absolute minimum with 16 GB or more needed for larger datasets, especially at 4K resolution.
Real time visualisation relies on graphics APIs for rasterisation, a rendering method for 3D software that takes vector data and turns it into pixels (a raster image).
Some of the more modern APIs like Vulkan and DirectX 12 include real-time ray tracing. This isn’t necessarily at the same quality level as dedicated ray trace renderers like V-Ray and KeyShot, but it’s much faster. For our testing we used three relatively heavy datasets, but don’t take our FPS scores as gospel. Other datasets will be less or more demanding.
Enscape 3.1: Enscape is a real-time visualisation and VR tool for architects that uses the Vulkan graphics API and delivers very high-quality graphics in the viewport. It supports ray tracing on modern Nvidia and AMD GPUs. For our tests we focused on rasterisation only, measuring real-time performance in terms of FPS using the Enscape 3.1 sample project.
Autodesk VRED Professional 2023: VRED is an automotive-focused 3D visualisation and virtual prototyping tool. It uses OpenGL and delivers very highquality visuals in the viewport. It offers several levels of real-time anti-aliasing (AA), which is important for automotive styling, as it smooths the edges of body panels. However, AA calculations use
a lot of GPU resources, both in terms of processing and memory. We tested our automotive model with AA set to ‘off’, ‘medium’, and ‘ultra-high’, recording FPS.
Unreal Engine 4.26: Over the past few years Unreal Engine has established itself as a very prominent tool for design viz, especially in architecture and automotive. It was one of the first applications to use GPU-accelerated real-time ray tracing, which it does through Microsoft DirectX Raytracing (DXR).
For benchmarking we used the Automotive Configurator from Epic Games, which features an Audi A5 convertible. The scene was tested with DXR enabled and disabled (DirectX 12 rasterisation).
Benchmark findings
For CAD and BIM Processor frequency (GHz) is very important for performance in CAD and BIM software. However, as mentioned earlier, you can’t directly compare different processor types by frequency alone.
For example, in Revit 2021 and Inventor 2023 the 2.45 GHz AMD EPYC 7V12 – Rome (Azure NV8as_ v4) performs better than the 2.6 GHz Intel Xeon E5-2690v3 – Haswell (Azure NV6_v3 & Azure NV6_v3) because it has a more modern CPU architecture and can execute more Instructions Per Clock (IPC).
The 3.2 GHz AMD EPYC 74F3 –Milan processor offers the best of both worlds – high frequency and high IPC thanks to AMD’s Zen 3 architecture. It makes the Azure NvadsA10 v5-series (NV6adsA10_v5 / Azure NV12adsA10_v5 / Azure NV36adsA10_v5) the fastest cloud workstations for CPU-centric CAD/BIM workflows, topping our table in all the single threaded or lightly threaded Revit and Inventor tests.
Taking a closer look at the results from the Azure NvadsA10 v5-series, the entrylevel NV6adsA10_v5 VM lagged a little behind the other two in some Revit and Inventor tests. This is not just down to having fewer vCPUs – 6 versus 12 (Azure NV12adsA10_v5) and 36 (NV36adsA10_ v5). It was also slower in some singlethreaded operations. We imagine there may be a little bit of competition between
‘‘ Frame automatically adapts to network conditions to maintain interactivity. EUC Score not only gives you a visual reference to this compression by recording the user experience, but it quantifies the amount of compression being applied
CAD / BIM
Microsoft Azure
Amazon Web Services (AWS)
Google Cloud Desktop workstations
Amazon Web Services (AWS)
Google Cloud Desktop workstations
u continued from page WS33
the CAD software, Windows, and the graphics card driver (remember 6 vCPUs is not the same as 6 physical CPU cores, so there may not be enough vCPUs to run everything at the same time). There could also possibly be some contention from other VMs on the same server.
Despite this, the 6 vCPU Azure NV6adsA10_v5 instance with 55 GB of memory still looks like a good choice for some CAD and BIM workflows, especially considering its $0.82 per hour price tag.
We use the word ‘some’ here, as unfortunately it can be held back by its GPU. The Nvidia A10 4Q virtual GPU only has 4 GB of VRAM, which is less than most of the other VMs on test. This appears to limit the size of models or resolutions one can work with.
For example, while the Revit RFO v3 2021 benchmark ran fine at FHD resolution, it crashed at 4K, reporting a ‘video driver error’. We presume this crash was caused by the GPU running out of memory, as it ran fine on Azure NV12adsA10_v5, with the 8 GB Nvidia A10-8Q virtual GPU. Here, it used up to 7 GB at peak. This might seem a lot of GPU memory for a CAD/BIM application, and it certainly is. Even Revit’s Basic sample project and advanced sample project both use 3.5 GB at 4K resolution in Revit 2021. But this high GPU memory usage looks to have been addressed in more recent versions of the software. In Revit 2023, for example, the Basic sample project only uses 1.3 GB and the Advanced sample project only uses 1.2 GB.
Interestingly, this same ‘video driver error’ does not occur when running the Revit RFO v3 2021 benchmark on a desktop workstation with a 4 GB Nvidia T1000 GPU, or with Azure NV8as v4, which also has a 4 GB vGPU (1/4 of an AMD Radeon Instinct MI25). As a result, we guess it might be a specific issue with the Nvidia virtual GPU driver and how that handles shared memory for “overflow” frame buffer data when dedicated graphics memory runs out.
AWS G4ad.2xlarge looks to be another good option for CAD/BIM workflows, standing out for its price/performance. The VM’s AMD Radeon Pro V520 GPU delivers good performance at FHD resolution but slows down a little at 4K, more so in Revit, than in Inventor. It includes 8 GB of GPU memory which should be plenty to load up the most demanding CAD/BIM datasets.
However, with only 32 GB of system memory, those working with the largest Revit models may need more.
As CAD/BIM is largely single threaded, there is an argument for using a 4 vCPU VM for entry-level workflows. AWS G4ad.xlarge, for example, is very cost effective at $0.58 per hour and comes with a dedicated AMD Radeon Pro V520 GPU. However, with only 16 GB of RAM it will only handle smaller models and with only 4 vCPUs expect even more competition between the CAD software, Windows and graphics card driver.
It’s important to note that throwing more graphics power at CAD or BIM software won’t necessarily increase 3D performance. This can be especially true at FHD resolution when 3D performance is often bottlenecked by the frequency of the CPU. For example, AWS G4ad.2xlarge and AWS G5.2xl both feature the same AMD EPYC 7R32 – Rome processor and have 8 vCPU. However, AWS G4ad.2xlarge features AMD Radeon Pro
GPU is essential, and while there are many workflows that don’t need plenty of vCPU, those serious about design visualisation often need both.
It’s easy to rule out certain VMs for real-time visualisation. Some simply don’t have sufficient graphics power to deliver anywhere near the desired 20 FPS in our tests. Others may have enough performance for FHD resolution or for workflows where real-time ray tracing is not required.
For entry-level workflows at FHD resolution, consider the Azure NV12adsA10_v5. Its Nvidia A10 8Q GPU has 8 GB of frame buffer memory which should still be enough for small to medium sized datasets displayed at FHD resolution. The Azure NV6_v3 and Azure NV12_v3 (both Nvidia M60) should also perform OK in similar workflows, but these VMs will soon be end of life. None of these VMs are suitable for GPU ray tracing.
For resolutions approaching 4K, consider VMs with the 16 GB Nvidia T4 (Azure NC4asT4_v3, Azure NC8asT4_v3, Azure NC16asT4_v3, AWS G4dn.xlarge, AWS
‘‘
Desktop workstations can significantly outperform cloud workstations in all different workflows, but to compare them on performance alone would be missing the point entirely
Professional, Unreal and Enscape only able to use one of the four GPUs.
Finally, it’s certainly worth checking out GCP’s new G2 VMs with ‘Ada Lovelace’ Nvidia L4 GPUs, which entered general availability on May 9 2023. While the Nvidia L4 is nowhere near as powerful as the Nvidia L40, it should still perform well in a range of GPU visualisation workflows, and with 24 GB of GPU memory it can handle large datasets. Frame will be testing this instance in the coming weeks.
As mentioned earlier, 3D performance for real time viz is heavily dependent on the size of your datasets. Those that work with smaller, less complex product / mechanical design assemblies or smaller / less realistic building models may find they do just fine with lower spec VMs. Conversely if you intend to visualise a city scale development or highly detailed aerospace assembly then it’s unlikely that any of the cloud workstation VMs will have enough power to cope. And this is one reason why some AEC firms that have invested in cloud workstations for CAD/BIM and BIM-centric viz workflows prefer to keep high-end desktops for their
Interestingly, even though the AMD EPYC 74F3 – Milan processor in Azure NV36adsA10_v5 has 12 fewer vCPUs than the Intel Xeon 8259 - Cascade Lake in the AWS.G4dn.12xlarge it delivers better performance in some CPU rendering benchmarks due to its superior IPC. However, it also comes with a colossal 440 GB of system memory so you may be paying for resources you simply won’t use. Of course, these high-end VMs are very expensive. Those with fewer vCPUs can also do a job but you’ll need to wait longer for renders. Alternatively, work at lower resolutions to prep a scene and offload production renders and animations to a cloud render farm.
Desktop workstation comparisons
It’s impossible to talk about cloud workstations without drawing comparisons with desktop workstations, so we’ve included results from a selection of machines we’ve reviewed over the last six months. Some of the results are quite revealing, though not that surprising (to us at least).
In short, desktop workstations can significantly outperform cloud
2. Users of desktop workstations have access to a dedicated CPU, whereas users of cloud workstations are allocated part of a CPU, and those CPUs tend to have more cores, so they run at lower frequencies.
3. Desktop workstation CPUs have much higher ‘Turbo’ potential than cloud workstation CPUs. This can make a particularly big difference in single threaded CAD applications, where the fastest desktop processors can hit frequencies of well over 5.0 GHz.
Of course, to compare cloud workstations to desktop workstation on performance alone would be missing the point entirely. Cloud workstations offer AEC firms many benefits. These include global availability, simplifying and accelerating onboarding/ offboarding, the ability to scale up and down resources on-demand, centralised desktop and IT management, built-in security with no data on the end-user PC, lower CapEx costs, data resiliency, data centralisation, easier disaster recovery (DR) capability and the built in ability to work from anywhere, to name but a few. But this
End user experience testing - the EUC Score Sync Player interface
● 1 Cloud and Instance type (e.g. Azure NC8asT4_V3)
● 2 Latency and network bandwidth
● 3 Click for detailed information about the VM specs, connection and endpoint
● 4 Click to maximimise the viewport
● 5 Viewport playback (examine for compression / responsiveness to mouse movements, etc.)
● 6 Task Manager showing resources used at the endpoint (not the cloud workstation)
● 7 Timeline (play back in real time or scrub up and down, as necessary)
end-user experience in remote desktops and applications. By capturing the real user experience in a high-quality video on the client device of a 3D application in use, it shows what the end user is really experiencing and puts it in the context of a whole variety of telemetry data. This could be memory, GPU or CPU utilisation, remoting protocol statistics or network insights such as bandwidth, network latency or the amount of compression being applied to the video stream. The big benefit of the EUC Score Sync Player is that it brings telemetry data and the captured real user experience video together in a single environment.
When armed with this information, IT architects and directors can get a much better understanding of the impact of different VMs / network conditions on
● a Actual CPU utilisation
● b Quantisation Priority (QP) (level of compression being applied to the video stream)
● c Performance in the viewport (Frames Per Second)
● d Round trip network latency
● e Actual network bandwidth used
● f Actual GPU utilisation of Nvidia GPUs with select applications
● g GPU memory utilisation
● h GPU utilisation for encoding the video stream (H.264)
end user experience, and size everything accordingly. In addition, if a user complains about their experience, it can help identify what’s wrong. After all, there’s no point in giving someone a more powerful VM, if it’s the network that’s causing the problem or the remoting protocol can’t deliver the best user experience.
For EUC testing, we selected a handful of
the Frame website (https://ux.fra.me)
.
EUC Score Sync Player is able to display eight different types of telemetry data at the same time, so that’s why there are different views of the telemetry data. The generic ‘Frame’ recordings are a good starting point, but you can also dig down into more detail in ‘CPU’ and ‘GPU’.
different VMs from our list of 23. We tested our 3D apps at FHD and 4K resolution using a special hardware device that simulates different network conditions.
The results are best absorbed by watching the captured videos and telemetry data, which can all be seen on
When watching the recordings, here are some things to look out for. Round trip latency is important and when this is high (anything over 100ms) it can take a while for the VM to respond to mouse and keyboard input and for the stream to come back. Any delay can make the system feel laggy, and hard to position 3D models quickly and accurately on screen. And, if you keep overshooting, it can have a massive impact on modelling productivity.
In low-bandwidth, higher latency
‘‘
New VMs come online, and prices change, as do your applications and workflows. But unlike desktops, you’re not stuck with your purchasing decision ’’
conditions (anything below 8 Mbps) the video stream might need to be heavily compressed. As this compression is ‘lossy’ and not ‘lossless’ it can cause visual compression artefacts, which is not ideal for precise CAD work. In saying that, the Frame Remoting Protocol 8 (FRP8) Quality of Service engine does do a great job and resolves to full high-quality once you stop moving the 3D model around. Compression might be more apparent at 4K resolution than at FHD resolution, as there are four times as many pixels, meaning much more data to send.
Frame, like most graphics-optimised remoting protocols, will automatically adapt to network conditions to maintain interactivity. EUC Score not only gives you a visual reference to this compression by recording the user experience, but it also quantifies the amount of compression being applied by FRP8 to the video stream through a metric called Quantisation Priority (QP). The lower the number, the less visual compression artefacts you will see. However, the lowest you can get is 12, as to the end user this appears to be visually lossless. This highest you can get is 50 which is super blurry.
Visual compression should not be confused with Revit’s ‘simplify display during view navigation’ feature that suspends certain details and graphics effects to maintain 3D performance. In the EUC Score player you can see this in action with textures and shadows temporarily disappearing when the model is moving. In other CAD tools this is known as Level of Detail (LoD).
The recordings can also give some valuable insight into how much each application uses the GPU. Enscape and Unreal Engine, for example, utilise 100% of GPU resources so you can be certain that a more powerful GPU would boost 3D performance (in Unreal Engine, EUC Score records this with a special Nvidia GPU usage counter).
Meanwhile, GPU utilisation in Revit and Inventor is lower, so if your graphics performance is poor or you want to turn off LoD you may be better off with a CPU with a higher frequency or better IPC than a more powerful GPU.
To help find your way around the EUC Score interface, see Figure 1. In Figures 2 and 3 we show the impact of network bandwidth on visual compression
EUC Score test results
artefacts. This could be a firm that does not have sufficient bandwidth to support tens, hundreds, or thousands of cloud workstation users or when the kids come home from school, and they all start streaming Netflix.
Conclusion
If you’re an AEC firm looking into public cloud workstations for CAD, BIM or design visualisation, we hope this article has given you a good starting point for your own internal testing, something we’d always strongly recommend.
There is no one size fits all for cloud workstations and some of the instances we’ve tested make no sense for certain workflows, especially at 4K resolution. This isn’t just about applications. While it’s important to understand the demands of different tools, dataset complexity and size can also have a massive impact on performance, especially with 3D graphics at 4K resolution. What’s good for one firm,
certainly might not be good for another. Also be aware that some of the public cloud VMs are much older than others. If you consider that firms typically upgrade their desktop workstations every 3 to 5 years, a few are positively ancient. The great news about the cloud is that you can change VMs whenever you like. New machines come online, and prices change, as do your applications and workflows. But unlike desktops, you’re not stuck with a purchasing decision.
While AEC firms will always be under pressure to drive down costs, performance is essential. A slow workstation can have a massive negative impact on productivity and morale, even worse if it crashes. Make sure you test, test and test again, using data from your own real-world projects.
For more details, insights or advice, feel free to contact to Ruben Spruijt (ruben@fra.me) or Bernhard Tritsch (btritsch@bennytritsch.com).
Reimagining the desktop workstation
Greg Corke caught up with Adam Jull, CEO of IMSCAD, to explore the rise of the desktop workstation as a dedicated remote resource
IMSCAD is one of the pioneers of Virtual Desktop Infrastructure (VDI) and cloud workstation solutions for graphics intensive applications, including CAD. The company offers a range of solutions for on-premise, public and private cloud, using a variety of technologies for graphics virtualisation including Citrix, VMWare, Nvidia vGPU and more.
Recently, the company added HP Anyware into the mix. One aspect of this move was to provide IMSCAD customers with a secured, high-performance remote solution that works with desktop workstations, rather than dedicated rack mounted servers. The idea is that rather than getting involved in the complexities of virtualisation, users can get a dedicated one-toone connection to a highperformance desktop workstation. We caught up with IMSCAD CEO, Adam Jull to explore what this means for design, engineering, and architecture firms.
Greg Corke: The industry has been talking about VDI and cloud workstations for the past ten years but, it seems, they’ve never fulfilled their potential. What trends are you seeing at IMSCAD?
Adam Jull: Since Nvidia launched GRID GPUs many moons ago, the market has seen many variants of how you can virtualise your workstations with VDI or run them in the Public Cloud, but to date the number of firms running their desktops this way is still low. At IMSCAD, we have more enquiries for these types of solutions from the US, where we have over 50% of our customers.
In the UK and across Europe I would guess uptake of an implemented hosted or
VDI solution is no more than 10%, which is, of course, surprising especially after Covid and the move to more flexible working.
The reasons are down to complexity and cost, the desire of users to have the best possible performance. Running from Public Cloud can have the issue of latency and the CPU clock speeds. On-premise VDI can also be tricky, although this is still the most common approach taken by our customers. We have delivered hundreds of these on-prem VDI solutions. As long as you engage your users, this way will deliver the required performance.
Traditional workstations [configured for remote working] just work. OK, let me rephrase that - they work well 95% of the time.
The feeling was that five to six years ago VDI and Cloud would take over the traditional workstation market, but this has not been the case. Applications are getting bigger, more demanding workflows with VR, AI, visualisation, and digital twins. I also have to say the OEMs have done a great job improving and evolving their range of hardware options for specific workflows.
The other factors are around cost. Public Cloud is great for some things but for GPU-based desktops, it still is very expensive. If you want to run a Private Cloud with your own physical servers but hosted in a datacentre, then the costs can be reduced. The most cost-effective way is still on-premise, absolutely no question. Firms have their own reasons and ideas for doing either of these options for deployment, but the key skill IMSCAD brings is our experience with ISVs like Autodesk and how to fully optimise the environment to give users the best possible experience.
Greg Corke: Recently, we’ve seen a big shift in the workstation market with both HP and Lenovo launching ‘Sapphire Rapids’ desktop workstations that are also purpose built for racks. These are all 4U or 5U. But if you look at 12th or 13th Gen Intel Core, you can find some very powerful micro workstations that can also be rack mounted with custom kits. What is it about these machines that customers find attractive?
Adam Jull: I like the HP Z2 Mini and Lenovo ThinkStation P360 Ultra
[recently replaced by the Lenovo ThinkStation P3 Ultra] a lot. Small, but powerful, mountable in a datacentre cabinet or in an office server room. Effectively you get the power of a workstation in these small form factors and, just as importantly, you can then add a remoting software solution, of which there are a few good options.
You can run as a bare metal 1:1 machine, giving full resource to the user from any other device they want to connect from.
If you then collocate these machines in datacentre environment you effectively create your own Private Cloud with remoting capabilities, so removing the need for a VPN, improving the user’s remote access and with the control features you can run the workstations in a similar way to a VDI server farm.
I would call this ‘one step before VDI’ which is a simpler and ultimately you will not be going too far away from your traditional on-premise approach. Bare metal, 1:1 with the user, highly resourced and capable of handling all applications, no virtual layer and much less complexity and really neat, in my humble opinion.
Greg Corke: What kind of density can you get from the HP Z2 Mini G9? How many units in a standard rack and how does this compare to a traditional VDI using virtualised servers?
Adam Jull: In a typical 42U rack you can get up to 36 HP Z2 Mini G9 workstations. It’s difficult to compare that to a traditional VDI server deployment but if we are coming at it from a dedicated GPU standpoint that probably works best in that scenario.
So, with many high-end design applications and visualisation tools requiring more and more GPU resource for optimal performance to cope with users ever more demanding workflows, most users are typically looking for around 8 GB plus of GPU.
If we take an HP Z2 Mini G9 generously resourced with the Nvidia RTX A2000 12GB GPU we can provide 36 users. Compare that to a server, remembering the Z2 Mini is 1:1 so our comparison should be based on a published virtual desktop. Most datacentre GPUs tend to top out at 48 GB, so you can get where I’m going with this.
If we take a server that has 2 x Nvidia L40 48 GB cards on board, you are only getting 8 users per server. To get to the same density as our racked Z2 Mini you are going to need 6 x servers! So, as you can imagine there is going to be a substantial difference in cost.
‘‘
The feeling was that five to six years ago VDI and Cloud would take over the traditional workstation market, but this has not been the case ’’
Greg Corke: One of the big advantages of using traditional servers for VDI is that they are built from the ground up for the datacentre and have remote management built in. Desktop workstation manufacturers are now addressing this with optional system controller cards. HP’s new Anyware Remote System Controller is even available as an external USB box to give the HP Z2 Mini similar capabilities. Do these system controllers give desktop workstations the exact same capabilities as dedicated rack servers, or are there things still missing?
Adam Jull: Things that you might take as a ‘given’ or a standard function with server virtualisation have sometimes been a touch tricky to achieve with a [traditional] workstation fleet. Features that would have previously required administrators to use third party tools are now being introduced and catered for. The HP Anyware Integrated Remote System Controller does go a long way in closing that gap as systems administrators will now have the ability to remotely manage their workstation fleet from a single pane. It also provides features such as power management, hardware alerts and diagnostics along with the ability to image or reimage the operating systems. So, a big step in the right direction.
Greg Corke: The larger rack friendly desktop workstations come with ‘servergrade’ features like hot-swappable
redundant PSUs, rear power buttons, front access hot swap storage. But with the smaller machines you still need to get inside. Does this make management and servicing of these machines harder?
Adam Jull: The Z2 Mini can be racked in much the same way as a larger workstation and, in fact, HP has put a lot of thought into the design. It’s worth noting that although the Z2 Mini has a smaller form factor, the Rail Rack Kit slides out from the rack providing excellent clearance and access to cables and ports etc. The Rail Rack Kit also features captive fasteners, providing quick and easy tool-free access to the workstations. So, in terms of servicing and management, the Z2 Mini itself offers tool-less access and slide out components, allowing for simple and easy swap out capabilities which could be for maintenance or expansion. Ultimately, these machines are extremely reliable.
Greg Corke: With a centralised desktop workstation solution, do firms tend to give users their own dedicated machine, which they remote into wherever they work, or is there a shared pool? Are firms using the cloud to handle peaks?
Adam Jull: With HP Z workstations, users have access to bare metal resource and remote access to those resources by one of the best-in-class remoting protocols in HP Anyware’s PCoIP.
Much like other remoting solutions, HP Anyware provides desktop administrators the ability to grant users or groups access as they see fit, whether that’s dedicating a user to a specific machine or having a bunch of machines available on more of a round robin or random assignment. And just touching on that management piece it’s worth highlighting that HP provides an end-to-end product for the centralised desktop deployment model.
Typical components would be the HP Anyware Manager which provides administrators with a management plane to configure, manage, broker and monitor remote workstation connections. There’s also HP Anyware Connector which provides security gateway services and user authentication for remote connections to their assigned desktops and then, of course, there’s the HP Anyware Agents which is software installed on the remote workstation which securely encodes the desktop and streams pixels-only to the PCoIP Client.
The PCoIP Client is also required and helps to complete the circuit, if you will. It’s installed on the user’s end device and allows them to connect to the remote workstation. It decodes a stream of PCoIP pixels from the remote workstation PCoIP Agent. Most firms using this type of solution are pretty savvy - they know their workforce will flex whether that’s up or down, typically we see firms overprovision rather than burst into the cloud.
■ www.imscadservices.com
Could a lack of sustainable IT investment mean AEC firms run the risk of being excluded from tenders? Asks
Keith Ali, MD at Creative ITCThe AEC sector has a pivotal role to play as we transition to a lower carbon future. Buildings account for 40% of global energy consumption and a third of greenhouse gas (GHG) emissions. Moreover, emissions from cement and concrete production have doubled over the last 20 years and currently make up 8% of total CO2 generation globally.
There are positive signs that the industry is moving to tackle these issues. Architects Declare is among many groups calling for real change and pushing for greater impetus. Carbon offsetting is increasingly frowned upon as a prime example of the tendency towards greenwashing, an empty promise that will not contribute to achieving net zero targets.
There’s rising pressure on AEC firms to provide incontestable evidence of the benefits of their environmental, social and governance (ESG) policies. Last year, two ESG disclosure laws became mandatory in the UK and many other countries are following suit.
For AEC firms, these steps serve as a warning shot for what’s to come. ESG requirements in public tenders are growing more stringent and there’s expectation of more widespread stipulations to come, with the potential to directly hit the bottom lines of noncompliant firms. Exclusion from AEC tenders looms, unless they can substantiate their sustainability claims. The message is clear - what’s needed is fundamental behavioural change.
Transformation from the inside out
While the renewed focus on climate change is starting to shift mindsets when it comes to commissioning, designing and constructing, working practices within AEC organisations often lag behind. Legacy IT solutions have plagued the industry for years, hampering operational efficiency. It’s less well recognised that outdated IT infrastructure is also a massive generator of CO2. Enterprise technology accounts for about 1% of worldwide GHG emissions – that’s equal to the total amount generated by the UK and equivalent to half of all emissions from aviation or shipping globally.
Greener IT is not
just about reassessing infrastructure to drive carbon reduction, it’s also about transforming working practices, operations and solution efficiency.
Unfortunately, many AEC companies don’t practice what they preach. It’s still commonplace for AEC professionals to work on bids for smart, eco-friendly projects using power-hungry CAD workstations that don’t maximise the use of renewable energy sources. Once the project kicks off, that behaviour is multiplied many times over as multidisciplinary teams are drafted in. To put that into perspective, according to research from Dell, 60 such workstations running for 12 hours produce around 48,000 kg of CO2eq. That’s the same amount as driving 170,000 miles in a family car.
IT teams are pivotal to change
Fortunately, IT teams have a number of practical sustainability initiatives they can implement to deliver tangible change.
One of the ways to address ESG challenges is adopting virtualisation via Desktop-as-a-Service. Some VDI solutions are hosted in datacentres operating on 100% clean and renewable energy. Our AEC-focused VDIPOD solution, for example, offers firms a path to net zero with metrics and an auditable trail to simplify ESG reporting.
AEC companies deploying the service use 81.7% less energy with a combined 89% renewable power model (one VDI server supporting 60 laptops/thin clients) at source compared to traditional CAD workstations and reduce CO2eq by up to 43% (calculations based on standard Dell high graphic workstations, Supermicro VDI servers and Dell XPS laptops).
Since migrating over 400 employees to VDIPOD, one multi-award-winning international architecture and design studio has realised a three-fold increase in renewable power use and a 90% reduction in kilowatt hours (kWh) per person. With more users onboarding in Asia Pacific, the US and Canada those benefits will only increase further still.
Transitioning to an Infrastructureas-a-Service (IaaS) model is another change IT teams can implement to reduce energy costs and environmental impact. Crucially, when AEC firms move to a fully managed IaaS solution, the responsibility of infrastructure management moves to the service provider, along with power consumption and carbon footprint. By migrating data, applications and IT services to the cloud, AEC firms no longer need to
maintain on-premise technology, thereby reducing energy consumption, cooling costs and waste from decommissioned equipment. Cloud providers can also make intelligent use of virtual machines and containers to reduce the number of servers needed at data centres and improve sustainability and ESG scores.
Global engineering company SNCLavalin, for example, is working to reduce its number of data centres worldwide from 16 down to three.
“One of the big benefits we’ve seen already in our carbon footprint is that we’ve reduced storage by 69%. We’ve reduced the electricity by 53% and the floorspace by 45%,” said Steve Capper, Group CIO of SNC-Lavalin.
Creating future value
Technology improvements are an often over-looked strategy that AEC firms can deploy to make a tangible difference in the race towards net zero. A new breed of industry specialist MSPs are emerging with the expertise to help AEC firms unlock the greatest value from transitioning to futureproof IT infrastructures, adopting new technologies and improving their working practices. There’s a clear case for AEC firms to opt for sustainable IT solutions, which can dramatically reduce their carbon footprint and positively impact environmental scorecards.
In addition to contributing to the achievement of global environmental goals, AEC companies adopting best practices will reap a number of financial and operational rewards. With increasingly stringent ESG requirements, they’ll have greater opportunity to bid for tenders and face fewer regulatory interventions. Sustainable companies outperform their industry peers on profitability and EBITDA, and enjoy top-line growth, increased productivity and reduced costs. Publicly owned top performers are also more likely to see higher equity returns, less downside risk, lower loan and credit default swap spreads and higher credit ratings.
There is a clear connection between ESG aims and business value. With such a pivotal role to play on a global stage, time is running out for the AEC industry to drive fundamental behavioural change. As environmental, social and governmental concerns grow ever more urgent, business and IT leaders should keep this link front of mind and make smart choices now. Future success will surely depend on it.
■ www.creative-itc.com
Staying ahead of the ESG curve