RADAR
2015 Volume 1 Number 1
mil-embedded.com/topics/radar
• Space Fence Radar System • Radars Enable Detection in Coastal Zones • Signal Processing for Radar & EW Systems • Integrating Video with Radar • Defense Apps with Intel Core i7’s Integrated GPU • Software-Defined Radio Hanbook
Sponsored by: Curtiss-Wright Defense Solutions; Sealevel Systems, Inc; Annapolis 1 Micro Systems; Mercury Systems; GE Intelligent Platforms; and Pentek, Inc.
2
RUGGED CAPABILITY WITHOUT COMPROMISE.
You need embedded solutions that work where you work. Brilliant enough to exceed your performance expectations. Rugged enough to take a beating in the process. Now you can have it all ‌ from a team that knows how to make things work smarter and tougher.
GE Rugged.
Embedded brilliance.
Watch Video
gedefense.com
3
RADAR CONTENTS
2015 Volume 1 Number 1
Space Fence radar system to identify, track space junk By Sally Cole, Senior Editor
Small radars enable detection in coastal zones By Sally Cole, Senior Editor
No such thing as too much signal processing for radar & EW systems By John McHale, Editorial Director
The necessity & difficulty of integrating video with radar By Christopher Fadeley, Tech Source
Tackling defense apps by harnessing the Intel Core i7’s integrated GPU By Marc Couture, Curtiss-Wright Defense Solutions
Software-defined radio handbook, 11th Edition Sponsored by
4
By Rodger H. Hosking, Pentek, Inc.
© 2015 OpenSystems Media, © Miltary Embedded Systems. All registered brands and trademarks within the Radar E-mag are the property of their respective owners.
Innovation
That Defends.
RadaR
systems today face incReased peRfoRmance
demands to tRack smalleR, fasteR taRgets, opeRate in
a2ad
enviRonments and pRovide fasteR Response
while opeRating simultaneous RadaR modes.
meRcuRy pRovides RadaR system designs with modulaR, open system building blocks to addRess these challenges. ouR innovations enable affoRdable aRchitectuRes and pRovide foR Rapid upgRades as technologies advance.
Other Mercury Innovations • Embedded Intel ® Server-class processing • OpenRFM™ for open, modular RF/Microwave • Air Flow-By™ thermal management • First deployed use of GPGPUs • Switch fabrics connections beyond 40 Gb/s
Watch Video
MADE
IN
AMERICA
Visit mrcy.com/Radar to learn how Mercury has solved some of the toughest challenges in radar Copyright © 2015 Mercury Systems, Inc. Mercury Systems, Innovation That Matters, OpenRFM and Air Flow-By are trademarks of Mercury Systems, Inc. - 3079
5
Pictured is the Space Fence interface. Photo courtesy of Lockheed Martin.
Space Fence radar system to identify, track space junk Space Fence, a fully digital S-band radar system, is being built for the U.S. Air Force by Lockheed Martin to better identify and track space junk to help protect military satellites and other space assets by preventing conjunctions and the creation of more debris – hurtling along at speeds of 17,500 mph – within low-Earth orbit. 6
By Sally Cole, Senior Editor
Since 1961, the U.S. Air Force has maintained a highly accurate catalog of objects within low-Earth orbit that are larger than the size of a softball, sharing responsibility with NASA for providing situational awareness of this environment to avoid potential conjunctions – a.k.a. “collisions” – with satellites and spacecraft.
SPACE FENCE RADAR
Now the Air Force’s Space Surveillance System, which tracks more than 200,000 pieces of debris, will soon be enhanced by “Space Fence,” a fully digital S-band radar system being built by Lockheed Martin, capable of tracking debris as small as 10 centimeters. Clearly, as space junk continues to proliferate at a rapid pace, it’s critical to be able to know not only where all of this debris is – so that it can be avoided during critical space launches and missions, or to maneuver satellites that cost hundreds of millions of dollars out of harm’s way – but also to be able to identify what exactly it is. At the heart of the problem is that space junk begets more space junk. In 2007, for example, China’s anti-satellite test, which used a missile to destroy an old weather satellite, contributed 3,000 more pieces of space junk to be tracked. And in 2010, when a defunct Russian satellite collided with and destroyed a functioning U.S. Iridium commercial satellite, it added more than 2,000 pieces of trackable debris (see Figure 1). With every space launch, there is always a risk of creating more space junk via rocket engines and other material jettisoned off, if it is not planned and carried out with foresight. “An astronaut’s glove from
a spacewalk, dead satellites, and other equipment that used to be operational are currently among the items in orbit,” explains Steve Bruce, vice president for advanced systems at Lockheed Martin Mission Systems and Training. Military satellites are not the only space assets that need to be protected from space junk. We rely on a variety of satellites for nearly every aspect – including GPS, banking, and telecommunications – of our interconnected lives now.
Space Fence design
Lockheed Martin’s Space Fence design combines a scalable solid-state S-band radar with a higher wavelength frequency capable of detecting much smaller objects than the Air Force’s Space Surveillance Network.
Figure 1 | Pictured is a radar-generated graphical image of space junk in low-Earth orbit. Image courtesy of Lockheed Martin. 7
“Space Fence will become part of the Space Surveillance Network, which is a whole series of radars and optical sensors on the ground and in space charged with discovering where objects are and then tracking them in low-Earth orbit,” Bruce explains. “It will have more capability than any of the systems currently in inventory, provide more sensitivity, and have the ability to track more objects.”
This is a “great asset
not only for the U.S., but the entire world.
What else differentiates Space Fence from earlier radar systems? One key aspect is that its radar is classified as fully digital. “With older radar systems, the antenna was made out of a dish to act as a big deflector, or involved a phased array using electronics at the element level to create an array of transmitters and receivers,” Bruce says. “In the past, those were typically analog radars, in which case an analog circuit formed the combined data from these transmit/receive modules to form beams in space that the antennas could steer.”
Space Fence is also an active electronically scanned array (AESA), but a digital one. Two key pieces make it digital. “First, it uses distributed waveform generators. In the transmit aperture, line replaceable units (LRUs) take digital commands – it creates the transmit waveform right on the digital LRU and sends it through power amplifiers to create a transmit signal,” Bruce continues. “Next, on the receive side, a series of 8
The reason Lockheed Martin decided to separate its transmit and receive antennas was primarily to improve the overall efficiency of the antennae. “This allowed us to build a very efficient transmit antenna by using gallium nitride (GaN) technology. It also allowed us to build a larger receive antenna, which needs to be a bit larger to make more accurate measurements about where the objects are in space,” he notes.
“
Part of Space Fence’s new capability, for example, involves optical sensors that can be used to identify debris.
low-noise amplifiers receive the radio frequency energy, which is then digitized to allow us to form numerous simultaneous receive beams. This makes our radar extremely flexible from an energy management perspective.”
In the array, they use a combination of custom technology and specific miniaturized microwave IC. “There are specific electronics in the aperture, but also custom electronics in the aperture. In the back end, our signal processing uses standard server technology,” Bruce says. “Once we digitize and form digital beams, all the processing uses commercial off-theshelf (COTS) server technology,” he adds. “So we have a very large server farm that does most of the signal, data, and mission processing. If you look at our compute engines that do orbital mechanics…it really isn’t much different from a typical server farm.” (See Figure 2.) Another big advantage of Space Fence’s digital radar is that it can be software defined. “This allows us to change the way
SPACE FENCE RADAR
Figure 2 | The Space Fence screen shows how operators can track space junk and other debris in orbit. energy is managed in space, the way waveforms are transmitted or received – it’s all fully programmable. The fact that it can be fully programmed to do many different things in the future adds tremendous flexibility to the system,” Bruce explains. “This is a great asset not only for the U.S., but the entire world.”
Project timeline
As far as a timeline for the Space Fence, the project began in June of 2014, will undergo a critical review in March 2015, and “in less than five years, Lockheed Martin will build the radar, get all of the electronics into it, test it, and then turn it over to the U.S. Air Force for operation by the end of 2018,” Bruce says. The radar facilities for Space Fence are currently being constructed on Kwajalein Atoll, an island in the Pacific Ocean. In terms of scale, once built, if you looked down on it from above it would look much like a couple of football stadiums.
Once Space Fence is up and running, it will allow the military to determine what types of objects are in low-Earth orbit and to track them. “They’ll use it to manage our U.S. space assets as well as to inform the world about potential conjunctions,” he adds. “It’s in every country’s best interest to know what’s going on so they can maneuver their satellites out of the way to avoid creating more space debris.” The catalogue of space junk, which currently stands at about 20,000 objects being tracked by the U.S., is expected to grow quite substantially. “No one really knows what’s up there; it could be upward of 200,000 objects that we’ll be able to find and track,” Bruce notes. “But what’s most important is being able to figure out where these objects will be in the future so we can predict and avoid conjunctions.” Military Embedded Systems mil-embedded.com/topics/radar @military_cots | LinkedIn | Facebook 9
Small radars enable detection in coastal zones Q&A with Mark Radford, Blighter Surveillance Systems
By Sally Cole, Senior Editor
10
SMALL RADARS
For force protection in tight areas such as coastlines and coastal zones, military leaders are leveraging small, compact surveillance radar systems. Mark Radford, CEO of Blighter Surveillance Systems, discusses this trend and talks about the technology behind small radars with Senior Editor Sally Cole. Edited excerpts follow. MIL-EMBEDDED: How are small radars being leveraged for military applications? RADFORD: In recent years, a number of smaller ‘compact surveillance radars’ (CSR) were launched for military force protection and gap filling applications. Such CSR have reduced the antenna size below 0.5 meters, but due to the inevitable inaccuracy in the angular measurement, can only offer useful performance out to limited ranges of around 1 kilometer. An angular error of ±2 degrees at 500 meters is just ±17 meters, but at 5 kilometers is a whopping ±175 meters. How much smaller will these radar systems go in the future? The limiting factor for radar system size is primarily the size of the antenna required to achieve useful accuracy in the angular measurement of detected targets. Laws of physics dictate that a radar antenna that is, say, 0.5 meters across, will detect targets to an accuracy of approximately 1 to 2 degrees – perhaps 0.5 to 1 degree through the use of additional tracking software. What are the biggest technical challenges involved in designing small radar systems?
(Blighter radar 3d Doppler) The image depicts a 3-D Doppler view of sea clutter detection with separate target.
The key challenge is being able to amass all of the design disciplines necessary to design a compact radar system. Such systems require a synergy of radar system design, RF and microwave, waveform generation, digital signal processing, radar data processing, power sup11
ply, and mechanical design skills such as thermal analysis. Unless every part of the radar system is designed with a deep understanding of the other system components, compromises will inevitably occur. The smaller the radar system, the greater the concentration of compromises. What types of form factors and processing are involved? Custom design or not? When designing Blighter radar, the hardware engineers took a modular approach that allowed both off-the-shelf and custom-designed modules to be used. For instance, the processor, synthesizer, and FPGA processing modules are all designed in-house and have a fully customized form factor. Modules such as the compass and GPS are ‘bought-in,’ offthe-shelf. Do the Blighter radar products use many off-the-shelf solutions? Our radar doesn’t use many embedded, off-the-shelf systems because there is insufficient control of all of the parameters that affect performance within the system. For example, all clocks used within the design need to be phase-locked to one another to avoid spurious internal emissions that could reduce radar sensitivity or create false targets. How do you handle the cooling of electronics in the radar design? Blighter radars neither use nor require any active cooling and don’t need to be placed or operated from inside any sort of protective radome enclosure. Instead, the complete radar unit – including the antennas, signal processing, and plot ex12
tractor – is typically just mounted at the top of tall, fixed surveillance towers or on extendable pump-up vehicle masts. As such, much of the outer case of the radar is exposed to heavy solar loading during daylight hours, so it was important to keep the radar’s power consumption and dissipation to a minimum (see Figure 1). From the outset, our radars were designed with the goal of consuming and dissipating only very small amounts of power, compared to both traditional mechanically scanned radars and active electronically scanned (AESA) radars. Power consumption is kept low by using the much more efficient frequency modulated continuous wave radar processing technology instead of the traditional ‘pulse’ radar technology used since the invention of radar. Whereas pulse radars generally emit many kilowatts of radar energy, Blighter emits 4 Watts in its highest power version and just 1 Watt in its standard power version. So while a pulse radar can consume – and dissipate – between 500 Watts to 2 kWm, ours consume 40 to 100 Watts, which is roughly equivalent to the power consumption of a standard household incandescent light bulb. Our radars also use low-power passive electronically scanned array (PESA) technology on its transmit and receive channels to electronically steer the radar beam in azimuth. Adoption of such digital beam forming allows the radars to detect both small and slow-moving targets that mechanically scanned radars would be unable to detect. PESA technology is also more power efficient than competing AESA technology. Implementing an
SMALL RADARS
day and night, in almost all weather conditions, including rough seas, heavy rain, or dense fog.
Figure 1 | Pictured is the Blighter B400 radar. AESA-based e-scan radar would necessitate the use of many tens of active transmit and receive radar modules/elements, which would each be power-hungry and costly to manufacture.
In terms of size, it’s about the equivalent of a large briefcase and transmits 4 Watts of power, while consuming 100 Watts. This allows operation via solar panels and easy installation in difficult to reach areas such as rocky or inaccessible coastal regions. The radar’s low data bandwidth requirement also allows remote operation over narrowband wireless links or satellite communication systems.
Our Blighter B400 e-scan radar’s algorithms – including a sea wave clutter filter and non-moving target detector – and its frequency modulated continuous wave transmission technology, combined with sensitive Doppler signal processing target detection, enables it to detect the small and uncooperative targets that traditional coastal surveillance radars such as vessel traffic systems and maritime radars are simply not designed to detect.
Mark Radford has worked in the radar industry since 1985, initially as a designer of high-performance signal processing solutions for naval radar systems, and then later as a system designer and development manager. Since joining Blighter Surveillance Systems in 2000, he has been involved in various radar development projects, including the specification, design, and development of the electronic-scanning frequency modulated continuous wave (FMCW) Doppler surveillance radar.
These features enable protection for complex coastlines from intruders such as smugglers, pirates, illegals, and terrorists at ranges of as far away as 16 kilometers. It can detect and locate small targets
Blighter Surveillance Systems +44-1799-533200 www.blighter.com
How do these radars enable detection of objects or people within coastlines and coastal zones?
13
No such thing as too much signal processing for radar & EW systems By John McHale, Editorial Director
Signal processing fuels radar and electronic warfare systems as each application has an unquenchable thirst for more and more bandwidth and performance. FPGA-based VPX computing systems are more often than not the tools that enable system success.
S 14
ignal processing technology continues to evolve in both radar and electronic warfare (EW) applications as embedded computing designers work to meet the insatiable demand their military customers have for more bandwidth capability. The nature of their demands are the same year to year, but there will never be a point when they have enough signal processing to meet their needs.
says Rodger Hosking, vice president and co-founder of Pentek in Upper Saddle River, N.J. “In general military radar and EW designers want faster data converters with a higher sampler rate in and out. They also want higher resolution inverters for improved dynamic range and better signal detection in large and small signals. All of these demands are consistent with the need for wider bandwidth.
Signal processing requirements from radar and EW programs are always about the same each year, but with new twists,
“People are constantly asking for faster speeds and more resolution because all the signals basically occupy the same
SIGNAL PROCESSING
“While there is much commonality between signal processing solutions for radar and EW, there are differences as well, particularly when it comes to latency on EW, which is one of the biggest drivers in that field,” says Peter Thompson, senior business development manager for high performance embedded computing at GE Intelligent Platforms in Huntsville, Ala. “For EW there is also a need for increased A-D and D-A performance and resolution. Getting data in and out as quickly as possible is paramount.”
Engineers at Lockheed Martin in Syracuse, N.Y. used a Wildfire FPGA board from Annapolis Microsystems and the company’s CoreFire FPGA development tool to program the boards to do the digitization and digital filtering inside the upgrade of 29 U.S. Air Force AN/FPS-117 long-range surveillance radars.
spectrum and wider bandwidth is needed to accommodate higher-definition videos, transferring data for network traffic, etc.,” Hosking says. “The waveforms also get more complicated every year as military system designers want better algorithms to get more information, to improve signal to noise performance, and to extract signals from noisy environments.”
Application trends: EW
Each application has unique processing challenges; for EW systems, designers are pressured to keep latency low.
“For general EW signal processing designs there are two things we are seeing – wider bandwidth for the A-D converters and higher resolution – as much as eight, nine, or 10 bits, says Noah Donaldson, VP of product development at Annapolis Microsystems in Annapolis, Md. “For EW applications the main thing our customers want is low latency. You want it to be as low as possible – less than a few nanoseconds. It is difficult to achieve, and you have to design a number of things just right. The A-D converter and D-A converter have to be properly designed along with the firmware, if not then often you have to redesign everything. To get minimum latency the hardware and firmware must be designed in sync.” Typically server-class signal processing solutions are being leveraged for advanced electronic countermeasures in EW applications, says Shaun McQuaid at Mercury Systems in Chelmsford, Mass. “Advanced deep signal analysis is also starting to be talked about by our customers. The number of EW threats continues to grow and the only way to combat them is with a broad-based, multithread15
ed signal processing approach that handles cognitive challenges with Xeon server-class systems that have GPUs on the back end and FPGAs handling the front end. The FPGA is key to tackling low latency challenges in EW systems,” he continues. “FPGAs are enabling much of the gains in signal processing performance on the front end,” Thompson notes. “It is a case of horses for courses. Huge bandwidth of data coming in through FPGAs on front end while on the back end parallel processing through GPUs are key.” The IPN 251 is GE’s latest 6U OpenVPX platform and combines Intel quad core and a Nvidia GPU for a design popular with radar users, he adds (Figure 1).
Figure 1 | The IPN 251 is GE Intelligent Platforms’ latest 6U OpenVPX platform and combines Intel quad core processors and a Nvidia GPU.
Application trends: Radar
The insatiable need for data is what drives radar system signal processing requirements. “The military does not have a ‘good enough’ point in these applications,” Hosking says. “They want to be able to detect targets as far away as possible. “For radar systems larger fast fourier transforms (FFTs) are needed for more 16
Driving down the cost of ESAs By Sally Cole, Senior Editor
Decreasing the cost of electronically scanned arrays (ESAs) is a big concern for the U.S. military and, as part of the U.S. Defense Advanced Research Projects Agency’s (DARPA) Arrays at Commercial Timescales (ACT) program, Rockwell Collins engineers are exploring ways to make it happen. ESAs, which are used primarily for tactical aircraft to detect and jam signals, justify $1M price tags for radars. “But this same technology would be useful in communications and other applications if we could drop the cost by a factor of about 1,000,” says John Borghese, vice president of Rockwell Collins’ Advanced Technology Center in Cedar Rapids, Iowa. “It’s the key challenge we’re working on as part of DARPA’s ACT program.” The company is seeing “demand in the defense radar space from defense customers and Department of Defense (DoD) contractors working in complementary roles in terms of developing low-cost array technology,” points out Lee Paulsen, principal electrical engineer for Rockwell Collins. This trend is expected to proliferate as the costs of these arrays decrease. “Expect to see them in automobiles in the near future to detect children or objects in front of or behind the vehicle,” notes Borghese. “And at least one cellphone company is actively exploring using ESAs in their phones.” As standards discussions continue for 5G technologies, ESAs will be involved. “To get access to open bandwidth, we’ll need to move into more sophisticated millimeter-wave-type frequencies with available
spectrum. It’s one of the means to achieve the 10 or 100x improvements in bandwidth for 5G,” Paulsen says. So, how is Rockwell Collins targeting lower-cost antennas? With a few different approaches. “Sense-and-avoid radar is one of the tools in the toolbox for making UAVs safe for national airspace or search-and-rescue radars. This market will be well regulated and slow to change,” explains Paulsen. “It has lots of volume, with thousands of systems per year. In this case, our approach is to integrate very-low-profile phased array systems and get rid of interconnect costs by soldering components together instead. We can collapse multiple different layers of the phased array subsystem into a single laminated PCB, which will have a radiating layer, a beam-forming layer, a control layer, and a power distribution layer – all as separate planes.” Another aspect involves exploring lower cost material sets. “Instead of some of the expensive high-linearity gallium nitride (GaN) components, silicon-based solutions or chipsets can be used in the military,” Paulsen adds. From his perspective, there’s a huge advantage to “being vertically integrated in these spaces and able to keep the overhead for each of the subcomponents relatively shared. Moving in that direction will enable us to create lower-cost phased arrays for some of these emerging radars.” Then there is the DARPA ACT program, which is trying to solve two problems simultaneously. “We’re working on developing an array architecture that can be used for numerous platforms, a ‘one-size-fitsmost’ approach, which will help drive volume,” explains Paulsen. “Everyone wants the ability to update the architecture reg-
ularly, so we’re leveraging next-gen FPGA technology to help coalesce multiple layers of existing phased arrays – essentially pushing them into a common module.” To do this, they’re targeting making the most expensive pieces of technology within the phased array software-definable in a power-efficient way. One of Paulsen’s visions for ACT is to end up with software-defined arrays with only power and digital interfaces. “All of the RF interfaces and signal processing – even adaptive beam forming – will be done inside the array box,” he elaborates. “Similarly, for radars, you can imagine moving all of the processing inside the same box. Moving these things out onto the face of the array frees up space inside UAVs and electronics bays in aircraft.” Rockwell Collins’ overall goal for ACT is to “show that we can develop the technology to build these things…that it’s physically possible and the math works and holds up,” says Borghese. “The reality is that it brings substantial differentiating capabilities to the warfighter and the DoD in general,” Paulsen adds. “We’re talking about moving into spaces where communications systems will need directional antennas to avoid being shot down, because we’re facing or posturing to face more sophisticated adversaries than in the past.” Within the context of sequestration, pursuing ‘cost-neutral ESAs’ is necessary. To do this, they’re focusing on removing expensive, mechanically clunky dish antennas, for instance, and replacing wave guide slot antennas with ESAs at no net price increase so that when technology upgrades or refreshes roll around for the platform, it should be affordable. 17
Upgrades vs. new programs While recent budget constraints have curtailed many large ticket programs in the U.S. Department of Defense, funding for radar and electronic warfare have remained relatively flat – but the question remains, is the funding going toward more new systems or technology refreshes/upgrades? “Most times it is hard to call something an upgrade when you are replacing old technology with a system, such as VPX, that is significantly superior to what was there before,” says Rodger Hosking, vice president and co-founder of Pentek in Upper Saddle River, N.J. “The aircraft or radar platform may be the same, but the processing solution being designed in is totally new. For radar systems that are 20 years old, a new system with the same footprint can deliver 10 times the functionality. Taking advantage of today’s technology makes it almost not fair to compare them to systems 10 or even five years old. Functional performance is moving so quickly it’s wonderful.” “We see a mixture of both round of radar designs and upgrades to existing radars at the platform level,” says Peter Thompson, senior business development manager for high performance embedded computing at GE Intelligent Platforms in Huntsville, Ala. “However, it is true that radar processing upgrades don’t look much different from new designs because of the improvement in performance over the older system is so significant. “Development time and cost are the biggest drivers in upgrade designs, which is why they trend toward modular systems that enable system integrators to find the best product and vendor,” Thompson continues. “It also means they can upgrade platforms piecemeal. Instead of changing an entire processing chain, they can leave the rest of it as is, defining interfaces between subsystems.” 18
precise Doppler processing, where the radar systems track the speed and direction of targets,” he continues. “Improving resolution at a given range yields more accurate target information, including the mass of an aircraft and detection of unique structures and reflection characteristics of a jet. This rich set of information in the received signal can absolutely identify a target. It’s akin to looking at fingerprints 10 feet away vs. examining it under a microscope.” Pentek offers the 5973 Flexor Virtex-7 VPX-based FMC for radar applications. It features the emerging VITA 66.4 optical backplane interface to deliver systems to customers that interconnect between modules in a chassis and between chassis via high-speed optical links with the ability to connect system elements as far as 10 kilometers apart, Hosking says (Figure 2).
Figure 2 | The 5973 Flexor Virtex-7 VPX-based FMC from Pentek features the emerging VITA 66.4 optical backplane interface. “On the radar side you need agility in frequency and waveforms, and the capability to defeat countermeasures,” Thompson says. “Bigger and faster memories are also required with higher and higher resolutions to track more
SIGNAL PROCESSING
and more objects on the ground. There is a constant push to do more and to reduce SWaP at the same time. For radar we also see a lot of demand for open systems architecture now. We don’t see radar program managers wanting to be locked into a specific architecture and vendor for hardware and software. This is not necessarily true in EW as they want to do whatever it takes to meet their latency goals.” “For radar systems, people want more channels, faster A-D converters to find wider bandwidths, and then they are looking for A-D converters with the highest resolution,” Donaldson says. “For example, with phased array radar dozens or hundreds of thousands of sensors are needed – multiple A-D converters for each sensor.” “A number of our radar customers are also focusing on enabling more raw processing performance onboard aircraft platforms for real-time exploitation of data before it has to be sent to the ground via weak data links,” McQuaid says. “In the radar space the appetite for more processing performance will only continue to grow.” For signal processing applications Mercury offers the HDS6603, a blade with more than one Teraflop (TFLOP) of general processing power in a single OpenVPX slot, McQuaid says. It uses two 1.8 GHz Intel Xeon E5-2600 v3 processors with 12 cores each to provide a total of 1.38 TFLOPS.
VPX
VPX, or VITA 46, based on signal processing systems are dominating the embedded niche for radar and EW systems as it is easily the best form factor for engaging
switched fabrics and enabling super computing functionality for these applications. “The data from radar and EW applications all feeds into what the DoD is calling the tactical cloud of big data that enables U.S. warfighters to have decision superiority on the battlefield,” McQuaid continues. “So in a way OpenVPX is fueling the tactical cloud as it enables the supercomputing capability and sheer processing power needed to generate the data it needs to exist. VPX is able to deliver the kind of architecture and ruggedization necessary for this tactical environment.” “We’re seeing a lot more requests for VPX in radar and EW systems, more than CompactPCI and more than AMC or MicroTCA,” Donaldson says. “Customers are also increasingly looking for 3U VPX due to reduced size and weight requirements in many systems. The 3U form factor also makes it easier to swap our cards and boards for easy upgrades and maintenance. However, those who do not have the space constraints and need the better density still go with 6U VPX designs. Annapolis offers the WILDSTAR 7 Conduction Cooled OpenVPX 6U product that has as many as two Xilinx Virtex 7 FPGAs per board with VX690T or VX980T FPGAs and as much as 8 GB of DDR3 DRAM for 51.2 GBps of DRAM bandwidth or up to 64 MB of QDRII+ SRAM for 32 GBps of SRAM bandwidth. It may also be air-cooled. Military Embedded Systems mil-embedded.com/topics/radar @military_cots | LinkedIn | Facebook
19
By Christopher Fadeley, Tech Source
Radar – essential to virtually all critical military environments and a mainstay in almost all vessels – has become more difficult to integrate with actual live video feeds from various camera interfaces. This situation arises primarily due to the complexity of properly routing and managing the various feeds. However, designers must overcome this complexity because the live video is so vital in complementing the radar data; not all information can be relayed with radar alone. 20
VIDEO WITH RADAR
Radar has been a powerful tool in the military environment, but it still can’t tell the full story in a combat situation. It is now standard in aircraft, ground vehicles, submarines, and sea craft, but video has been more difficult to integrate. Video is a necessary addition, however, even if it is to simply supplement the radar data and better interpret targets of interest. With an integrated system, the radar data can be directly correlated with the live video feed to pinpoint the target of interest and then analyze and track that target. Without this live camera data, full analysis and verification is difficult. Lack of live video unfortunately increases the possibility of false positives and delayed response times.
Difficulties of raw video
Modern digital radar data is smaller than live video feeds and is typically sent in a standard format (like ASTERIX CAT 240) over Ethernet as user datagram protocol (UDP) packets. This data is then retrieved onboard the vessel – typically via a rugged system like VME, VPX, or CompactPCI embedded systems – because most military applications have strict environmental requirements. The data can then be displayed on this local embedded system or sent off vessel to other networked computers at various other interested locations, like control/ground stations. Raw live camera video doesn’t have the luxury of this simple setup, however, due to the size of the resulting data and the inability to effectively send this raw data over Ethernet. Modern Ethernet maxes out at 1 Gbps, while 1080p30 raw video runs at 1.49 Gbps. Even at a reduced resolution, frame rate, and chroma subsampling, raw video will still take up most if not all of the Ethernet bandwidth, essentially clogging the network. Therefore, the only way to get raw video to all interested parties is to hardwire it directly to each party. Hardwiring creates its own complexities, however: While serial digital interface (SDI) is typically used in modern environments because of its digital format and ability to be transmitted longer distances over fiber on a single coax, the video still has to be hardwired to each individual interested party. This setup involves not only having to run the actual wire to all parties (and in turn limiting the number allowed to view), but also splitting the single camera feed into multiple wires. Again, this assumes only a single camera feed. Typically, cameras are mounted at all critical angles of the vessel, with each of these feeds needing to be split and routed to every interested party. An additional switch is necessary to allow each party to select which feed to be displayed at any 21
Figure 1 | Streaming compressed video in a naval-ship environment can be accomplished by posting six SDI cameras in various locations, each with an embedded VME SBC running an SDI encoder card. given time. Needless to say, this approach becomes a mess of wires very quickly, all while limiting further expansion.
Back to Ethernet
As mentioned previously, the ability to send raw video is unworkable due to its large data size. Fortunately, there is a way to send video over Ethernet: by compressing and streaming the video. Thanks to the current compression standard, H.264, previous issues of latency and quality are no longer as relevant. H.264 is highly configurable to produce a near-1:1 quality factor; hardware accelerated encoders ensure that this encoding is done in a timely manner.
22
on the network can access and display the stream without any extra overhead. The system then becomes easy to scale for future additional viewers and allows the stream to be sent off remotely.
How to compress
Standards such as RTP & MPEG2TS enable the live video to be encapsulated with other data (like audio and metadata) and then sequenced correctly at the other end with minimal overhead.
Since streaming compressed video is the easiest way to distribute live camera feeds throughout a large environment, the question becomes: How do we properly compress the video in a naval environment? Hardware acceleration – necessary on both the encoder and decoder side of the pipeline –ensures a consistently stable performance and is more efficient than using software codecs that require a lot of CPU usage. Efficiency is the key in this case since encoding does inherently add latency, but if a proper hardware encoder is selected, this should not be an issue, with most encoders staying around 50-150 ms of latency.
This UDP Multicast system also is stateless, meaning that any number of computers
An example environment for this would be a naval vessel outfitted with six SDI
VIDEO WITH RADAR
cameras in various locations. (Figure 1.) Each of these cameras has an embedded VME single-board computer (SBC) close by running an SDI encoder card like the Tech Source VC100x or VC102x. This card takes in the SDI video and encodes it to H.264 before passing it off to the host PC. This host PC can then show the stream locally and forward the stream to all other interested parties.
Added value
Once the video is compressed and sent out over Ethernet, the viewing parties must capture the Ethernet packets and decode the video before displaying. Other beneficial features are added if the selected radar suite itself can handle the decoding of the video. Applications such as Cambridge Pixel’s SPx software already has support for decoding and displaying these H.264 feeds and further more can directly integrate and correlate the video feed with the radar data and also enable the user to set up the H.264 video stream parameters on the Tech Source Condor VC100x or VC102x. For example, when a target of interest is identified in the radar view, the software can control the nearest camera via a Pelco-D or similar interface to show the target of interest on video (Slew to Cue). The position of the camera is then controlled by the observation of the target by the radar. This allows the user to instantly analyze what has been detected on radar and if it is truly a target of interest or simply noise. Going further, the user may then want to process the camera video to track the target from the real-time video signal (video tracking). Optimum detection and assessment arises with the integration of radar, which provides long-range, 360-degree
coverage, with cameras that support visual assessment and identification. Furthermore, applications can typically add various other beneficial features –including video feeds like overlays and target/motion tracking – by using a dedicated GPU to analyze the video feed. Using a dedicated GPU will not only ensure performance of the radar applications, but can also be used to speed up the decoding of the H.264 video feed to further reduce latency of the feed. Some systems have on-board graphics available, but only a dedicated GPU can ensure strong performance. With the data size now much more manageable, the video and/or radar display can also be recorded to a local storage device for future viewing and analysis.
Integration is worth it
Radar will always be a necessary part of the naval surveillance environment, but video can be a vital supplement to further reduce potential false positives in target acquisition. Video can be difficult to implement, but compressing and distributing the video over an already pre-existing Ethernet network saves unnecessary cabling and allows the video to be accessed at any location. All this data working together provides the user with true situational awareness. Christopher Fadeley is a Software Engineer and Project Manager at Tech Source. He holds a B. Eng in Computer Engineering: Digital Arts & Sciences from the University of Florida. Fadeley specializes in military embedded systems software and video encoding/streaming solutions. Tech Source www.techsource.com chrisf@techsource.com 23
Tackling defense apps by harnessing the
Intel Core i7’s
integrated GPU By Marc Couture, Curtiss-Wright Defense Solutions The defense electronics market has embraced commercial-off-the-shelf (COTS) technology. Riding waves of innovation driven by huge, worldwide markets, defense system designers are using multicore processors, high-capacity memory chips, wide-bandwidth fabrics, and open software – all developed for commercial markets. These designers must constantly evaluate new commercial innovations and determine if, and how, they can be employed to craft a superior defense solution. The following will discuss a recent innovation, the integrated, on-chip Graphics Processing Units (GPUs) within a new generation of Intel x86 architecture chips. It will cover the basic concepts of how these integrated GPUs function, look at what they do well and not so well, and then present some defense electronics use cases where they offer significant advantages.
24
DEFENSE ELECTRONICS
Evaluating the design value of an Creation of complex objects usually involves the utilization of very simple shapes, integrated GPU Early in the design cycle, system engineers must make decisions on what types of processing elements will best meet their requirements. There are lots of choices – Power Architecture, ARM, x86 architecture, FPGAs, GPUs – all with strengths and weaknesses.
Now there is a new variant in the list of processor choice, GPUs integrated inside Core i7 x86 silicon. Understanding that variant, its strengths, and weaknesses, is one part of making good design decisions for a new generation of programs.
such as triangles, that serve as fundamental graphic building blocks. Then these objects are immersed in a scene that defines the overall lighting, surface textures, and point of view. One of the most popular techniques for immersing the objects within a given scene involves first object shading and then rasterization, which geometrically projects the 3D objects and the scene onto a 2D image plane, i.e., the monitor (Figure 1).
In contrast to these computer-generated images, image signal processing, as well as computer vision, receives a continuThe primary functions of a GPU ous stream of imaging data from a vidA discussion on the value of an integrated eo camera or electro-optic focal plane GPU needs to begin with an understand- array. Two-dimensional signal processing of how GPUs function as a general ing techniques are then applied for edge class of processors. At a high level, GPUs detection and motion blur correction, lend themselves to two broad classes of in addition to a myriad of other applicaprocessing, both of which are needed by tion-specific algorithms. The enhanced image is then rendered to a display, defense applications: where the human operator will hopefully have an easier experience in regard to 1. Rendering images for display 2. Accelerated floating-point math extracting useful information. operations How does a GPU make that happen? Rendering images was the original intent of What is involved in rendering an image? An image on a flat screen monitor can be the GPU chip and remains as its core funcderived via a 2D or 3D computer model. tion in commercial applications to this day. Projecting 3D computer images in motion onto a 2D surface, so as to be convincing to the human eye, is a commonly used but highly compute-intensive application. A moving image, for instance, will command a tremendous number of calculations per frame as the GPU renders simple polygons to create more advanced objects, maps textures to simulate surfaces, and then Figure 1 | Greater shading capability enhances rotates these shapes within dynamically image quality. changing coordinate systems. 25
Image signal processing and computer vision perform different operations, but are equally intensive in terms of computation throughput requirements and use math algorithms highly similar to those used by image rendering. Why is the GPU good at these operations? Given the continuously streaming nature of rendering video frames, GPUs are by nature optimized for streaming processing. To manage streams of data they first boost throughput by parallelizing their compute engines via a massive pipeline
“
rithms associated with this type of processing. Powerful new GPUs include large numbers of these shaders for accelerated rendering of highly complex, lifelike images. How and why does a GPU accelerate floating-point math? With the advent of programmable shaders that are inherently adept at vector and matrix math, using GPUs to work on problems outside of rendering graphics has become a reality. Vector and matrix data manipulation are fundamental
The various chip manufacturers are making great progress and the OpenCL community is growing at a much faster rate due to the variety of target silicon that is available. architecture. Then they add high bandwidth, high-capacity memory, which is central to this model given the relentless flow of high-resolution imagery. In recent years, this streaming processing capability has been further enhanced by the incorporation of “programmable shaders.� These shaders perform highly sophisticated shading effects (e.g., volumetric lighting, normal mapping, and chroma keying); they are optimized at the silicon level to execute the algo-
26
to linear algebra, filtering, and transforms, all components of digital signal processing. Initially, the data formats were integer based, however the need for high dynamic range imaging saw the development of floating point data formats. This set the course in motion for GPUs to be used as a floating-point math accelerators in a multitude of commercial- and defense-related high-performance compute applications. The term General Purpose GPU or GPGPU was then coined.
DEFENSE ELECTRONICS
Why are GPUs really good at floating-point math for streaming applications? The sheer number of shaders in GPGPUs, coupled with a high throughput streaming pipeline architecture allows for scalable, massive parallel processing that can far outpace the floating point performance of SIMD units within standard CPUs. Discrete GPGPUs are often coupled with high-throughput, high-capacity GDDR5, a type of synchronous graphics RAM based on DDR3 SDRAM. This is not only required for buffering high frame rates of high-resolution graphics data, but is highly advantageous for buffering the ingest and egress of wideband sensor data (Figure 2).
ly Intel-based, as they themselves are not governed by an operating system. The host CPU directs the instructions to be executed and also identifies the input source and output destination for data streams. The physical implementation of GPUs has seen them on full-length PCI cards placed in Intel-based workstations, in addition to being incorporated directly onto CPU motherboards. Mobile PCI Express Module (MXM) is a modular format for GPUs, which in addition to high-end consumer products has been used in high-performance embedded computing (HPEC) applications. One example is a single MXM module on a 3U OpenVPX carrier board; another instantiates two MXM modules on a 6U OpenVPX carrier. These carriers are placed in adjunct slots alongside Intel SBC or Intel/FPGA DSP boards with a multilane PCIe super highway providing the interconnect for streaming data ingest/egress. Rendered graphics destined for displays are output directly from the GPUs over display signal ports.
Figure 2 | GPUs scale with hundreds of shader cores. Putting GPUs into systems Until very recently, GPUs have only existed as unique chips. As discrete pieces of silicon, GPUs have traditionally relied on external host CPUs, typical-
Recently, ARM cores that do run operating systems are finding their way into discrete GPUs. And, as will be discussed later, there are other architectural approaches that also provide new ways to integrate GPUs into systems. 27
Comparing a GPU to other math How is a GPU like an SIMD unit? How is accelerators it different? How is a GPU like an FPGA? How is it different? FPGAs have architectures that scale by the number of logic cells, a single cell being comprised of D flip-flops, LUTs, and other elements. FPGAs scale these logic cells from a few hundred to thousands as the physical die footprint grows. GPUs scale shader processors in a similar fashion, scaling from a few hundred in the smaller packages to a couple thousand. This range of sizes sees both FPGA and GPU technologies producing devices that are easily over 100 W thermal design power (TDP), which often excludes the largest high performers from being selected for designs destined for applications that have physical constraints such as size, weight, and power. Neither FPGAs nor GPUs are well suited for handling the cognitive or decision making parts of an application; dynamic decision making functions are usually performed by general-purpose processor CPU cores. However, both FPGAs and GPUs excel at those parts of a high-bandwidth streaming application where the dataset and associated algorithms can be parallelized. The big difference between FPGAs and GPUs to date is that GPUs have a powerful floating point capability while FPGAs lack dedicated floating point processing, although this may change in the not-sodistant future. FPGAs do have an advantage in that their I/O is user definable, whereas GPUs are solely dependent on PCI Express for data transport and display port interfaces destined for monitors. 28
The CPU cores within a device such as the Intel Core i7 (4th generation “Haswell�) each have a single instruction, multiple data (SIMD) unit labeled Advanced Vector Extensions (AVX) 2. The AVX2, like GPU shaders, is a floating point engine. And, in fact, shaders are SIMD-like, with the same instruction operating in one cycle on multiple shaders. A big point of differentiation lies in the fact that GPUs generally process elements independently, with little to no provision for static or shared data. Each GPU processor can only read from an input, perform an operation on it, and write it to the output. In contrast, the AVX2 SIMD units contain advanced register and memory manipulation functionalities, in addition to support for context switching, making for compute elements capable of cognitive applications. This makes them suitable for more general-purpose processing, such as scientific and financial applications that benefit from parallelism and high throughput. Another major difference is the huge disparity in density, as GPUs have far more shaders than the number of SIMD processors in a Core i7. For instance, the NVIDIA GTX 970M has 1,280 shaders, as opposed to 4 SIMD units in the Core i7; this is partially offset by the fact that the individual AVX2 unit FLOPS benchmark is much higher than an individual GPU shader core at ~75 GFLOPS as opposed to ~2 GFLOPS.
Programming a GPU
The pros and cons of CUDA and OpenCL Within the software domain, the most popular and ubiquitous API for drawing and
DEFENSE ELECTRONICS
rendering graphics is undoubtedly OpenGL, maintained by a not-for-profit consortium called the Khronos Group. However, when it comes to accelerating math there are two main choices: CUDA from the GPU silicon manufacturer NVIDIA and something a bit newer and also from the Khronos Group called OpenCL. Compute Unified Device Architecture (CUDA) strictly targets NVIDIA GPUs for use as GPGPUs. CUDA is based on C and is predicated on a parallel computing programming model that looks to gain massive throughput by executing many threads concurrently. The big advantage to CUDA is that it exposes optimized, intrinsic functionalities buried within the NVIDIA GPU silicon. One example enables thread usage of shared memory regions, a previously cited omission of GPUs. In general, reviews from CUDA developers have been very favorable within the high-performance computing (HPC) community. The big disadvantage with CUDA is that it is proprietary and closed to silicon outside of NVIDIA GPUs. Open Computing Language (OpenCL) is also based on C and provides parallel computing constructs. The main differentiation lies in the fact that OpenCL targets and executes across a whole spectrum of heterogeneous platforms consisting of not just GPUs, but also CPUs, DSPs, and even FPGAs. The big advantage with the open nature of OpenCL is portability and the ability to bridge heterogeneous computing elements within the same system or even the same silicon. The disadvantage can sometimes be experienced in the level of abstraction from the underlying silicon, and optimization mileage will vary depending on OpenCL implementation by the associated silicon ven-
dor. The various chip manufacturers are making great progress and the OpenCL community is growing at a much faster rate due to the variety of target silicon that is available. There are GPU security concerns Intel x86 and Power Architecture devices are general-purpose processors (GPPs) well armored against security attacks, but within a device the neighboring discrete GPU/GPGPUs are relatively unprotected and vulnerable. Discrete GPUs lack the integrated crypto capabilities and special security features, like secure boot, found in GPPs.
How GPUs evolved
Core count has increased As is the case with other processing architectures, GPUs have seen and continue to see a steady increase in core count with-in a fixed power footprint driven by lithography advances that increase die density. This creates an ever-greater performance-to-power (i.e., FLOPS to watt) ratio in all architectures, with GPUs currently in the lead. Another key point, previously mentioned, is that CPU cores (e.g., ARM) are finding their way onto GPU dies. This provides a new level of autonomy to GPUs and also allows them to assume functions within a system beyond that of a mindless Giga/ Tera FLOPS monster. Intel and GPUs can work together For a number of years Intel Corp. has conducted a dual roadmap targeting two distinct macro markets. Xeon server-class chips are effectively large homogeneous pools of x86 cores. These chips can be physically linked so that cache coherency scales 29
across multiple devices, making them ideal for cloud computing, HPC servers, and applications that are extremely dynamic or multimodal. Alternatively, Intel’s Core i7/i5/ i3 mobile-class devices are geared towards portable devices such as laptops and ultrabooks, are therefore more environmentally rugged, and now have the requirement to sometimes render their own graphics. Intel has kept both of these product lines on a strict time cadence called “Tick-Tock” where Intel first creates a new microarchitecture in one generation and then shrinks the die geometry in the following generation. This results in the latest Core i7 mobile class device in particular having four x86 cores, each core with the latest AVX2 SIMD vector unit. Additionally, with the evolving microarchitecture and shrinking lithography, a tightly coupled GPU now resides on the same Core i7 die, which will continue to advance in efficiency and density with the Tick-Tock march of future generations (Figure 3). Historically, Intel CPUs have been coupled on motherboards with discrete GPUs from NVIDIA and AMD. The integration of the GPU device by Intel onto the mobile-class
Figure 3 | The intel ‘Tick-Tock’ development model. 30
die was partly spurred by marketplace contentions, but there are also definitive technical advantages to subsuming the GPU, such as removing PCB complexities and reducing thermal dissipation. The aerospace and defense industry is a direct beneficiary of this development, as it now has a very powerful heterogeneous CPU+GPU core where the GPU can be used to drive monitors or process extreme sensor feeds. Intel’s integrated GPUs The Intel Core i7 Gen 4 (Haswell) contains four x86 cores, each with its own dedicated AVX2 SIMD unit and each with an exclusive L1/L2 cache. There is a larger L3 cache (4-8 MB) that is shared among all four cores and the internal graphics processor (integrated GPU). This “sharing” is accomplished via a high-speed ring bus (>300 GBps) that serves as the data transport mechanism. Additionally, some of the Core i7 chip variants have additional embedded DRAM (eDRAM) that effectively functions as a Level 4 cache (victim cache to L3, up to 128 MB). Outside of external DDR3 memory, the method for getting into and out of the Core i7 is through lots of PCI Express Gen 3 lanes (Figure 4).
DEFENSE ELECTRONICS DMI
PCI Express*
System Agent
IMC
Core
GT2 supports three display ports for rendering graphics (Figure 5).
Display
LLC LLC
Core
LLC
Core
LLC
DDR
Core
Processor Graphics
eDRAM
Figure 4 | Core i7 architecture diagram. There are several different sizes available for the embedded GPU within a Core i7; one of the more popular ones is the GT2 (a.k.a. HD Graphics 4600), which has 20 shader processors (Intel calls them execution units (EU)). These EUs produce >350 GFLOPS of single-precision floating point processing. Additionally, the
Defense electronics has advantages Some applications will not be candidates for tossing aside a discrete NVIDIA or AMD GPU in favor of the embedded Intel mobile-class GPUs. This includes applications that require TFLOPS of processing and/or GB of external GDDR5 DRAM, which are what the MXM modules with the larger discrete GPUs provide. However, there are other applications that can greatly benefit from the onboard GPU in the Core i7, where the GFLOPS performance (excluding CPU AVX2 units) meets the need and the memory capacity is sufficient. • The first obvious advantage to defense electronics is the removal of an entire adjunct GPU board (or more), which equates to an entire FRU slot for a deployed chassis, saving size, weight, power, and overall cost for the entire system. • Another key advantage with the onboard GPU is the extremely low latency made possible by its proximity to
Figure 5 | A silicon view of the Core i7 architecture. 31
the CPU cores. With all the CPU cores and the GPU interconnected by a lightning-fast ring bus and passing data at the caching level, latency benchmarks are greatly improved when compared to data transport to/from an Intel CPU device on one board and a discrete GPU on a second board interconnected via PCI Express. • A third significant advantage to consider is the increased security surrounding the Intel devices as opposed to the large discrete GPUs, which currently do not have adequate provisions for protection. Intel processors include capabilities for hardware-based secure key generation and boot integrity protection. Example use cases offers advantages Graphics processing units continue to gain popularity in the aerospace and defense industry. They are utilized in both modes as previously described, as image renderers (GPUs) or math accelerators (GPGPUs). Both the discrete GPUs and the integrated GPUs are growing in terms of performance and memory capacity. Which type makes the most sense is really a question that needs to be answered by the requirements of the program. We are seeing GPUs being used to render displays for 360 degree situational awareness on modern military platforms. This enables the pilot or driver to effectively see through the ceiling, walls and floor of a vehicle. Even more popular is GPU usage to capture image sensor data (e.g. via gigapixel cameras) 32
followed by real-time image manipulation, optimization, and display. This latter application of GPUs is becoming particularly popular with unmanned aerial vehicles (UAVs) that are carrying multiple camera types (i.e., electro-optic and infrared) that need to be orthorectified and stitched together, just for starters. Sensor data other than standard imagery is also a target for GPUs. STAP and SAR radar, which are hungry for FLOPS, are seeing pulse compression and Doppler processing occurring in GPUs. SIGINT applications requiring high-throughput, wideband frequency domain analysis are being targeted by GPUs. The embedded GPUs are becoming particularly attractive for some applications given their proximity to the CPU cores all on the same die. Applications with stringent latency requirements, in addition to processing throughput, can greatly benefit from a heterogeneous processor such as the Intel Core i7 given the tightly coupled ring bus infrastructure between the embedded GPU and the AVX2-enabled x86 cores. Electronic warfare (EW) applications, such as Cognitive EW, that were dismissive of discrete GPUs due to high latency figures, are now considering the integrated device as it gives them the
Figure 6 | The CHAMP-AV9 6U OpenVPX DSP module.
DEFENSE ELECTRONICS
throughput of the GPU, the cognitive capabilities of the CPUs, and the low latency of the ring bus interconnected caching. OpenCL, as previously discussed, represents the software enabler that gives developers the access they need to realize the benefits of this heterogeneous device. The CHAMP-AV9 is a dual integrated GPU module A prime example of a design implementation incorporating embedded GPUs is Curtiss Wright’s CHAMP-AV9 Intel Core i7 Multiprocessor 6U OpenVPX DSP board. This module brings two Core i7 processors onto a single PCB and is designed to withstand the severe environments that the aero-space and defense industry imposes upon it. The combined GFLOPS metric for just the AVX2 SIMD units on the two Haswell Core i7s equates to 614 GFLOPS. However, if we were to enlist the embedded GT2 GPUs as GPGPU math accelerators, we would see an overall total of 1,318 GFLOPs of processing power. Alternatively, if the GPUs were leveraged to render displays, the CHAMP-AV9 provides for a total of six individual display ports, three per Core i7 device. One real consideration to take into account with the Core i7 computational powerhouse is the importance of providing ample memory throughput and capacity, as well as high-speed I/O and lots of it. Rarely are GPUs compute bound; they are much more likely to suffer from being memory bound or I/O bound. This is why the CHAMP-AV9 employs up to 32 GB of DDR3 to tackle memory capacity, in addition to the routing of all genres of highspeed I/O on-and-off module (Figure 6).
Signal integrity over an extended temperature range is of particular importance, as the CHAMP-AV9 routes 1,600 MHz memory lanes, PCIe Gen 3, DisplayPort, and 40 GbE/InfiniBand in a very tight 6U OpenVPX footprint. The CHAMP-AV9 employs Curtiss-Wright Fabric40 technology to address this issue and provide high signal integrity at the extreme signaling rates used by high-bandwidth fabrics.
Summary
System designers must regularly evaluate the many choices of processing elements available on the market, selecting the architecture that best meets the needs of a specific program. GPUs integrated within Intel’s Core i7 processors are a recently available processing variant, with significant advantages to many types of aerospace and defense applications. SWaP savings, high performance, low latency, and enhanced security are all characteristics of these integrated GPUs. Curtiss-Wright will continue to choose the best and brightest processing elements for the aerospace and defense market. Expect to see new solutions utilizing the latest discrete GPU technologies. Additionally, the 5th generation Intel Core i7 “Tick” isn’t far off in the future. Marc Couture is a senior product manager at Curtiss-Wright Defense Solutions. Curtiss-Wright Defense Solutions www.cwcdefense.com ds@curtisswright.com 33
Software-defined radio handbook, 11th edition By Rodger H. Hosking, Pentek, Inc.
Software-Defined Radio (SDR) has revolutionized electronic systems for a variety of applications including communications, data acquisition, and signal processing. Early roles for FPGAs
As true programmable gate functions became available in the 1970’s, they were used extensively by hardware engineers to replace control logic, registers, gates, and state machines which otherwise would have required many discrete, dedicated ICs. Often these programmable logic devices were one-time factory-programmed parts that were soldered down and never changed after the design went into production. These programmable logic devices were mostly the domain of hardware engineers and the software tools were tailored to meet their needs. You had tools for accepting boolean equations or even schematics to help generate the interconnect pattern for the growing number of gates. Then, programmable logic vendors started offering predefined logic blocks for flip-flops, registers and counters that gave the engineer a leg up on popular hardware functions. 34
Nevertheless, the hardware engineer was still intimately involved with testing and evaluating the design using the same skills he needed for testing discrete logic designs. He had to worry about propagation delays, loading, clocking, and synchronizing—all tricky problems that usually had to be solved the hard way—with oscilloscopes or logic analyzers. FPGAs are new device technology It’s virtually impossible to keep up to date on FPGA technology, since new advancements are being made every day. The hottest features are processor cores inside the chip, computation clocks to 500 MHz and above, and lower core voltages to keep power and heat down. Several years ago, dedicated hardware multipliers started appearing and now you’ll find literally thousands of them onchip as part of the DSP initiative launched by virtually all FPGA vendors. High memory densities coupled with very flexible memory structures meet a wide
RADIO HANDBOOK
BGA and flip-chip packages provide plenty of I/O pins to support onboard gigabit serial transceivers and other user-configurable system interfaces.
A/D CONVERSION
FPGAs
DDC
Process Intensity
range of data flow strategies. Logic slices with the equivalent of over ten million gates result from steadily shrinking silicon geometries.
FILTER
ASICs
DEMOD DECODE
DSPs FFT ANALYSIS DECISIONS
Flexibility
New announcements seem Figure 1 | FPGAs bridge the SDR application space. to be coming out every day from chip vendors like Xilinx and Altera in a never-ending game tion-specific algorithms. These are ready of outperforming the competition. to drop into the FPGA design process to help beat the time-to-market crunch and To support such powerful devices, new to minimize risk. design tools are appearing that now open up FPGAs to both hardware and software Using FPGAs for SDR engineers. Instead of just accepting log- Like ASICs, all the logic elements in FPic equations and schematics, these new GAs can execute in parallel. This includes tools accept entire block diagrams as well the hardware multipliers, and you can now as VHDL and Verilog definitions. get over 3,500 of them on a single FPGA. Choosing the best FPGA vendor often hinges heavily on the quality of the design tools available to support the parts. Excellent simulation and modeling tools help to quickly analyze worst-case propagation delays and suggest alternate routing strategies to minimize them within the part. This minimizes some of the tricky timing work for hardware engineers and can save one hours of tedious troubleshooting during design verification and production testing. A new industry of third-party IP core vendors now offer thousands of applica-
This is in sharp contrast to programmable DSPs, which normally have just a handful of multipliers that must be operated sequentially. FPGA memory can now be configured with the design tool to implement just the right structure for tasks that include dual port RAM, FIFOs, shift registers, and other popular memory types. These memories can be distributed along the signal path or interspersed with the multipliers and math blocks, so that the whole signal processing task operates in parallel in a systolic pipelined fashion. 35
Input
Sampling
Input DDC Bits Chan
Decimation Output DUC Interpol.
Model
Channels
Freq (max)
Chan
Range
Bits
71621
3
200 MHz
16
3
2–64K
16 or 24
1
2–512K
2
16
Yes
71641
1 or 2
3.6 or 1.8 GHz
12
1 or 2
4, 8 or 16
16
None
None
None
–
No
71741
1 or 2
3.6 or 1.8 GHz
12
1 or 2
4, 8 or 16
16
None
None
None
–
No
71651
2
500/400 MHz 12/14
2
2–128K 16 or 24
1
2–512K
2
16
Yes
71751
2
500/400 MHz 12/14
2
2–128K 16 or 24
1
2–512K
2
16
Yes
71661
4
200 MHz
16
4
2–64K
None
None
–
Yes
71662
4
200 MHz
16
32
16–8K
24
None
None
None
–
No
71663
4
200 MHz
16
1100
–
–
None
None
None
–
No
71671
None
None
–
None
–
–
4
2–1024K
4
16
No
71771
None
None
–
None
–
–
4
2–1024K
4
16
No
16 or 24 None
Range
Chan. Out BeamOut
Bits former
Figure 2 | Typical Pentek products with factory-installed SDR IP cores. Again, this is dramatically different from sequential execution and data fetches from external memory as in a programmable DSP. As we said, FPGAs now have specialized serial and parallel interfaces to match requirements for high-speed peripherals and buses. As a result, FPGAs have significantly invaded the application task space as shown by the center bubble in Figure 1. They offer the advantages of parallel hardware to handle some of the high process-intensity functions like DDCs and the benefit of programmability to accommodate some of the decoding and analysis functions of DSPs. These advantages may come at the expense of increased power dissipation and increased product costs. However, these considerations are often secondary to the performance and capabilities of these remarkable devices. 36
Figure 2 shows the salient characteristics for some of Pentek’s SDR products with factory-installed IP cores. All these products are available off-the-self and are in the Pentek datasheets and catalogs. Figure 2 provides information regarding the number of input channels, maximum sampling frequency of their A/Ds, and the number of bits. This information is followed by DDC characteristics such as number of DDC channels and the decimation range. Other information that’s specific to each core is included as well as an indication of the models that include a DUC, an interpolation filter and output D/A. As shown in the chart, many of these models include features that are critical for beamforming and direction-finding applications. All the models shown here are XMC modules. As with all Pentek SDR products, these models are also available in PCI Express, OpenVPX, AMC, and CompactPCI formats as well.
RADIO HANDBOOK Virtex-II P VP
Logic Cells 53K–74K Slices* 23K–33K CLB Flip-Flops 46K–66K 4,176–5,904 Block RAM (kb DSP Hard IP 18x18 Multiplier DSP Slices 232–328 Serial Gbit Transceivers N/A PCI Express Support N/A User I/O 852–996
Virtex-4
Virtex-5
55K–110K 24K–49K 48K–98K 4,176–6,768 DSP48 96–512 0–20 N/A 576–960
52K–155K 8K–24K 32K–96K 4,752–8,784 DSP48E 128–640 12–16 N/A 480–680
FX, LX, SX
Virtex-6
LX, SX
LX, SX
Virtex-7 VX
128K–314K 326K–693K 20K–49K 51K–108K 160K–392K 408K–864K 9,504–25,344 27,000–52,920 DSP48q DSP48E 480–1,344 1,120–3,600 20–24 28–80 Gen 2 x8 Gen 2 x8, Gen 3 x8 600–720 700–1,000
*Virtex-II Pro and Virtex-4 Slices actually represent 2.25 Logic Cells; Logic Cells
Figure 3 | FPGA resource comparison. *Virtex-5, Virtex-6 and Virtex-7Slices actually represent 6.4 Figure 3 compares the available resources in the five Xilinx FPGA families that are used in most of the Pentek products. Virtex-II Pro: VP Virtex-4: FX, LX and SX Virtex-5: LX and SX Virtex-6: LX and SX Virtex-7: VX The Virtex-II family includes hardware multipliers that support digital filters, averagers, demodulators, and FFTs—a major benefit for software radio signal processing. The Virtex-II Pro family dramatically increased the number of hardware multipliers and also added embedded PowerPC microcontrollers. The Virtex-4 family is offered as three subfamilies that dramatically boost clock speeds and reduce power dissipation over previous generations. The Virtex-4 LX family delivers maximum logic and I/O pins while the SX family boasts of 512 DSP slices for maximum DSP performance. The FX family is a generous mix of all resources and is the only family to offer RocketIO, PowerPC cores, and the newly added gigabit Ethenet ports.
The Virtex-5 family LX devices offer maximum logic resources, gigabit serial transceivers, and Ethernet media access controllers. The SX devices push DSP capabilities with all of the same extras as the LX. The Virtex-5 devices offer lower power dissipation, faster clock speeds and enhanced logic slices. They also improve the clocking features to handle faster memory and gigabit interfaces. They support faster single ended and differential parallel I/O buses to handle faster peripheral devices. The Virtex-6 and Virtex-7 devices offer still higher density, more processing power, lower power consumption, and updated interface features to match the latest technology I/O requirements including PCI Express. Virtex-6 supports PCIe 2.0 and Virtex-7 supports PCIe 3.0. The ample DSP slices are responsible for the majority of the processing power of the Virtex-6 and Virtex-7 families. Increases in operating speed from 500 MHz in V-4, to 550 MHz in V-5, to 600 MHz in V-6, to 900 MHz in V-7 and continuously 37
increasing density allow more DSP slices to be included in the same-size package. As shown in the chart, Virtex-6 tops out at an impressive 1,344 DSP slices, while Virtex-7 tops out at an even more impressive 3,600 DSP slices. Before we look at SDR and its various implementations in embedded systems, we’ll review a theorem fundamental to sampled data systems such as those encountered in software-defined radios. Nyquist’s Theorem: “Any signal can be represented by discrete samples if the sampling frequency is at least twice the bandwidth of the signal.” Notice that we highlighted the word bandwidth rather than frequency. In what follows, we’ll attempt to show the implications of this theorem and the correct in- 0 fs/2 terpretation of sampling frequency, also known as sampling rate.
38
Zone 1
The resulting spectrum can be seen by holding the transparent stack up to a light and looking through it. You can see that signals on all of the sheets or zones are “folded” or “aliased” on top of each other — and they can no longer be separated. Once this folding or aliasing occurs during sampling, the resulting sampled data is corrupted and can never be recovered. The term “aliasing” is appropriate because after sampling, a signal from one of the higher zones now appears to be at a different frequency. Frequency 3fs/2 2fs
fs
Zone 2
Zone 3
Zone 4
5fs/2
Zone 5
3fs
Zone 6
7fs/2
Zone 7
Figure 4 | A simple technique to visualize sampling. 0
fs/2
fs
3fs/2
2fs
5fs/2
3fs
7fs/2
Energy
A simple technique to visualize sampling To visualize what happens in sampling, imagine that you are using transparent “fan-fold” computer paper. Use the horizontal edge of the paper as the frequency axis and scale it so that the paper folds line up with integer multiples of one-half of the sampling frequency ƒs. Each sheet of paper now represent what we will call a “Nyquist Zone”, as shown in Figure 4.
The basics of sampling Use the vertical axis of the fan-fold paper for signal energy and plot the frequency spectrum of the signal to be sampled, as shown in Figure 5. To see the effects of sampling, collapse the transparent fanfold paper into a stack.
Zone 1
Zone 2
Zone 3
Figure 5 | Sampling basics.
Zone 4
Zone 5
Zone 6
Zone 7
RADIO HANDBOOK
Sampling a baseband signal A baseband signal has frequency components that start at ƒ = 0 and extend up to some maximum frequency.
An advantage to undersampling Undersampling allows us to use aliasing to our advantage, providing we follow the strict rules of the Nyquist Theorem.
To prevent data destruction when sampling a baseband signal, make sure that all the signal energy falls ONLY in the 1st Nyquist band, as shown in Figure 7.
In our previous IF signal example, suppose we try a sampling rate of 40 MHz. 0
Folded Signals Fall On Top of Each Other
There are two ways to do this:
fs/2
1. Insert a lowpass filter to eliminate all signals above ƒs/2, or 2. Increase the sampling frequency so all signals present fall below ƒs/2. Note that ƒs/2 is also known as the “folding frequency.” Sampling bandpass signals Let’s consider bandpass signals like the IF frequency of a communications receiver that might have a 70 MHz center frequency and 10 MHz bandwidth. In this case, the IF signal contains signal energery from 65 to 75 MHz. If we follow the baseband sampling rules above, we must sample this signal at twice the highest signal frequency, meaning a sample rate of at least 150 MHz. However, by taking advantage of a technique called “undersampling”, we can use a much lower sampling rate.
0
fs/2
Figure 6 | Sampling basics. fs
3fs/2
2fs
5fs/2
3fs
7fs/2
No Signal Energy
Zone 1
Zone 2
Zone 3
Zone 4
Zone 5
Zone 6
Zone 7
Figure 7 | Baseband sampling.
0
fs/2
fs
3fs/2
2fs
5fs/2
No Signal Energy
Zone 1
Zone 2
3fs
7fs/2
No Signal Energy
Zone 3
Zone 4
Zone 5
Zone 6
Zone 7
Figure 8 | Analog radio receiver block diagram. 39
RADIO HANDBOOK
Figure 8 shows a fan-fold paper plot with Fs = 40 MHz. You can see that zone 4 extends from 60 MHz to 80 MHz, nicely containing the entire IF signal band of 65 to 75 MHz.
Folded signals still fall on top of each other - but now there is energy in only one sheet !
0
fs/2
Now when you collapse the fan fold sheets as shown in Figure 9, you can see that the IF signal is preserved after sampling because we have no signal energy in any other zone. Also note that the odd zones fold with the lower frequency at the left (normal spectrum) and the even zones fold with the lower frequency at the right (reversed spectrum). In this case, the signals from zone 4 are frequency-reversed. This is usually very easy to accommodate in the following stages of SDR systems. The major rule to follow for successful undersampling is to make sure all of the energy falls entirely in one Nyquist zone. There two ways to do this:
Figure 9 | Analog radio receiver block diagram. Undersampling allows a lower sample rate even though signal frequencies are high, PROVIDED all of the signal energy falls within one Nyquist zone. To repeat the Nyquist theorem: The sampling frequency must be at least twice the signal bandwidth – not the signal frequency. Continue reading the SoftwareDefined Radio handbook, 11th edition at www.pentek.com/go/emagswrhb.
Rodger H. Hosking is Vice President 3. Insert a bandpass filter to eliminate all and Co-Founder of Pentek, Inc. signals outside the one Nyquist zone. 4. Increase the sampling frequency so all signals fall entirely within one Nyquist zone. www.pentek.com Summary @Pentekinc | LinkedIn Baseband sampling requires the sample freFacebook | Google+ | YouTube quency to be at least twice the signal bandwidth. This is the same as saying that all of the signals fall within the first Nyquist zone. In real life, a good rule of thumb is to use the 80 percent relationship: Bandwidth = 0.8 * ƒs / 2 = 0.4 * ƒs 40
COMMUNICATE WITH CONFIDENCE COTS and Full Custom Military Solutions The modern battlefield demands ruggedized equipment that deliver a combat advantage. Sealevel computing and communications solutions are proven effective even in the harshest environments. Choose from an array of COTS products or partner with us to create a custom solution ideally suited to your particular application.
Watch Video
SYNCHRONOUS COMMUNICATIONS
POWERFUL SOLID-STATE PROCESSING
The Relio R2 Sync Server solid-state computer offers designers robust synchronous serial communications, small size, powerful processing, and long product life. Perfect for radar, satellite, and other military applications, the R2 Sync Server's four synchronous ports are individually configurable for RS-232, RS-422, RS-485, RS-530, RS-530A or V.35. The system operates over a wide 0-50°C temperature range and is compliant to MIL-STD-810 specifications for shock and vibration.
sealevel-mil.com • 864.843.4343
Photo Credit: US Air Force
41
RADAR SPONSORS
Š 2015 OpenSystems Media, Š Miltary Embedded Systems. All registered brands and trademarks within the Radar E-mag are the property of their respective owners.