8 RESOURCE GUIDP� DEVELOPMENT KIT SELECTOR
http://embedded-computing.com/designs/iot_dev_kits/
Powered by �,
Jetsolf1X2/TX2i/TX1
�
nVIDIA•
CONNECT TECH INC.
PG 47
Rudi fmbedded System with NVIDIA® Jetson™ TXZ, TXZi or TX1
-
EMPOWERING INNOVATION TOGETHER.
IWHIUifri&; SPONSORED
e} I
(]����
�
�M1cRocH11>
molex
MOUSER ELECTRONICS PG 3 Mouser's Generation Robot Series: Human 2.0
Cyberdyne is developing robotic systems that enable humans to overcome eveiyday challenges. http://bitly/M ouserGenerationRobot
www.windriver.com/automated-to-autonomous
Think Fast. Making fast, intelligent decisions with the ability to always adapt and constantly learn is the fusion of AI and machine learning. Mouser delivers the electronic components and resources you need to answer your AI and machine learning requirements. Freeing you to quickly think of new designs.
mouser.com/generation-robot
SPECIAL SECTION
Brain and Neuronal Control of Robotic Limbs and Exoskeletons
er
ion
ovat
th Toge
g Inn ector n werin ston, Dir unicatio John mm Jack ting Co Marke
Empo
r Edito utive Ray Exec borah S. ors De Auth ing ribut ler Cont Bly ne
John Brow ld Traci Byfie ok e Bruc y S. Co Jerem bay ing Ga Jon n Keep Steve Rao V. R.
or ribut Cont nical Tech ul Golata n Pa uctio Prod r n& Harpe Desig rt
or Robe ribut Cont rial Edito an Stacy Ry anks ial Th Spec With vin Hess ting
rke Ke , Ma Sr. VP ting sor ell Ra Marke Russ pplier Su VP, irovic sign Krajc tive De ifer Jenn tor, Crea nt Direc Conte Yin ond nical Raym tor, Tech Direc
by
1
written The following articles, of on three areas electrical prosthetic V. R. Rao, focus Mechanical and the helped users in progress and advancements systems have indeed to bring brain and but they pose many engineering needed regain function, to of robotic devices users. In this special neuronal control challenges for the cuttingtheir full potential: section, we highlight and of robotics: Brain edge and future and and Neuronalof robotic limbs • Brain-Control neuronal control aim Robotic for advances These Interfaces Control exoskeletons. Orthotics challenges and Prosthetics and to overcome current users gain functionality help to promise for Prostheses intuitively and • Sensor Technology and control prostheses naturally.
5
with and Challenges • Biopotentials Conditioning Biomedical Signal
31
30
FEATURED EBOOK : HUMAN 2.0 Robotics Prosthetics and Exoskeletons Self-healing, Biodegradable E-Skin Brain and Neuronal Control of Robotics
S PO N SORED BY
Authorized Distributor
AD LIST PAGE ADVERTISER 9
ACCES I/O Products – PCI Express Mini Card, mPCIe Embedded I/O solutions
1
EMBEDDED COMPUTING BRAND DIRECTOR Rich Nass rich.nass@opensysmedia.com EDITOR-IN-CHIEF Brandon Lewis brandon.lewis@opensysmedia.com
Connect Tech – Rudi Embedded System
TECHNOLOGY EDITOR Curt Schwaderer curt.schwaderer@opensysmedia.com ASSOCIATE TECHNOLOGY EDITOR Laura Dolan laura.dolan@opensysmedia.com
1 Digikey – Development Kit Selector 51
electronica 2018 & Semicon Europa – November 13-16, 2018. Connecting everything – smart, safe & secure
39
embedded world – Exhibition & Conference ... it's a smarter world
1
Mouser Electronics – Generation Robot Series: Human 2.0
3
ASSISTANT MANAGING EDITOR Lisa Daigle lisa.daigle@opensysmedia.com DIRECTOR OF E-CAST LEAD GENERATION AND AUDIENCE ENGAGEMENT Joy Gilmore joy.gilmore@opensysmedia.com ONLINE EVENTS SPECIALIST Sam Vukobratovich sam.vukobratovich@opensysmedia.com CREATIVE DIRECTOR Stephanie Sweet stephanie.sweet@opensysmedia.com SENIOR WEB DEVELOPER Aaron Ganschow aaron.ganschow@opensysmedia.com WEB DEVELOPER Paul Nelson paul.nelson@opensysmedia.com CONTRIBUTING DESIGNER Joann Toth joann.toth@opensysmedia.com
Mouser Electronics – Empowering innovation together
EMAIL MARKETING SPECIALIST Drew Kaufman drew.kaufman@opensysmedia.com
SALES/MARKETING
5 Vector – VME/VXS/cPCI Chassis, Backplanes & Accessories
SALES MANAGER Tom Varcie tom.varcie@opensysmedia.com (586) 415-6500
52 Virtium – Balance is everything 1
Wind River – Accelerating the evolution of critical infrastructure from automated to autonomous
26
Wind River – AI pushing us to the edge
SOCIAL
MARKETING MANAGER Eric Henry eric.henry@opensysmedia.com (541) 760-5361 STRATEGIC ACCOUNT MANAGER Rebecca Barker rebecca.barker@opensysmedia.com (281) 724-8021 STRATEGIC ACCOUNT MANAGER Bill Barron bill.barron@opensysmedia.com (516) 376-9838 STRATEGIC ACCOUNT MANAGER Kathleen Wackowski kathleen.wackowski@opensysmedia.com (978) 888-7367 SOUTHERN CAL REGIONAL SALES MANAGER Len Pettek len.pettek@opensysmedia.com (805) 231-9582
Facebook.com/Embedded.Computing.Design
SOUTHWEST REGIONAL SALES MANAGER Barbara Quinlan barbara.quinlan@opensysmedia.com (480) 236-8818 ASIA-PACIFIC SALES ACCOUNT MANAGER Helen Lai helen@twoway-com.com BUSINESS DEVELOPMENT EUROPE Rory Dear rory.dear@opensysmedia.com +44 (0)7921337498
@Industrial_ai
LinkedIn.com/in/EmbeddedComputing
youtube.com/user/VideoOpenSystems
EVENTS electronica 2018/Semicon Europa unich, Germany M November 13-16, 2018 electronica.de/index.html
DVCon U.S. San Jose, CA February 25-28, 2019 dvcon.org
4
Industrial AI & Machine Learning RESOURCE GUIDE 2018
WWW.OPENSYSMEDIA.COM PRESIDENT Patrick Hopper patrick.hopper@opensysmedia.com EXECUTIVE VICE PRESIDENT John McHale john.mchale@opensysmedia.com EXECUTIVE VICE PRESIDENT Rich Nass rich.nass@opensysmedia.com CHIEF FINANCIAL OFFICER Rosemary Kristoff rosemary.kristoff@opensysmedia.com GROUP EDITORIAL DIRECTOR John McHale john.mchale@opensysmedia.com VITA EDITORIAL DIRECTOR Jerry Gipper jerry.gipper@opensysmedia.com TECHNOLOGY EDITOR Mariana Iriarte mariana.iriarte@opensysmedia.com SENIOR EDITOR Sally Cole sally.cole@opensysmedia.com CREATIVE PROJECTS Chris Rassiccia chris.rassiccia@opensysmedia.com PROJECT MANAGER Kristine Jennings kristine.jennings@opensysmedia.com FINANCIAL ASSISTANT Emily Verhoeks emily.verhoeks@opensysmedia.com SUBSCRIPTION MANAGER subscriptions@opensysmedia.com CORPORATE OFFICE 1505 N. Hayden Rd. #105 • Scottsdale, AZ 85257 • Tel: (480) 967-5581 REPRINTS WRIGHT’S MEDIA REPRINT COORDINATOR Wyndell Hamilton whamilton@wrightsmedia.com (281) 419-5725
www.embedded-computing.com/ai-machine-learning
�.L..@ i:-J•Ji:�
Since 1947 MADE IN THE USA VME / VXS / cPCI ® Chassis, Backplanes & Accessories
J.., rll 1.;:: ·rE.�rJJ
-
a
I
.
i
�lfr::!.!;J /
I
•
S,J!!JLJP
I' ,: . '· .
I
Chassis and Rack Accessories
-�--�.,,.. "1t c::= .......
•0
Custom Front Panels
.
Mil-1-46058-C Conformal Coating Available for all VECTOR backplanes
Hi-speed VITA ANSINITA 1.1-1997 monolithic or J1 backplanes (Hi current VITA 1.7 compliant) with Electronic Bus-Grant (EBG), surface mount devices, fully tested and certified. MADE in USA, ships in 2-3 days
CONTENTS FEATURES
12
10 Neural network optimization with sparse computing and Facebook Glow
2018 | Volume 1 | Number 1
By Brandon Lewis, Editor-in-Chief opsy.st/ECDLinkedIn
12 Filling the data scientist gap By Seth DeLand, The MathWorks, Inc.
COVER
16 Riding the deep learning wave: Simulations enabling software
The dawn of artificial intelligence is upon us, as engineers race to develop new categories of processors and development tools that support emerging neural networks. Industrial AI & Machine Learning addresses these and other important considerations for the embedded neural network design engineer.
18
engineers to generate data needed to train neural networks By Peter McGuinness, Highwai
18 The Internet of learning Things By Semir Haddad, Renesas 20 Getting back up: Coupling AI and memory class
20
storage saves in a big way By Bill Gervasi, Nantero
24 Adaptive acceleration holds the key to bringing AI from the cloud
WEB EXTRAS
to the edge
Disruptive technology switches sides
By Dale Hitt, Xilinx
28 Micropower intelligence for edge devices
By Tommy Mullane, S3 Semiconductors https://bit.ly/2QcH8MM
By Narayan Srinivasa and Gopal Raghavan, Eta Compute
Educators and industry foster tomorrow’s engineers through student competitions
32 From logistics regression to self-driving cars: Chances and
By Lauren Tabolinsky, MathWorks https://bit.ly/2DMdPzt
32
challenges for machine learning in highly automated driving
8 sensor protocols for your next IoT project
By Sorin Mihai Grigorescu, Markus Glaab, and Andre Roßbach, Elektrobit Automotive
By Sreedevi Vamanan, Embitel https://bit.ly/2N5jkIQ
40 2018 RESOURCE GUIDE
Published by:
COLUMNS 7
Machine learning starts with the algorithms By Rich Nass, Brand Director
8
AI challenges & opportunities in the automotive industry
2018 OpenSystems Media® © 2018 Embedded Computing Design © 2018 Industrial AI and Machine Learning All registered brands and trademarks within Embedded Computing Design and Industrial AI and Machine Learning magazines are the property of their respective owners.
By Curt Schwaderer, Technology Editor enviroink.indd 1
6
@Industrial_ai
Industrial AI & Machine Learning RESOURCE GUIDE 2018
10/1/08 10:44:38 AM
www.embedded-computing.com/ai-machine-learning
Machine learning starts with the algorithms By Rich Nass, Brand Director
Rich.Nass@opensysmedia.com
There are lots of different ways to look at machine learning, which is the ability for a computing device to make decisions based on actions and conditions. Some look at it from the very starting point: the initial software and algorithms that are run on the hardware to make the whole process work. Some areas that are currently taking advantage of machine learning include big data, like for SEO and other analytics. There’s also a lot of talk (albeit with less action) in the industrial-automation space for predictive maintenance. For example, we put systems at the edge to learn what normal behavior looks like, then monitor performance and raise a flag if something abnormal is observed. It’s fair to say that the real key to accurate and useful machine learning consists of assembling the right combination of algorithms, compilers, and hardware architecture. If you don’t have the right components in any of those three areas, machine learning won’t work as it should. For example, if you don’t start with an algorithm that can be parallelized, you won’t get very far. Similarly, if your hardware doesn’t support parallelism to handle the intense computations, that’s a nonstarter. And the compiler, which sits in the middle, must provide the right bridge.
That’s why the combination of the three aspects mentioned earlier (algorithms, compilers, and hardware architecture) is so significant. If there’s even a slight error somewhere in the sequence, it will be magnified over time, resulting in a large error. This situation is simply unacceptable in machine-learning applications. To ensure that information is returned in real time, your choices may be to either reduce the required precision or increase the amount of processing power thrown at the problem. In general, neither of these options are good ones. Going forward, we’ll see more applicationspecific, rather than general-purpose, models. Vision is a good example of that, where the hardware-software combination can be tuned to handle the vision algorithms. Also, we’ll see changes in what computations are handled at the edge rather than in the cloud. “It’s always the software that’s the big issue here,” says Allen. “Lots of people are coming up with hardware that takes lots of different approaches. That hardware is only useful if the programmer can get at it. And that’s where the compilers and algorithms come in. If you don’t have the right set of tools to go and utilize it, it doesn’t matter how good the hardware is.”
A lot of the do’s and don’ts are still being worked out, as machine learning can be a very inexact science. Therefore, lots of people are trying to develop the tools that address these issues. The real-time nature of the majority of machine-learning applications compounds the problem, making this task significantly more difficult. According to Randy Allen, Director of Advanced Research for Mentor Graphics, “Machine-learning problems are going to boil down to a matrix multiplication. This consists of two phases, training and using. In the training, you generate a sequence of large matrix multiplications that are continuously repeated.” www.embedded-computing.com/ai-machine-learning
Mentor’s formula is to optimize things so you can work in a noncloud environment by optimizing performance at the edge. This situation can be achieved with what it calls “data-driven hardware,” which definitely doesn’t mean just throwing more processing power at the problem. Allen adds, “We use an entirely different set of algorithms to optimize machine learning. And that’s not something the hardware guys typically consider when they’re developing an interface to the software. That’s where we can assist.” IAI
Industrial AI & Machine Learning RESOURCE GUIDE 2018
7
AI challenges & opportunities in the automotive industry By Curt Schwaderer, Technology Editor
Curt.Schwaderer@opensysmedia.com
The automotive industry has seen significant advances in driver assist capabilities. In May 2018, a wireless technology company called Metawave announced a $10 million additional seed investment from strategic investors including Hyundai, Toyota, Denso, and Infineon. The Metawave announcement included an all-in-one radar sensor with integrated artificial intelligence (AI) edge processing to operate seamlessly with the sensor fusion module for self-driving vehicles, which includes camera, lidar, and radar technologies. Since that time, Metawave has announced that it is incorporating AI to further advance autonomous driving. I caught up with Dr. Maha Achour, CEO of Metawave; and Dr. Matt Harrison, Metawave’s first AI engineer, to discuss AI in general and the application of AI to autonomous driving. What kinds of signs should stakeholders look for when their environment would benefit from AI? “It breaks down to how you define AI,” Dr. Harrison began. “A lot of it is data collection and monitoring. But that’s not complete machine learning. Neural networks and advanced machine learning require a very large amount of high-quality data. If, as a stakeholder, you’re hearing the analysts are being flooded with more data than they can analyze, you’re ready for AI.” Harrison also mentioned the value of envisioning the use case to ensure the ability to deploy the AI algorithm in a meaningful way. Prior to applying AI, sit down and map the problem to be solved to an established machine-learning algorithm as a starting point. Harrison said as people come up the learning curve, starting with completely new approaches is dangerous. “The barrier to entry is lower than you’d expect,” Harrison said. “For example, TensorFlow is an open source machine-learning framework comprised of a C library accessed using Python. Deep-learning application typically boils down to designing the right computational graph, defining operations on input to output, then using the series of operations to design the forward graph. The machine-learning framework can then be used for the back computation. This will result in output that can be used for training the AI.” Once you determine you need AI, how do you get started? Resources, foundational things you should know, etc. I was surprised when Dr. Harrison mentioned all of the state-of-the-art AI algorithms that are published and the tools that are available. If you’re an experienced programmer, you can sit down with the tutorials for training. Online courses are also available. You should have a broad general knowledge of neural networks and information on the general architecture you’re looking to implement. The trick is experienced AI engineers that can develop the algorithms for the use case and provide the right input to the learning engine in order for the AI to be trained properly. Harrison mentions that it’s invaluable to have someone who has done these things before in order to avoid “garbage in, garbage out” scenarios that are common
8
Industrial AI & Machine Learning RESOURCE GUIDE 2018
occurrences with less experienced AI engineers. This is why starting with an established solution and someone experienced in AI can be valuable until more depth and experience can be gained. Does AI need to be designed in from the start or is it something that, when you know where the AI comes in, you can design in later? Harrison posited that you can take an established system, then design an AI add-on. Most applications don’t need to incorporate AI from the beginning. However, if the data streams you need for AI aren’t accessible in an easy way, then refactoring may be needed. “The engine needs to see the information required for the AI to operate,” Harrison said. “If you have long records or full, accurate data, you can use it post facto for training. There is an element of trying to replace something that once required a human – image object recognition, for example. There are other tasks that are not easy for humans but can be easy for a machine-learning system. A good example of this is the use of principal component analysis. The basic idea is if your data exists in a high dimensional space, I’m going to condense that large set of numbers into a smaller set of numbers that is still representative of the data set. This is commonly done as a preprocessing step to machine-learning algorithms. These boiled-down things mean something to the AI, but wouldn’t be able to be interpreted for a human.” How has Metawave applied these concepts to autonomous driving? Dr. Achour had some fascinating insights: “The auto industry is trending from driver assistance to semidriver to fully autonomous. This is in response to the next generation’s general disinterest with driving. This trend impacts car companies and their business model. They won’t be selling cars but will be selling
www.embedded-computing.com/ai-machine-learning
miles. Of course, safety becomes a major focus with this model. The winners will be the high-quality, high-safety environments. High safety requires driving focus, great vision, clear hearing, and fast reaction. Right now, everything is driven by the camera. It’s the best sensor with the highest resolution. But cameras are limited to anywhere between 70 to 100 meters. Image capture is important, but you must also be able to classify the objects for autonomous driving in order to understand those objects’ behavior.” Dr. Achour went on to describe environmental challenges. “Being able to operate in low light, bright light, and bad weather is also important. Lidar is not capable of operating in bad weather conditions. We also found that operating in Florida, where you have a lot of mosquitoes, can even be a problem. Dirty roads are also a challenge. This is where radar excels. With its 3.8 mm wavelength, sub-one-degree resolution can be reached at long ranges with specific power allocated. However, the problem with today’s radar is that it can only detect the object, distance, and speed. But this isn’t sufficient for an autonomous driver. You need to know what the object is – a bike? A person walking? An animal? Object classification becomes critical.”
Final thoughts The key to effective AI integration is mixing the proper available, established algorithms, and tools with enough core AI experience to know how to effectively design and train the AI. Additional factors involve space, power consumption, and AI decision latencies, all of which must be considered when building the AI system. The machinelearning and training phase is typically best done in parallel, leveraging high-powered cloud environments, due to the sheer amount of data required to train the AI. Finally, understanding the Internet of Things (IoT) application from deployment and given the computational resources, helps to deliver the determined capability. AI is evolving and advancing. But Dr. Harrison emphasized that this is not Hollywood, where some sentient computer will be taking over the world soon. “AI is different than what many people think – it’s a machine that’s crunching computation for the purposes of feeding a decision-making engine. These environments are not going to take over the world anytime soon.” IAI
Dr. Achour described the challenges with training the radar. A number of aspects fall under the AI engine for autonomous driving, like melding computer vision with sequence of radar scans to pass a list of objects, categories, and velocities onto the driver AI. Radar has a stream of data that adds critical information to solve the autonomousdriver AI challenge. One example is radial velocity. You’ll get reflections from road and signs and everything running at a different velocity in reference to the ground. This is where you need to do a whole new class of AI operations on data that lidar cannot provide. Another key is building into the AI the ability to shape and steer the beam in a similar way to how drivers’ eyes continually scan and also use peripheral vision. This capability is another key to the Metawave combined sensor module, Dr. Achour said. www.embedded-computing.com/ai-machine-learning
Industrial AI & Machine Learning RESOURCE GUIDE 2018
9
DEEP LEARNING FRAMEWORKS, ALGORITHMS & SOFTWARE
Neural network optimization with sparse computing and Facebook Glow By Brandon Lewis, Editor-in-Chief, Embedded Computing Design “The Mostly Complete Chart of Neural Networks, Explained,” published in August of 2017 (https://bit.ly/2viD8EP), identified 27 different types of neural networks. The number of neural network types has increased since then as engineers and data scientists seek out optimized implementations of artificial intelligence (AI) for their use cases.
Processor architects have been scrambling to deliver novel compute platforms capable of executing these workloads, but efforts are also being made in software to improve the efficiency of neural networks. One such technique is pruning, or the removal of duplicate neurons and redundancies from a neural network to make it smaller and faster.
FIGURE 1
The graph on the left densifies neurons with active connections into yellow cubes, while the one on the right depicts sparse, random connections distributed throughout a neural network.
Pruning in general is an important enabler for deep learning on embedded systems, as it lowers the amount of computation required to achieve the same level of accuracy. Pruning techniques can also be taken a step further to minimize specific inefficiencies of neural networks, such as sparsity.
Systems. “On average, current neural networks exhibit 50 percent sparsity in activation from input to output.”
Sparsing through neural networks In a neural network graph, sparsity refers to neurons that contain a zero value or no connection to adjacent neurons (Figure 1). This is an inherent characteristic of the hidden matrices that exist between the input and output layers of a deep neural network, as a given input typically activates fewer and fewer neurons as data passes through the graph. The fewer connections in the neural network matrix, the higher the sparsity.
Once a neural network has been pruned for increased sparsity, compute technologies such as Cadence’s Tensilica DNA 100 Processor IP can take advantage by performing multiply-accumulate (MAC) operations only against nonzero values (Figure 2). This is possible thanks to an integrated sparse compute engine, which includes a direct memory access (DMA) subsystem that reads values before passing executables along to the processing unit.
“As you go through these layers and multiply a 1 with a 0 you get a 0, so the sparsity in the activation of neurons increases as you progress through layers,” says Lazaar Louis, Senior Director and Head of Marketing and Business Development for Tensilica products at Cadence Design
10
Rather than using processor and memory resources to compute zero values of a sparse neural network, pruning the network to induce sparsity can be turned into a computational advantage: Pruning for sparsity can be achieved by forcing near-zero values to absolute zero, which, along with model retraining, can increase sparsity to 70 or so percent.
Overall, operation this provides higher MAC utilization for DNA 100 IP that results in a performance increase of 4.7 times over alternative solutions with similar array sizes. Glowing compilation Like any embedded processor, the DNA 100 leverages a compiler to help it interpret the sparsity of neural network graphs. The Tensilica Neural Network Compiler takes floating-point outputs from deep learning development frameworks such as Caffe, TensorFlow, TensorFlow Lite, and the Android Neural Network (ANN) app, and quantizes them into integers and machine code optimized for Cadence IP. The compiler also assists with pushing near-zero weights to zero and, where possible, fuses multiple neural network layers into individual operations. These capabilities are critical to improving neural network throughput on devices like the DNA 100 while
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
FIGURE 2
The Cadence Design Systems DNA 100 Processor IP incorporates a sparse compute engine that only executes multiplyaccumulate (MAC) operations with nonzero values, thereby increasing throughput.
also maintaining accuracy within 1 percent of the original floating-point model. Despite the advantages of the Tensilica Neural Network Compiler, however, engineers are already challenged by a growing number of neural network types, deep learning development frameworks, and AI processor architectures. As this trend continues, developers will seek compilers that enable them to use the widest selection of neural network types and tools on the most diverse range of processor targets. Vendors like Cadence, on the other hand, will require solutions that allow them to support the evolution of technologies further up the stack. In response to this market need, Facebook has developed Glow, a graph lowering machine learning compiler for heterogeneous hardware architectures that’s based heavily on the LLVM compiler infrastructure. The objective of Glow is to accept computation graphs from frameworks like PyTorch and generate highly optimized code for multiple hardware targets using math-related optimizations. The tool does so by lowering neural network dataflow graphs into an intermediate representation and then applying a two-phase process (Figure 3). The first phase of Glow’s intermediate representation allows the compiler to perform domain-specific improvements and optimize high-level constructs based on the contents of the neural network dataflow graph. At this stage, the compiler is target-independent. In the second phase of the Glow representation, the compiler optimizes instruction
FIGURE 3
Facebook’s Glow compiler is a hardware-agnostic compiler that uses a two-phase process to optimize neural networks for embedded compute accelerators.
scheduling and memory allocation before generating hardware-specific code. Because of its incremental lowering phase and the fact that it supports a large number of input operators, the Glow compiler is able to take advantage of specialized hardware features without implementing all of the operators on every supported hardware target. This not only reduces the amount of memory space required, but also makes it extensible for new compute architectures that only focus on a few linear algebraic primitives. Esperanto Technologies, Intel, Marvell, Qualcomm, and Cadence have already committed to Glow in future silicon solutions. “Facebook Glow allows us to quickly optimize for technologies that are yet to come,” Louis says. “Let’s say there’s a new network that’s introduced. Either somebody will contribute that, or we will do it. They want to enable an open source community, so people can come in and contribute and accelerate things that are generic. “They’ve also introduced the capability in Glow to plug various accelerators in, so we can make it fit our architecture,” Louis continues. “We see Glow as the underlying engine in our compiler [moving forward].” Embedded neural networking: Less is more Pruning neural networks is quickly becoming a common practice for neural network developers as they attempt to improve performance without sacrificing accuracy. Meanwhile, Facebook Glow is addressing processor fragmentation before it can deter AI adoption. Using the one right tool extremely well usually leads to success; for neural networks, using the Glow compiler and sparse computing techniques could be what gets the job done. IAI
www.embedded-computing.com/ai-machine-learning
Industrial AI & Machine Learning RESOURCE GUIDE 2018
11
DEEP LEARNING FRAMEWORKS, ALGORITHMS & SOFTWARE
Filling the data scientist gap By Seth DeLand, The MathWorks, Inc.
A shortage of data scientists means that companies are struggling to fill this void – but this isn’t new information in the data science space. Companies are looking for data scientists who have computer science skills, knowledge of statistics, and domain expertise relevant to their specific business problems. These types of candidates are proving elusive, but companies may find success by focusing on those that possess the domain expertise skill. This third skill – domain expertise about the business – is often overlooked. Domain expertise is required to make judgment calls during the development of an analytic model. It enables one to distinguish between correlation and causation, between signal and noise, between an anomaly worth further investigation and “oh yeah, that happens sometimes.” Domain knowledge is hard to teach: It requires on-the-job experience, mentorship, and time to develop. This type of expertise is often found in engineering and research departments that have built cultures around understanding the products they design and build. These teams are intimately familiar with the systems they work on. They often use statistical methods and technical computing tools as part of their design processes, making the jump to the machine-learning algorithms and big data tools of the data analytics world manageable.
12
With data science emerging across industries as an important differentiator, these engineers with domain knowledge need flexible and scalable environments that put the tools of the data scientist at their fingertips. Depending on the problem, they might need traditional analysis techniques such as statistics and optimization, dataspecific techniques such as signal processing and image processing, or newer capabilities such as machine-learning algorithms. The cost of learning a new tool for each technique would be high, so having these tools together in one environment becomes very important. Staying current and flexible A natural question to ask is, how can newer techniques like machine learning be made accessible to engineers with domain expertise? Let’s dive a little deeper into the technology to come up with an approach. The goal of machine learning is to identify the underlying trends and structure in data by fitting a statistical model to that data. When working with a new data set, it’s hard to know which model is going to work best; there are dozens of popular models to choose from (and thousands of less-popular choices). Trying and comparing several different model types can be very time-consuming using “bleedingedge” machine-learning algorithms. Each of these algorithms will have an interface that is specific to the algorithm and preferences of the researcher who developed it. Significant amounts of time will be required to try many different models and compare approaches.
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
analysis that would previously have required more expertise, enabling them to deliver advanced analytics without having the skills that characterize data scientists.” As technology continues to evolve, organizations must quickly ingest, analyze, verify, and visualize a tsunami of data to deliver timely insights to capitalize on business opportunities. Instead of spending time and money searching for those elusive data scientists, companies can stay competitive by enabling their engineers to do data science with a flexible tool environment like MATLAB that enables engineers and scientists to become data scientists – opening up access to the data for more people.
DOMAIN KNOWLEDGE IS HARD TO TEACH: IT REQUIRES ON-THE-JOB EXPERIENCE, MENTORSHIP, AND TIME TO DEVELOP. Diving into data analytics technologies The tsunami of data provides businesses an opportunity to optimize processes and provide differentiated products. A new set of algorithms and infrastructure has emerged that allows businesses to use key data analytics techniques such as big data or machine learning to capitalize on these opportunities. Additionally, this new infrastructure behind big data or machine learning leads to a host of different technologies that support the iterative process of building a data analytics algorithm. It’s this beginning stage of the iterative process of building the algorithm that can set a business up for success. This iterative process involves trying several strategies like finding other sources of data and different machine-learning approaches and feature transformations. One solution is an environment that makes it easy for engineers to try the mosttrusted machine-learning algorithms and that encourages best practices such as preventing over-fitting. For example, the process engineers at a large semiconductor manufacturing company were considering new ways to ensure alignment between the layers on a wafer. They came across machine learning as a possible way to predict overlay between layers but, as process engineers, they didn’t have experience with this newer technique. Working through different machine-learning examples in MATLAB, they were able to identify a suitable machine-learning algorithm, train it on historical data, and integrate it into a prototype overlay controller. The flexible MATLAB environment allowed these process engineers to apply their domain expertise to build a model that can identify systematic and random errors that might otherwise go undetected. According to Gartner, engineers with the domain expertise “can bridge the gap between mainstream self-service analytics by business users and the advanced analytics techniques of data scientists. They are now able to perform sophisticated
Given the potentially unlimited number of combinations to try, it is crucial to iterate quickly. Domain experts are well suited to iterate quickly, as they can use their knowledge and intuition to avoid approaches that are unlikely to give strong results. The faster an engineer with domain knowledge can apply their knowledge with the tools that enable quick iterations, the faster the business can gain a competitive advantage. But before diving into the technologies that support this activity, let’s first walk through an example of this iterative process and some questions to ask along the way. Iterating on data sets A prosthetics company knows that it could build smarter prosthetics if it knew what activity its customer would be doing (standing, sitting, walking, etc.). So, the first question it asks is: What data could we use to determine this? The engineers at the company know that most of their customers have smartphones, so they would like to use the data from the smartphone’s sensors to determine their activity. Engineers at the company begin by logging data from the accelerometer. They apply a machine-learning algorithm directly to the data, but find the results aren’t as good as they hoped. The iterative process begins, with the engineers then asking: Are there additional ways we could prepare the data for machine learning that might give better results? The company’s engineers apply signal processing techniques to extract frequency content from the sensor data and try the machine-learning techniques again. The results are better but not quite there yet, so they ask: Are there other sources of data we could use to improve our predictions? They decide to also log gyroscope data from the smartphones and combine this with the accelerometer data. Training their machine-learning models again, they are now happy with the results, and move to production.
www.embedded-computing.com/ai-machine-learning
Industrial AI & Machine Learning RESOURCE GUIDE 2018
13
DEEP LEARNING FRAMEWORKS, ALGORITHMS & SOFTWARE
Other questions an engineer in the iterative process might ask include: ›› What data is available? ›› Are there other data sources? ›› What types of processes could be used to extract high-level information from the data? ›› Where is the model going to run in production? ›› Are certain types of misclassification costlier than others? ›› How can we experiment quickly to validate ideas and answer the above questions? Now that you’ve seen an example of the iterative process and questions to ask, what about the technologies behind this process? Iterating on big data As more and more data gets generated, systems need to evolve to process it all. In this “big data” space, two large
projects have reshaped the landscape: Hadoop and Spark. Both projects are part of the Apache Software Foundation. Together, they have made it easier and cheaper to store and analyze large amounts of data. These technologies can greatly impact an engineer’s work. For engineers accustomed to working with data in files on desktop machines, on network drives, or in traditional databases, these new tools require a different way of accessing the data before analysis can even be considered. In many cases, artificial data silos and inefficiencies can be created, such as when someone needs to be contacted to pull data out of the big data system each time a new analysis is performed. Another challenge engineers face when working with big data is the need to change their computational approach. When data is small enough to fit in memory, the standard workflow is to load the data in and perform computation; the computation would typically be fast because the data is already in memory. But with big data, there are often disc reads/writes, as well as data transfers across networks, which slow down computations. When engineers are designing a new algorithm, they need to be able to iterate quickly over many designs. The result is a new workflow that involves grabbing a sample of the data and working with that locally, enabling quick iterations and easy usage of helpful development tools such as debuggers. Once the algorithm has been vetted on the sample, it is then run against the full data set in the big data system The solution for these challenges is a system that lets engineers use a familiar environment to write code that runs both on the data sample locally and on the full data
OpenSystems Media works with industry leaders to develop and publish content that educates our readers. Automotive Functional Safety: No Hiding Place By Blackberry and QNX The occurrence of hardware and software errors in such high-reliability applications as autonomous driving can negatively affect the system’s safety. However, hardware diagnostics on its own is not enough to detect every error in today’s high-performance hardware. Learn how real-time software checking paired with hardware diagnostics can be an efficient and complete means of verifying system operation in safety-critical end uses.
www.embedded-computing.com/ white-paper-library/automotivefunctional-safety-no-hiding-place
14
Industrial AI & Machine Learning RESOURCE GUIDE 2018
Check out our white papers at www.embedded-computing.com/white-paper-library www.embedded-computing.com/ai-machine-learning
set in the big data system. Tools such as MATLAB establish connections to big data systems such as Hadoop. Data samples can be downloaded, and algorithms prototyped locally. New computational models that utilize a deferred evaluation framework are used to run the algorithm on the full data set in a performanceoptimized manner. For the iterative analysis that is common to engineering and data science workflows, this deferred evaluation model is key to reducing the time it takes for an analysis to complete on a full data set, which can often be on the order of minutes or hours. Big data technologies have been a key enabler in the growth of data science. With large amounts of data collected, new algorithms were needed to reason on this data, which has led to a boom in the use of machine learning. Machine learning Machine learning is used to identify the underlying trends and structures in data. Machine learning is split up into unsupervised learning and supervised learning. In unsupervised learning, we try to uncover relationships in data, such as groups of data points that are all similar. For example, we may want to look at driving data to see if there are distinct modes that people operate their cars in. From cluster analysis, we may discover different trends such as city versus highway driving or, more interestingly, different styles of drivers (e.g., aggressive drivers).
less popular). However, it’s hard to know which one of these algorithms will be best for the particular problem you are working on. Often, the best thing to do is to just try them out and compare results. This can be quite the challenge in some environments, as researchers build algorithms with different interfaces depending on their problem and preferences. Mature machine-learning tools have a consistent interface for the various algorithms and make it easy to quickly try different approaches. This is critical for domain experts performing data science because it enables them to identify “quick wins” where machine learning provides improvement over traditional methods. This approach also prevents them from spending days or weeks tuning a machine-learning model to a data set that is not well-suited for machine learning. Tools such as MATLAB address this problem by providing point-and-click apps that train and compare multiple machine-learning models. Iterate faster Combined, big data and machine learning are poised to bring new solutions to longstanding business problems. The underlying technology, in the hands of domain experts who are intimately familiar with these business problems, can yield significant results. For example, engineers at Baker Hughes used machine-learning techniques to predict when pumps on their gas and oil extraction trucks would fail. They collected nearly a terabyte of data from these trucks, then used signal processing techniques to identify relevant frequency content. Domain knowledge was crucial here, as they needed to be aware of other systems on the truck that might show up in sensor readings, but that weren’t helpful at predicting pump failures. They applied machinelearning techniques that can distinguish a healthy pump from an unhealthy one. The resulting system is projected to reduce overall costs by $10 million. Throughout the process, their knowledge of the systems on the pumping trucks enabled them to dig into the data and iterate quickly. Platforms for developing analytics offer ways to package machine-learning algorithms to run in different production environments. Look for a tool that provides integration paths and application servers for use with common IT systems and also targets embedded devices. For example, MATLAB provides deployment paths for integrating analytics with programming languages commonly used in IT systems (e.g., Java and .NET), as well as converting analytics to standalone C code that can be run on embedded devices. Both deployment options are accessed through point-and-click interfaces, making them appealing for engineers with domain knowledge. By automating the process of converting the analytic to run in production systems, these tools significantly reduce the time for design iterations.
In supervised learning, we are given input and output data, and the goal is to train a model that, given new inputs, can predict the new outputs. Supervised learning is commonly used in applications such as predictive maintenance, fraud detection, and facial recognition in images.
Technologies that enable domain experts to apply machine-learning and other data analytics techniques to their work are here to stay. They provide exciting opportunities for engineering teams to innovate – in both their design workflows and the products they create. It does not appear that the shortage of data scientists will be addressed anytime soon. Domain experts will play a crucial role in filling this gap. Their knowledge of the business and the products it produces positions them well to find innovative ways to apply data analytics technologies. IAI
Each of the areas in machine learning – unsupervised learning and supervised learning – have dozens of algorithms that are popular (and hundreds that are
Seth DeLand is an application manager at MathWorks for data analytics. Before that, he was product manager for optimization products. Prior to joining MathWorks, Seth earned his BS and MS in mechanical engineering from Michigan Technological University.
The MathWorks, Inc. www.mathworks.com
@MATLAB
www.linkedin.com/company/the-mathworks_2
www.embedded-computing.com/ai-machine-learning
www.facebook.com/MATLAB
GOOGLE PLUS
https://plus.google.com/+matlab
Industrial AI & Machine Learning RESOURCE GUIDE 2018
15
DEEP LEARNING FRAMEWORKS, ALGORITHMS & SOFTWARE
Riding the deep learning wave:
Simulations enabling software engineers to generate data needed to train neural networks By Peter McGuinness, Highwai The vision of software solutions evolving from exposure to data has some compelling aspects. Training by example offers the possibility of a true mass manufacturing technique for software.
About six years ago, there was a major shock in a somewhat obscure corner of the computing world, when a team from the University of Toronto won the Imagenet Challenge using a convolutional neural network that was trained, rather than designed, to recognize images. That team, and others, went on to not only beat out the very best detection algorithms but also to outperform humans in many image-classification tasks. Now, only a few years later, it seems that deep neural networks are inescapable. Even in 2012, machine learning was not new; in fact, pretty much all classification software up to that point used some training. But they all depended to some degree on human-designed feature-extraction algorithms. What made this new solution – later dubbed AlexNet after the lead researcher – different was that it had no such humandesigned algorithms and achieved its results purely from supervised learning. The impact of this revelation on the entire field of computing has already been huge in areas very far removed from image classification. Moreover, the changes it has brought are predicted to be even more profound in the future as researchers learn how to apply deep learning techniques to more and more problems in an ever-growing number of fields. Enthusiasm for deep learning has even led some commentators to predict the end of classical software authoring
16
that depends on designed algorithms, to be replaced by networks trained on vast quantities of data. This vision of software solutions evolving from exposure to data has some compelling aspects: training by example offers the possibility of a true mass manufacturing technique for software. Currently, software manufacturing is in a preindustrial phase where every application is custom-designed, rather like coach-built automobiles. With a standard algorithmic platform (the network) and automated training environments, deep learning could do for software what Henry Ford did for automobile manufacturing. Whether or not you agree with this vision, the key feature of deep learning is that it depends on the availability of data, and therefore domain-specific expertise becomes less important than ownership of the relevant data. As expressed by deep learning pioneer Andrew Ng: “It’s not the person with the best algorithm that wins, it’s the person with the most data.” This is the central problem now faced by companies wanting to transition to the new paradigm: Where do they get the data? The role of behavioral data For companies that depend on online behavioral data, the answer is obvious; the recording, tracking, and reselling of all our browsing habits is now so ubiquitous that the overhead of it all dominates our online experiences. For companies that deal more closely with the real world, the solutions are less convenient. Waymo, the bestknown name in autonomous vehicles, has addressed this problem by deploying fleets of instrumented cars to map out localities and record real-world camera, radar, and other data that it then feeds into its perception software. Other players in that space have followed suit in a smaller way but even Waymo, with millions of miles driven and vast amounts of data available to it, finds it inadequate for the task. To begin with, not all data is equal: To be useful it must be accurately and thoroughly annotated. That remains an expensive, error-prone business, even today. After several years of efforts to automate the process, Amazon’s Mechanical Turk is still the go-to method of annotating data. As well as being annotated, data must also be relevant and that is a major problem when relevance is determined by how uncommon, dangerous, or outright illegal any given occurrence is. Reliable, relevant ground-truth data is so hard to come by that Waymo has taken to building its own mock cities out in the desert where it can simulate the behaviors it needs under controlled conditions. In a world where Hollywood can produce CGI scenes that are utterly convincing, it must be possible to use that sort of capability to create training data for real world
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
scenarios and – of course – it is. The industry has been moving in this direction for a few years, with one team of researchers developing a method for annotating sequences from the game Grand Theft Auto. In addition, Udacity has an open source project for a selfdriving car simulator as part of its selfdriving car nanodegree. Like the Udacity example, most of the available simulators are aimed at implementing a verification loop for testing trained perception stacks rather than producing data primarily intended for the training itself. Those data simulators that do exist are closely held by the auto companies and their startup competitors, demonstrating the fundamental value of the data they produce. So: Is it true that synthetic data can be successfully used to train neural networks? How much and what sort of data are needed to do the job? What is KITTI? Palo Alto-based Highwai has published the results of its pilot study that uses the KITTI data set as a jumping-off point to examine the gains that are possible with a completely synthetic data set used to augment the annotated images available from KITTI. [The KITTI data set is a project of Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago.] (Figure 1.) The training images were produced using Highwai’s Real World Simulator and include a number of sequences taken from downtown urban and residential suburban scenes populated with a variety of vehicles, pedestrians, and bicycles. The purpose was object detection and classification rather than tracking so capture frame rate was set low to enable the capture of a wide variety of images while keeping data set size moderate. Images were captured over a range of conditions, including camera height and field of view, lighting and shadowing variations due to time of day, and atmospheric effects such as fog and haze. Although Highwai’s tools support LIDAR, only visible-light camera data was captured in this instance. Annotations included categories such as “pedestrian,” “car,” and “bicycle” and a screen-space 2D bounding box was the only type of annotation used.
FIGURE 1
The KITTI data set was used as a starting point to look at the gains possible with a synthetic data set augmenting annotated images.
The data was prepared for training using Highwai’s Data Augmentation Toolkit to add camera sensor and image compression noise, to add “distractor” objects to the images, and to desensitize the training to color. At the end of this process, the total size of the synthetic data set was 54,443 objects in 5,000 images. (This compares to 37,164 objects and 7,000 images in the original KITTI data set). The total time taken to produce the data, augment it, and add it to the training data set was under two hours. The base network used was a Faster RCNN Inception Resnet pretrained as an object detector on the Common Objects in Context (COCO) data set, with supplementary retraining done twice; first using only the KITTI data set to produce a baseline and then with the KITTI and Highwai synthetic data sets combined. Testing done on the KITTI reference test data set, which contains only real-world images, showed significant gains in performance between the KITTI-only and KITTI plus synthetic training. The addition of the synthetic data increased recognitions by 5.9 percent overall, with detection of cars and pedestrians improving significantly more – a result that is unsurprising since the Highwai synthetic data set concentrated on those object types. The question of how much training data are needed has no good answer but Highwai points to highly targeted data curation as essential to keeping this within reasonable bounds. A good example is a data set they created for an undisclosed objectdetection project where the total amount of image and annotation data actually used for training came to about 15 GB. An initial total of approximately 12,000 images containing around 120,000 annotated objects was auto-curated down from an original set of 30,000 images and 500,000 annotated objects. Results like these are important to independent software makers as well as to system integrators and OEMs. Sure, they can use Amazon’s services to help train networks, but if the value lies in the data, then commercial viability demands that they are able to create IP in that area – they must be able to create their own training data using their own domain expertise to specify, refine, and curate the data sets. This result means that the emergence of a tools industry aimed at the production of such IP is a significant step and one that will be welcomed. We can expect to see rapid development of expertise in the use of synthetic training data and equally rapid development in the tools to produce it. IAI Peter McGuinness is CEO and co-founder of Palo Alto, CA-based Highwai, a privately held company that develops and markets simulators to train neural networks for very high accuracy object recognition. Readers may reach the author at peter@highwai.com.
www.embedded-computing.com/ai-machine-learning
Highwai
www.highwai.com
Industrial AI & Machine Learning RESOURCE GUIDE 2018
17
EMBEDDED NEURAL NETWORK PROCESSING
The Internet of learning Things By Semir Haddad, Renesas Like ants or honeybees, large groups of robots – like Harvard’s demonstrated 1,024-robot swarm – could achieve remarkable colony-level feats like transporting large objects or autonomously building human-scale structures. Photo courtesy Self-Organizing System Research Group, Harvard University.
When did artificial intelligence (AI) become popular again? It can be dated back to March 2016, when AlphaGo, the neural network-powered AI by Google beat the Korean champion Lee Sedol at the game of Go, which had the reputation of being so complex that no machine could ever solve its puzzle. We are now in full hype mode about AI, with even the popular press embracing the hopes and fears generated around intelligent machines. Very soon, your barber will talk to you about AI. This will be the end of the cycle. This hype reminds me of another hype that we are all familiar with: The Internet of Things, or IoT. As a matter of fact, when you plot the two popularity curves (Figure 1) for the search terms IoT and artificial intelligence on Google Trends, they follow a very similar pattern.
high-performance consumer devices, appearing in smartphones, semi-autonomous cars, gaming consoles, TVs, and smart speakers. While technically “at the edge” and often classified as “IoT,” these devices are peculiar in the IoT species due to the fact that they still require lots of power, performance, and memory and are connected to the internet with high-bandwidth links.
While several IoT startups have already emerged and died, the IoT is still here. It is now incubating in the slow growth mode that precedes the final blossom of a new technology. And this makes me think about another likely and no so distant event: The collision between IoT and AI.
Wikipedia gives an overview of the average neurons for several animals, including the number of synapses (neuron connections) for a few of them, and we can extrapolate synapses/neurons for the rest. If we try to emulate an animal brain’s processing power with a deep neural network, as shown in Table 1, a rough approximation could be to consider that we need one byte for each neuron (output) and one byte for each synapse (weight).
Things that think Most of the AI applications focused initially on internet services from Facebook, Google, Amazon, and the like, with algorithms running on servers equipped with multicore CPUs and GPUs operating at GHz frequencies with terabytes of memory. Then AI reached the
18
Most of the billion-plus devices foreseen by IoT luminaries are much more constrained, however. Bringing AI into these devices at the very edge of the network would have a transformative impact but would require striking the right balance between resources, connectivity, cost, and power. Some may argue: With such small devices, what can be the utility of an AI at the edge? To picture it, let’s compare these devices with what can be achieved with animal brains.
Ant power! Microcontrollers (MCUs) have a memory range of 1 to 4 megabytes, so AI in a MCU should be as useful as a jellyfish or a snail. Not much intelligence, would you say? Well, let’s look at it twice: Snail and jellyfish can feed themselves, move, reproduce, and hide when they feel threatened. They manage complex interactions with their environment, recognize patterns, and control their body to create motion. This level of intelligence is enough for simple devices like thermostats, vacuum cleaners, and doorbells. The idea here is to have devices that make microdecisions based on the sensing of their environment.
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
A device with a snail brain could at least recognize some micropatterns, in the same way a snail identifies what to eat and what to fear. Imagine a door lock identifying an attempt to tamper with it, or a washing machine defining its program by the color of your clothes. Animals, even ones with lower intelligence, are incredibly skilled at one thing that seems to evade computers: interaction. Animals interact instinctively with each other through sight, smell, and touch. A snail brain in a device could certainly bring a sensitive user interface, one that’s more intuitive and natural. With microprocessors that have external memory in the gigabyte range, the level of an ant or a honeybee can be reached. The typical use case for insectlevel intelligence is the swarm: The idea is that with a myriad of simple robots with enough intelligence to exhibit swarm behavior, complex tasks can be performed, in the same way as an ant colony functions. If you are wondering about the utility of the insect brain, know that there are 10 quintillion insects in the world and only 7.6 billion humans; the answer is: it can be very useful. The applications are vast, with opportunities in farming, smart cities, environment, security, rescue, and defense. Researchers at Harvard several years ago demonstrated a 1,024-robot swarm, which is the largest to date. Like ants or honeybees, the 1,024-robot swarm can achieve remarkable colony-level feats like transporting large objects or autonomously building human-scale structures. Enabling endpoint intelligence In feature-rich embedded devices – from smartphones to security cameras and cars – machine vision has been driving AI adoption. With concrete use cases, a large potential market, and an efficient algorithm, CNN [convolutional neural networks] machine vision allows for hardware acceleration. Now almost every device capable of image acquisition and processing integrates a CNN AI accelerator. Renesas
www.renesas.com
@RenesasAmerica
FIGURE 1
Google Trends: IoT (red) and Artificial Intelligence (blue)
TABLE 1
Summary of the memory needed to emulate animal brains
Enabling AI in smaller IoT devices at the endpoint of the network is not that simple, as there are multiple unproven use cases. Because of the difficulty in defining a proper hardware acceleration, the route that many hardware companies are taking is to enable AI with tools and software on general-purpose controllers, and to monitor the evolution. To this end, Renesas Electronics was one of the first companies to release an embedded AI solution (e-AI) that enables users to translate a trained network from Tensorflow or Caffe to code that is usable by its MCUs. The tools offered include an e-AI translator that converts the neural network in C code usable by the MCU tools, and an e-AI checker that predicts the performance of the translated network. Renesas has identified several use cases – in predictive maintenance, for instance – but the intention is to let users and the communities discover and innovate. This is why the tutorial is available for the Renesas community “gadget Renesas” boards and why Renesas Electronics America has been promoting an e-AI design contest on the GR-PEACH board, using the RZ/A1H embedded MPU. Billions and counting With embedded AI, we are in the typical “business model definition” stage, wherein the solution has been invented and is used by the early adopters, but experiencing a gap before it achieves mainstream acceptance. Startups and innovative companies engaging in this research will have to use all their wits to cross this chasm. With new developments every day, it is certain that many pivots will be necessary before companies find a profitable use case and a growth engine to achieve success. If they do, they will surely be in for a big success. Remember, there are 10 quintillion (10,000,000,000 billion) insects on the planet. How many billions of “insect-smart” devices will we build? What could we create with that level of intelligent devices? IAI Semir Haddad is Director of Strategy Planning & Strategic Business Development at Renesas Electronics America. He has over 20 years of experience in the semiconductor industry and more than 17 years in product management with microcontrollers, microprocessors, and embedded software. Mr. Haddad earned his master’s degree in Electronic Engineering from the École Supérieure d’Électricité (France) and an MBA from ESSEC Business School (France).
www.linkedin.com/company/ renesas/
www.embedded-computing.com/ai-machine-learning
www.facebook.com/ RenesasElectronicsAmerica
YOU TUBE
www.youtube.com/channel/ UCK-EIRMobKKZwybhgTh8wEg
GOOGLE PLUS
https://plus.google.com/ +renesas
Industrial AI & Machine Learning RESOURCE GUIDE 2018
19
EMBEDDED NEURAL NETWORK PROCESSING
Getting back up: Coupling AI and memory class storage saves in a big way By Bill Gervasi, Nantero
The hockey stick in adoption of artificial intelligence (AI) applications is well under way; custom architectures including CPUs, GPUs, FPGAs, and neural network processors are claiming their place in the cloud. One of the weaknesses of the massively parallel architectures for AI is sensitivity to data loss on power failure, resulting in high costs for these data centers. Memory class storage architecture provides a high-performance persistent memory for AI applications, enabling a new level of uptime, lower power, and higher profits for data centers. The rapid rise in the adoption of artificial intelligence (AI) as well as deep-learning algorithms is changing the profile of modern data centers. Applications including voice recognition, augmented reality, medical data mining, and thousands of others are capturing more and more data cycles as the cloud becomes the universal resource for information. The simplistic term AI as used here implies the collection of many types of applications, some with wildly different technical requirements. The cloud is a major source of revenue for most businesses as well, and uptime has become a dominant requirement. Losses from system downtime are pegged at an average of $22,000 per minute; an average of 14.1 hours of downtime annually means an average annual loss of
20
$18 million. Companies wholly dependent on the internet could easily see multiples of that overhead cost. Clearly, there is a rapid return on investment for a solution that minimizes downtime and speeds recovery from power failures. Introducing memory class storage, defined below, into the data centers – and especially into AI processor subsystems – is a key performance enhancement that increases the value of a data center. Common AI applications AI is not new, although mass deployment of AI is a recent trend. While there is no one architecture for an AI processing center, a number of common trends have emerged in the deployment of AI. Generally speaking, AI uses a large array of relatively simple processing units operating in parallel on a very wide data set, with each processing element operating on a subset of the data. To help visualize how this would work, imagine a JPEG picture-processing algorithm with a small processing unit running the discrete cosine transform (DCT) algorithm for an 8x8 array of pixels. Since typical raw pictures are far greater than 8x8 pixels, an AI engine can include many processing units operating in parallel, each running the same DCT algorithm, with one master controller that feeds 8x8 parts of the overall picture to the processing units. The master controller feeds the results from
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
INTRODUCING MEMORY CLASS STORAGE ... INTO THE DATA CENTERS – AND ESPECIALLY INTO AI PROCESSOR SUBSYSTEMS – IS A KEY PERFORMANCE ENHANCEMENT THAT INCREASES THE VALUE OF A DATA CENTER. streaming past the processing units. With voice recognition, this use could include tuning the recognition engine for a specific user’s accent. Deep convolutional networks are one mechanism for adding recognition, context, and modified seeds to enhance the quality and accuracy of the algorithm.
FIGURE 1 Recurrent neural network processing.
each processing unit back into an output cache for the results, such as the data for an encoded JPEG file. These parallel processing elements may also choose to share data between them, for example to manage the edges between adjacent input data blocks. One approach to this solution is the recurrent neural network (RNN) as shown in Figure 1. Relating the use of AI to the cost of power loss, the simple RNN application shown in Figure 1 shows the inherent overhead when the input data (in yellow) and the output data (in orange) is stored in dynamic memory. On data loss, the input data must be reloaded and the RNN algorithms rerun to restore the output data. Deep learning is another variation of AI that exhibits greater sensitivity to data throughput. Deep-learning algorithms load a seed pattern into the AI memory that represents a precalculated estimate of expected possible answers. Examples of deep learning include voice recognition, where the seed data may represent generic voice patterns. However, deep learning can modify the seed pattern once input data starts www.embedded-computing.com/ai-machine-learning
One challenge for deep-learning algorithms is to avoid retraining. Imagine the reaction of the customer who had to retrain a voice-recognition device every time they moved the device to a different room. For deep learning, the modified seed models need to be checkpointed periodically so that the learned data may be restored after power failure. In Figure 2, the green circles represent these models; if this data is lost, any updates to the model must be relearned. The pooling layers, in pink double circles, also contain information to accelerate the processing and must be saved or reconstructed. Common AI architectures As stated previously, there are trends in the industry regarding hardware implementations of AI architectures. These trends are 1) feeding large amounts of data into a wide array of processing elements; 2) each processor operating on a relatively simple algorithm to 3) produce output data which is recombined into a desired result; 4) for deep learning, these intermediate results can modify an initial input data set which needs to be extracted periodically.
Industrial AI & Machine Learning RESOURCE GUIDE 2018
21
EMBEDDED NEURAL NETWORK PROCESSING
A fairly typical AI device is shown in Figure 3, wherein an array of 15 processing elements is tied through an internal memory crossbar to a large data resource (e.g., 32 GB) using high-bandwidth memory (HBM). HBM provides a 1024-bit wide interface running at 2.4Â Gb/s per data pin; with four internal HBM buses, the processing elements can be fed at a combined throughput of roughly 1.2 TB/s. In addition, each AI device provides a number of high-speed serial I/O interfaces (e.g., 1 Tb/s) to expand the architecture to large numbers of processors. These serial buses are interconnected in a somewhat application-specific pattern, often a hypercube or toroid arrangement, so that data can be distributed through the array for processing in parallel by many AI devices. Figure 4 is an example of a toroid connection scheme to route data through the array. At 1 Tb/s, these serial links are reasonably fast, but the number of links are still limited and data must make several hops through the array to reach the destination. Even at fairly high bus efficiency, it can take multiple seconds to fill the HBM content of each AI device. One solution would be to increase the number of serial input feeds from the supporting hardware, but at 26 GB/s for a full DDR4 interface, it takes four channels of DDR4 to feed one input pipe. The external support hardware quickly becomes the bottleneck. Checkpointing the contents of modified models in deep-learning applications
FIGURE 3 Example of an AI processor.
also takes place over these serial buses. Each AI device needs to communicate its contents to the support hardware for commitment to nonvolatile resources, typically NVMe or SSD units. This checkpointing consumes valuable bandwidth on the communications links, time that could be better spent on data processing. The cost of power failure The bottleneck caused by the serial links highlights the problem faced by power failure. Once the models are lost, refilling a large array of AI devices can take a very long time. Reloading the applications into the execution units takes time as well, then restarting the algorithms and restoring lost work in progress can also take a significant amount of time. Meanwhile, the system is unavailable for end-user data processing. For deep-learning environments, data loss may not be acceptable, so complex mechanisms must be put in place to periodically checkpoint learned modification of seeded data models. Depending on the system architecture, the speed of reloading will be limited by the nonvolatile storage interface. Even with 10 GB/s NVMe backup, restoration of a
FIGURE 2 Deep convolutional network processing.
22
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
large AI array could take many minutes. If this bottleneck defines the granularity of saved deep-learning models, the likelihood of permanently lost data is assured. To avoid this data loss, uninterruptible backup systems are typically deployed; however, the power requirements for these architectures tends to be quite high and the UPS are relatively expensive.
FIGURE 4 Toroid connection of multiple AI devices.
Memory class storage in AI Nantero NRAM is a nonvolatile memory technology that operates at full DRAM speed and with unlimited write endurance, hence the expression “Memory Class Storage.” NRAM uses electrostatic forces to switch arrays of carbon nanotubes for the storage of 1s and 0s. Coupling NRAM with an HBM interface, NRAM can solve the problem of power fail sensitivity of AI architectures. The NRAM HBM drops into the AI device exactly like a DRAM HBM. It uses the same signals and timing, making integration exceptionally simple. Unlike storage class memories, which are much slower than DRAM with very limited endurance and therefore still require support DRAM to achieve the desired performance, NRAM completely replaces all DRAM in the AI device with no additional support required. Since NRAM is inherently nonvolatile, the AI controllers can also exploit its unique features; one example is to turn off refresh and gain an additional 15 percent performance at the same clock frequency. (Figure 5.) On power failure, the system sends a single signal to all AI devices warning of impending power failure. Each AI device responds by completing calculations in process and storing the results into the nonvolatile NRAM, then shutting down. When power is restored, the execution unit code is restored from the NRAM to the local memory (such as SRAM) attached to each execution unit, temporary results are restored, and execution resumes in microseconds instead of minutes.
FIGURE 5 HBM NRAM architecture.
NVMe or SSD – the incorporation of NRAM HBM into an AI architecture eliminates the need for expensive battery-backup systems. Conclusion NRAM HBM reimagines the computing infrastructure for AI and deep-learning applications. By providing inherent data persistence at all times, AI servers need not take the long delays associated with reloading models and other data. Checkpointing of modified model data is automatic and does not consume bandwidth on the interconnects between processing elements and the support computing system. Incorporating NRAM HBM into an AI architecture eliminates the cost, complexity, and reliability concerns of battery backup systems; reduces power in the data center; and increases performance. Backup and restore procedures turn mostly into NOPs … just turn back on and go! IAI Bill Gervasi is principal systems architect at Nantero, Inc. He has been working with memory devices and subsystems since 1 Kb DRAM and EPROM were the leading edge of technology. He has been a JEDEC chairman since 1996 and is responsible for key introductions including DDR SDRAM, the integrated Registering Clock Driver and RDIMM architecture, and the formation of the JEDEC committee on SSDs. He is actively involved in the definition of NVDIMM protocols.
Since data is never lost, and the external serial links are not needed – nor are external backup mechanisms such as www.embedded-computing.com/ai-machine-learning
Nantero Inc.
www.nantero.com
TWITTER @Nantero
www.linkedin.com/company/nantero/
Industrial AI & Machine Learning RESOURCE GUIDE 2018
23
EMBEDDED NEURAL NETWORK PROCESSING
Adaptive acceleration holds the key to bringing AI from the cloud to the edge By Dale Hitt, Xilinx
MPSoC devices are available with ISO 26262 ASIL-C safety specs, needed for autonomous-driving applications.
Emerging applications for artificial intelligence (AI) will depend on system-on-chip (SoC) devices with configurable acceleration to satisfy increasingly tough performance and efficiency demands. Applications such as smart security, robotics, and autonomous driving that rely increasingly on embedded AI to improve performance and deliver new user experiences mean that inference engines hosted on traditional compute platforms can struggle to meet real-world demands within tightening constraints on power, latency, and physical size. They suffer from rigidly defined inferencing precision, bus widths, and memory that cannot be easily adapted to optimize for best speed, efficiency, and silicon area. An adaptable compute platform is needed to meet the demands placed on embedded AI running state-of-the-art convolutional neural networks (CNN). Looking further ahead, the flexibility to adapt to more advanced neural networks is a prime concern. CNNs that are popular today are being superseded by new state-of-the-art architectures at an accelerating pace. Traditional SoCs must be designed using knowledge of current neural network architectures but are targeting deployment typically about three years in the future from the time development starts. New types of neural networks, such as RNNs or capsule networks, are likely to render
24
traditional SoCs inefficient and incapable of delivering the performance required to remain competitive. If embedded AI is to satisfy end-user expectations and – perhaps more importantly – keep pace as demands continue to evolve in the foreseeable future, a more flexible and adaptive compute platform is needed. This goal could be achieved by taking advantage of user-configurable multicore system-on-chip (MPSoC) devices that integrate the main application processor with a scalable programmable logic fabric containing configurable memory architecture and signal processing suitable for variable-precision inferencing. Inferencing precision In conventional SoCs, performance-defining features such as the memory structure and compute precision are fixed. The minimum is often eight bits, defined by the core CPU, although the optimum precision for any given algorithm may be lower. An MPSoC, in contrast, allows programmable logic to be optimized right down to transistor level, giving freedom to vary the inferencing precision down to as little as one bit if necessary. These devices also contain many thousands of configurable DSP slices to handle multiply-accumulate (MAC) computations efficiently. The freedom to optimize the inferencing precision so exactly yields compute efficiency in accordance with a square-law: A single-bit operation executed in a 1-bit core ultimately imposes only 1/64th of the logic needed to complete the same operation in an 8-bit core. Moreover, the MPSoC allows the inferencing precision to be optimized differently for each layer of the neural network to deliver the required performance with the maximum possible efficiency. Memory architecture Along with improving compute efficiency by varying inferencing precision, configuring both the bandwidth and structure of programmable on-chip memories can further enhance the performance and efficiency of embedded AIs. A customized MPSoC can
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
FIGURE 1 AI is increasingly becoming embedded in many types of equipment, including medical devices.
the inference engine in the most power-, cost-, and size-efficient option capable of meeting system performance requirements. Also available: automotive-qualified parts with hardware functional-safety features certified according to industry-standard ISO 26262 ASIL-C safety specifications, which is very important for autonomous-driving applications. (See photo at left.) An example is Xilinx’s Automotive XA Zynq UltraScale+ family, which contains a 64-bit quad-core ARM Cortex-A53 and dual-core ARM Cortex-R5 based processing system alongside the scalable programmable logic fabric. This configuration enables users to consolidate control processing, machine-learning algorithms, and safety circuits with fault tolerance in a single chip.
have more than four times the on-chip memory and six times the memoryinterface bandwidth of a conventional compute platform running the same inference engine. The configurability of the memory allows users to reduce bottlenecks and optimize utilization of the chip’s resources. In addition, a typical subsystem has only limited cache integrated on-chip and must interact frequently with off-chip storage, which adds to latency and power consumption. In an MPSoC, most memory exchanges can occur onchip, which is not only faster but also saves over 99 percent of the power consumed by off-chip memory interactions. Silicon area Solution size is also becoming an increasingly important consideration, especially for mobile AI on board drones, robots, or autonomous/self-driving vehicles. The inference engine implemented in the FPGA [field-programmable gate array] fabric of an MPSoC can occupy as little as one-eighth of the silicon area of a conventional SoC, allowing developers to build more powerful engines within smaller devices. Moreover, MPSoC device families can offer designers a variety of choices to implement Xilinx
www.xilinx.com
TWITTER @XilinxInc
Today, an embedded inference engine can be implemented in a single MPSoC device, and consume as little as 2 watts, which is a suitable power budget for applications such as mobile robotics or autonomous driving. Conventional compute platforms cannot run real-time CNN applications at these power levels even now and are unlikely to be able to satisfy the increasingly stringent demands for faster response and more sophisticated functionality within more challenging power constraints in the future. Platforms based on programmable MPSoCs can provide greater compute performance, increased efficiency, and size/weight advantages at power levels above 15 watts as well. The advantages of such a configurable, multiparallel compute architecture might be of academic interest only, where developers were unable to apply them easily in their own projects. Success depends on the availability of suitable tools to help developers optimize the implementation of their target inference engine. To meet this need, Xilinx continues to extend its ecosystem of development tools and machine-learning software stacks; it also works with specialist partners to simplify and accelerate implementation of applications such as computer vision and video surveillance. Flexibility for the future Leveraging the SoC’s configurability to create an optimal platform for an application at hand also gives AI developers the flexibility to keep pace with the rapid evolution of neural network architectures. The potential for the industry to migrate to new types of neural networks represents a significant risk for platform developers, but the reconfigurable MPSoC gives developers the flexibility to respond to changes in the way neural networks are constructed by reconfiguring to build the most efficient processing engine using any contemporary state-of-the-art strategy. Increasingly, AI is now being embedded in equipment such as industrial controls, medical devices (Figure 1), security systems, robotics, and autonomous vehicles. Adaptive acceleration in leveraging programmable logic fabric in MPSoC devices holds the key to delivering the responsive and advanced functionality required to remain competitive. IAI Dale Hitt is Director of Strategic Marketing Development at Xilinx. LINKEDIN
www.linkedin.com/company/xilinx/
www.embedded-computing.com/ai-machine-learning
www.facebook.com/XilinxInc
YOU TUBE
www.youtube.com/user/cevadsp
Industrial AI & Machine Learning RESOURCE GUIDE 2018
25
ADVERTORIAL
EXECUTIVE SPEAKOUT
AI PUSHING US TO THE EDGE By Mychal McCabe
While autonomous cars generate the majority of headlines, drones, collaborative robotics or “cobotics,” and transportation systems ranging from rail to hyperloop will be built from the ground up to capitalize on the current wave of artificial intelligence (AI) and machine learning (ML) innovation to realize the vision of remarkably more efficient, more intelligent infrastructure. This autonomy will drive demand for significantly more compute at the edge, where requirements, constraints, and economics are fundamentally different. It will also present new and unprecedented levels of risk. AI will have a significant impact on the type of computing workloads that need to be run on edge devices. Traditionally, embedded system design has begun with custom hardware, possibly encompassing customized silicon processors (SoCs), on which software is layered – a “bottoms-up” approach. For AI and machine learning implementations, the process is turned on its head; a defined problem statement will determine the best type of learning algorithm to use (for example, an object classification problem may require a different approach from voice recognition), from which the best hardware platform will be selected to run the learning framework most efficiently. This approach may involve selecting CPUs with specific instruction sets or accelerators, or using GPUs or FPGAs alongside traditional processors, for example. In these environments, the software often defines the required hardware platform. Many embedded systems are designed to automate specific tasks. In industrial systems, for example, a programmable logic controller (PLC) is used to automate manufacturing processes such as chemical reactions, assembly lines, or robotic devices. Generally, these devices perform with a high degree of accuracy, repeatability, and reliability, though they need to be individually programmed to do so and often have little scope for performing outside of their initial design parameters. However, in order to drive productivity increases and affect larger business outcomes, learning systems will increasingly be used, spanning a range of control devices at the cell, plant, or system level. Similar system-level approaches are emerging in autonomous-driving applications, where information from multiple subsystems needs to be merged and processed in some central unit running ML algorithms for object classification, pathfinding, and actuation. Embedded AI, ML face constraints Embedded AI and ML differ from other types of AI mainly due to the fact they are constrained by a number of factors, including: 1. Response time: If an AI-enabled system needs to react to an input, then the response time becomes a discriminant factor in choosing the right hardware or resizing the problem. 2. Processing power: It’s not always feasible to deploy a large, power-hungry, expensive system.
3. Electrical power requirements: If the problem that needs solving is compute-intensive, then available electrical power can be a limiting factor, especially in highly mobile systems such as drones, robots, etc. 4. Safety: ML/AI algorithms output statistical results or classifications with less than 100% certainty. Security should also be top of mind as AI finds its way into more systems. The Future of Humanity Institute recently released a report titled “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation.” The report focuses on three macro changes to the threat landscape: expansion of existing threats, introduction of new threats, and a change in the character of existing threats. Fundamental to this discussion is the notion that AI is a “dualuse” technology: “AI systems and the knowledge of how to design them can be put toward both civilian and military uses, and more broadly, toward beneficial and harmful ends.” The popular imagination is concerned with AI at large in the world, or a world overrun with indifferent or malevolent autonomous systems. Such systems tend to be viewed as either science projects or science fiction, but a look across multiple market segments suggests that we’re entering the early majority phase of the technology adoption lifecycle for systems moving from automatic to autonomous. Consider that DARPA’s first Autonomous Vehicle Challenge took place in 2004, that Google and Amazon have been working on their autonomous drone fleets since 2012, and that IBM began lobbying the FDA to allow its Watson system to assess cancer screening scans in 2013. AI: Look around AI in particular is emerging as a mainstream capability for everything from marketing automation to smart factories, but that ubiquity doesn’t mean that it is well understood. Consider Facebook’s unplugging of an AI project in which two computers began to communicate with one another in a language that wasn’t understood by the humans assigned to the project: Mainstream media described this story as a “creepy preview of our potential future.” Only it didn’t really happen, at least not in the way suggested by the headlines.
EXECUTIVE SPEAKOUT
More likely than an incomprehensible, creepily pervasive, and indifferent AI entity at large in the world are actual threats like data poisoning, adversarial examples, and the ability to exploit the goal orientation of autonomous systems. As the Future of Humanity Institute points out, “these are distinct from traditional software vulnerabilities (e.g., buffer overflows) and demonstrate that while AI systems can exceed human performance in many ways, they can also fail in ways that a human never would.” The clear trend from systems characterized by automatic operation to those characterized by autonomous operation –in multiple critical infrastructure sectors – will usher in the arrival of AI in operational technologies (OT) including but not limited to autonomous vehicles, control and process domains, and other systems with safety-critical requirements. System architectures and certification approaches must evolve with these requirements in mind. What kind of power for AI/ML? As we move one year deeper into the transition from automated to autonomous, the increased demand for intelligence in edge devices is accelerating. The need to integrate or fuse data from diverse sensors, with video growing from HD to 4K and 8K on the horizon, and with even relatively lightweight applications of ML and AI coming to systems ranging from collaborative robotics to connected vehicles, the edge is going to require more compute power. Edge devices tend to be resource-constrained and are likely under the severe cost pressure that comes with a bill of materials; in other words, brute-force applications of raw compute power won’t meet the requirements of the market. Cloud computing will also fail to meet the requirements of automatic and autonomous systems that require immediate response times; today’s network connectivity simply cannot deliver that.
ADVERTORIAL
Consolidating edge compute workloads, even in complex heterogeneous systems with multiple levels of safety criticality throughout, will be essential to the economics of digital transformation and the near horizon of a software-defined autonomous world. Separating workloads that could result in more than one output, based on the response of an AI-capable system from other workloads with fixed or deterministic outcomes, should be an essential consideration of those architecting such systems. Consolidating and separating workloads with multiple levels of safety criticality and performance criteria is an area where Wind River has deep expertise across multiple industries. The ability to understand expected and desired outcomes at the system-level and identify deltas in real-time will be critical. Simulation and Digital Twin technologies – including Wind River Simics – have a role to play in setting such behavioral baselines and monitoring against them through time. And operating systems like VxWorks and Wind River Linux can serve as a landing zone for AI. Wind River is proud to be working with the ecosystem of innovators driving the automated to autonomous trend that is bringing AI to the forefront and ensuring that the softwaredefined autonomous world of the future is a safe, secure reality. Mychal McCabe is VP of Marketing at Wind River. Wind River www.windriver.com Twitter: @WindRiver Facebook: @WindRiverSystems LinkedIn: www.linkedin.com/company/wind-river/
EMBEDDED NEURAL NETWORK PROCESSING
Micropower intelligence for edge devices By Narayan Srinivasa and Gopal Raghavan, Eta Compute
The world is moving toward a smart and distributed computing model of interacting devices. Intelligence in these devices will be driven by machine-learning (ML) algorithms. Yet extending machine learning to the edge is not without its challenges. Neuromorphic, or brain-inspired, computing – which can include handwriting recognition and continuous speech recognition – will enable a wide range of intelligent applications that address these challenges. The world is moving toward a smart and distributed computing model wherein millions of smart devices would directly interact and communicate with their world and with each other to enable a faster, more responsive, and intelligent future. Such a situation would enable a new breed of edge devices and systems ranging from wearables, smart bulbs, and smart locks to smart cars and buildings, with minimal need for any central coordinating or processing entity such as the cloud. At the heart of this smart world lie machine-learning algorithms that run on processors that directly operate on data collected by these devices to learn to make intelligent inferences and to act upon their world in real time and efficiently under dynamically changing circumstances. The model for enabling intelligence in edge devices is based on the ability to sense-learn-infer-act during their interaction with their environment and other devices (Figure 1). We believe that
28
solutions that empower these intelligent edge devices to be both agile (i.e., fast response) and efficient (i.e., from a power perspective) will dominate this dynamically changing and distributed world of networked sensors and objects. But like any new technology, there are several challenges that need to be addressed in delivering low power intelligence at the edge. We review some key challenges and discuss how Eta Compute is working to address them. A key challenge is to be able to process data intelligently with very limited power resources on these edge devices. Eta Compute has been developing a foundational technology called DIAL [delay insensitive asynchronous logic] to transform the operation of traditional microprocessors and digital signal processors from synchronous to an asynchronous mode (i.e., without any clock). The basic principle, unlike traditional logic (Figure 2), is to run the processor in an event-driven fashion, waking up to process on demand and sleep when not used via a handshaking protocol. Furthermore, this processor can be automatically controlled to operate at the lowest frequency as demanded by a task and at that frequency to be able to scale the operating voltage to the smallest possible value to run the task. An important innovation of DIAL is that there is no area penalty in the circuit design, while also offering a formally verifiable methodology to verify proper circuit function. These important f eatures have enabled Eta Compute to deliver microprocessor technology at the lowest power levels in the industry and a scaling of ~10 μW/MHz with frequency of operation. Performing voltage scaling also enables a seamless shift between power-efficient (i.e., low-power) and performance-efficient (i.e., high-throughput) computing tasks. Another important challenge is the ability to support ML models that can learn directly on the device with very limited memory resources. This capability offers the
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
desired privacy and security for many applications while also ensuring agile interaction with its environment. The approach to address this problem today is to train ML models in the cloud using deep learning models with tools such as Google’s TensorFlow and then convert these trained models into an inference model that operates at the edge. We are exploring a new approach based on principles of brain computing or neuromorphic computing [1] to be able to both learn directly from data streams but also do so with a limited number of training examples and with limited memory requirements to store the learned knowledge. The basic idea here is to represent data/signals (Figure 3a) explicitly by incorporating time in the form of action potentials or spikes and then by combining the asynchronous mode of chip operation (as described above) with the asynchronous mode of learning using spikes. The spiking neural network (SNN) computations offer a very sparse representation of data and
FIGURE 1 An intelligence edge model with the ability to learn and infer and rapidly interact with its environment will enable smart devices of the future.
is very energy efficient to implement using DIAL because computing only happens when there are spike events. Furthermore, learning is enabled only using local learning rules with sparse connectivity and is thus not as parameter intensive as a traditional ML model thus saving on memory required to store and time needed to train the model [2]. The last but equally important aspect of SNNs is to exploit structural constraints (Figure 3b) to encode memory of event sequences for rapid learning without the need for multiple data presentations during training. Combining these aspects of the model with a processor powered by DIAL results in an edge device that interchangeably can learn and infer thus enabling an agile sense-learn-infer-act model. Eta Compute is developing real-world applications for pattern recognition using this technology and we discuss two such examples. The first is in continuous recognition of spoken digits using data from the speech command dataset [3]. This dataset is composed of audio snippets of single digits from 2,300 different speakers for a total of 60K utterances. Our SNN was trained using a single pass of each training sample
FIGURE 2 DIAL technology enables voltage scaling to allow seamless ways to address both power and performance constraints applications.
FIGURE 3 The core ideas in SNN models. (a) Sparse and discrete event-based representation of signals is very efficient from both an information-theoretic and energy-efficiency perspective (b) Recurrent structural constraints and balanced spiking activity between excitatory (green) and inhibitory (red)Â neurons. www.embedded-computing.com/ai-machine-learning
(a)
(b)
Industrial AI & Machine Learning RESOURCE GUIDE 2018
29
EMBEDDED NEURAL NETWORK PROCESSING
FIGURE 4
(a)
(b)
Our chip implementation of continuous spoken digit and hand-written digit recognition is highly power- and memory-efficient while being comparable in accuracy to traditional ML models. (a) Continuous spoken digit recognition (b) Hand-written digit recognition.
while achieving an accuracy of 95.2 percent on the test set comparable to other ML models. Our SNN model is orders of magnitude more efficient from a model efficiency point of view (as measured by the number of training samples needed and the number of network parameters to learn) while performing with comparable accuracy.
These results are suggestive of the potential for realizing intelligent, agile, and efficient edge devices to drive a rapidly expanding IoT market, with over 25 billion devices coming into use by 2020 as predicted by the Gartner group [5]. The codevelopment of other infrastructure – such as standards for interoperability of devices and 5G wireless technology – could, for example, enable new fitness trackers to robustly detect user state such as falling asleep to then automatically switch off lights. The future appears to be marching toward a smart and distributed computing model powered by intelligent and efficient edge devices. IAI
The SNN was also able to generalize to robustly identify digits in a continuous mode due to intrinsic short-term memory for robust detection of the beginning of 10-digit utterances even with overlapping spectrograms due to other digits. This model was ported onto our asynchronous Eta Core chip. The audio was captured from a microphone and digitized using our low power ADC on chip and the digitized signal was converted into spectrograms and then encoded into spikes using DSP. The ARM M3 performed SNN computations (Figure 4a). The total memory for the model was 36 KB. The total power consumed from data to decision (i.e., including I/O, DSP, and M3) was 2 mW with an inference rate of 6-8 words/second.
References
The same principles were applied to a handwritten digit recognition problem based on training data from the MNIST benchmark [4]. The binary images were directly converted into spikes by the DSP while SNN learning is performed on the M3 (Figure 4b). The chip achieved an accuracy of 98.3 percent on the MNIST test set and required 64 KB of memory. The solution required 1 mW of power from data to decision with a throughput of 8 images/second.
30
1. C. A. Mead, “Neuromorphic Electronics Systems,” Proc. of IEEE, vol. 78, no. 10, pp. 1629-1636, 1990. 2. N. Srinivasa and Y. K. Cho, “Unsupervised discrimination of patterns in spiking neural networks with excitatory and inhibitory synaptic plasticity,” Front. In Comp. Neuroscience, doi:10.3389/fncom.2014.00159, 2014. 3. P. Warden, “Launching the Google Speech Command Dataset,” https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html. 4. Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient based learning applied to document recognition,” Proc. of IEEE, vol. 86, pp. 2278-2324, 1998. 5. “Gartner Says 4.9 Billion Connected ‘Things’ Will Be in Use in 2015,” https://www.gartner.com/newsroom/id/2905717.
Narayan Srinivasa, Ph.D., CTO of Eta Compute, is an expert in machine learning, neuromorphic computing, and their applications to solve real-world problems. Prior to joining Eta Compute, Dr. Srinivasa was the Chief Scientist and Senior Principal Engineer at Intel leading the neuromorphic computing group. Dr. Srinivasa earned his bachelor’s degree in technology from the Indian Institute of Technology, Varanasi, and his doctorate degree from University of Florida in Gainesville and was a Beckman Postdoctoral fellow at the University of Illinois at Urbana-Champaign. Gopal Raghavan, Ph.D., CEO and cofounder of Eta Compute, is an expert in engineering design and innovation for advanced technologies that solve the world’s toughest challenges. Prior to cofounding Eta Compute, Dr. Raghavan was the CTO of the Cadence Design Systems IP Division. Before Cadence, he co-founded Inphi Corporation, a company that set new standards for ultra-high bandwidth optical solutions. Dr. Raghavan earned his bachelor’s degree in technology from the Indian Institute of Technology, Kanpur, and obtained his master’s degree and doctorate in electrical engineering from Stanford University.
Industrial AI & Machine Learning RESOURCE GUIDE 2018
Eta Compute
www.etacompute.com
www.embedded-computing.com/ai-machine-learning
BY ENGINEERS, FOR ENGINEERS In the rapidly changing technology universe, embedded designers might be looking for an elusive component to eliminate noise, or they might want low-cost debugging tools to reduce the hours spent locating that last software bug. Embedded design is all about defining and controlling these details sufficiently to produce the desired result within budget and on schedule. Embedded Computing Design (ECD) is the go-to, trusted property for information regarding embedded design and development.
embedded-computing.com
AI FOR AUTONOMOUS DRIVE
From logistics regression to self-driving cars: Chances and challenges for machine learning in highly automated driving By Sorin Mihai Grigorescu, Markus Glaab, and Andre Roßbach, Elektrobit Automotive Machine learning has the potential to reshape the future of the automotive software and system landscape, despite the challenges that remain.
Machine learning has been one of the hottest topics in research and industry over the last couple of years. Renewed attention has resulted from the latest advancements in computational performance and algorithms compared to the advent of machine learning decades ago. Recent impressive results in artificial intelligence have been facilitated by machine learning, in particular deeplearning solutions. Applications include natural language processing (NLP), personal assistance, the victory of the game-playing program AlphaGo over a human being, and the achievement of human-level behavior in learning to play Atari games. Considering that machine learning and deep learning enable such impressive results when tackling extremely complex problems, it is obvious that researchers and engineers have considered also applying them to highly automated driving (HAD) scenarios in self-driving cars. The first promising results have been achieved in this area with NVIDIA’s Davenet, Comma. Ai, Google Car, and Tesla. Machinelearning and deep-learning approaches have resulted in initial prototypes, but the industrialization of such functionalities poses additional challenges with regard to, for example, essential functional safety considerations.
32
This article aims to contribute to ongoing discussions about the role of machine learning in the automotive industry and to highlight the importance of this topic in the context of self-driving cars. In particular, it aims to increase understanding of the capabilities and limitations of machine-learning technologies. Machine learning and highly automated driving It is a complex and nontrivial task to develop the highly automated driving functionalities that lead to self-driving cars. Engineers typically tackle such challenges using the principle of divide and conquer. This is for a good reason: A decomposed system with clearly defined interfaces can be tested and verified much more thoroughly than a single black box. Our approach to highly automated driving is EB robinos, depicted in Figure 1. EB robinos is a functional software architecture with open interfaces and software modules that permits
Industrial AI & Machine Learning RESOURCE GUIDE 2018
developers to manage the complexity of autonomous driving. The EB robinos reference architecture integrates components following the “Sense, Plan, Act” decomposition paradigm. Moreover, it makes use of machine-learning technology within its software modules in order to cope with highly unstructured real-world driving environments. The subsections below contain selected examples of the technologies that are integrated within EB robinos. In contrast, end-to-end deep-learning approaches also exist, which span everything from sense to act (Bojarski et al., 2016). However, with respect to the handling and training of corner cases and rare events, and with regards to the exponential amount of training data necessary, a decomposition approach (i.e., semantic abstraction) is considered as more reasonable (Shalev-Shwartz et al., 2016). Nevertheless, a decision about which parts are better tackled in isolation to others or in combination with others
www.embedded-computing.com/ai-machine-learning
FIGURE 1 Open EB robinos reference architecture.
FIGURE 2 Block diagram of the artificial samples generation algorithm for machine-learningbased recognition systems.
is required even if the decomposition approach is followed. It is also necessary to determine whether a machinelearning approach is expected to outperform a traditionally engineered algorithm for the task accomplished by a particular block. Not least, this decision may be influenced by functional safety considerations, a crucial element of autonomous driving. Traditional software components are written on the basis of concrete requirements and are tested accordingly. The main issues in the testing and validation of machine-learning systems are their “black box” nature and the stochastic behavior of the learning methods. It is basically impossible to predict how the system learns its structure. The criteria and theoretical background given above can provide guidance for informed decisions. Elektrobit is currently researching and developing use cases in which machine-learning approaches are considered to be promising. Two such use cases are presented next. The first deals with the generation of artificial training samples for machine-learning algorithms and their deployment for traffic sign recognition. The second use case describes our approach to selflearning cars. Both examples make use of current cutting-edge deep-learning technology.
Use case 1: Artificial sample generation and traffic sign recognition This project proposes a speed limit and end of restriction traffic sign (TS) recognition system in the context of enhancing OpenStreetMap (OSM) data used in entry navigation systems. The aim is to run the algorithm on a standard smartphone that can be mounted on the windshield of a car. The system detects traffic signs along with their GPS positions and uploads the collected data to backend servers via the mobile data connection of the phone. The approach is divided mainly into two stages: detection and recognition. Detection is achieved through a boosting classifier, while recognition is performed through a probabilistic Bayesian inference framework that fuses information delivered by a collection of visual probabilistic filters. The color image obtained is passed to the detector in 24-bit RGB format. The detection process is carried out by evaluating the response of a cascade classifier calculated through a detection window. This detection window is shifted across the image at different scales. The probable traffic sign regions of interest (RoI) are collected as a set of object hypotheses. The classification cascade is trained with extended local binary patterns (eLPB) from the point of view of feature extraction. Each element in the hypotheses vector is classified into a traffic sign by a support vector machine (SVM) learning algorithm. Traffic sign recognition methods rely on manually labeled traffic signs, which are used to train both the detection and the recognition classifiers. The labeling process is tedious and prone to error due to the variety of traffic sign templates used in different countries. Specific training data for each country is required for the traffic sign recognition method to perform well. It is time-consuming to create enough manually labeled traffic signs because position, illumination, and weather conditions have to be taken into account. Elektrobit, therefore, has created an algorithm that generates training data automatically from a single artificial template image to overcome the challenge of manually annotating large numbers of training samples. Figure 2 shows the structure of the algorithm. This approach provides a method for generating artificial data that is used in the training stages of machine-learning algorithms. The method uses the reduced data set of real and generic traffic sign image templates for each country to output a collection of images.
www.embedded-computing.com/ai-machine-learning
Industrial AI & Machine Learning RESOURCE GUIDE 2018
33
AI FOR AUTONOMOUS DRIVE
The features of these images are artificially defined by a sequence of image template deformation algorithms. The artificial images thus obtained are evaluated against a reduced set of real-world images using kernel principal components analysis (KPCA). The artificial data set is suitable for the training of machinelearning systems – in this particular case for traffic sign recognition – when the characteristics of the generated images correspond to those of the real images. Elektrobit replaced the Boosting SVM classifiers with a deep region-based detection and recognition convolutional neural network to improve the precision of the original traffic sign recognition system. The network is deployed using Caffe (Jia et al. 2014), which is a deep neural network library developed by Berkeley and supported by NVIDIA. Caffe is a pure C++/CUDA library with Python and Matlab interfaces. In addition to its core deep-learning functionalities, Caffe also provides reference deep-learning models that can be used directly in machine-learning applications. Figure 3 shows the Caffe net structure used for traffic sign detection and recognition. The different colored blocks represent convolution (red), pooling (yellow), activation (green), and fully connected network layers (purple).
THE ARTIFICIAL DATA SET IS SUITABLE FOR THE TRAINING OF MACHINE-LEARNING SYSTEMS – IN THIS PARTICULAR CASE FOR TRAFFIC SIGN RECOGNITION – WHEN THE CHARACTERISTICS OF THE GENERATED IMAGES CORRESPOND TO THOSE OF THE REAL IMAGES. We constructed the deep reinforcement learning system, shown in Figure 4, in order to experiment safely with autonomous driving learning. This system uses the TORCS open-source race simulator (Wymann et al. 2014), which is widely used in the scientific community as a highly portable multi-platform car-racing simulator. It runs on Linux (all architectures, 32- and 64-bit, little and big endian), FreeBSD, OpenSolaris, MacOSX, and Windows (32- and 64-bit). It features many different cars, tracks, and opponents
Use case 2: Learning how to drive The revolution in deep learning has recently increased attention on another paradigm, which is referred to as reinforcement learning (RL). In RL, an agent by itself learns how to perform certain tasks by means of a reward system. The methodology is in the category of semi supervised learning because the design of the reward system requires domainspecific knowledge. There is no required labeling for the input data, in contrast with supervised learning. This recent interest in RL is due mainly to the seminal work of the Deep Mind team. This team managed to combine RL with a deep neural network capable of learning the action value function (Mnih et al. 2016). Their system was able to learn to play several Atari games at human-level capacity.
34
FIGURE 3
Industrial AI & Machine Learning RESOURCE GUIDE 2018
Deep region-based detection and recognition convolutional neural network in Caffe. www.embedded-computing.com/ai-machine-learning
FIGURE 4 Deep reinforcement learning architecture for learning how to drive in a simulator.
to race against. We can collect images for object detection as well as critical driving indicators from the game engine. These indicators include the speed of the car, the relative position of the egocar to the center line of the road, and the distances to the cars in front. The goal of the algorithm is to self-learn driving commands by interacting with the virtual environment. A deep reinforcement learning paradigm was used for this purpose, in which a deep convolutional neural network (DNN) is trained by reinforcing actions a that provide a positive reward signal r(s^’,a). The state “s” is represented by the current game image as seen in the simulator window. There are four possible actions: accelerate, decelerate, turn left, and turn right.
FIGURE 5 A Caffebased deep convolutional neural network structure used for deep reinforcement learning.
The DNN computes a so-called Q- function, which predicts the optimal action a to be executed for a specific state s. In other words, the DNN calculates a Q- value for each state-action pair. The action with the highest Q- value will be executed, which moves the simulator environment to the next state, s’. In this state, the executed action is evaluated by means of the reward signal r(s’,a). For example, if the car was able to accelerate without a collision, the related action that made this possible will be reinforced in the DNN; otherwise, it will be discouraged. The reinforcement is performed in the framework by retraining the DNN with the state-reward signals. Figure 5 shows the Caffe implementation for the deep reinforcement learning algorithm. Types of machine-learning algorithms Although deep neural networks are among the most often used solutions in complex machine-learning challenges, there are various other types of machinewww.embedded-computing.com/ai-machine-learning
Industrial AI & Machine Learning RESOURCE GUIDE 2018
35
AI FOR AUTONOMOUS DRIVE
learning algorithms available. Table 1 classifies them according to their nature (continuous or discrete) and training type (supervised or unsupervised). Machine-learning estimators can be classified roughly according to their output value or training methodology. The algorithm is classed as a regression estimator if the latter estimates a continuous value function y ε R (i.e., a continuous output). The machine-learning algorithm is called a classifier when its output is a discrete variable y ε {0,1,…,q}. The traffic sign detection and recognition system described earlier is an implementation of this type of algorithm. Anomaly detection is one special application of unsupervised learning. The goal here is to identify outliers or anomalies in the data set. The outliers are defined as feature vectors that have different properties compared to the feature vectors commonly encountered in the application. In other words, they occupy a different position in the feature space. Table 1 also lists some popular machinelearning algorithms. These are briefly explained below. ›› Linear regression is a regression method used to fit a line, a plane, or a hyperplane to a data set. The fitted model is a linear function that can be used to make predictions on the real value function y. ›› Logistic regression is the discrete counterpart of the linear regression method, in which the predicted real value given by the mapping function is converted to a probability output that denotes membership of the input data point to a certain class. ›› Naïve Bayes classifiers are a set of machine-learning methods built on the basis of Bayes’ theorem, which assumes that each feature is independent of the others. ›› Support vector machines (SVM) are designed to calculate the separation between classes using so-called margins. The margins are computed to be as wide as possible in order to separate the classes as clearly as possible.
36
TABLE 1 Types of machinelearning algorithms.
›› Ensemble methods, such as decision trees, random forests, or AdaBoost combine a set of base classifiers, sometimes called “weak” learners, with the purpose of obtaining a “strong” classifier. ›› Neural networks are machine-learning algorithms in which the regression or classification problem is solved by a set of interconnected units called neurons. In essence, a neural network tries to mimic the function of the human brain. ›› k-means clustering is a method used for grouping together features that have common properties; i.e., they are close to each other in the feature space. k-means iteratively groups common features into spherical clusters based on the given number of clusters to group. ›› Mean-shift is also a data clustering technique, which is more general and robust with respect to outliers. As opposed to k-means, mean-shift requires only one tuning parameter (the search window size) and does not assume a spherical prior shape for the data clusters. ›› Principal components analysis (PCA) is a data dimensionality reduction technique that transforms a set of possibly correlated features into a set of linearly uncorrelated variables named principal components. The principal components are arranged in order of variance. The first component has the highest variation; the second has the next variation below this, and so on. Deep learning has revolutionized learning systems and their capabilities, but it is not necessarily the most suitable approach for all tasks. It may be more appropriate to use traditional pattern-recognition methods such as logistic regression, naïve Bayes, or k-means clustering for several other types of application. Criteria for selecting the right machine learning algorithm are therefore necessary. These criteria are described below. The complexity of the problem is a straightforward criterion governing choice, which must fit the complexity of the method. This criterion can be translated into the number of parameters that the algorithm has to learn. A deep neural network might be required to learn millions of parameters to get at best similar results to those of the logistic regression method. Figure 6 shows an approximate distribution of machine learning algorithms ordered according to their complexity. The math behind each algorithm is the basis for this empirical finding. The bias- variance tradeoff is one important aspect when choosing and building a machine learning system. Bias is the error produced by erroneous assumptions made by the learning method. It is directly related to the issue of underfitting. High bias algorithms fail to find the relevant relationships between the input features and the target labels. In contrast, variance is a measure of the sensitivity of the method to the random noise that is present in the input data. A high variance system can cause overfitting, where the algorithm models the random noise instead of the actual input features. In practice, a tradeoff between bias and variance must be found because these
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
two quantities are proportional to each other. Another criterion that should be taken into account is the number of tuning parameters that a data engineer needs to tune when training a classifier. Finally, the nature of the input data also needs to be considered. Linear separation of the data in the feature space is unusual in the real world. Arguably though, linearity can be assumed for some applications. An example of this is the classification of car and noncar objects based on their size and speed. This assumption is crucial in choosing an appropriate machine learning approach, since a linear classifier is faster and more effective for data that can be separated linearly compared to a non-linear classifier. Functional safety considerations Functional safety is part of the overall safety of a system. The international standard document ISO 26262 “Road vehicles – Functional safety” describes the development of electrical and electronic (E/E) systems in road vehicles: A system is made safe by various activities or technical solutions. These so-called
FIGURE 6
Classification of machine-learning algorithms based on their complexity
safety measures are reflected in the process activities that specify requirements, create architecture and design, and perform verification and validation. The avoidance of systematic failures is one aspect of ISO 26262. Human failures have been the systematic failures in traditionally engineered systems. Some clear examples of such failures are requirements and test cases that are incomplete, significant aspects of the design that are forgotten, or verifications that fail to discover issues. The same is also true when using machine learning. Furthermore, the task to be learned and the corresponding test cases are all also described by humans; systematic failures can still occur here. The development of machine-learning models therefore requires the application of best practice or of an appropriate standard process. This alone is not enough, however: Safety measures are required in order to control systematic failures in the machine-learning algorithms, given that parts of the development of system elements will be accomplished in future by means of such algorithms. These failures can be eliminated only if both can be guaranteed.
OpenSystems Media E-cast Embedded Servers Move to the Edge Sponsored by Davra, Intel, and Kontron The emergence of AI and other compute- and bandwith-intensive applications is creating an urgent need for micro servers that can live at the network edge. In this webinar, learn how to deploy high-performance analytics right where they’re needed – whether on a factory floor, on an isolated wind farm, or in a mobile medical lab. http://ecast.opensystemsmedia.com/813
www.embedded-computing.com/ai-machine-learning
Industrial AI & Machine Learning RESOURCE GUIDE 2018
37
AI FOR AUTONOMOUS DRIVE
More attention has been given recently to safety in the context of machine learning, due to its increased use in autonomous driving systems. Amodei et al., 2016, discussed research problems related to accident risk and possible approaches to solving them. The code in traditional software systems has to meet specific requirements that are later checked by standardized tests. In machine learning, the computer can be thought of as taking over the task of “programming” the modules by means of the learning method. This “programming” represents learning the parameters or weights of the algorithm. The learning procedure is very often stochastic, which means that no hard requirements can be defined. The machine-learning component is therefore a “black box” system. As a result, it is difficult or even impossible to interpret the learned content, due to its high dimensionality and the enormous number of parameters. Environmental sensors and the related processing play a decisive role that is beyond the requirements of functional safety, especially in the case of highly automated driving. The safety of the intended functionality (SOTIF) is concerned with the methods and measures used to ensure that safety-critical aspects of the intended functionality perform properly, taking sensor and processing algorithms into account. However, this problem has to be clarified for traditionally engineered systems and machine-learning systems alike, and it is still the subject of ongoing discussions. Analysis within a virtual simulator is one approach to disclosing such algorithms. A theoretically infinite number of driving situations can be learned and evaluated in such a simulated environment before the machine-learning system is deployed in a real-world car. Lives are at stake now that machine learning has progressed from gaming and simulation into real-world automotive applications. As discussed, Elektrobit
www.elektrobit.com
38
@EB_automotive
functional safety issues are becoming important as a result, and this also affects the scientific community. One consequence is research into approaches to benchmarking different machine-learning and AI algorithms in simulation. OpenAI Gym (Brockman et al. 2016) is one such simulator that can be used as a tool for developing and comparing reinforcement learning algorithms. What’s next The application of machine-learning-based functionalities to highly automated driving has been motivated by recent achievements. Initial prototypes have indeed produced promising results and have indicated the advantages when addressing the related complex problems. A significant number of challenges remain, however. It is necessary first of all to select the right neural network type for the given task. This selection is related to the applied learning methodology, the necessary preprocessing, and the quantity of training data. There is still discussion about the best way to decompose the overall driving task into smaller subtasks. Deep-learning technologies are capable of enabling end-to-end approaches without any need for decomposition, but this is currently considered to be less appropriate with regard to verification and validation capabilities. The machine-learning community needs to develop enhanced approaches, not least in order to address functional safety requirements, which are the foundation for successful industrialization of related functionalities. Elektrobit is convinced that machine learning has the potential to reshape the future automotive software and system landscape, despite the challenges that remain. For this reason, two types of investigation have been undertaken: The first is the application of machine-learning-based approaches as a solution to (selected subsets of) highly automated driving scenarios, such as the use cases mentioned above. The EB robinos reference architecture, as well as the partnership with NVIDIA among other things, contribute to the development environment. In the second, Elektrobit uses its expertise in the area of functional safety and industrialization of automotive software to bring these ideas and the products of its partners and customers to life. IAI Associate Professor Sorin Mihai Grigorescu has been affiliated since June 2010 with the Department of Automation at Transylvania University of Brasov, where he has been leading the Robust Vision and Control Laboratory. Since June 2013, he has also been affiliated with Elektrobit Automotive Romania, where he is the team manager of the Navigation Department. Sorin was the winner of the EB innovation award 2013 for his work on machine-learning algorithms used to build next-generation adaptable software for cars. Markus Glaab is an expert in automotive software architectures at EB Automotive Consulting. In 2010, he received his M.Sc. degree in Computer Science from the University of Applied Sciences Darmstadt, Germany. Markus has been with EB since 2016 and works on future automotive E/E architectures and the integration of related technologies such as machine learning. André Roßbach is a senior expert for functional safety at EB Automotive Consulting. In 2003, he received his degree in Business Information Technology from the University of Applied Sciences Hof. André has been with EB since 2004 and has developed software for medical systems, navigation software, and driver assistance systems; his current focus is on functional safety, agile development, and machine learning.
www.linkedin.com/company/ elektrobit-eb-automotive
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.facebook.com/EBAutomotiveSoftware
YOU TUBE
www.youtube.com/user/EBAutomotiveSoftware
www.embedded-computing.com/ai-machine-learning
19foPr free admission 2erw e-code You
ucher
orld.de / vo
-w embedded
Nürnberg, Germany
February 26 – 28, 2019
TODAY, TOMORROW, AND BEYOND Your one-stop resource for the entire spectrum of embedded systems: discover more than 1,000 companies and get inspired by the latest trends and product developments, by renowned speakers and exciting shows. Keep up to date:
embedded-world.de Media partners
Exhibition organizer NürnbergMesse GmbH T +49 9 11 86 06-49 12 Fachmedium der Automatisierungstechnik
F +49 9 11 86 06-49 13 visitorservice@nuernbergmesse.de Conference organizer WEKA FACHMEDIEN GmbH T +49 89 2 55 56-13 49 F +49 89 2 55 56-03 49 info@embedded-world.eu
Industrial AI & Machine Learning Resource Guide
2018
RESOURCE GUIDE PROFILE INDEX
APPLICATIONS: COMPUTER/MACHINE VISION Socionext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
APPLICATIONS: INDUSTRIAL AUTOMATION/CONTROL Acces I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 ADL Embedded Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Vector Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
APPLICATIONS: SECURITY ADL Embedded Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Microchip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
HARDWARE MODULES/SYSTEMS FOR MACHINE LEARNING Advantech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Congatec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Connect Tech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Intermas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Virtium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 WinSystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49, 50
NEURAL NETWORK PROCESSORS: IP/ACCELERATORS Socionext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
40
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning
Helping Customers Develop ASICs Right The First Time Socionext offers an ideal match of capabilities to meet customers’ needs with our state-of-the-art process technology, advanced packaging solutions, extensive and differentiated IPs, proven design methodologies, and a full, turn-key ecosystem. Socionext America, Inc. (SNA) is the US branch of Socionext Inc. headquartered in Sunnyvale, California. We are ranked as one of the world’s leading fabless ASIC suppliers and specialize in a wide range of standard and customizable SoC solutions in imaging, networking, computing and other dynamic applications. Socionext is a premier SoC supplier dedicated to providing our customers with quality semiconductor products backed by our best-in-class customer support.
Socionext America Inc socionextus.com/ai
Capabilities and Products ĄĄ
ASIC
• Range of Business Models – From turnkey development to COT – Range of process nodes to 7nm • Leading IPs – 56Gbps and 112Gbps SerDes – AI engine – ARM processors – DDR, HMB, HBM, GDDR memory controllers – SSD technology – ADC/DAC for optical communications – ADC/DAC for automotive applications such as LIDAR • Leaders in Factory automation • In-house packaging www.embedded-computing.com/ai-machine-learning/p374790
sna_inquiry@us.socionext.com https://twitter.com/SocionextUS
+1 (408) 737-5400 | Toll free +1 (844) 680-3453
www.linkedin.com/company/socionext-us
Applications: Industrial Automation/Control
mPCIe-ICM Family PCI Express Mini Cards The mPCIe-ICM Series isolated serial communication cards measure just 30 x 51 mm and feature a selection of 4 or 2 ports of isolated RS232/422/485 serial communications. 1.5kV isolation is provided port-to-computer and 500V isolation port-to-port on ALL signals at the I/O connectors. The mPCIe-ICM cards have been designed for use in harsh and rugged environments such as military and defense along with applications such as health and medical, point of sale systems, kiosk design, retail, hospitality, automation, and gaming. The RS232 ports provided by the card are 100% compatible with every other industry-standard serial COM device, supporting TX, RX, RTS, and CTS. The card provides ±15kV ESD protection on all signal pins to protect against costly damage to sensitive electronic devices due to electrostatic discharge. In addition, they provide Tru-Iso™ port-to-port and port-to-PC isolation. The serial ports on the device are accessed using a low-profile, latching, 5-pin Hirose connector. Optional breakout cables are available, and bring each port connection to a panel-mountable DB9-M with an industry compatible RS232 pin-out. The mPCIe-ICM cards were designed using type 16C950 UARTS and use 128-byte transmit/receive FIFO buffers to decrease CPU loading and protect against lost data in multitasking systems. New systems can continue to interface with legacy serial peripherals, yet benefit from the use of the high performance PCI Express bus. The cards are fully software compatible with current PCI 16550 type UART applications and allow for users to maintain backward compatibility.
ACCES I/O Products, Inc. www.accesio.com
www.embedded-computing.com/ai-machine-learning
FEATURES ĄĄ PCI Express Mini Card (mPCIe) type F1, with latching I/O connectors ĄĄ 4 or 2-port mPCIe RS232/422/485 serial communication cards ĄĄ Tru-Iso™ 1500V isolation port-to-computer and 500V isolation
port-to-port on ALL signals
ĄĄ High performance 16C950 class UARTs with 128-byte FIFO for each
TX and RX
ĄĄ Industrial operating temperature (-40°C to +85°C) and RoHS standard ĄĄ Supports data communication rates as high as 3Mbps – 12MHz with ĄĄ ĄĄ ĄĄ ĄĄ
custom crystal Custom baud rates easily configured ±15kV ESD protection on all signal pins 9-bit data mode fully supported Supports CTS and RTS handshaking
www.embedded-computing.com/ai-machine-learning/p372557
contactus@accesio.com linkedin.com/company/acces-i-o-products-inc.
858-550-9559 twitter.com/accesio
Industrial AI & Machine Learning RESOURCE GUIDE 2018
41
Industrial AI & Machine Learning Resource Guide
Applications: Computer/Machine Vision
Industrial AI & Machine Learning Resource Guide
Applications: Industrial Automation/Control
USB3-104-HUB – Rugged, Industrial Grade, 4-Port USB 3.1 Hub Designed for the harshest environments, this small industrial/military grade 4-port USB 3.1 hub features extended temperature operation (-40°C to +85°C), locking USB and power connections, and an industrial steel enclosure for shock and vibration mitigation. The OEM version (board only) is PC/104-sized and can easily be installed in new or existing PC/104-based systems as well. The USB3-104-HUB makes it easy to add USB-based I/O to your embedded system or to connect peripherals such as external hard drives, keyboards, GPS, wireless, and more. Real-world markets include Industrial Automation, Security, Embedded OEM, Laboratory, Kiosk, Military/Mission Critical, Government, and Transportation/Automotive. This versatile four-port hub can be bus powered or self (externally) powered. You may choose from two power inputs (power jack and terminal block) to provide a full 900mA source at 5V on each of the downstream ports. Additionally, a wide-input power option exists to accept from 7VDC to 28VDC. All type A and type B USB connections feature a locking, high-retention design.
ACCES I/O Products, Inc. www.accesio.com
FEATURES ĄĄ Rugged, industrialized, four-port USB 3.1 hub ĄĄ USB 3.1 Gen 1 with data transfers up to 5Gbps (USB 2.0 and 1.1 compatible) ĄĄ Extended temperature (-40°C to +85°C) for industrial/military grade applications ĄĄ Locking upstream, downstream, and power connectors prevent accidental disconnects ĄĄ SuperSpeed (5Gbps), Hi-speed (480Mbps), Full-speed (12Mbps), and Low-speed (1.5Mbps) transfers supported ĄĄ Supports bus-powered and self-powered modes, accessible via DC power input jack or screw terminals ĄĄ LED for power, and per-port RGB LEDs to indicate overcurrent fault, High-Speed, and SuperSpeed ĄĄ Wide input external power option accepts from 7-28VDC ĄĄ OEM version (board only) features PC/104 module size and mounting compatibility www.embedded-computing.com/ai-machine-learning/p374114
contactus@accesio.com
linkedin.com/company/acces-i-o-products-inc.
858-550-9559 twitter.com/accesio
Applications: Industrial Automation/Control
ADLE3800SEC Intel® E3800 Series Edge-Connect SBC Measuring just 75mm x 75mm, the ADLE3800SEC is an embedded SBC specially optimized for Size, Weight, and Power (SWAP) applications. Based on the E3800 series Intel Atom™ SoC, this tiny board delivers maximum performance in the smallest possible size. It features a quad-core processor with up to 2MB onboard cache, and an integrated Intel HD Graphics engine with support for DirectX 11, Open GL 4.0, and full HD video playback.
About EdgE-ConnECt ArChitECturE: Via the backside board-edge
connector, additional I/O is easily accessible using standard and customerspecific breakout boards. Easy expansion helps reduce cabling, integration time, and system size while increasing quality and overall MTBF. Easily connect to sensors, cameras, and storage with a full range of onboard I/O: 2x Gigabit LAN, 1x USB 3.0, 1x USB 2.0, 2x PCie, and SATA. The Intel HD Graphics engine supports video output in either HDMI or Display Port format. An onboard M.2 socket allows users to install the fastest Solid State storage solutions on the market. Extended Temperature Ratings and hard-mounted Edge-Connect design make the ADLE3800SEC ideal for industrial embedded applications.
AppliCAtions: UUAV, UUV Unmanned Systems, Industrial Control
Systems, Government and Defense, Video Surveillance, Small Scale Robotics, Remote Datalogging, Man-Wearable Computing.
ADL Embedded Solutions, Inc. www.adl-usa.com
42
Industrial AI & Machine Learning RESOURCE GUIDE 2018
FEATURES Small Size (75mm x 75mm) ĄĄ 4GB soldered DRAM (DDR3-1333 MHz) ĄĄ Low-power Atom® processor (8W TDP) ĄĄ Quad-Core/Dual-Core Versions Available ĄĄ M.2 Storage Socket Onboard ĄĄ Expansion Connector ĄĄ Extended Temperature Available ĄĄ
www.embedded-computing.com/ai-machine-learning/p374785
sales@adl-usa.com 858-490-0597 x115 www.linkedin.com/company/adl-embedded-solutions
@ADLEmbedded
www.embedded-computing.com/ai-machine-learning
A FINE TECHNOLOGY GROUP
cPCI, PXI, VME, Custom Packaging Solutions VME and VME64x, CompactPCI, or PXI chassis are available in many configurations from 1U to 12U, 2 to 21 slots, with many power options up to 1,200 watts. Dual hot-swap is available in AC or DC versions. We have in-house design, manufacturing capabilities, and in-process controls. All Vector chassis and backplanes are manufactured in the USA and are available with custom modifications and the shortest lead times in the industry. Series 2370 chassis offer the lowest profile per slot. Cards are inserted horizontally from the front, and 80mm rear I/O backplane slot configuration is also available. Chassis are available from 1U, 2 slots up to 7U, 12 slots for VME, CompactPCI, or PXI. All chassis are IEEE 1101.10/11 compliant with hot-swap, plug-in AC or DC power options.
FEATURES ĄĄ
Made in the USA
Our Series 400 enclosures feature side-filtered air intake and rear exhaust for up to 21 vertical cards. Options include hot-swap, plug-in AC or DC power, and system voltage/temperature monitor. Embedded power supplies are available up to 1,200 watts.
ĄĄ
Most rack accessories ship from stock
Series 790 is MIL-STD-461D/E compliant and certified, economical, and lighter weight than most enclosures available today. It is available in 3U, 4U, and 5U models up to 7 horizontal slots.
ĄĄ
Card sizes from 3U x 160mm to 9U x 400mm
ĄĄ
System monitoring option (CMM)
All Vector chassis are available for custom modification in the shortest time frame. Many factory paint colors are available and can be specified with Federal Standard or RAL numbers.
ĄĄ
AC or DC power input
ĄĄ
Power options up to 1,200 watts
ĄĄ
Modified ‘standards’ and customization are our specialty
For more detailed product information,
VISIT OUR NEW WEBSITE!
please visit www.vectorelect.com
WWW.VECTORELECT.COM
or call 1-800-423-5659 and discuss your application with a Vector representative.
Made in the USA Since 1947
www.embedded-computing.com/ai-machine-learning/p371649
Vector Electronics & Technology, Inc. www.vectorelect.com
www.embedded-computing.com/ai-machine-learning
inquire@vectorelect.com 800-423-5659
Industrial AI & Machine Learning RESOURCE GUIDE 2018
43
Industrial AI & Machine Learning Resource Guide
Applications: Industrial Automation/Control
Industrial AI & Machine Learning Resource Guide
Applications: Security
ATECC608A CryptoAuthentication™ Device
FEATURES
Combined with the Google Cloud IoT Core service, the ATECC608A CryptoAuthentication™ device provides secure and trusted storage for the root of trust. The IoT hardware private key used for the authentication to Google Cloud Platform is protected in the ATECC608A against side channel attacks and physical tampering. In addition, the ATECC608A offers secure storage for firmware updates and secure boot credentials, enhancing current IoT hardware designs.
ĄĄ
ĄĄ
Protected storage for up to 16 keys, certificates or data
ĄĄ
ECDH: FIPS SP800-56A Elliptic Curve Diffie-Hellman
ĄĄ
NIST standard P256 elliptic curve support
ĄĄ
Leverage 20 years of in-manufacturing security expertise by choosing Microchip. Provisioning happens at Microchip’s secure facilities using Hardware Secure Module (HSM) networks in the ATECC608A. During production, the ATECC608A will generate the private keys inside the device within Microchip factories, avoiding any exposure in the IoT device life cycle.
Cryptographic co-processor with secure hardwarebased key storage
ĄĄ
SHA-256 & HMAC hash including off-chip context save/restore AES-128: encrypt/decrypt, galois field multiply for GCM
www.embedded-computing.com/ai-machine-learning/p374761
Microchip Technology Inc.
www.microchip.com/ATECC608aGCPiotCore
44
Industrial AI & Machine Learning RESOURCE GUIDE 2018
Contact Microchip: www.microchip.com/distributors/SalesHome.aspx 480-792-7200 @MicrochipTech www.linkedin.com/company/microchip-technology
www.embedded-computing.com/ai-machine-learning
SKY-6100 1U Rackmount Dual Intel® Xeon® Scalable GPU server, Supporting 1 x PCIe x16 double-deck FH/FL card or 5 x PCIe x16 single-deck cards Advantech’s industrial 1U to 4U GPU server solutions (SKY-Series) feature the latest in multi-core computing technology. By offloading CPU to the GPU, the GPU's massively parallel architecture can be leveraged to perform multiple tasks simultaneously. These servers deliver high-performance computing and are ideal for performance-intensive applications that involve visualization, parallelization/acceleration, and virtualization computing. Our industrial GPU server series solutions also accelerate visual computing in applications such as automated optical inspection (AOI), surveillance, video transcoding, cloud gaming, and medical imaging.
Advantech Corp.
www.advantech.com
FEATURES ĄĄ Processor: Dual Intel Xeon Scalable Processor ĄĄ Memory: DDR4 2666 MHz ECC-REG type up to 512 GB ĄĄ Remote Management: IPMI function support ĄĄ Expansion: Supporting 1 x PCIe x16 double-deck FH/FL card +
1 x PCIe x16 single-deck FH/HL card or 5 x PCIe x16 single-deck HH/HL cards.
ĄĄ PSU: 1200W 1+1 redundant power supply with 80 PLUS
Platinum level certification.
www.embedded-computing.com/ai-machine-learning/p374735
skyserver@advantech.com
www.linkedin.com/company/advantech/
1-949-420-2500
Applications: Security
ADLEPC-1520 Compact, Modular, Fanless Industrial PC The ADLEPC-1520 is a rugged, compact industrial-grade chassis constructed from 6063 aluminum, with thick-walled design and a fanless, conduction-cooled CPU for wider temperature operation. It features a durable anodized finish, flush-mounted screws, and flexible mounting options. At only 2.3" x 3.4" x 3.7", it is ideal for a variety of industrial applications and environments – whether on the factory floor or in the external environments.
Module I/o expansIon: The heart of the ADLEPC-1520 is a compact
(75mm x 75mm) Intel E3800-series Atom CPU (ADL75S-E38XX) with a stackable expansion connector using the Edge-Connect form factor. This results in a very small 3.4" x 3.7" overall footprint, while providing application-specific features. The expansion connector features a number of interfaces including 2xPCIe, 2x USB, 1x SATA, SM-Bus, and 1x DisplayPort with standard expansion boards available.
applIcatIons: Industrial IoT (IIoT) network and cloud computing, Cyber
security edge devices for networks, ICS and SCADA threat security, Secure networking (routing, traffic monitoring and gateways), intelligent machinery and equipment controllers, unmanned or autonomous vehicle mission/ payload computing, oil and gas IPC.
ADL Embedded Solutions, Inc. www.adl-usa.com
www.embedded-computing.com/ai-machine-learning
FEATURES Compact footprint: 2.3" x 3.4" x 3.7" ĄĄ Intel® E3800-Series Atom processors ĄĄ Stackable extension connector ĄĄ Modular I/O expansion and fanless design ĄĄ OperatingTemperature: -20C to +60C (option extended temperature available) ĄĄ Edge-Connect form factor ĄĄ Microsoft Azure Certified for IoT ĄĄ
www.embedded-computing.com/ai-machine-learning/p374741
sales@adl-usa.com 858-490-0597 x115 www.linkedin.com/company/adl-embedded-solutions
@ADLEmbedded
Industrial AI & Machine Learning RESOURCE GUIDE 2018
45
Industrial AI & Machine Learning Resource Guide
Hardware Modules/Systems for Machine Learning
Industrial AI & Machine Learning Resource Guide
Hardware Modules/Systems for Machine Learning
Server-on-Modules With the launch of the COM Express Type 7 specification, the PICMG has defined a highly flexible new module standard characterized by high-speed network connectivity with up to four 10 GbE interfaces as well as an increased number of up to 32 PCIe lanes for customization. This is a perfect basis for bringing the embedded server-class Intel® Xeon® D SoC as well as the new Intel® Atom™ processors (code name Denverton) to the industrial fields. Developers with high-performance demands for industrial automation, storage and networking applications, modular server and base station designs for telecom carriers, as well as cloud, edge and fog servers for IoT and Industry 4.0 applications are best served with modules based on the Intel Xeon D1500 processor family, such as the conga-B7XD COM Express Type 7 Server-on-Modules from congatec. They are available with ten different server processors soldered on the module for highest robustness, ranging from the Intel® Xeon® processor D1577 to the Intel® Pentium® processor D1519 for the industrial temperature range (-40°C to +85°C). These modules offer up to 16 cores for 32 threads and a maximum turbo frequency of up to 2.70 GHz, delivered in a low thermal envelope of only 45 Watt thermal design power (TDP) and below. For applications that are power restricted and/or do not need the high performance per core that the Intel Xeon D1500 processors provide, the new conga-B7AC modules with Intel® Atom™ C3000 processors raise the bar for embedded edge computing through 10 GbE bandwidth support. With a power consumption of only 11 to 31 Watt TDP, the new low-power multi-core Server-on-Modules feature up to 16 cores. Compared to the Intel Xeon modules, they do not support hyper threading or turbo boost. Both new congatec COM Express Type 7 Server-on-Modules impress by a full range of server features on a very small form factor including multiple 10 Gigabit Ethernet interfaces, 32 PCIe lanes and up to 48 gigabytes of DDR4 ECC RAM. The long-term available Server-on-Modules come application-ready, offering a standardized footprint, carrier board interfaces and a cooling concept, which significantly simplifies system designs – accelerating the launch of new, robust server technology. Future performance upgrades are remarkably simple and cost-efficient, as only the Server-on-Module needs to be exchanged for new processor architecture. Product Link: www.congatec.com/us/products/com-express-type7.html
congatec
www.congatec.us
46
FEATURES ĄĄ
ĄĄ
ĄĄ
ĄĄ
ĄĄ
ĄĄ
ĄĄ
ĄĄ
High scalability from 16 Core Intel® Xeon® processor technology with 45 W TDP to low-power quad core Intel® Atom™ processors with a TDP as low as 11.5 W All Server-on-Modules support the commercial temperature (0°C to 60°C) range. Selected SKUS even offer support for the industrial temperature range (-40 °C to +85 °C) conga-B7AC with Intel Atom technology (code named Denverton) offers 4x 10 Gigabit Ethernet ports, conga-B7XD with Intel Xeon technology support 2x 10 GbE Supporting up to 48 gigabytes of fast and energy efficient 2400DDR4 (ECC or Non ECC) NC-SI Network Controller Sideband Interface support to connect a Baseboard Management Controller allowing out-of-band remote manageability Up to 32 PCIe lanes for flexible server extensions such as NVMe flash storage and/or GPGPUs Comprehensive set of standard interfaces with 2x SATA Gen3 (6 Gbs), 6x USB 3.0/2.0, LPC, SPI, I2C Bus and 2x legacy UART OS support for Linux and Microsoft Windows variants
www.embedded-computing.com/ai-machine-learning/p374527
sales-us@congatec.com www.linkedin.com/company/congatec-ag
Industrial AI & Machine Learning RESOURCE GUIDE 2018
858-457-2600 twitter.com/congatecAG
www.embedded-computing.com/ai-machine-learning
V7G GPU System from Connect Tech The V7G GPU System from Connect Tech combines Intel® Xeon® D (Server Class) and Intel® Atom™ C3000 x86 processors with high-end NVIDIA® Quadro® and Tesla® Graphics Processing Units (GPU) all into a small form factor embedded system. Choose from highest-end, highest-performance models or from low-powered models all ideal for high-end encode/decode video applications or GPGPU CUDA® processing, Deep Learning and Artificial Intelligence applications. This embedded system exposes all of the latest generation interconnect including: 10GbE and Gigabit Ethernet, USB 3.0 and 2.0, HDMI, SATA III, GPIO, I2C, M.2, miniPCIe. The black aluminum enclosure has two mounting options: Half-rack rail mount or Standalone mounting brackets. https://bit.ly/2P37VdY
FEATURES ĄĄ ĄĄ
High-End GPUs with Intel® Xeon® D Server Class and Intel® Atom™ C3000 x86 Processors 4 independent display outputs or GPGPU processing system using CUDA® cores
ĄĄ
Black Aluminum Enclosure
ĄĄ
Mounting Options: Half-rack & Standalone
ĄĄ
Outer Dimensions without mounting rails installed: 228.6mm x 88.9mm x 181.0mm www.embedded-computing.com/ai-machine-learning/p374724
NVIDIA® Jetson TX2/TX2i/TX1 Solutions Connect Tech is the largest NVIDIA® Jetson™ Ecosystem partner, providing small form factor solutions for the Jetson TX2, TX2i or TX1. We are solving real world applications for deep learning at the edge. Connect Tech’s line of NVIDIA Jetson-integrated embedded systems are ideal for Machine Vision and deep learning applications. The Cogswell Vision System allows up to 5 Gigabit Ethernet cameras to be connected, 4 of which can be powered by on board Power over Ethernet. Rudi Embedded System holds a lot of power in a small package and is pre-integrated with the NVIDIA Jetson TX2, TX2i or TX1, ideal for deployable computer vision and deep learning applications. Finally, Rosie Embedded System is pre-integrated with NVIDIA Jetson TX2, TX2i or TX1. Housed in a rugged compact enclosure with optional mounting brackets, Rosie is designed to MIL-STD 810g and DO-160G for shock and vibration. https://bit.ly/2MuxRlZ
FEATURES ĄĄ ĄĄ
Small form factor embedded systems pre-integrated with TX2, TX2i or TX1 Featuring revolutionary NVIDIA Pascal™ or Maxwell™ architecture with 256 CUDA cores delivering over 1 TeraFLOPs of performance with a 64-bit ARM A57 CPU
ĄĄ
Fanless and cable free
ĄĄ
Ideal for machine vision and deep learning applications www.embedded-computing.com/ai-machine-learning/p374752
Connect Tech Inc. connecttech.com
www.embedded-computing.com/ai-machine-learning
sales@connecttech.com
www.linkedin.com/company/connect-tech-inc
1-800-426-8979 @ConnectTechInc
Industrial AI & Machine Learning RESOURCE GUIDE 2018
47
Industrial AI & Machine Learning Resource Guide
Hardware Modules/Systems for Machine Learning
Industrial AI & Machine Learning Resource Guide
Hardware Modules/Systems for Machine Learning
®
Solid State Storage and Memory
Solid State Storage and Memory for Industrial IoT Virtium manufactures solid state storage and memory for the world’s top industrial embedded OEM customers. Our mission is to develop the most reliable storage and memory solutions with the greatest performance, consistency and longest product availability.
SSD Storage Includes: M.2, 2.5", 1.8", Slim SATA, mSATA, CFast, eUSB, Key, PATA CF and SD. Classes include: MLC (1X), iMLC (7X) and SLC (30X) – where X = number of entire drive-writes-per-day for the 3/5-year warranty period.
Industry Solutions include: Communications, Networking, Energy, Transportation, Industrial Automation, Medical, Smart Cities and Video/Signage.
All SSD’s include Virtium‘s Intelligent Storage Platform – which features:
Features • Broad product portfolio from latest technology to legacy designs • Twenty years refined U.S. production and 100% testing • A+ quality – backed by verified yield, on-time delivery and field-defects-per-million reports • Extreme durability, iTemp -40º to 85º C • Intelligent, secure IIoT edge storage solutions • Longest product life cycles with cross-reference support for end-of-life competitive products • Leading innovator in small-form-factor, high-capacity, high-density, high-reliability designs • Worldwide Sales, FAE support and industry distribution
vtView®: Monitor/maintain SSDs, estimate SSD life, predict maintenance, over-the-air updates, make SSD quals faster and easier – includes open source API. vtGuard®: Power-loss protection, power management and I-Temp support. vtSecure™: Military-grade secure erase, optional TCG Opal 2.0 for SATA and PCIe, and optional keypad or other authentication for external USB devices. vtEdge™: Data filtering, pruning, and analytics functions, road-map for PCIe, USB and Ethernet, and data storage optimization. Memory Products Include: All DDR, DIMM, SODIMM, Mini-DIMM, Standard and VLP/ULP. Features server-grade, monolithic components, best-in-class designs, and conformal coating/under-filled heat sink options.
www.embedded-computing.com/ai-machine-learning/p374789
Virtium
www.virtium.com
48
Industrial AI & Machine Learning RESOURCE GUIDE 2018
sales@virtium.com www.linkedin.com/company/virtium
949-888-2444 @virtium
www.embedded-computing.com/ai-machine-learning
Intermas – Subrack FLEXIBLE Intermas develops, manufactures, and markets components and modules for the packaging of electronics: Cabinets, housings, subracks, cassettes, and an extensive range of accessories for the 19" rack systems. The electronic enclosure systems are used in the fields of PCI, VME/VME64x, cPCI, IEEE, and communication applications with state-of-the-art EMI- and RFI-shielded protection. Intermas offers wiring connectors and cable interface housings in accordance with IEC 60 603-2/DIN 41 612, bus bars, 19" cross flow fans, power supplies, and euroboard covers. Intermas has an extensive product range of more than 10,000 separate components and more than 30 years of experience.
Go to
www.Intermas-US.com for our new catalog.
Intermas US LLC
www.Intermas-US.com
FEATURES ĄĄ 19" subracks and housings with flexible internal layout in various
3U and 6U sizes
ĄĄ EMI- and RFI-shielded protection using stable stainless steel
contact springs, ensuring permanent and reliable bonding
ĄĄ CompactPCI modules with integrated bus board and power supply ĄĄ InterRail® product line to meet tough physical demands and
vibration-proof for railway engineering, traffic engineering, and power station engineering ĄĄ Connectors and wiring accessories ĄĄ Customizations available www.embedded-computing.com/ai-machine-learning/p369515
intermas@intermas-us.com 800-811-0236
Hardware Modules/Systems for Machine Learning
ITX-P-3800 Pico-ITX Intel® E3800 Single Board Computer with Dual Ethernet WinSystems’ ITX-P-3800 series packs an impressive feature set into a small form factor Pico-ITX design. The Intel E3800 processor family delivers robust CPU and graphics performance. This SBC is a perfect choice for applications requiring low power and Intel performance in a small form factor package. The ITX-P-3800 series is packed with I/O features often lacking from larger SBCs, including dual 10/100/1000 Ethernet controllers based on the Intel i211 family with Wake-On-Lan and PXE capabilities, 4x USB 2.0, and 1x USB 3.1 Gen 1 enhanced host ports, and 4x RS/232 serials ports. Expansion options include 1x full-size and 1x half-size PCIe Mini Card slots along with the 5x USB ports. The fullsize PCIe Mini-Card slot supports PCIe x1, mSATA and USB interfaces while the half-size supports mSATA and USB interfaces. The ITX-P-3800 is a very compact, PC-compatible SBC and a perfect fit for applications in UAV, energy, medical diagnostics, and industrial control.
We Specialize in Customized Embedded Solutions
WinSystems, Inc.
www.winsystems.com www.embedded-computing.com/ai-machine-learning
FEATURES ĄĄ Pico-ITX Form Factor (102 x 73 mm)
ĄĄ Intel Atom™ (formerly Bay Trail-I) E3800 Series Processor
(Dual or Quad Core)
ĄĄ Up to 4 GB DDR3L Onboard System RAM
ĄĄ Dual Gigabit Ethernet • 4x USB 2.0 • 1x USB 3.1 Gen 1 ĄĄ 4x Serial Ports • 4x Digital Inputs • 4x Digital Outputs ĄĄ Audio with amplifier
ĄĄ -20°C to +70°C Operating Temperature Range ĄĄ Intel Low Power Gen7 Graphics Engine ĄĄ Full-HD and 3D Graphics acceleration
ĄĄ VGA and Dual Channel LVDS/eDP Outputs www.embedded-computing.com/ai-machine-learning/374715
info@winsystems.com www.linkedin.com/company/winsystems-inc-/
+1 817.274.7553
Industrial AI & Machine Learning RESOURCE GUIDE 2018
49
Industrial AI & Machine Learning Resource Guide
Hardware Modules/Systems for Machine Learning
Industrial AI & Machine Learning Resource Guide
Hardware Modules/Systems for Machine Learning
ITX-P-3800 Pico-ITX Intel® E3800 Single Board Computer with Dual Ethernet WinSystems’ ITX-P-3800 series packs an impressive feature set into a small form factor Pico-ITX design. The Intel E3800 processor family delivers robust CPU and graphics performance. This SBC is a perfect choice for applications requiring low power and Intel performance in a small form factor package. The ITX-P-3800 series is packed with I/O features often lacking from larger SBCs, including dual 10/100/1000 Ethernet controllers based on the Intel i211 family with Wake-On-Lan and PXE capabilities, 4x USB 2.0, and 1x USB 3.1 Gen 1 enhanced host ports, and 4x RS/232 serials ports. Expansion options include 1x full-size and 1x half-size PCIe Mini Card slots along with the 5x USB ports. The fullsize PCIe Mini-Card slot supports PCIe x1, mSATA and USB interfaces while the half-size supports mSATA and USB interfaces. The ITX-P-3800 is a very compact, PC-compatible SBC and a perfect fit for applications in UAV, energy, medical diagnostics, and industrial control.
FEATURES ĄĄ Pico-ITX Form Factor (102 x 73 mm)
ĄĄ Intel Atom™ (formerly Bay Trail-I) E3800 Series Processor
(Dual or Quad Core)
ĄĄ Up to 4 GB DDR3L Onboard System RAM
ĄĄ Dual Gigabit Ethernet • 4x USB 2.0 • 1x USB 3.1 Gen 1 ĄĄ 4x Serial Ports • 4x Digital Inputs • 4x Digital Outputs ĄĄ Audio with amplifier
ĄĄ -20°C to +70°C Operating Temperature Range ĄĄ Intel Low Power Gen7 Graphics Engine ĄĄ Full-HD and 3D Graphics acceleration
ĄĄ VGA and Dual Channel LVDS/eDP Outputs
We Specialize in Customized Embedded Solutions
WinSystems, Inc.
www.embedded-computing.com/ai-machine-learning/374715
info@winsystems.com
www.linkedin.com/company/winsystems-inc-/
www.winsystems.com
+1 817.274.7553
Neural Network Processors: IP/Accelerators
Helping Customers Develop ASICs Right The First Time Socionext offers an ideal match of capabilities to meet customers’ needs with our state-of-the-art process technology, advanced packaging solutions, extensive and differentiated IPs, proven design methodologies, and a full, turn-key ecosystem. Socionext America, Inc. (SNA) is the US branch of Socionext Inc. headquartered in Sunnyvale, California. We are ranked as one of the world’s leading fabless ASIC suppliers and specialize in a wide range of standard and customizable SoC solutions in imaging, networking, computing and other dynamic applications. Socionext is a premier SoC supplier dedicated to providing our customers with quality semiconductor products backed by our best-in-class customer support.
Socionext America Inc socionextus.com/ai
50
Capabilities and Products ĄĄ
ASIC
• Range of Business Models – From turnkey development to COT – Range of process nodes to 7nm • Leading IPs – 56Gbps and 112Gbps SerDes – AI engine – ARM processors – DDR, HMB, HBM, GDDR memory controllers – SSD technology – ADC/DAC for optical communications – ADC/DAC for automotive applications such as LIDAR • Leaders in Factory automation • In-house packaging
sna_inquiry@us.socionext.com https://twitter.com/SocionextUS
Industrial AI & Machine Learning RESOURCE GUIDE 2018
www.embedded-computing.com/ai-machine-learning/p374790
+1 (408) 737-5400 | Toll free +1 (844) 680-3453 www.linkedin.com/company/socionext-us
www.embedded-computing.com/ai-machine-learning
Connecting Global Competence
November 13–16, 2018
Connecting everything – smart, safe & secure Trade fair • 17 halls • Full range of technologies,
products and solutions
Conferences & forums • 4 conferences • 16 forums • New TechTalk for engineers
and developers
Talent meets Industry • electronica Experience
with live demonstrations • e-ffwd: the start-up platform
powered by Elektor • electronica Careers
co-located event
ACCELERATING THE EVOLUTION OF CRITICAL INFRASTRUCTURE FROM AUTOMATED TO AUTONOMOUS For nearly 40 years, Wind River software has enabled digital transformation across critical infrastructure sectors. Learn how we’re helping a wide range of industries accelerate their evolution from automated to autonomous systems and ensuring the softwaredefined world of the future is a safe, secure reality. www.windriver.com/automated-to-autonomous www.windriver.com
Balance is Ever ything
We make superior solid state storage and memory for industrial IoT ecosystems, with the optimum balance of quality, data integrity and cost-efficiency. •
Twenty years refined U.S. production and 100% testing - unlike offshore competition
•
A+ quality: 98.8% yield, 99.7% on-time delivery and 86 field-defects-per-million*
•
Extreme durability, longer life-cycles and intelligent, secure edge solutions
Visit our website to learn more and let’s keep the balance - together.
Familiar Done Differently ®
Solid State Storage and Memory
*QA marks averaged through entire year of 2017. Copyright 2018, Virtium LLC. Top image copyright: 123RF/Orla
www.virtium.com