AI COMPILERS AND THE RACE TO THE BOTTOM PG 3
WWW.EMBEDDED-COMPUTING.COM/MACHINE-LEARNING
2020 | VOLUME 1 | NUMBER 1
SMART APPLICATION OF AI AND ML IN DATA ANALYSIS PG 5
2020 RESOURCE GUIDE PG 26
for Embedded AI STARTS ON PG 5
Development Kit Selector
http://embedded-computing.com/designs/iot_dev_kits/
AD LIST PAGE ADVERTISER 8
ACCES I/O Products, Inc. – PCI Express mini card/mPCIe embedded I/O solutions
1
Digi-Key Corporation – Development Kit Selector
18
Lauterbach, Inc. – Debugger for RH850 from the automotive specialists
19
Lauterbach, Inc. – Multicore Debugging and Real-Time Trace for Arm Cortex-A/-R/-M
23
Tadiran – IIoT devices run longer on Tadiran batteries
13
Technologic Systems – TS-7100, NXP i.MX 6UL 696 MHz ARM CPU with FPU
32
Vector Electronics – VME/VXS/CPCI Chassis, Backplanes & Accessories
EMBEDDED COMPUTING BRAND DIRECTOR Rich Nass rich.nass@opensysmedia.com EDITOR-IN-CHIEF Brandon Lewis brandon.lewis@opensysmedia.com ASSOCIATE EDITOR Perry Cohen perry.cohen@opensysmedia.com ASSISTANT EDITOR Tiera Oliver tiera.oliver@opensysmedia.com TECHNOLOGY EDITOR Curt Schwaderer curt.schwaderer@opensysmedia.com ONLINE EVENTS MANAGER Josh Steiger josh.steiger@opensysmedia.com MARKETING COORDINATOR Katelyn Albani katelyn.albani@opensysmedia.com
CREATIVE DIRECTOR Stephanie Sweet stephanie.sweet@opensysmedia.com
SENIOR WEB DEVELOPER Aaron Ganschow aaron.ganschow@opensysmedia.com
WEB DEVELOPER Paul Nelson paul.nelson@opensysmedia.com CONTRIBUTING DESIGNER Joann Toth joann.toth@opensysmedia.com EMAIL MARKETING SPECIALIST Drew Kaufman drew.kaufman@opensysmedia.com
SALES/MARKETING
DIRECTOR OF SALES AND MARKETING Tom Varcie tom.varcie@opensysmedia.com (734) 748-9660
MARKETING MANAGER Eric Henry eric.henry@opensysmedia.com (541) 760-5361
PROFILES
STRATEGIC ACCOUNT MANAGER Rebecca Barker rebecca.barker@opensysmedia.com (281) 724-8021
APPLICATIONS: INDUSTRIAL AUTOMATION/CONTROL
STRATEGIC ACCOUNT MANAGER Kathleen Wackowski kathleen.wackowski@opensysmedia.com (978) 888-7367
26 27 27 28
SOUTHERN CAL REGIONAL SALES MANAGER Len Pettek len.pettek@opensysmedia.com (805) 231-9582
ACCES I/O Products, Inc. ADL Embedded Solutions Advantech Embedded Group Vector Elect
APPLICATIONS: MEDICAL 29
SMART Embedded Computing
EDGE AI 29 Eurotech
HARDWARE MODULES/SYSTEMS FOR MACHINE LEARNING
STRATEGIC ACCOUNT MANAGER Bill Barron bill.barron@opensysmedia.com (516) 376-9838
ASSISTANT DIRECTOR OF PRODUCT MARKETING/SALES Barbara Quinlan barbara.quinlan@opensysmedia.com (480) 236-8818 STRATEGIC ACCOUNT MANAGER Glen Sundin glen.sundin@opensysmedia.com (973) 723-9672
INSIDE SALES Amy Russell amy.russell@opensysmedia.com
TAIWAN SALES ACCOUNT MANAGER Patty Wu patty.wu@opensysmedia.com
CHINA SALES ACCOUNT MANAGER Judy Wang judywang2000@vip.126.com
EUROPEAN MARKETING SPECIALIST Steven Jameson steven.jameson@opensysmedia.com +44 (0)7708976338
30 congatec 31 Virtium LLC WWW.OPENSYSMEDIA.COM
SOCIAL
Facebook.com/Embedded.Computing.Design
@Embedded_ai
LinkedIn.com/in/EmbeddedComputing
youtube.com/user/VideoOpenSystems
2
PRESIDENT Patrick Hopper patrick.hopper@opensysmedia.com EXECUTIVE VICE PRESIDENT John McHale john.mchale@opensysmedia.com EXECUTIVE VICE PRESIDENT Rich Nass rich.nass@opensysmedia.com GROUP EDITORIAL DIRECTOR John McHale john.mchale@opensysmedia.com VITA EDITORIAL DIRECTOR Jerry Gipper jerry.gipper@opensysmedia.com ASSOCIATE EDITOR Emma Helfrich emma.helfrich@opensysmedia.com SENIOR EDITOR Sally Cole sally.cole@opensysmedia.com CREATIVE PROJECTS Chris Rassiccia chris.rassiccia@opensysmedia.com PROJECT MANAGER Kristine Jennings kristine.jennings@opensysmedia.com FINANCIAL ASSISTANT Emily Verhoeks emily.verhoeks@opensysmedia.com FINANCE Rosemary Kristoff rosemary.kristoff@opensysmedia.com SUBSCRIPTION MANAGER subscriptions@opensysmedia.com CORPORATE OFFICE 1505 N. Hayden Rd. #105 • Scottsdale, AZ 85257 • Tel: (480) 967-5581 REPRINTS
WRIGHT’S MEDIA REPRINT COORDINATOR Wyndell Hamilton whamilton@wrightsmedia.com (281) 419-5725
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
AI Compilers and the Race to the Bottom By Brandon Lewis, Editor-in-Chief
Brandon.Lewis@opensysmedia.com
Creating intelligence requires a lot of data. And all of that data needs technologies that can support it.
Comparing RAM and Flash Usage
In the case of artificial intelligence (AI), these technologies include large amounts of direct-access, high-speed memory; parallel computing architectures that are capable of processing different parts of the same dataset simultaneously; and, somewhat surprisingly, lower-precision computing than many other applications. An almost endless supply of this technology mix is available in the data center. AI development tools were therefore designed for the data center infrastructure behind applications like internet queries, voice search, and online facial recognition. But as AI technology advances, so does the desire to leverage it in all sorts of use cases – including those that run on small, resource-constrained, MCU-based platforms at the edge. So instead of focusing solely on high-end hardware accelerators running cloud-based recommendation systems, for example, tools like compilers must also be able to optimize AI data and algorithms for smaller footprint devices. Facebook’s open-source machine learning compiler, Glow, is an example of this tooling evolution. It “lowers” neural network graphs using a two-phase intermediate representation (IR), which generates machine code that is specially-tuned to the features and memory of a variety of embedded and server-class hardware targets. It also performs ahead-of-time (AOT) compilation, which minimizes runtime overhead to save disk space, memory, startup times, and so on. “We have this really high-performance runtime, but a lot of projects don’t care because they aren’t in the data center,” explained Jordan Fix, a research scientist at Facebook. “They need to do AOT compilation, shrink as much as they can, use quantization and parallelization, and not have a lot of dependencies. “AOT compilation isn’t as important in the data center, but we can hook LLVM back ends into Glow and target x86, Arm, RISC-V, and specialized architectures,” Fix continued. “The way Glow works is you have a couple levels of IR that use high-level optimizations and quantizations to limit memory. At that point the compiler back end can accept the instruction-based IR and optimize and compile it down however it wants.” AI Compilers’ Race to the Bottom Of course, Glow is not the only neural network compiler available. Google’s Multi-Level Intermediate Representation (MLIR) is a compiler infrastructure that focuses on tensor processors www.embedded-computing.com/machine-learning
1500 1000 500 0
i.MX RT1060 i.MX RT685 w/ HiFi4 i.MX RT1060 TF Glow + CMSIS-NN Lite RAM (kbytes)
FIGURE 1
Flash (kbytes)
Glow’s ahead-of-time (AOT) compiler delivers massive RAM and Flash memory savings com-pared to just-intime (JIT) compilers like that of TensorFlow Lite.
and has been absorbed by LLVM. Microsoft’s Embedded Learning Library (ELL) is another cross-compiling toolchain for resource-constrained AI devices. However, Glow is more mature than either, having been open sourced in 2018. It’s also more performant than many existing AI compiler options. In performance tests on their recently released i.MX crossover MCUs, NXP systems engineers compiled 32 x 32 CIFAR-10 datasets using TensorFlow Lite and Glow and fed them into RT1060, RT1170, and RT685 devices. Glow-compiled inputs exhibited at least a 3x frames/second performance improvement, while Figure 1 gives you an idea of just how efficient AOT compilation is compared to the just-in-time (JIT) compilation used in the TensorFlow/TensorFlow Lite frameworks. The AI technology market is changing rapidly, which makes it difficult for development organizations to commit to any technology. This may be one of the most compelling aspects of Glow, and it isn’t even directly related to technology. As an open source project with more than 130 active contributors, large organizations like Facebook, Intel, and others continue making commits to the Glow mainline because they now depend on its common infrastructure for access to instructions, operators, kernels, etc. And then, obviously, there’s the inherent value of open source. “We regularly see contributions from external users that we care about like a more generic parallelization framework, and we have a lot of machine learning models that they are running,” Fix said. “So maybe it allows them to get support for operators without us having to do anything. ‘I think you were working on this specific computer vision model’ or, ‘I think this was an operator that you were talking about.’ They just review it and port it and land it. “We can all benefit from each other’s work in a traditional open source framework,” he added. Embedded AI & Machine Learning RESOURCE GUIDE 2020
3
CONTENTS FEATURES 5
Smart Application of AI and ML in Data Analysis
2020 | Volume 1 | Number 1
6
opsy.st/ECDLinkedIn
By Maria Thomas, GreyCampus
6
IC Design: Trajectories from the Past and into the Future
COVER
By Asem Elshimi, Silicon Labs
10
Artificial intelligence (AI) has become viable in even small, resource-constrained embedded and IoT systems. The 2020 Embedded AI & Machine Learning Resource Guide highlights how the technology paradigm is being put to use across the engineering spectrum, from IC design to industrial predictive analytics and maintenance to voice-enabled device testing. The issue also reveals enabling products and solutions, beginning on page 26.
Architecture Exploration of AI/ML Applications and Processors By Deepak Shankar, Mirabilis Design, Inc.
14 20
Embedded AI Algorithms – Going from Big Data to Smart Data By Dzianis Lukashevich, Analog Devices
Open Standards for Accelerating Embedded Vision and Inferencing: An Industry Overview By Neil Trevett, The Khronos Group
24
@Embedded_ai
A Step by Step Guide to Voice-Enabled Device Testing By Keyur Shah & Dhaval Patel, eInfochips
26 2020 RESOURCE GUIDE 10
WEB EXTRAS Secure Flash for Machine Learning on Edge Devices By Zhi Feng, Cypress https://bit.ly/3oiyeiB
Minimizing Algorithm Footprint and Training at the AI Network Edge By Yasser Khan, ONE Tech Inc. https://bit.ly/37C4Z4f
20
Published by:
2020 OpenSystems Media® © 2020 Embedded Computing Design © 2020 Embedded AI and Machine Learning All registered brands and trademarks within Embedded Computing Design and Embedded AI and Machine Learning magazines are the property of their respective owners.
COLUMNS 3 4
AI Compilers and the Race to the Bottom By Brandon Lewis, Editor-in-Chief
Embedded AI & Machine Learning RESOURCE GUIDE 2020
enviroink.indd 1
10/1/08 10:44:38 AM
To unsubscribe, email your name, address, and subscription number as it appears on the label to: subscriptions@opensysmedia.com www.embedded-computing.com/machine-learning
APPLYING AI
Smart Application of AI and ML in Data Analysis By Maria Thomas, GreyCampus
Previously, companies would collect data, discover information, and run analytics, which could then be applied to decisionmaking processes. But at present, businesses are using data analytics as a means of staying agile and operating faster. To achieve this competitive edge using such enormous amounts of data, businesses must gather, organize, and interpret the correct data to improve their business processes and decision making.
unorganized datasets and unveil new information. A neural network is a system of software and hardware modeled after the human nervous system that estimates the functionalities that are based on the enormous volumes of hidden data. Neural networks are defined by three elements, namely architecture, activity rule, and the learning rule. They are adaptive and transform themselves as they learn from prior information.
Artificial intelligence and machine learning in data analytics make it possible to connect data to insights on consumers, expand business, and optimize the quality and speed of logistics. Before we look into how these technologies benefit an organization, let’s understand the various types of analytics.
There are many other ways in which AI and ML benefit a business. These methods can help organizations enhance their business operations, drive customer engagement, and optimize customers’ experiences.
1. Descriptive Analytics: Descriptive analytics can summarize unprocessed data and transform it into a form that can be easily understood by people. They can explain, in detail, an incident that has happened in the past. This type of analytics is useful in obtaining a pattern, if any, from the previous occurrences or drawing ideas from data so that more reliable approaches can be built for the future. 2. Prescriptive Analytics: This kind of analytics describes a step-by-step process in a circumstance. It is a new type of analytics that utilizes a mixture of machine learning, business practices, and computational modeling to suggest the most suitable plan of action for any predefined result. 3. Predictive Analytics: Any company that is seeking success must have a vision. Predictive analytics helps such companies determine trends and practices depending on events. Whether it is predicting the possibility of an occurrence in the future, or evaluating the exact moment it will occur, these can all be forecast with the help of predictive analytics. It uses multiple machine learning and analytical modeling methods to interpret past data and predict the future. Organizations with large amounts of data can generate analytics. And before generating analytics, data scientists should be certain that the predictive analytics satisfies their organizational goals and is suitable for the big data environment. Developing Predictive Abilities with the Help of Artificial Intelligence and Machine Learning Since the data is huge and the right set of tools is required to gather and extract the correct information, machine learning and AI algorithms are used that reveal new statistical patterns that build the foundation of predictive analytics. For instance, various machine learning algorithms such as recurrent neural networks (RNN) can identify hidden patterns in GreyCampus
www.greycampus.com
@greycampus
Importance of Data Analytics for Businesses The rising value of data analytics for a company has transformed the world in the real sense, but an average person remains uninformed of the influence of data analytics on industry. A few ways data analytics has changed the industry include: 1. Business Knowledge: Business knowledge can be understood, and it can determine how a company can run in the coming years. It can even determine the types of markets available to a company and its services. 2. Cost Reduction: AI and ML can bring huge cost benefits if linked to large data stores. 3. Improving Efficiency: Most of the data obtained by companies is examined internally. With the progressions in technology, it has become very convenient to collect data that helps to understand the performance of employees as well as the company. As these technologies evolve day by day, there are many APIs that come into existence. The ability of AI and ML algorithms to predict, recognize voices and faces, process images, and more has made it possible to move further. Artificial intelligence and machine learning help a business manage data and use it to discover new possibilities. This leads to further intelligent and innovative business strategies, higher earnings, productive operations, and satisfied clients. The intention is to distribute the prospects of a company in a more dependable way and to apply it with analytics. EAI Maria Thomas is Content Marketing Manager and Product Specialist at GreyCampus, with eight years of experience in professional certification courses like PMP – Project Management Professional, CISSP, AIML, Agile & Scrum Master certifications.
https://www.linkedin.com/company/greycampus/
www.embedded-computing.com/machine-learning
@greycampus
YOU TUBE
https://www.youtube.com/user/Greycampus
Embedded AI & Machine Learning RESOURCE GUIDE 2020
5
AI LOGIC & MEMORY
IC Design: Trajectories from the Past and into the Future By Asem Elshimi, Silicon Labs
What is the future of IC design jobs? Will AI replace IC designers? These are questions that have been repeatedly asked at many technical conferences I have attended lately. At the base of every effective AI algorithm is an electronic chip that is designed by a team of intelligent IC designers. Ironically, IC designers could be designing the very technology that may potentially replace their jobs. This, if true, is an intimidating prospect. Yet, by examining the past and present of IC design, we can recognize that IC design today is not what it was in the past. By conjuncture, we can also realize IC design jobs are unlikely to go away. However, they will absolutely shift and become more involved. Automation, Inefficiency, and Demand To begin with, there are three patterns in the industry to recognize: automation, inefficiency, and demand. Firstly, present-day IC designers are working with a huge amount of automation technology. The simulation capacity of today’s machines is inherited directly from yesterday’s competitive IC solutions. That is to say, IC design jobs of the past are now automated. The trend towards automation enabled IC designers to move upward in the hierarchy of synthesis. We are now able to make remarkably complex solutions because we are utilizing innovative automation technology available to us. Secondly, there are many IC inefficiencies that remain unsolved. For instance, in RF design, communications are not full duplex yet and antennae are still off chip. If this technology evolved, communications would be double the speed and RF modules would be smaller. Technical issues like these still require continual innovation and focus. Thirdly, the combination of automation and inefficiency patterns creates a huge demand for clever IC designers to address unresolved problems. We already know how to design a 20 dBm PA, so why not let machines do this task for us? Automating processes leaves more time and resources for design teams to troubleshoot new problems. The demand for designers will remain steady until there are no remaining RF and IC inefficiencies.
6
Furthermore, as automation picks up, IC designers in the field will have more fun brainstorming solutions as labor-intensive aspects of the job are eliminated. By nature, IC design jobs are constantly being redefined as breakthroughs occur, and this pivoting will carry into the future as well. Learning from our Past In order to understand the evolution of design, I interviewed two senior IC designers at Silicon Labs, Dr. John Khoury and Jeff L. Sonntag. Dr. Khoury served as an engineering professor at Columbia University and held previous engineering and management positions at Bell Labs and Multilink Technology. He earned engineering degrees at Columbia and MIT. Jeff Sonntag has engineering
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
www.silabs.com
@SiliconLabs
degrees from Cornell University and Carnegie Mellon University. Prior to Silicon Labs, he spent 18 years at Bell Labs where he was recognized as a Bell Labs Fellow for his contributions. We touched on their earliest memories of IC design and discussed their vision of the future of the industry using only recollection and opinions instead of relying on accompanying documentation. One of the first insights Dr. Khoury shared was how the first circuit he designed was a differential Op-Amp and a 20th order switched capacitor filter. The entire analog chip was roughly 30 mm2 and on the order of 5,000 transistors. Comparatively, today’s chips carry tens of Op-Amps and filters and the transistor count can be in the hundreds of millions and above. We have certainly moved towards more complex designs in only a matter of decades. My colleague also described the technology available to designers when he first started his career saying how the layout was done on a Tektronix graphic terminal with a green screen, which is essentially a large storage scope. The terminal did not have colors, but designers worked with one layer of metal and one poly used for routing. Today, designers have access to numerous layers for routing to create devices. Using a colorless monitor in layout nowadays sounds like a project suicide attempt! I heard similar sentiments from Sonntag, who elaborated on his experience with the Tektronix monitor explaining how the monitor consumed about 1kW, was about 4' tall, had a green screen with persistent graphics, and two thumbwheels for X-Y input. The thumbwheel arrangement worked very well and the monitors were driven by a 140 kbps cable, which was taped to the floor, across the hall from a computer center room. He also told me designers used to type their netlists on a text editor. They would start with a white board design, number the nodes, and then type in a netlist for simulation. Currently, we can only design SoC ICs because we have graphical user interfaces that allow us to
YOUTUBE
www.linkedin.com/company/siliconlabs/
www.youtube.com/user/ViralSilabs
@SiliconLabs
Item
1984
Today
CMOS technology
1750 nm
7 nm
VDD
5.0V
- 1.0V
Interconnect stack
1 metal + 1 poly
< 8 metal
# masks
9
> 40
Passband frequencies
< 200 kHz
< 100 GHz
Circuit topologies
Complex
Simplified (many inverter-based)
Design automation
Optimization of known topologies
Analogous to digital synthesis techniques
Design engineer/Layout designer
2.5
0.3
Worldwide mixed-signal CMOS engineers
~ 100*
> 20,000
FIGURE 1
Comparison of circuit complexity between 1980s and today.
FIGURE 2 Exponential growth in IC complexity is only allowed by more automation in the design process.
100,000,000,000 Transistors/CPU
SILICON LABS
IC complexity
1,000,000,000 10,000,000 100,000 1,000 1970
1980
1990 2000 Year
2010
2020
Source: https://en.wikipedia.org/wiki/Transistor_count.
place and examine thousands of transistors in the analog domain without worrying about node numbers or netlist typos. In this sense, we are leveraging automation to create more complex and advanced systems (Figure 1). Another old and interesting IC tradition from the mid-1980s was at the chip level. Since designers didn’t have Layout Versus Schematic Software (LVS), they would have to craft together paper printouts of a reasonable scale to create nets. This paper simulation consisted of two or three printouts about 3' wide taped together on the floor or multiple tables to cover the width of the respective chip. From a top-level netlist, they would verify connectivity to each pin of each block net by net, and then mark the “light up” of each net with colored pencils. While this may seem like an elegant solution given the technology at the time, it was unnecessarily repetitive and tiresome (Figure 2). Interestingly enough, automation and AI are not new to IC design and rooted in the past. As we have seen, IC designers have always counted on automation to make processes less task-intensive, but now the systems are more intricate than they’ve ever been. However, there are certain processes that are still solely done by human minds, like analog design. Analog design is the intuitive process of sizing devices and achieving analog functions using specific device configurations. While computers help with math and estimate operating points of circuits, human minds are better at intuitive, intelligent design. Now, the question is whether recent developments in AI make machines intelligent enough to replace IC designers?
www.embedded-computing.com/machine-learning
Embedded AI & Machine Learning RESOURCE GUIDE 2020
7
AI LOGIC & MEMORY
From here, we can speculate a few outcomes. First, automation of such complex design principles should be viewed as a good thing, and machines should be viewed as an extension of the human mind. As computers get smarter, humans have more time to focus on novel concepts and reach new milestones. Furthermore, automating the design process has been happening for decades already. In the 1980s, one of my long-time designer colleagues explained how a few engineers at Bell Labs successfully automated design of Op-Amps and switched capacitor filters. They essentially used known topologies and then used optimization methods to choose component sizes. As an IC designer, I find this automation comforting as I enjoy the critical thinking behind choosing a topology more than the laborious aspects of optimizing components.
“MACHINES MADE BY HUMANS WILL ERR AS MUCH AS HUMANS SIMPLY DUE TO THE FACT THAT THEY ARE MADE BY US. THIS IS WHY A HUMAN OBSERVER WILL ALWAYS BE NEEDED AT THE TOP OF ANY AUTOMATED PROCESS.” Human vs. Machine: How Reliable are Machines? The stories I learned about the 1980s from my colleagues demonstrate that human intelligence in IC design is truly needed. For example, one of the stories described how an in-house group of math specialists at Bell Labs undertook a project about applying constraints and minimizing the number of simulation trails required to do an optimization. To do this, they handed off a band-gap design with the goal of minimizing percentage change in output voltage across PVT and mismatch, while keeping PSRR and power below some bound. They decided to simplify the optimized parameter to mV change instead of percentage change. After several weeks, they returned with a presentation and were proud that they had reduced the variability by orders of magnitude, which initially seemed too good to be true. Surprisingly, their tools had discovered a resistor that could be sized down to near zero to reduce the output voltage to just a few mV with a very small variability (measured in mV) across PVT! Should we ascribe the outcome of this incident to machines or humans? Machines made by humans will err as much as humans simply due to the fact that they are made by us. This is why a human observer will always be needed at the top of any automated process. For
8
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
2010 FIGURE 3
2015
2020
... Transistors/CPU
2025
... Creativity/Labor
2030
ICs get more complex. Designers go back to creative tasks and drop laborious processes to AI.
instance, modern-day computing speed is exponentially higher, which allows us to run every possible variation of a chip to confer whether it will work or not. While speed is great, incorporating a real human in this process guarantees a higher probability that a problem will be detected – especially in an analog system. Thus, I truly believe we cannot rely on machines solely to detect problems since problems can perpetuate in the machine. We must accept that humans are imperfect, and machines will be imperfect by nature. The stories I picked from the past highlight the contrast between humans versus machines. However, similar situations continue to occur as researchers and entrepreneurs’ experiment more with automation of design processes. Compared to the past though, I speculate the evolution of future automation will focus on tougher problems that might not even be on our horizon yet. Originally, engineers built simpler systems but took on all the burden of analysis. Coming up with new designs and building analytical models requires enormous mental effort. Without access to simulators and other automation processes, engineering capacity was limited to thousands of transistors. Luckily, as technology advances, we can test ideas in a matter of hours without establishing analytical models. For example, there is more simulation bandwidth to verify functionality of billion transistor SoCs. On the other hand, as complexity increases and chips become more massive, the validation and verification cycles also get longer, and consequently, more laborious. As one of my colleagues likes to say, we spend 10 percent of our time coming up with clever ideas, but 90 percent of our time simply verifying said ideas. Evolution: What Can We Expect in the Job Market? Looking at the trajectory from past to present establishes an extrapolation towards the future with an exciting possibility. More automation opens up time for creative and critical thinking. Perhaps very soon, machines will be able to replicate designs, or even suggest new design ideas. Machines will also take part in the tedious verification process. This will afford IC designers the luxury of working on “fun and exciting problems.” While these visions are ideal, we shouldn’t be too eager for AI to completely take over in the near future. As a first step, AI could start automating floor planning, and then layout. Then, AI could slowly crawl towards the core design functions, creating more mental space for designers to focus on new problems (Figure 3). www.embedded-computing.com/machine-learning
In terms of IC design jobs, teamwork and coordination will become more relevant. Design teams are growing to cope with the humongous scale of modern ICs. Communication between engineers and engineering functions might occupy more bandwidth. If we want to build the massive systems that we only dream about today, future IC designers will need to be collaborative decision makers. Addressing collaboration will be an educational and cultural issue as well. Instead of just technical skills, STEM students need to focus on soft and interpersonal skills like communication, creativity, imagination and teamwork. The human mind is capable of powerful things when it is stimulated in all areas and working with other great minds. A young engineering student once asked me if she should turn the page and shift from hardware engineering to software development. A common concern among young engineers is whether their field of study is needed in the market or not. My response to this dilemma is to always encourage students to learn about the market themselves and see how to position themselves for success. I can’t speak to which job is better or which job pays more since passion determines career satisfaction. This journey is very personal. Speaking from my experience only, I can assure students who are curious about IC design that the demand for her skillset won’t go away anytime soon, as there are many problems to solve. The difference is that the types of problems in the future may be radically different due to the disruption of AI and automation. This is anyone’s game to predict. However, I am very hopeful that we are entering a future where humans and machines can harmoniously work together. EAI Asem Elshimi is an RFIC design engineer for IoT wireless solutions at Silicon Labs. He joined Silicon Labs in July 2018. Elshimi specializes in the areas of RF circuit design and electromagnetic structure design. He holds an M.S. in Electrical and Computer Engineering from the University of California, Davis.
Embedded AI & Machine Learning RESOURCE GUIDE 2020
9
AI LOGIC & MEMORY
Architecture Exploration of AI/ML Applications and Processors By Deepak Shankar, Mirabilis Design, Inc.
Architecture exploration of AI applications is complex and involves multiple fields. To start with, we can target a single problem such as memory access or can look at the full processor or system. Most designs start with the memory access. There are so many options â&#x20AC;&#x201C; SRAM versus DRAM, local versus distributed storage, in-memory compute, and caching the back-propagation coefficients versus discarding. The second evaluation sector is the bus or network topology. The virtual prototype can have a network-on-chip, TileLink or AMBA AXI bus for the processor internals, PCIe or Ethernet to connect the multi-processor boards and chassis, and Wi-Fi/5G/Internet routers to access the data center. The third study using the virtual prototype is the compute. This can be modeled as processor cores, multi-processor, accelerators, FPGA, multiply-accumulate, and analog processing. The last piece is the interface to sensors, networks, math operations, DMA, custom logic, arbiters, schedulers, and control functions. Moreover, the architecture exploration of AI processors and systems is
10
challenging as it consists of executing data-intensive graphs as efficiently as possible in hardware. At Mirabilis, we use VisualSim for the architectural exploration of AI applications. Users of VisualSim assemble a virtual prototype very quickly in a graphical discrete-event simulation platform with a large library of AI hardware and software modeling components. The prototype can be used to conduct timing, throughput, power consumption, and quality of service tradeoffs. Over 20 templates of AI processors and embedded systems are provided to accelerate the development of new AI applications. Reports generated for AI systems include response times, throughput, buffer occupancy, average power, energy consumption, and resource efficiency. ADAS Model Construction To begin with, let us consider advanced driver assistance system (ADAS) application, a form of AI deployment. ADAS applications co-exist with a number of applications on both an electronic control unit (ECU) and on the vehicle network, and have dependencies on vehicle sensors and actuators. Early architecture trade-offs can test and evaluate the hypotheses to quickly identify bottlenecks and optimize processor specifications to timing, throughput, power, and functional requirements. Figure 1 shows the implementation of this logical ADAS architecture mapped to a physical architecture. Processor Model Construction Designers of AI processors and systems conduct experiments by application type, training versus inference, cost point, power consumption, and size limitations. For
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
MIRABILIS DESIGN
www.MirabilisDesign.com
@VisualSim
www.linkedin.com/company/mirabilis-design-inc-/
@Mirabilis-Design-Inc
Autonomous Driver Assistance System
Digital Simulator
• Sim Time: 0.1
• Mem Speed: 1000
• Sim_Time: 500.0E-03
• Input_Rate: 500.0e-6
• Bus Speed: 1000
• MySeed: seed (1234567)
• Num_Of_Sensors: 7
• Processor Speed: 1000.0e6
• CAN_Bus_Name: “CAN1”
• Num_Of_Nodes: 6
“Architect
CAN_Seg
Sensors
Power Table Power Source
CAN Wire
“Manager_1”
H
Sensors H
Sensors_Database
Trans_Seq
Architecture
Routing_Table Database
ECU
• OverloadMultiplier: 100.0
Traffic
H
H
Node
Node
Body Latency_Whole_ADAS_Func
ExpressionList
ADAS_Func2
H
ExpressionList2
ADAS
H
Reg_Sensor_Data_Database
Engine_Func
CAN_Node5
CAN_Node8
Engine
CAN_Node7
CAN_Node4
Node
Node
Traffic2
Traffic3
Brake
H
EPS_Func
ExpressionList3
H
Brake_Func
ExpressionList4
Body_Func
ExpressionList5
Other_Func
ExpressionList6
H
H ADAS_Func3
ADAS_Func
Traffic6
H
H
CAN_Node6
CAN_Node3
Node
Node
Traffic4
Other
EPS
H
Latency_ADAS_Func3
Traffic5
FIGURE 1
H
H
H
System model of automotive ADAS system mapped to the ECU network. Bus Arbiter
• n_value: 256/*n by n*/ • External_Mem: “DDR” • On_Chip_Mem: “Unified_Buffer” • TPU_Speed_Mhz: 700.0
“Bus_1
Bus Interface
H
“P_N_2”
Host_out
Host_in
Weight_FIFO
“P_N_1”
Bus Interface 2
Ext_Mem_Out
“P_N_3”
PCle_Bus
Script
“P_N_4”
Ext_Mem_in
“Bytes_check LABEL-BEGIN if(port_toke
example, designers can assign child networks to pipeline stages, trade-off deep neural networks (DNNs) versus conventional machine learning algorithms; measure algorithm performance on GPU, TPU, AI processors, FPGA and conventional processors; evaluate the benefits of melding compute and memory on a chip; compute the power impact of analog techniques that resemble human brain functions; and building SoCs with a partial set of functions targeted at a single application.
ExpressionList
IN
Matrix_Multiply_Unit
H
Accumulators
H
TPU_Processor_Latency
Late. Activation
H
FIGURE 2
Top view of a VisualSim model of the AI hardware architecture.
VisualSim Architect - .Multi_AXI_to_Memory_Access.xTime_yData_...
VisualSim Architect - .Multi_AXI_to_Memory_Access.Timing_Diagram.Cycle_Accurate_...
File Edit Special Help
-5 x10
File Edit Special Help Fil
Task Latency
MM TG2 TG3 TG4
Latency (Secs)
3.5
The schedule from PowerPoint to first prototype for the new AI processors is extremely short and the first production sample cannot have any bottlenecks or bugs. Hence modeling becomes mandatory. Figure 2 shows an architectural model of a Google Tensor processor. The processor receives requests from a host computer via a PCIe interface. MM, TG2, TG3, and TG4 are different requests streams from independent hosts. The weights are stored in an off-chip DDR3 and called up into the Weight FIFO. The arriving requests are stored and updated in the Unified Local Buffer and
Unified_Buffer
H P C I e
3.0 2.5 2.0 1.5
Cycle_Accurate_Memory DRAM_Bank_6 DRAM_Bank_4 DRAM_Bank_2
1.0
DRAM_Bank_0
0.5 0.0
DRAM_BiDir_Bus 0.0
0.5
1.0 1.5
2.0
2.5 3.0
3.5
Simulation Time (secs)
FIGURE 3
4.0
-5 x10
0.0
0.5
1.0
1.5
2.0
2.5
Sim_Time
3.0
3.5 4.0
-5 x10
Statistics for the architecture exploration tradeoff analysis.
sent to the Matrix Multiple Unit for processing. When the request has been processed through the AI pipeline, it is returned to the Unified Buffer to respond back to the Host. Processor Model Analysis In Figure 3, you can view the latency and the back-propagation weights management in the off-chip DDR3. You will see that TG3 and TG4 were able to maintain a low latency until 200 µs and 350 µs, respectively. MM and TG2 started to buffer early in the simulation. As there is considerable buffering and the latency increases for this set of traffic profiles, the current TPU configuration is inadequate for the processing workloads. The higher priority of TG3 and TG4 helped sustain operations for a longer period.
www.embedded-computing.com/machine-learning
Embedded AI & Machine Learning RESOURCE GUIDE 2020
11
AI LOGIC & MEMORY
Automotive Design Construction Today’s automotive designs incorporate a number of safety and autonomous driving features that require a significant amount of machine learning and inferencing. The available time schedule will determine whether the processing is done at the ECU or sent to a data center. For example, a braking decision can be done locally, while changing the air-conditioning temperature can be sent for remote procesing. Both require some amount of artificial intelligence based on the input sensors and cameras. Figure 4 represents a portion of an ADAS network that connects to a high-performance NVIDIA Drive PX platform that contains multiple Arm cores and a GPU. In this model, the focus is on understanding the internal behavior of the SoC. The application is MPEG video capture, processing, and rendering that is triggered by vehicle camera sensors. Automotive Design Analysis Figure 5 shows AMBA bus and DDR3 memory statistics. You can see the distribution of the workload across multiple masters. The application pipeline can be evaluated for bottlenecks, identifying the highest cycle time tasks, memory usage profile, and the latency for each individual task. The use cases and traffic patterns are applied to the architecture model assembled as a combination of hardware, RTOS, and networks. A periodic traffic profile is used to model the radars, lidars, and cameras.
Automotive ADAS Application Using Nvidia DrivePX
CAN_Seg CAN Wire
CAN_Node4
PowerTable
Node
Digital
ARM9_Instruction_Set
ArchitectureSetup2 “Architectu..
“Manager_Top”
CAN_Node3
TextDisplay
Node
CAN_Node5
CAN_Node10
Node
Multi_Core_ARM aux2 H
Node
CAN_Node6
CAN Bit Parameters
Serial_Switch
GPU
aux
H
Serial Switch
Top Parameters
Node
• • • • • • • • •
CAN_Node7 Node
CAN_Node8 Node
CAN_Mbps: 1.0 Plot_Enable: true Routing_Table_Name: “RT” Processor_Speed: 800.0 I_Cache_Size: 8 D_Cache_Size: 16 Mem_Speed: 2000.0 Bus_Speed: 2000.0 Sim_Time: 6.0e-5 • stopTime: 2.0e-2
CAN_Node9 Node
Traffic2
ExpressionList6
TaskGenerator4 {”add” ,”mov”}
Traffic
ExpressionList
TaskGenerator {”add” ,”mov”}
FIGURE 4
YConversion
ExpressionList7
Decode None
TaskGenerator5 {”add” ,”mov”}
None ExpressionList2
TaskGenerator2 {”add” ,”mov”}
Pre_Proc
ExpressionList8
Post_Proc
TaskGenerator6 {”add” ,”mov”}
None ExpressionList3
None
TaskGenerator2 {”add” ,”mov”}
Histogram
ExpressionList9
None Render
ExpressionList4
None
12
OUT Hus...
ExpressionList10
IN2 Flow... IN
TimeData ExpressionList5
Flow...
VisualSim model of an autonomous driving and E/E architecture.
VisualSim Architect - ,Multi_AXI_to_Memory_Access.AXI_Bus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
OUT2 Hus...
DISPLAY AT TIME ------ 40.0000000 us -----{AXI_Top_Master_1_1Write_Data_Bytes = 22272, AXI_Top_Master_1_Write_Data_MBps = 556.8000000000001, AXI_Top_Master_3_Read_Data_Bytes = 12224, AXI_Top_Master_3_Read_Data_MBps = 305.6. AXI_Top_Master_5_Read_Data_Bytes = 4928, AXI_Top_Master_5_Read_Data_MBps = 123.2, AXI_Top_Master_7_Read_Data_Bytes = 2496, AXI_Top_Master_7_Read_Data_MBps = 62.4, AXI_Top_Slave_1_Read_Data_Bytes = 19648, AXI_Top_Slave_1_Read_Data_MBps = 491.2, AXI_Top_Slave_1_Write_Data_Bytes = 21760, AXI_Top_Slave_1_Write_Data_MBps = 544.0 } = “M_DDR_Mem_Ctrl”, {BLOCK = 41344, Total_Bytes = 1.7943237911025E-9, Total_Delay_Mean = 6.4696452706518E7, Total_IOs_per_Second = 1035.14324330428 } Total_MBs_per_Second
FIGURE 5 Bus and memory activity report.
The use case and traffic can be varied for the input rates, data sizes, processing time, priority, dependency, prerequisites, back-propagation loops, coefficients, task graph, and memory accesses, and can be simulated on the system model by varying the attributes. As a result, a variety of statistics and plots can be generated by the system model, including cache hit-ratio, pipeline utilization, number of requests rejected, watts per instruction or task, throughput, buffer occupancy, and state diagrams. Figure 6 shows the power consumption of both the system and silicon. In addition to the heat dissipated, battery charge consumption rate, and the battery lifecycle change, the model can capture dynamic power change. The model plots the state activity of each device, the associated instant spikes, and average power of the system. Getting early feedback on the power consumption helps the thermal and mechanical teams design the casing and cooling methods. This early power information can also be used to examine power consumption and performance tradeoffs with a given architecture. Further Exploration Scenarios Aside from autonomous driving, the use case outlined here could just as easily have been chatbots, search, large data manipulation, image recognition, and disease detection. The following offers additional examples that highlight the use of AI architecture modeling and analysis. 1. An AI processor for deep learning and inferencing tasks that defines a network-on-chip backbone comprised of 32 cores, 32 accelerators, four HBM 2.0s, eight DDR5s, multiple DMAs, and full cache coherency. This model uses variations of RISC-V, Arm, and a proprietary core. By using an architecture model and retraining the network routing, 40 Gbps links were achieved while maintaining a low router frequency. 2. The memory required for a 32-layer deep neural network was
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
FIGURE 6 Measure the power consumption in real-time for an AI processor.
VisualSim Architect - ,Lock_Mode_Demo_SR.Power_Plot File
Edit
Special
Help
Power_Plot Inst_Power
2.5 Power (Watts)
reduced from 40 GB to less than 7 GB without changing the data throughput or response times. The model is setup with a functional flow diagram that accounts for the behavior of memory accesses for both processing and the backpropagation. For different data sizes and task graphs, the model determined the amount of data discarding and off-chip DRAM sizing and SSD storage options. The task graph was varied with an arbitrary number of graphs and several inputs and outputs. 3. General-purpose SoCs using Arm processors and the AXI bus for low-cost AI processing were able to achieve the lowest power per watt while maximizing memory bandwidth. The multiplyaccumulate functions were offloaded to vector instructions, encryption to an IP core, and the custom algorithms to
Avg_Power
2.0 1.5 1.0 0.5 0.0
0.18 0.20 0.22 0.24 0.26 0.28 0.30 Simulation Time (Seconds)
accelerators. The model was constructed with the explicit purpose of evaluating different cache-memory hierarchies to increase the hit-ratio and bus topologies to reduce latency. 4. The throughput and power consumption of an analog-to-digital AI processor were accurately analyzed by modeling non-linear control tasks in a discrete event simulator as a series of linear functions to accelerate simulation time. In this case, functionality was tested to check behavior and measure true power savings. EAI Deepak Shankar is Founder of Mirabilis Design, Inc.
OpenSystems Media WEBCAST Edge to Cloud AI, Super-Powered by 5G Sponsored by NVIDIA, Aetina Corporation Edge-based AI and 5G are two technologies in the spotlight. While edge computing provides the benefits of low-latency, independence, and security, 5G can potentially keep these features intact while introducing cloud computing.
www.embeddedARM.com
TS-7100
NXP i.MX 6UL 696 MHz ARM CPU with FPU Our smallest single board computer measuring only 2.4" by 3.6" by 1.7" Starting at
$269 Qty 100
The high-performance and versatility of the NVIDIA Jetson platform enables cloud-native computing for AI applications, from cloud training to edge inferencing, to speed up the application development process.
For more information on this webcast http://bit.ly/AI5GWebinar www.embedded-computing.com/machine-learning
Embedded AI & Machine Learning RESOURCE GUIDE 2020
13
AI INFERENCING & ALGORITHMS
Embedded AI Algorithms – Going from Big Data to Smart Data By Dzianis Lukashevich, Analog Devices
Industry 4.0 applications generate a huge volume of complex data – big data. The increasing number of sensors and, in general, available data sources require the virtual view of machines, systems, and processes to be ever more detailed. This naturally increases the potential for generating added value along the entire value chain. At the same time, however, the question as to how exactly this potential can be extracted keeps arising – after all, the systems and architectures for data processing are becoming more and more complex, and the number of sensors and actuators is constantly increasing. Only with relevant, high quality, and useful data – smart data – can the associated economic potential be unfolded. Challenges Collecting all possible data and storing them in the cloud in the hopes that they will later be evaluated, analyzed, and structured is still a widespread, but not particularly effective approach. The potential for generating added value from the data remains unused; finding a solution later on becomes more complex and costly. A better alternative is to make conceptual considerations early on to determine which information is relevant to the application and where in the data flow the information can be extracted. Figuratively speaking, this means refining the data – for example, making smart data out of big data for the entire processing chain. At the application level, a decision can already be made regarding which AI algorithms have a high probability of success for the individual processing steps. This depends on boundary conditions such as available data, application type, available sensor modalities, and background information about the lower level physical processes. For the individual processing steps, correct handling and interpretation of
14
the data are extremely important for real added value to be generated from the sensor signals. Depending on the application, it may be difficult to interpret the discrete sensor data correctly and extract the desired information. Often the temporal behavior plays a role and has a direct effect on the desired information. In addition, the dependencies between multiple sensors must frequently be taken into account. For complex tasks, simple threshold values and manually determined logic are no longer sufficient or do not allow for automated adaptation to changing environmental conditions. Embedded, Edge, or Cloud AI Implementation? The overall data processing chain with all the algorithms needed in each individual step must be implemented in such a way that the highest possible added value can be generated. Implementation usually occurs at all levels – from the small sensor with limited computing resources through gateways and edge computers to large cloud computers. It is clear here that the algorithms should not only be implemented at one level. Rather, in most cases, it is more advantageous to implement the algorithms as close as possible to the sensor. Through this, the data are compressed and refined at an early stage, and the communication and storage costs are reduced. In addition, through early extraction of the essential information from the data, development of global algorithms at the higher levels is less complex. In most cases, algorithms from the streaming analytics area are also useful for avoiding unnecessary storage of data and, thus, high data transfer and storage costs. These algorithms use each data point only once; for example, the complete information is extracted directly, and the data do not need to be stored. Embedded Platform for Condition-Based Monitoring AI technology as a whole is evolving quickly. Many companies are developing IP for turning Big Data into Smart Data and making their algorithms and designs available to the general market in the form of platforms. These platforms bring together hardware,
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
ANALOG DEVICES www.analog.com
@adi_news
www.linkedin.com/company/analog-devices/
@AnalogDevicesInc
FIGURE 1 Block diagram of the iCOMOX open platform.
better granularity and higher probability. Through smart signal processing closer to the edge, big data becomes smart data, making it necessary for only the data relevant to the application case to be sent to the edge or the cloud. software, and tools to enable OEMs to implement and deploy AI without having to develop the underlying AI infrastructure themselves. For example, consider the Arm CortexM4F processor-based open embedded platform iCOMOX from Shiratech Solutions, Arrow, and Analog Devices. iCOMOX is an power-saving, integrated microcontroller system with integrated power management, as well as analog and digital sensors and peripheral devices for data acquisition, processing, control, and connectivity. The platform is well-suited for local data processing and early refinement of data with stateof-the-art smart AI algorithms.
For wireless communications, the iCOMOX provides a solution with high reliability and robustness as well as extremely low power consumption. The SmartMesh IP network is composed of a highly scalable, self-forming/optimizing multihop mesh of wireless nodes that collect and relay data. A network manager monitors and manages the network performance and security and exchanges data with a host application. The intelligent routing of the SmartMesh IP network determines an optimum path for each individual packet in consideration of the connection quality, the schedule for each packet transaction, and the number of multihops in the communication link. Especially for wireless, battery-operated condition monitoring systems, embedded AI can help extract the full added value. Local conversion of sensor data to smart data by the AI algorithms results in a lower data flow and consequently less power consumption than is the case with direct transmission of raw sensor data to the edge or the cloud.
iCOMOX stands for intelligent condition monitoring box, and it can be used for entry into the world of structural health and machine condition monitoring based on vibration, magnetic fields, sound, and temperature analysis. On request, the platform can be supplemented with additional sensor modalities â&#x20AC;&#x201C; for example, gyroscopes from Analog Devices for precise measurement of rotational speeds, even in environments with high shock and vibration loads (see Figure 1).
Range of Applications Embedded AI has a wide range of applications in the field of monitoring machines, systems, structures, and processes â&#x20AC;&#x201C; extending from detection of anomalies to complex fault diagnostics and immediate initiation of fault elimination. For example, through the integrated microphone and accelerometer, magnetic field sensor, and temperature sensor that a platform like iCOMOX enables, monitoring of vibrations and noises, as well as other operating conditions in diverse industrial machines and systems. Process states, bearing or rotor and stator damage, failure of the control electronics, etc., and even unknown changes in system behavior, for example, due to damage to the electronics, can be detected by AI. If behavior models are available for certain damages, these damages can even be predicted. Through this, maintenance measures can be taken at an early stage and, thus, unnecessary damage-based failure can be avoided. If no predictive model exists, the embedded platform can also help subject matter experts successively learn the behavior of a machine and over time derive a comprehensive model of the machine for predictive maintenance. In addition, a platform can be used to optimize the complex manufacturing processes to achieve a higher yield or better product quality.
The AI methods implemented in an embedded AI platform can deliver a better estimate of the current situation through so-called multisensor data fusion. In this way, various operating and fault conditions can be classified with
Embedded AI Algorithms for Smart Sensors With data processing by AI algorithms, automated analysis is even possible for complex sensor data. Through this, the desired information and, thus, added value are automatically arrived at from the data along the data processing chain. Selection of an algorithm often depends on existing knowledge about the application. If extensive domain knowledge is available, AI plays a more supporting role and the algorithms
www.embedded-computing.com/machine-learning
Embedded AI & Machine Learning RESOURCE GUIDE 2020
15
AI INFERENCING & ALGORITHMS
used are quite rudimentary. If no expert knowledge exists, the algorithms can be much more complex. In many cases, it is the application that defines the hardware and, through this, the limitations for the algorithms. For the model building, which is always a part of an AI algorithm, there are basically two different approaches: data-driven approaches and model-based approaches. Anomaly Detection Using Data-Driven Approaches If only data, but no background information that could be described in the form of mathematical equations, are available, then so-called data-driven approaches must be chosen. These algorithms extract the desired information (smart data) directly from the sensor data (big data). They encompass the full range of machine learning methods, including linear regression, neural networks, random forest, and hidden Markov models. A typical algorithm pipeline for data-driven approaches that can be implemented on embedded platforms such as the iCOMOX are composed of three components (see Figure 2): 1) data preprocessing, 2) feature extraction and dimensionality reduction, and 3) the actual machine learning algorithm. During data preprocessing, the data are processed in such a way that the downstream algorithms, especially the machine learning algorithms, converge to an optimum solution within the shortest possible computational time. Missing data must thereby be replaced using simple interpolation methods in consideration of the time dependence and the interdependence between different sensor data. Furthermore, the data are modified by prewhitening algorithms in such a way that they appear to be mutually independent. As a result of this, there are no more linear dependencies in time series or between sensors. Principal component analysis (PCA), independent component analysis (ICA), and so-called whitening filters are typical algorithms for prewhitening. During feature extraction, characteristics, also known as features, are derived from the preprocessed data. This part of the processing chain strongly depends on the actual application. Due to the limited computing power of embedded platforms, it is not yet possible here to implement computationally intensive, fully automated algorithms that evaluate the various features and use specific optimization criteria to find the best features â&#x20AC;&#x201C; genetic algorithms would be included among these. Rather, for embedded platforms such as the iCOMOX that have low power consumption, the method used for extracting features must be specified manually for each individual application. The possible methods include transforming the data into the frequency domain (fast Fourier transformation), applying a logarithm to the raw sensor data, normalizing the accelerometer or gyroscope data, finding the largest eigenvectors in PCA, or performing other calculations on the raw sensor data. Different algorithms for feature extraction can also be selected for different sensors. A large feature vector containing all the relevant features from all of the sensors is obtained as a result.
FIGURE 2 16
Data-driven approaches for embedded platforms.
If the dimensionality of this vector exceeds a certain size, it must be reduced through dimensionality reduction algorithms. The minimum and/or maximum values within a certain window can simply be taken, or more complex algorithms such as the previously mentioned PCA or selforganizing maps (SOM) can be used for this purpose. Only after the complete preprocessing of the data and the extraction of the features relevant to the respective application can the machine learning algorithms be optimally employed to extract different information right on the embedded platform. As was the case for feature extraction, the selection of the machine learning algorithm strongly depends on the respective concrete application. Fully automated selection of the optimum learning algorithm â&#x20AC;&#x201C; for example, via genetic algorithms â&#x20AC;&#x201C; is also not possible due to the limited computing power. However, even somewhat more complex neural networks, including the training phase, can be implemented on embedded platforms such as the iCOMOX. The decisive factor here is the limited available memory. For this reason, the machine learning algorithms, as well as all previously mentioned algorithms in the entire algorithm pipeline, must be modified in such a way that the sensor data are directly processed. Each data point is used only once by the algorithms; for example, all of the relevant information is extracted directly, and the memory-intensive collection of large amounts of data and the associated high data transfer and storage costs are eliminated. This type of processing is also known as streaming analytics. The previously mentioned algorithm pipeline was implemented on the iCOMOX and evaluated for anomaly detection in two different applications: condition-based monitoring for ac motors and trajectory monitoring in industrial robots. The algorithms were basically the same for both applications; only the parameterization differed in that the time interval under consideration was short for motor monitoring and long for trajectory monitoring. Through limitation of the hardware,
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
different values were also derived for the remaining algorithm parameters. The accelerometer and gyroscope data with a sampling rate of 1 kHz each were used as input data. For the motor condition monitoring, the microphone data were also used as input data so as to include the acoustic peculiarities and thereby improve the anomaly detection accuracy. The results of the local calculation on the embedded platform are shown in Figure 3 and Figure 4. In both examples, the accelerometer and gyroscope
data, the locally derived features, and the locally calculated anomaly indicator are presented. This indicator increases sharply with new signal behavior and is much lower on reoccurrence; that is, the newly detected signal was considered and updated in the model by the learning algorithm. Dynamic Pose Estimation Using Model-Based Approaches Another fundamentally different approach is modeling by means of formulas and explicit relationships between the sensor data and the desired information. These approaches require the availability of physical background information or system behavior in the form of a mathematical description. These so-called model-based approaches combine the sensor data with this background information to yield a more precise result for the desired information. Some of the best known examples here are the Kalman filter (KF) for linear systems and the unscented Kalman filter (UKF), the
FIGURE 3
Vibration monitoring in AC motors on an embedded platform.
FIGURE 4
Trajectory monitoring in industrial robots on an embedded platform.
www.embedded-computing.com/machine-learning
Embedded AI & Machine Learning RESOURCE GUIDE 2020
17
AI INFERENCING & ALGORITHMS
extended Kalman filter (EKF), and particle filter (PF) for nonlinear systems. The selection of the filter strongly depends on the respective application. A typical algorithm pipeline for model-based approaches that can be implemented on embedded platforms such as the iCOMOX are composed of three components (see Figure 5): 1) outlier detection, 2) prediction step, and 3) filtering step. During outlier detection, sensor data far removed from the actual estimate of the system condition are either fractionally weighted or taken out completely in further processing. Through this, more robust data processing is achieved. In the prediction step, the current system condition is updated over time. This is done with the help of a probabilistic system model that describes a prediction of the
FIGURE 5
In the filtering step, the predicted system condition is then processed with a given measurement and the condition estimate thereby updated. There is a measurement equation equivalent to the system equation that enables the relationship between the system condition and the measurement to be described in a formula. For the position estimation considered here, this would be the relationship between the accelerometer and gyroscope data and the precise position of the sensor in space.
Model-based approaches for embedded platforms.
TRACE 32 ®
Debugger for RH850 from the automotive specialists
DEBUGGING RH850, ICU-M and GTM Debugging
NEXUS TRACING AD TO COME
AUTOSAR / Multicore Debugging Runtime Measurement (Performance Counter)
Code Coverage (ISO 26262) Multicore Tracing Onchip, Parallel and Aurora Tracing
www.lauterbach.com/1701.html
18
Embedded AI & Machine Learning RESOURCE GUIDE 2020
eec_rh850.indd 1
future system condition. This probabilistic system model is often derived from a deterministic system equation that describes the dependence of the future system condition on the current system condition as well as other input parameters and disturbances. In the example of condition monitoring in an industrial robot considered here, this would be the dynamic equation for the individual articulated arms, which only allow certain directions of motion at any point in time.
Combination of the data-driven and model-based approaches is both conceivable and advantageous for certain applications. The parameters of the underlying models for the modelbased approaches can, for example, be determined through the data-driven approaches or dynamically adapted to the respective environmental condition. In addition, the system condition from the model-based approach can be part of a feature vector for the data-driven approaches. However, all of this strongly depends on the respective application. The previously mentioned algorithm pipeline was implemented on the iCOMOX and evaluated for precise dynamic pose estimation in an industrial robot end effector. Accelerometer and gyroscope data with a sampling rate of 200 Hz each were used as input data. The iCOMOX was attached to the end effector of the industrial robot and its pose – consisting of position and orientation – was determined. The results are shown in Figure 6. As shown, the direct calculation leads to very fast reactions, but also to a large amount of noise www.embedded-computing.com/machine-learning
07.11.2018 12:21:20
with numerous outliers. An IIR filter, as is commonly used in practice, leads to a very smooth signal, but it follows the true pose very poorly. In contrast, the algorithms presented here lead to a very smooth signal where the estimated pose very precisely and dynamically follows the motion of the end effector of the industrial robot. Conclusion Ideally, through the corresponding local data analysis, the AI algorithms should also be able to decide themselves which sensors are relevant for the respective application and which algorithm is the best one for it. This means smart scalability of the platform. At present it is still the subject matter expert who must find the best algorithm for the respective application, even though the AI algorithms used here can already be scaled with minimal implementation effort for various applications for machine condition and structural health monitoring.
FIGURE 6
Precise dynamic angle estimation on an embedded platform. The implemented algorithm showed much better performance when compared to the direct calculation and IIR filtering.
Dzianis Lukashevich is the director of platforms and solutions at Analog Devices. His focus is on megatrends, emerging technologies, complete solutions, and new business models shaping the future of industries and transforming ADI business in the broad market. Dzianis joined ADI Sales and Marketing in Munich, Germany, in 2012. He received his Ph.D. in electrical engineering from Munich University of Technology in 2005 and M.B.A. degree from Warwick Business School in 2016. He can be reached at dzianis.lukashevich@analog.com.
The embedded AI should also make a decision regarding the quality of the data and, if it is inadequate, find and make the optimal settings for the sensors and the entire signal processing. If several different sensor modalities are used for the fusion, the weaknesses and disadvantages of certain sensors and methods can be compensated for by the use of an AI algorithm. Through this, the data quality and the system reliability are increased. If a sensor is classified as not or not very relevant to the respective application by the AI algorithm, its data flow can be accordingly throttled. The open embedded platform iCOMOX from Shiratech Solutions, Arrow, and Analog Devices is available through Arrow and contains a free software development kit and numerous example projects for hardware and software for accelerating prototype creation, facilitating development, and realizing original ideas. A robust and reliable wireless mesh network of smart sensors for condition-based monitoring can be created using multisensor data fusion and embedded AI. With it, big data is locally turned into smart data. EAI www.embedded-computing.com/machine-learning
Multicore Debugging & Real-Time Trace for Arm ® Cortex ®-A/-R/-M Infineon TriCore™ AURIX™ Renesas RH850
get there
Embedded AI & Machine Learning RESOURCE GUIDE 2020
19
AI INFERENCING & ALGORITHMS
Open Standards for Accelerating Embedded Vision and Inferencing: An Industry Overview By Neil Trevett, The Khronos Group The ever-advancing field of machine learning has created new opportunities for deploying devices and applications that leverage neural network inferencing with never-before-seen levels of vision-based functionality and accuracy. But, the rapidly evolving field has given way to a confusing landscape of processors, accelerators, and libraries. Many interoperating pieces need to work together to train a neural network and deploy it successfully on an embedded, accelerated inferencing platform. Effective neural network training typically takes large data sets, uses floating-point precision, and is run on powerful GPU-accelerated desktop machines or in the cloud. Once training is complete, the trained neural network is ingested into an inferencing runtime engine optimized for fast tensor operations, or a machine learning compiler that transforms the neural network description into executable code. Whether an engine or compiler is used, the final step is to accelerate the inferencing code on one of a diverse range of accelerator architectures ranging from GPUs through to dedicated tensor processors.
acceleration for demanding use cases such as 3D graphics, parallel computation, vision processing, and inferencing. As processor frequency scaling gives way to parallel programming as the most effective way to deliver needed performance at acceptable levels of cost and power, Khronos standards are being used increasingly in the field of vision and inferencing acceleration (Figure 1).
Every industry needs open standards to reduce costs and time to market through increased interoperability between ecosystem elements. Open standards and proprietary technologies have complex and interdependent relationships. Proprietary APIs and interfaces are often the Darwinian testing ground and can remain dominant in the hands of a smart market leader and that is as it should be. Strong open standards result from a wider need for a proven technology in the industry and can provide healthy, motivating competition. In the long view, an open standard that is not controlled by, or dependent on, any single company can often be the thread of continuity for industry’s forward progress as technologies, platforms, and market positions swirl and evolve.
Broadly, these standards can be divided into two groups: high-level and low-level. The high-level APIs focus on ease of programming with effective performance portability across multiple hardware architectures. In contrast, low-level APIs provide direct, explicit access to hardware resources for maximum flexibility and control. It is important that each project understand which level of API will best suit their development needs. Also, often the high-level APIs will use lowerlevel APIs in their implementation.
So, how can industry open standards can help streamline the confusing matrix of AI training and inferencing? Creating Embedded Machine Learning Applications The Khronos Group’s area of expertise is creating open, royalty-free API standards that enable software applications libraries and engines to harness the power of silicon
20
Let’s take a look at some of these Khronos standards in more detail.
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
THE KHRONOS GROUP
www.khronos.org
@TheKhronosGroup
www.linkedin.com/company/the-khronos-group/
Higher-level APIs
Streamlined programming and performance portability
Graph-based Vision and Inferencing Acceleration
Single source C++ Programming with Compute Acceleration
Lower-level APIs
Direct Hardware Control
GPU Rendering + Compute Acceleration
IMPORT Trained Neural Network Exchange Format
Heterogeneous Compute Acceleration
Intermediate Representation (IR) supporting Parallel Execution and Graphics
GPU
CPU
GPU
FPGA
DSP
Custom Hardware
FIGURE 1
Khronos standards used in accelerating vision and inferencing applications and engines.
SYCL-BLAS, SYCL-DNN, SYCL-Eigen, SYCL Parallel STL
C++ Libraries
Standard C++ Application Code
C++ Template Libraries
C++ Template Libraries
SYCL Compiler for OpenCL
C++ Kernel Fusion can give better performance on complex apps and libs than hand-coding
ML Frameworks
C++ Template Libraries
Complex ML Frameworks can be directly compiled and accelerated
C++ templates and lambda functions separate host & accelerated device code
CPU Compiler CPU
Accelerated code passed into device OpenCL compilers
CPU
GPU
FPGA
DSP
AI/Tensor HW Custom Hardware
FIGURE 2
SYCL splits a standard C++ application into CPU and OpenCL-accelerated code.
SYCL – C++ Single-Source Heterogeneous Programming SYCL (pronounced ‘sickle’) uses C++ template libraries to dispatch selected parts of a standard ISO C++ application to offload processors. SYCL enables complex C++ machine learning frameworks and libraries to be straightforwardly compiled and accelerated to performance levels that, in many cases, outperform hand-tuned code. As shown in Figure 2, by default, SYCL is implemented over the lower-level OpenCL standard API: feeding code for acceleration into OpenCL and the remaining host code through the system’s default CPU compiler. There are an increasing number of SYCL implementations, some of which use proprietary back-ends, such as NVIDIA’s CUDA for accelerated code. Significantly, Intel’s new oneAPI Initiative contains a parallel C++ compiler called DPC++ that is a conformant SYCL implementation over OpenCL. Neural Network Exchange Format (NNEF) There are dozens of neural network training frameworks in use today including Torch, Caffe, TensorFlow, Theano, Chainer, Caffe2, PyTorch, MXNet, and many more – and all use proprietary formats to describe their trained networks. There are also dozens, maybe even hundreds, of embedded inferencing processors hitting the market. Forcing that many hardware vendors to understand and import so many formats is a classic fragmentation problem that can be solved with an open standard. The NNEF file format is targeted at providing an effective bridge between the worlds of network training and inferencing silicon – where Khronos’ proven multi-company www.embedded-computing.com/machine-learning
YOUTUBE
www.youtube.com/user/khronosgroup
governance model gives the hardware community a strong voice on how the format evolves in a way that meets the needs of companies developing processor toolchains and frameworks, often in safety-critical markets. NNEF is not the industry’s only neural network exchange format, ONNX is an open source project founded by Facebook and Microsoft and is a widely adopted format that is primarily focused on the interchange of networks between training frameworks. NNEF and ONNX are complementary as ONNX tracks rapid changes in training innovations and the machine learning research community, while NNEF is targeted at embedded inferencing hardware vendors that need a format with a more considered roadmap evolution. Khronos has initiated a growing open source tools ecosystem around NNEF, including importers and exporters from key frameworks and a model zoo to enable hardware developers to test their inferencing solutions. OpenVX – Portable Accelerated Vision Processing OpenVX (VX stands for ‘vision acceleration’) streamlines the development of vision and inferencing software by providing a graph-level abstraction that enables a programmer to construct their required functionality through connecting a set of functions or ‘Nodes’. This high-level of abstraction enables silicon vendors to very effectively optimize their OpenVX drivers for efficient execution on almost any processor architecture. Over time, OpenVX has added inferencing functionality alongside the original vision Nodes – neural networks are just another graph after all! There is growing synergy between OpenVX and NNEF through the direct import of NNEFtrained networks into OpenVX graphs. OpenVX 1.3 was released in October 2019 and enables carefully selected subsets of the specification that are targeted at vertical market segments, such as inferencing, to be implemented and tested as officially conformant. OpenVX also has a deep integration with OpenCL that enables a programmer to add their
Embedded AI & Machine Learning RESOURCE GUIDE 2020
21
AI INFERENCING & ALGORITHMS
own custom-accelerated Nodes for use within an OpenVX graph – providing a unique combination of easy programmability and customizability. OpenCL – Heterogeneous Parallel Programming OpenCL is a low-level standard for cross-platform, parallel programming of diverse heterogeneous processors found in PCs, servers, mobile devices, and embedded devices. OpenCL provides C and C++-based languages for constructing kernel programs that can be compiled and executed in parallel across any processors in a system with an OpenCL compiler, giving explicit control over which kernels are executed on which processors to the programmer. The OpenCL run-time coordinates the discovery of accelerator devices, compiles kernels for selected devices, executes the kernels with sophisticated levels of synchronization, and gathers the results as illustrated in Figure 3.
GPU OpenCL Kernel Code
FIGURE 3
Kernel code compiled for devices
CPU
DSP FPGA
AI HW
CPU
Runtime API loads and executes kernels across devices
CPU Host
Devices
OpenCL enables C or C++ kernel programs to be compiled and executed in parallel across any combination of heterogenous processors.
OpenCL is used pervasively throughout the industry for providing the lowest ‘close-to-metal’ execution layer for compute, vision, and machine learning libraries, engines, and compilers. OpenCL was originally designed for execution on high-end PC and supercomputer hardware, but in a similar evolution to OpenVX, processors needing OpenCL are getting smaller, with less precision, as they target edge vision and inferencing. The OpenCL working group is working to define functionality tailored to embedded processors and to enable vendors to ship selected functionality targeted at key power- and cost-sensitive uses cases with full conformance. EAI Neil Trevett is Vice President of Developer Ecosystems at NVIDIA where he helps enable applications to take advantage of advanced GPU and silicon acceleration. Neil is also the elected President of the Khronos Group.
OpenSystems Media works with industry leaders to develop and publish content that educates our readers. Blending DSP and ML Features into a Low-Power General-Purpose Processor – How Far Can We Go? By Arm In this paper, we will look at how the Arm Cortex-M55 processor with Helium technology compares to features found on traditional DSPs, and some of the fundamental differences between VLIW (Very Long Instruction Word) architecture and the Helium approach to the processor’s pipeline design.
Check out our white papers at www.embedded-computing.com/ white-paper-library
22
Embedded AI & Machine Learning RESOURCE GUIDE 2020
Check out this white paper at http://bit.ly/DSPforAIWhitePaper
www.embedded-computing.com/machine-learning
IIoT devices run longer on Tadiran batteries.
PROVEN
40 YEAR OPERATING
LIFE
Remote wireless devices connected to the Industrial Internet of Things (IIoT) run on Tadiran bobbin-type LiSOCl2 batteries. Our batteries offer a winning combination: a patented hybrid layer capacitor (HLC) that delivers the high pulses required for two-way wireless communications; the widest temperature range of all; and the lowest self-discharge rate (0.7% per year), enabling our cells to last up to 4 times longer than the competition.
ANNUAL SELF-DISCHARGE TADIRAN
COMPETITORS
0.7%
Up to 3%
Looking to have your remote wireless device complete a 40-year marathon? Then team up with Tadiran batteries that last a lifetime.
* Tadiran LiSOCL2 batteries feature the lowest annual self-discharge rate of any competitive battery, less than 1% per year, enabling these batteries to operate over 40 years depending on device operating usage. However, this is not an expressed or implied warranty, as each application differs in terms of annual energy consumption and/or operating environment.
Tadiran Batteries 2001 Marcus Ave. Suite 125E Lake Success, NY 11042 1-800-537-1368 516-621-4980 www.tadiranbat.com
*
VOICE PROCESSING
A Step by Step Guide to Voice-Enabled Device Testing Keyur Shah & Dhaval Patel, eInfochips
It has been said that devices cannot do everything that humans can do. However, the devices that we use in our daily lives have been evolving over the last couple of decades. We have seen significant changes in them in terms on functionality, connectivity, and size. One of the biggest challenges has been the size of the device as more efforts are put in to achieving a smaller form factor. A few years ago, a new challenge emerged: devices cannot communicate like humans. This led to standalone devices being transformed into connected devices with added voice-enabled operation.
The Major Challenge of Voice-Enabled Devices Testing Day-by-day usage of voice-enabled devices around the world is increasing rapidly. Having more than 500 countries supporting more than 1,000 languages with different accents, genders, and voice modulations based on age group makes it challenging to verify voice-enabled devices. It is almost impossible to test these devices with so many different combinations and permutations in a short span of time. So, let us see how we can automate the testing of voice-enabled devices. Automating Voice-Enabled Device Testing To avoid manual testing efforts, we need to design an automation solution which can be used to test these devices using different languages. The easiest option is to work with frameworks that can help develop automation scripts for such voice-integrated devices. As of now, there are no open-source frameworks available in the market that provide all of the features required to test integration with voice-enabled devices. The challenges here are how to give a command to the device in different languages, how to read the response from the device, and test the expected output. ›› To give a command to a device without manual effort, one needs to identify a command in text format. ›› Convert the text in audio format. ›› Play the audio so that voice enabled devices can listen and process. ›› Wait for a response from the device, record it in an audio file, and as the last step you need to convert this audio into the text and match it against the expected format. Each device testing procedure will have custom requirements; hence, the framework has to be modular. To get the solution, we need to design a modular and scalable framework where each step of this solution can be implemented by open-source or paid libraries available in the market. We have designed four modules in the frame-work below: ›› Multi-Language Text Module: To convert text from one language to another language ›› Text-Audio Module: To convert text to .mp3 ›› Audio-Text Module: To convert .wav to text
24
›› Audio Module: • To play an .mp3 file using an audio output device • To read audio data using a mic • To save audio data to .wav file The detailed solution is outlined below in six simple steps. ›› Step 1. Prepare Device Command in the English Language – Use the Multi-Language Text Module to convert the device command into a language that can be understood by the device. It uses services provided by Google for translation, where you can translate text from any source language to any desired language. ›› Step 2. Create an Audio File for the Translated Text – Use the TextAudio Module to convert text to audio. Generated audio can be played on the audio output device. This module uses the Google textto-speech service in the backend. ›› Step 3. Play Audio – Use the Audio Module to play an .mp3 file to an audio output device. This step requires an audio output device and a voice-enabled device in proximity so that when audio is played, the device can capture the audio and process the command. ›› Step 4. Record Audio – This step is required to capture the response from a voice-enabled device. Use the Audio Module to capture the recording data from the mic. You
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
@einfochipsltd
need to define the duration and have the module return audio sample data. Once sample data is available, it needs to be saved as a .wav (audio) file. To achieve this, the save_audio_ to_file method can be used. This method takes sample audio data and writes it to a .wav file which can be played later using an audio device or used to convert it to text. â&#x20AC;şâ&#x20AC;ş Step 5. Convert Captured Audio to Text â&#x20AC;&#x201C; Use the Text-Audio Module to convert the .wav file to text content. This is achieved using a speech recognizer. You should specify the input .wav file and audio content language. To convert audio to text, third-party libraries provided by various vendors can be used. â&#x20AC;şâ&#x20AC;ş Step 6. Translate the above text to English and verify it against the expected result in English. Using these above 4 modules, one can implement voice automation for integrated voice-based devices. Real-World Testing of a Home Automation Product Home automation systems consist of various devices that can be operated through the web using REST services. Security cameras, lights, thermostats, sensors, and doorbells are few examples of home automation devices. For example, the end user, who is the homeowner, can turn on or off the light remotely using internet portals. Some of the systems provide integration with thirdparty partners like Alexa, Google, etc. Companies provide devices that can listen to the human voice and perform the action requested by the user. So, considering a light as a home automation product and Alexa as a third-party partner to a home automation system provider, we want to test whether the light can be turned on/off through Alexa or not. To automate the end-to-end scenario, we need to perform the below steps using the previously discussed automation framework.
www.linkedin.com/company/einfochips/
FACEBOOK @eInfochips
YOUTUBE
www.youtube.com/user/eInfochipsAnArrowCompany
(
#
$ %
'
&
www.einfochips.com
eInfochips, an Arrow Company
!
% %
FIGURE 1
This diagram illustrates the test case for a voice-enabled home automation system.
1. Prepare the Alexa command to turn on the light in the English language. a. â&#x20AC;&#x153;Alexa, turn on lightâ&#x20AC;?. 2. Convert above command to an .mp3 file. 3. Play an .mp3 file near the Alexa device using the speaker attached to the automation machine. 4. Record the response from Alexa in the .wav file. 5. Convert audio file to the text, which could be â&#x20AC;&#x153;Ok. Turning on the lightâ&#x20AC;? or â&#x20AC;&#x153;the light is turned onâ&#x20AC;?. 6. Verify the converted text with the expected result set. 7. In verification, we can also verify the actual IoT light status using:  a. REST APIs to fetch the light status from the home security system. b. Web automation of security portals to verify the light status. c. If the light status is being stored in a cloud database, we can fetch the data from the database to verify the status. Conclusion Using the above steps, one can not only test the system integration or end-to-end testing with just one voice-enabled device, but can also test the system by combining multiple clients or devices. Users can perform an action on one device/product using Alexa and verify its status using Google or a portal or vice-versa. For example, a user could ask Google to "turn on the light"and then fetch the light status using Alexa/ customer portals. eInfochips is a preferred partner for product companies requiring comprehensive test coverage from devices to applications. eInfochips offers extensive cost and effort savings through test automation, software development engineer in test (SDET) services, shift-left testing, and DevOps. EAI Dhaval Patel has more than 15 years of IT experience in software design, development, and automation testing to create products for the IoT-enabled platform and home security & automation domains. Keyur Shah is Sr. Tech lead with more than 11 years of experience in eInfochipsâ&#x20AC;&#x2122; Product Engineering Service business unit. He has experience across the full product lifecycle using technologies like Java, QT C++, Javascript, Python, Git, Jenkins, Jira, etc.
www.embedded-computing.com/machine-learning
Embedded AI & Machine Learning RESOURCE GUIDE 2020
25
Embedded AI & Machine Learning Resource Guide
Applications: Industrial Automation/Control
USB3-104-HUB – Rugged, Industrial Grade, 4-Port USB 3.1 Hub Designed for the harshest environments, this small industrial/military grade 4-port USB 3.1 hub features extended temperature operation (-40°C to +85°C), locking USB and power connections, and an industrial steel enclosure for shock and vibration mitigation. The OEM version (board only) is PC/104-sized and can easily be installed in new or existing PC/104-based systems as well. The USB3-104-HUB makes it easy to add USB-based I/O to your embedded system or to connect peripherals such as external hard drives, keyboards, GPS, wireless, and more. Real-world markets include Industrial Automation, Security, Embedded OEM, Laboratory, Kiosk, Military/Mission Critical, Government, and Transportation/Automotive. This versatile four-port hub can be bus powered or self (externally) powered. You may choose from two power inputs (power jack and terminal block) to provide a full 900mA source at 5V on each of the downstream ports. Additionally, a wide-input power option exists to accept from 7VDC to 28VDC. All type A and type B USB connections feature a locking, high-retention design.
ACCES I/O Products, Inc. www.accesio.com
FEATURES Ą Rugged, industrialized, four-port USB 3.1 hub Ą USB 3.1 Gen 1 with data transfers up to 5Gbps (USB 2.0 and 1.1 compatible) Ą Extended temperature (-40°C to +85°C) for industrial/military grade applications Ą Locking upstream, downstream, and power connectors prevent accidental disconnects Ą SuperSpeed (5Gbps), Hi-speed (480Mbps), Full-speed (12Mbps), and Low-speed (1.5Mbps) transfers supported Ą Supports bus-powered and self-powered modes, accessible via DC power input jack or screw terminals Ą LED for power, and per-port RGB LEDs to indicate overcurrent fault, High-Speed, and SuperSpeed Ą Wide input external power option accepts from 7-28VDC Ą OEM version (board only) features PC/104 module size and mounting compatibility
contactus@accesio.com
linkedin.com/company/acces-i-o-products-inc.
858-550-9559 twitter.com/accesio
Applications: Industrial Automation/Control
mPCIe-ICM Family PCI Express Mini Cards The mPCIe-ICM Series isolated serial communication cards measure just 30 x 51 mm and feature a selection of 4 or 2 ports of isolated RS232/422/485 serial communications. 1.5kV isolation is provided port-to-computer and 500V isolation port-to-port on ALL signals at the I/O connectors. The mPCIe-ICM cards have been designed for use in harsh and rugged environments such as military and defense along with applications such as health and medical, point of sale systems, kiosk design, retail, hospitality, automation, and gaming. The RS232 ports provided by the card are 100% compatible with every other industry-standard serial COM device, supporting TX, RX, RTS, and CTS. The card provides ±15kV ESD protection on all signal pins to protect against costly damage to sensitive electronic devices due to electrostatic discharge. In addition, they provide Tru-Iso™ port-to-port and port-to-PC isolation. The serial ports on the device are accessed using a low-profile, latching, 5-pin Hirose connector. Optional breakout cables are available, and bring each port connection to a panel-mountable DB9-M with an industry compatible RS232 pin-out. The mPCIe-ICM cards were designed using type 16C950 UARTS and use 128-byte transmit/receive FIFO buffers to decrease CPU loading and protect against lost data in multitasking systems. New systems can continue to interface with legacy serial peripherals, yet benefit from the use of the high performance PCI Express bus. The cards are fully software compatible with current PCI 16550 type UART applications and allow for users to maintain backward compatibility.
ACCES I/O Products, Inc. www.accesio.com
26
FEATURES Ą PCI Express Mini Card (mPCIe) type F1, with latching I/O connectors Ą 4 or 2-port mPCIe RS232/422/485 serial communication cards Ą Tru-Iso™ 1500V isolation port-to-computer and 500V isolation
port-to-port on ALL signals
Ą High performance 16C950 class UARTs with 128-byte FIFO for each
TX and RX
Ą Industrial operating temperature (-40°C to +85°C) and RoHS standard Ą Supports data communication rates as high as 3Mbps – 12MHz with Ą Ą Ą Ą
custom crystal Custom baud rates easily configured ±15kV ESD protection on all signal pins 9-bit data mode fully supported Supports CTS and RTS handshaking
contactus@accesio.com linkedin.com/company/acces-i-o-products-inc.
858-550-9559 twitter.com/accesio
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
ADLEPC-1700 Compact Industrial PC The newly-released ADLEPC-1700 is a rugged, compact, industrialgrade computer with highly expandable customer configurable I/O options, constructed from 6063 aluminum, with thick-walled design and a fanless, conduction-cooled CPU for industrial temperature operation. Its compact size and light weight make it ideal for deployment into existing industrial infrastructure such as alternative energy, smart power grid, and Industry 4.0 plants and refineries for AI-driven applications. Custom I/O and Power The ADLEPC-1700 is highly customizable and can easily be adapted for particular customer needs including Wi-Fi, CAN, RS232/422/485, MILCOTS power, MIL-STD-1553, ARINC, and much more.
FEATURES Ą Ą Ą Ą Ą Ą Ą
Small, compact footprint Strong Customer Customizable I/O options Intel® E3900-Series Atom processors Wide Temperature Up to 15-year availability Onboard and mPCIe expansion features available Custom Options: Company logos, paint and designs available
APPLICATIONS
ADL Embedded Solutions - Smarter By Design CONTACT US FOR MORE INFORMATION
ADL Embedded Solutions, Inc. www.adl-usa.com
Ą Ą Ą Ą Ą Ą Ą Ą
Industrial IoT (IIoT) Network and Cloud Computing Cyber Security Edge Devices for ICS and SCADA threat security Secure Networking (Secure Routing, Traffic Monitoring and Gateways) Intelligent Machinery and Equipment Controllers Unmanned or Auonomous Vehicle Mission/Payload Computing Traffic Engineering Wind Turbine Datalogging and Collision Avoidance Oil and Gas IPC Controller
sales@adl-usa.com
www.linkedin.com/company/adl-embedded-solutions
855-727-4200
@ADLEmbedded
Applications: Industrial Automation/Control
ARK-1551 Intel® 8th Generation Core™ i5/Celeron Slim Fanless Computer ARK-1551, a compact high-performance fanless computer featuring an Intel® i5-8365UE/Celeron® 4305UE system-on-chip (SoC). ARK-1551 supports multiple I/O to facilitate easy integration in harsh limited-space environments. This embedded computer complies with global certifications (CE/FCC/UL/CB/BSMI/CCC), accepts 12~24V power (-10%/+20% input), and features a vibration/shock-resistant design capable of functioning in broad temperature ranges (-20 ~ 55 °C/-4 ~ 131 °F). When paired with Advantech’s edge AI VEGA modules, this fanless computer can function as an AI inference system that satisfies demands for versatile AI image recognition. Because of its compact size, excellent computing power, durability, and ability to function in harsh environments, ARK-1551 are ideal for applications in factory/machine automation control cabinets, industrial equipment computers, and semioutdoor signage.
FEATURES Ą Intel® 8th Generation Core™ i5/Celeron Slim Fanless Computer Ą Features ARK-1551 ƒ 8th Gen Intel® Core™ i5/Celeron Ą Dual DDR4 2400MHz SO-DIMM memory up to 32G Ą 1 x swappable 2.5" SATA HDD drive bay and 1 x mSATA slot, sup-
ports Intel software RAID 0/1
Ą 4 x USB 3.1 Gen 2, 2 x Intel GbE LAN, 4 x RS-232/422/485, 8 bit
GPIO
Ą 4K2K HDMI and VGA dual independent displays Ą 1 x full-size mPCIe with SIM holder and 1 x M.2 2230 E Key
Data sheet: https://advdownload.advantech.com/productfile/PIS/ARK-1551/file/ARK-1551_DS(073120)20200731110950.pdf Specifications: https://www.advantech.com/products/1-2jkbyz/ark-1551/mod_47d30ee7-28b6-41bc-83a1-a7ca416e68cd
Advantech Corporation
https://advantechusa.com/automation/ www.embedded-computing.com/machine-learning
eisystems@advantech.com www.linkedin.com/company/advantechusa/
twitter.com/Advantech_USA 949-420-2500
Embedded AI & Machine Learning RESOURCE GUIDE 2020
27
Embedded AI & Machine Learning Resource Guide
Applications: Industrial Automation/Control
Embedded AI & Machine Learning Resource Guide
Applications: Industrial Automation/Control
A FINE TECHNOLOGY GROUP
cPCI, PXI, VME, Custom Packaging Solutions VME and VME64x, CompactPCI, or PXI chassis are available in many configurations from 1U to 12U, 2 to 21 slots, with many power options up to 1,200 watts. Dual hot-swap is available in AC or DC versions. We have in-house design, manufacturing capabilities, and in-process controls. All Vector chassis and backplanes are manufactured in the USA and are available with custom modifications and the shortest lead times in the industry. Series 2370 chassis offer the lowest profile per slot. Cards are inserted horizontally from the front, and 80mm rear I/O backplane slot configuration is also available. Chassis are available from 1U, 2 slots up to 7U, 12 slots for VME, CompactPCI, or PXI. All chassis are IEEE 1101.10/11 compliant with hot-swap, plug-in AC or DC power options.
FEATURES Ą
Made in the USA
Ą
Most rack accessories ship from stock
Our Series 400 enclosures feature side-filtered air intake and rear exhaust for up to 21 vertical cards. Options include hot-swap, plug-in AC or DC power, and system voltage/ temperature monitor. Embedded power supplies are available up to 1,200 watts.
Ą
Series 790 is MIL-STD-461D/E compliant and certified, economical, and lighter weight than most enclosures available today. It is available in 3U, 4U, and 5U models up to 7 horizontal slots.
Modified ‘standards’ and customization are our specialty
Ą
Card sizes from 3U x 160mm to 9U x 400mm
Ą
System monitoring option (CMM)
Ą
AC or DC power input
Ą
Power options up to 1,200 watts
All Vector chassis are available for custom modification in the shortest time frame. Many factory paint colors are available and can be specified with Federal Standard or RAL numbers.
VISIT OUR NEW WEBSITE! WWW.VECTORELECT.COM
For more detailed product information,
QUALITY SYSTEMS PACKAGING AND PROTOTYPE PRODUCTS
please visit www.vectorelect.com or call 1-800-423-5659 and discuss your application with a Vector representative. Vector Electronics & Technology, Inc. www.vectorelect.com
28
Made in the USA Since 1947
inquire@vectorelect.com
800-423-5659
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
Video Analytics Solution
FEATURES Ą Advanced AI Analytics Software
Ą Integrates with existing security cameras
Ą Provides actionable intelligence in health, retail, manufacturing and
logistics
Ą AMD EPYC CPU
Ą Up to 2TB DDR4 memory
Ą Up to two M.2 storage devices
Ą Up to three PCI Express (PCIe) slots, tested with NVIDIA GPU cards
The SMART Edge Server (SE1700) combines with hosted AI visual analysis software to create a video analytics platform that can be used in a wide range of industries including retail, manufacturing, transportation, logistics, public safety and healthcare. A single 1U server can process up to eight cameras and, in a retail implementation, offers brick-and-mortar retailers the potential for a similar level of shopper insights as those used by the most sophisticated e-commerce sites. Retailers can understand their customers, refine their product offerings, optimize store layouts, and streamline operations – all without storing personally identifiable information. The AI video analytics software powered by the SE1700 SMART Edge Server can be applied to an infection screening application in workplaces, buildings and public spaces. It can integrate into existing security systems to widen the use of functions such as mask wearing, hand sanitization, social distancing, cleaning routines and contact tracing. Using thermal cameras, it can also provide incredibly accurate non-contact elevated body temperature readings for viral infection screening. In manufacturing and industry, this video analytics solution can simplify inventory management, enable predictive maintenance, detect product faults before the product has left the factory and help further optimize manufacturing processes.
www.smartembedded.com/ec/solutions/retail-video-analytics
SMART Embedded Computing www.smartembedded.com
news@smartembedded.com
602-438-5720
www.linkedin.com/company/smart-embedded-computing/
Edge AI
DynaCOR 40-35 – Rugged Data Logger for Edge AI The DynaCOR 40-35 is a high-performance rugged data logger for Edge deployment. Featuring up to 123TB of NVMe storage, 80 Gb/s of sustained writing speed and 100Gbs interfaces, it is designed to support sensors fusion requirements in data-intensive applications that demand high-speed raw data transfer from a variety of sensors, cameras, RADAR, LIDAR, etc. The DynaCOR 40-35 is available in pre-configured and custom variants, which allows meeting the requirements of Industrial use cases. Eurotech Professional Services are available to customers needing application-specific peripherals and/or dedicated interfaces like: CAN-bus, Camera-link or MIPI CSI2 for ultra high speed cameras and RADARs. Eurotech Professional services also provide a complete path to get additional application specialized certifications. The DynaCOR family has a proven track record of reliable operation in harsh environmental conditions such as those found in manufacturing facilities and mobile installations. The DynaCOR 40-35 is part of the Eurotech family of HPEC products and therefore it can naturally be integrated with other products such as the DynaCOR 50-35 and the high-performance switches DynaNET 100G-01 and DynaNET 10G-01, to build high-performance computing, data logging and networking architectures for unmatched performance in Edge applications where space is at a premium. An advanced and original, direct hot-water cooling system keeps all the internal components at an ideal temperature, regardless of the environmental conditions. This allows a much better performance and density compared to traditional fan-based cooling systems and the deployment in recesses without air flow/exchange or in applications where noise is not acceptable. Moreover, the DynaCOR 40-35 features a wide-range power supply (9-58 VDC) and a system management and monitoring unit allows safe boot, operation and system shut down. The system is designed and certified to endure wide temperature ranges, strong vibration and humidity levels and unstable power supplies, like those found in Industrial applications.
FEATURES Ą Ą Ą Ą
High-performance Data Logging for Edge AI 123TB NVMe Storage Ą Liquid Cooled 80 Gb/s Sustained Write Speed Ą Customizable Rugged for Heavy-duty Applications
www.eurotech.com/en/products/high-performance-embedded-computing-hpec/hpec-systems/dynacor-40-35
Eurotech
www.eurotech.com www.embedded-computing.com/machine-learning
welcome@eurotech.com www.linkedin.com/company/eurotech
0433 485 411 @eurotechfan
Embedded AI & Machine Learning RESOURCE GUIDE 2020
29
Embedded AI & Machine Learning Resource Guide
Applications: Medical
Embedded AI & Machine Learning Resource Guide
Hardware Modules/Systems for Machine Learning
FEATURES The new conga-TC370 COM Express Type 6 modules, the conga-JC370 embedded 3.5 inch SBCs, and the conga-IC370 Thin Mini-ITX motherboards all feature:
Faster Innovation and time to market
Ą The latest Intel® Core™ i7, Core™ i5, Core™ i3 and Celeron
With three optimized form factors for designers to choose from, congatec delivers a simpler, efficient way to harness the benefits of 8th Gen Intel® Core™ U-series processors for IoT. These products draw from congatec’s deep expertise in embedded and industrial design to offer an enriched feature set, along with long product availability, hardware and software customization, and value-added design support. As a result, OEMs and ODMs can build high-performing solutions with less development time and cost.
Ą The memory is designed to match the demands of consolidating
embedded processors with a long-term availability of 10+ years. multi OS applications on a single platform: Two DDR4 SODIMM sockets with up to 2400 MT/s are available for a total of up to 64GB.
Ą USB 3.1 Gen2 with transfer rates of 10 Gbps is supported
natively, which makes it possible to transfer even uncompressed UHD video from a USB camera or any other vision sensor.
Ą Supports a total of 3 independent 60Hz UHD displays with up
Performance at the edge Specially designed for embedded use conditions in which space and power are limited, 8th Gen Intel Core U-series processors provide high performance for edge devices with up to four cores. This enables a wide range of designs at 15W TDP, configurable down to 12.5W.
to 4096x2304 pixels as well as 1x Gigabit Ethernet (1x with TSN support).
Ą The new boards and modules offer all this and many more
interfaces with an economical 15W TDP that is scalable from 10W (800 MHz) to 25W (up to 4.6 GHz in Turbo Boost mode).
congatec products based on these processors deliver high-quality visual, audio, and compute capabilities with integrated graphics and high-definition media support.
• Ensure exceptional graphics performance while helping lower BOM costs with integrated Gen 9.5 Intel® Graphics with up to 24 execution units.
• Deliver on rising expectations for video
performance with 4K/UHD content support, plus accelerated 4K hardware media codecs. Designs can support up to three displays.
• Develop media and video applications
with the Intel® Media SDK, which provides tools and an API enabling hardware acceleration for fast video transcoding, image processing, and media workflows.
• Create better audio experiences with enhanced speech and audio
quality from microphones, voice activation and wake from standby, and enhanced playback with Intel® Smart Sound Technology and Intel’s programmable quad-core audio DSP, designed for low power consumption.
congatec products based on 8th Gen Intel Core U-series processors also help bring artificial intelligence (AI) to more places. With high processing and integrated graphics performance, combined with the optimized Intel® Distribution of OpenVINO™ toolkit, these processors improve inference capabilities like facial recognition, license plate recognition, people counting, and fast and accurate anomaly detection on manufacturing lines.
www.congatec.us
congatec
www.congatec.us
30
sales-us@congatec.com www.linkedin.com/company/congatec
858-457-2600 twitter.com/congatecAG
Embedded AI & Machine Learning RESOURCE GUIDE 2020 www.embedded-computing.com/machine-learning
®
Solid State Storage and Memory
Industrial-Grade Solid State Storage and Memory Virtium manufactures solid state storage and memory for the world’s top industrial embedded OEM customers. Our mission is to develop the most reliable storage and memory solutions with the greatest performance, consistency and longest product availability. Industry Solutions include: Communications, Networking, Energy, Transportation, Industrial Automation, Medical, Smart Cities and Video/Signage. StorFly® SSD Storage includes: M.2, 2.5", 1.8", Slim SATA, mSATA, CFast, eUSB, Key, PATA CF and SD.
Features
Classes include: MLC (1X), pSLC (7X) and SLC (30X) – where X = number of entire drive-writes-perday for the 3/5-year warranty period.
• 22 years refined U.S. production and
Memory Products include: All DDR, DIMM, SODIMM, Mini-DIMM, Standard and VLP/ULP. Features server-grade, monolithic components, best-in-class designs, and conformal coating/under-filled heat sink options.
New! XR (Extra-Rugged) Product Line of SSDs and Memory: StorFly-XR SSDs enable multi-level protection in remote, extreme conditions that involve frequent shock and vibration, contaminating materials and/or extreme temperatures. Primary applications are battlefield technology, manned and unmanned aircraft, command and control, reconnaissance, satellite communications, and space programs. Also ideal for transportation and energy applications. Currently available in 2.5" and Slim-SATA formats. Include: custom ruggedization of key components, such as ultra-rugged connectors and screwdown mounting, and when ordered with added BGA under-fill, can deliver unprecedented durability beyond that of standard MIL-810-compliant solutions. XR-DIMM Memory Modules have the same extra-rugged features as the SSDs, and include heatsink options and 30μ" gold connectors. They also meet US RTCA DO-160G standards.
Virtium
www.virtium.com www.embedded-computing.com/machine-learning
sales@virtium.com www.linkedin.com/company/virtium
• Broad product portfolio from latest technology to legacy designs 100% testing
• A+ quality – backed by verified yield, on-time delivery and field-defects-per-million reports • Extreme durability, iTemp -40º to 85º C • Industrial SSD Software for security, maximum life and qualification • Longest product life cycles with cross-reference support for end-oflife competitive products • Leading innovator in small-formfactor, high-capacity, high-density, high-reliability designs • Worldwide Sales, FAE support and industry distribution
949-888-2444 twitter.com/virtium
Embedded AI & Machine Learning RESOURCE GUIDE 2020
31
Embedded AI & Machine Learning Resource Guide
Hardware Modules/Systems for Machine Learning