4 minute read
Bringing Real-Time Object Detection to MCUs with Edge Impulse FOMO
Bringing Real-Time Object Detection
to MCUs with Edge Impulse FOMO
By Jan Jongboom, Edge Impulse
We humans rely heavily on sight to perform many daily tasks, from some of the most basic to the most complex. With one look, we know if there are people in the room with us, if there’s an elephant nearby, or how many free parking spaces are available. Despite the importance of vision, though, many embedded devices still can’t perceive things visually. Wouldn’t it be amazing if we could teach all our devices to see the world the way we do?
In recent years, there have been some amazing developments in computer vision, fueling progress in things like self-driving cars and biometric immigration gates (very useful if, like me, you travel a lot!). But these use cases are incredibly computationally expensive, requiring costly GPUs or special accelerators to run.
The awesome thing is that not all computer-vision tasks require such intensive compute. Any yes/no question (“Do I see an elephant?,” “Is this label properly attached to the bottle?”) can add tremendous value to constrained embedded devices. What’s more, these problems of image classification can even be solved by today’s microcontrollers. Imagine if we could add even more advanced vision capabilities to every embedded device!
Figure 1: FOMO classification within Edge Impulse Studio.
Say Hello to FOMO
We’re making it a reality. We developed a novel neural network architecture for object detection called Faster Objects, More Objects, or FOMO (Figure 1). It’s designed from the ground up to run in real-time on microcontrollers, so embedded engineers can (ahem) avoid the fear of missing out when it comes to computer vision.
Fast, Lean & Flexible
FOMO is capable of running on a 32-bit MCU, like an Arm Cortex-M7, with a frame rate of 30 frames per second. And the next time you choose a Raspberry Pi 4 type device, you’ll be able to do object detection at a rate of about 60 frames a second. That’s roughly 30 times faster than MobileNet SSD or YOLOv5.
Figure 2: Run object detection on a wide variety of dev boards, including the Arduino Portenta. Figure 3: Here’s a former iteration of the FOMO approach used to count individual bees.
Figure 4: Training on the centroids of beer bottles. On top the source labels, at the bottom the inference result.
FOMO scales down to about 100 kilobytes in RAM, making it possible to run object detection in real-time on everything from highly-constrained Arm Cortex-M4 cores to more powerful ones, like the Cortex-M7 cores on the Arduino Portenta H7 (Figure 2), the new Arduino Nicla Vision (another dual Arm Cortex-M7/M4 CPU), or even specialized DSPs such as the Himax WE-I.
FOMO can scale from the tiniest microcontrollers all the way to full gateways or GPUs. This high degree of flexibility also makes FOMO useful when fault detection requires identifying variations that are very, very small within an image.
In an MCU with strictly limited compute and memory capacity, it’s best to use an image size of about 96x96 pixels. But with a larger microcontroller device, 160x160 pixels is probably fine. The important thing is that FOMO is fully convolutional, so it works on any arbitrary input size. If you need higher granularity, more detail, or more objects, you can just scale up the input resolution.
It Sees the Little Stuff
As long as the objects in the frame are of similar size and don’t overlap, this new architecture can even spot and count lots of very small objects very effectively (Figure 3). That’s something that MobileNet SSD and YOLOv5, despite being larger and more capable models, can’t do very well.
No More Missing Out
FOMO is available today, runs on a wide variety of computing platforms, and is compatible with Linux systems, Cortex-M microcontrollers, and specialized DSPs. Add a camera and Edge Impulse, and you’re all set.
With FOMO, you can quickly add object detection to just about any camera-based device, and avoid the fear of missing out that, until now, embedded engineers have had to deal with when it came to computer vision (Figure 4).
To learn more about FOMO and experiment with your own algorithm, visit edgeimpulse.com/fomo.
220207-01
About the Author
Jan Jongboom is an embedded engineer and machine learning advocate, always looking for ways to gather more intelligence from the real world. He has shipped devices, worked on the latest network tech, simulated microcontrollers and there’s even a monument in San Francisco with his name on it. Currently he serves as the cofounder and CTO of Edge Impulse, the leading development platform for embedded machine learning with 80,000+ projects.