Laurent PILATI — NXP — Audio Machine Learning for Mobility

Page 1

Audio Machine Learning for Mobility Laurent Pilati - Engineering Manager - Voice Audio Solution

November 2018

Company Public – NXP, the NXP logo, and NXP secure connections for a smarter world are trademarks of NXP B.V. All other product or service names are the property of their respective owners. Š 2018 NXP B.V.


IA for Mobility: Myth and reality of Machine Learning

1

PUBLIC


IA for Mobility: Myth and reality of Machine Learning

https://mapr.com/blog/demystifying-ai-ml-dl/

2

PUBLIC


IA for Mobility: AI missing a Common Sense

Frequent Word

Proper name

Segmentation of Words from sentences Typical Word stress

Language prosody Segmentation In large clauses Typical vowels Typical consonant

Segmentation In small clauses Phototactic

Phototactic illusion Loss of non native contrast

Emmanuel Dupoux - EHESS, Paris 15 janvier 2018

3

PUBLIC


IA for Mobility: Importance of Common sense 1. Speech Recognition in Car

4

PUBLIC


IA for Mobility: Importance of Common sense Speech Recognition in Car: Lombard effect

The Lombard effect with Thai lexical tones: an acoustic analysis of articulatory modifications in noise - Benjawan Kasisopa, Virginie Attina, Denis 2014 in INTERSPEECH

5

PUBLIC


IA for Mobility: Importance of Common sense Speech Recognition in Car: Lombard effect

Non-Lombard Lombard Compensated Lombard

The impact of the Lombard effect on audio and visual speech recognition systems - Ricard Marxer Jon Barker Najwa Alghamdi SteveMaddock

6

PUBLIC


IA for Mobility: Importance of Common sense 2. Source separation through Neuro Evolution Bias

Output Evolved network

Input

Raw signal

Enhanced signal

1. End to End approach: Let the Neural Network to find the best discriminative features 2. Fitness score 3. Small networks Evolving recurrent neural networks that process and classify raw audio in a streaming fashion - Adrien Daniel - INTERSPEECH 2017

7

PUBLIC


IA for Mobility: Importance of Common sense 2. Source separation through Neuro Evolution

Speech vs. Music

8

PUBLIC


IA for Mobility: Importance of Common sense 2. Audio source separation through Neuro Evolution Talker B

Talker A

Talker A

9

PUBLIC


IA for Mobility: Importance of Common sense

Actual Class

3. Audio Context Recognition

Predicted class 10

PUBLIC


IA for Mobility: Importance of Common sense 3. Audio Context Recognition: Machine Hit-rate vs. time Total Accuracy vs Time Analysis 88 86

Total Accuracy [%]

84 82 80 78 76 74 72 70 0

11

PUBLIC

5

10

15 Time analysis [s]

20

25

30


IA for Mobility: Importance of Common sense 3. Audio Context Recognition: Human Hit-rate vs. time

AES 110th Convention: Recognition of Everyday Auditory Scenes: Potentials, Latencies and Cues – Peltonen et al.

12

PUBLIC


NXP vision on AI: Video, Voice, Vibration i.MX 8QuadMax – Edge Computing Node for Vision

ICS

GPU1

GPU0

Collect Video

Machine Vision Object Detection

CNN Neural Net Classifier

i.MX 8QuadMax A complete Machine Vision and Neural Net Processing Edge Node

13

PUBLIC

CPU

Machine Decision


NXP vision on AI: Video, Voice, Vibration 1. Voice Activity Detection for Speech Enhancement

14

PUBLIC


NXP vision on AI: Video, Voice, Vibration 1. Multi modal: Voice Activity Detection for Speech Processing:

15

PUBLIC


NXP vision on AI: Video, Voice, Vibration 1. Multi modal: Voice Activity Detection for Speech Processing:

16

PUBLIC


NXP vision on AI: Video, Voice, Vibration 1. Multi modal: Voice Activity Detection for Speech Processing: Wind Noise Reduction

Samsung S7 output

WNR processing combining Audio + Video

Note: Same context of recording – but not same microphone

17

PUBLIC


NXP vision on AI: Video, Voice, Vibration 2. Multi modal: Voice Activity Detection for Speech Recognition in Car

18

PUBLIC


NXP vision on AI: Video, Voice, Vibration 2. Multi modal: Voice Activity for Speech Recognition

DOA

Source: DOA algorithm module from Interview Mode 2.0 - Thomas Camier

19

PUBLIC


NXP vision on AI: New AI / IoT era will create new Industry leaders “The 4th Tectonic Shift in Computing”

Security

Edge Processing Machine Learning

Connected Data Source: Jeffer

20

PUBLIC


A Position of Strength to Better Serve Our Customers 7TH

largest semiconductor company2 11,000 engineers

Operations in 33 countries Headquarters: Eindhoven, Netherlands

21

Automotive

PUBLIC

60+ year history $9.3B annual revenue3

31,000 employees

#1

9,000 patent families

#1

Broad-based MCUs1

#1

Secure Identification

#1

Communications Processors

#1

RF Power Transistors

Sources: HIS, ABI Research, Strategy Analytics, The Linley Group 1) MCU market excluding Automotive 2) Excludes memory 3) Posted revenue for 2017


NXP and the NXP logo are trademarks of NXP B.V. All other product or service names are the property of their respective owners. Š 2018 NXP B.V.


BACK-UP

23

PUBLIC


False Positive with Face recognition algorithm

24

PUBLIC


SophIA – November 7-8-9 • • • •

25

PUBLIC

Technical Session Master Class Keynote Talk


NXP and the NXP logo are trademarks of NXP B.V. All other product or service names are the property of their respective owners. Š 2018 NXP B.V.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.