Cost to Build Salon App with Appointment Booking Feature

www.hiddenbrains.co.uk

TRANSITION OF SIRIâ&#x20AC;&#x2122;S VOICE FROM ROBOTIC TO HUMAN: NOTE THE DIFFERENCE

TRANSITION OF SIRI’S VOICE

Being an iOS user, how many times do you talk to Siri in a day? A good many times, isn’t it? If you are a keen observer, then you know that Siri’s voice sounds much more like a human in iOS 11 than it has before. This is because Apple is digging deeper into the technology of artificial intelligence, machine learning, and deep learning to offer the best personal assistant experience to its users. From the introduction of Siri with the iPhone 4S to its continuation in iOS 11, this personal assistant has evolved to get closer to humans and establish good relations with them. To reply to voice commands of users, Siri uses speech synthesis combined with deep learning.

www.hiddenbrains.co.uk

SPEECH SYNTHESIS: AN INTEGRAL PART OF SIRIâ&#x20AC;&#x2122;S FUNCTIONING

Speech synthesis is basically the artificial production of human speech. This technology is quintessential in several domains including virtual personal assistants, games, and entertainment. While several advancements have been made to the basic models of unit selection and parametric synthesis, deep learning has penetrated into it deeper.

The integration of this technology in speech synthesis has given rise to a new model known as direct waveform modeling. With this model, it is now possible to process high-quality unit selection synthesis and also avail the benefit of flexibility with parametric synthesis.

www.hiddenbrains.co.uk

HOW THE TEXT-TO-SPEECH SYSTEM (TTS) WORKS

Recording the Voices of Humans for Possible Instances

1

The first major task in making a text-to-speech system for virtual personal assistants is to record voice of a human. This voice should not only be pleasant to hear but should also be very clear to understand for everyone.

www.hiddenbrains.co.uk

HOW THE TEXT-TO-SPEECH SYSTEM (TTS) WORKS

2

Bifurcation of Speech Units

The recorded speech of humans is divided into several components and later joined together as per the received text for creating a perfect response. Optimizing speech units for specific devices or making them compatible for an array of devices requires analyzing the acoustic characteristics of each phone and prosody of speech.

www.hiddenbrains.co.uk

HOW THE TEXT-TO-SPEECH SYSTEM (TTS) WORKS

3

Use of Machine Learning

Though it sounds like just another process, it is quite difficult and challenging for developers to get the pattern of stress and intonation (prosody) perfectly. Further, it is too heavy for a mobile phone to go with this method of stringing. These challenges are solved to an extent with the introduction of machine learning. By gathering data for training, it is possible to make the text-to-speech system understand the pattern and how to divide different elements of audio for delivering natural human-like output.

www.hiddenbrains.co.uk

APPLE’S EFFORTS IN IMPROVING SIRI’S VOICE

Once they decided to work rigorously to improve Siri’s voice, engineers at Apple worked with a female voice actor to record 20 hours of speech in US accent English. These 1-2 million audio segment recordings were then used to train the deep learning system. Next, they tested the output by making subject choose from previous and new voices of Siri. The majority of them preferred the new natural and human-like Siri voice. They noticed a clear difference from a robotic to a natural voice when Siri responded to trivia questions, acknowledged "request completed" notifications, and provided other navigation instructions.

www.hiddenbrains.co.uk

APPLE’S EFFORTS IN IMPROVING SIRI’S VOICE The following graph shows the result of AB pairwise subjective listening tests:

www.hiddenbrains.co.uk

WHEN WILL USERS GET TO EXPERIENCE THE NEW VOICE OF SIRI?

iPhone 8 will be the first Apple phone to come with iOS 11 and the new voice of Siri. The latest iPad release will also feature the new personal assistant voice. Apple never stops experimenting with technology to discover new possibilities. Now that the voice of Siri is improved, Apple is now in the observation phase to know the reaction of end users.

Artificial intelligence and deep learning are strengthening their roots in terms of usage in virtual personal assistants and other applications. The future seems quite bright for these technologies, as people are reacting positively to it.

www.hiddenbrains.co.uk

Turn static files into dynamic content formats.

Create a flipbook