AUGUST 19–21 MARRIOTT MARQUIS 2013 NEW YORK CITY
Super Early Bird Pricing For a limited time, register for an All Access Pass before the program is released and save $300. Regular registration opens soon!
w w w. c u s t s e r v e x p e r i e n c e . c o m
Co-located with
Gold Sponsor
Platinum Sponsor
Smartphone Interactive Users
SCAN HERE
@CSE_Con #CustSE
Organized and produced by
CONTENTS Spring 2013 Volume 18, Number 1
> > F E AT U R E S
20
The Government Is Listening
>> COVER STORY
Speech technology sees government growth spurt.
14
BY MICHELE MASTERSON
36
Speech Finds a Home
Speech Developer Programs Have Hit Prime Time
Talking to the walls might not mean you’re crazy.
From the novice to the experienced, it’s boom time for developers.
BY LEONARD KLIE
BY MICHELE MASTERSON
>> FYI
9
Speech Pioneer Ray Kurzweil Joins Google
>> COLUMNS
2
Ray Kurzweil, often credited with the invention of text-to-speech, has been named director of engineering at Google.
10
Clearing Up the Hype Around Hypervoice
The Appliances Have Ears BY DAVID MYRON
4
BY SUSAN HURA
Speech to Play a Bigger Role in Translation Technologies The language industry is expected to grow substantially this year and next, due to rising demand for professional translation and localization services.
12
Drivers Prefer Speech, But It Needs Work In an era of tablets and smartphones, conventional automobile control interfaces might soon become as antiquated as whitewall tires, rumble seats, and chrome bumpers.
13 13
Interact When Bad IVRs Are Good Enough
When it comes to voice technologies, what you say and what you do will be more closely linked in the future.
11
Editor’s Letter
Overheard Underheard Under-the-radar speech news.
6
Inside Outsourcing Raise Your Mobile Profile BY KEVIN BROWN
7
View from AVIOS Beyond Speech in the Car BY THOMAS SCHALK
44
The Business Case The Emergence of Real-Time Solutions for Contact Centers
Soundbytes News roundup.
BY DONNA FLUSS AND DEBORAH NAVARRA
>> DEPLOYMENTS
40
Voice Value
CBE Lets Companies Collect with CallMiner Platform
Universal Design Offers Options—and Access
Eureka Speech Analytics helps extract meaningful business intelligence.
BY ROBIN SPRINGER
BY MICHELE MASTERSON
42
46 48
Forward Thinking
IVR gives Mediacom subscribers on-the-go options via their smartphones.
Singing the Praise of Speech Recognition
BY LEONARD KLIE
BY MOSHE YUDKOWSKY
Mobile Technology Solves TV Troubles Synchronoss’ SmartCare
www.speechtechmag.com
SPRING 2013
Speech Technology | 1
EDITOR’S LETTER
Editorial Director David Myron dmyron@infotoday.com News Editor Leonard Klie lklie@infotoday.com Managing Editor Sherri Lerner slerner@infotoday.com Senior Designer Laura Hegyi
The Appliances Have Ears V navigation technologies from Nuance Communications, ActiveVideo, and Vlingo turned more than a few tech enthusiasts’ heads at last year’s International Consumer Electronics Show in Las Vegas. The show helped set into motion some partnerships that would soon bring speech technology into our living rooms. Within a week of the show, Nuance announced that its Dragon TV platform provides the voice recognition capabilities of the LG Magic Remote for the LG CINEMA 3D Smart TV. Shortly after the LG agreement, Nuance announced that Dragon TV would voice-enable Panasonic digital TVs as well. And, by the end of the year, Nuance and LG announced plans to add natural language capabilities to their listening clickers. Using Nuance’s voice and natural language understanding capabilities, Dragon TV viewers could find content by speaking channel numbers, station names, and show and movie names. While I’m not yet convinced that barking at my TV is a better way to channel surf than thumbing my remote, it certainly has its advantages for targeted search. Navigating through my TV’s search screen and painstakingly pressing every letter in the name, or type of show I’m looking for, is a hassle that will only get more cumbersome as digital TVs offer access to more shows, movies, music, and Internet and social media content. So there is some value in speech-enabling digital televisions, but what about other household appliances? Wouldn’t it be great if we could use our voice to dim the lights, preheat the oven, adjust the thermostat, or unlock the front door? How far away are we from a speech-enabled home? This is what inspired our cover story, “Speech Finds a Home” (page 14), by News Editor Leonard Klie. In fact, we’re not the only ones who think about this. The creators of the Distant Speech Interaction for Robust Home Applications (DIRHA) project, launched last year, had a similar concept in mind when they set out to create the speech-enabled home. The program, which will last three years, will investigate and test solutions for voice-enabled interaction with machines in “smart homes.” According to industry insiders, these smart-home technologies aren’t too far out of reach, even for consumers. Read our cover story to find out what needs to happen to make speech-enabled homes a reality and which vendors are leading the way.
T
Staff Writer Michele Masterson mmasterson@infotoday.com Proofreader Greg Edmondson Contributors Kevin Brown, Donna Fluss, Susan Hura, Deborah Navarra,
Thomas Schalk, Robin Springer, Moshe Yudkowsky Editorial Advisory Board The Editorial Advisory Board of Speech Technology magazine is composed of prominent figures in research, development, and applications of speech technology. The members will assist the magazine’s editorial staff by recommending articles and column topics or prospective authors, and offering advice on technical matters and industry trends. Additional responsibilities will include critiquing the magazine’s content and design. If you are interested in participating, contact David Myron, Editorial Director, Speech Technology magazine, 237 W. 35th St., 14th Floor, New York, NY 10001, (212) 251-0608.
ADVERTISING SALES / PRINT & ONLINE Eastern & Central Ad Director Adrienne Snyder adrienne@infotoday.com (201) 327-2773 • Mountain & Pacific Ad Director Dennis Sullivan
dennis@infotoday.com (800) 248-8466, ext. 538 MARKETING Marketing Manager, Events & Circulation Sheila Willison sheila@infotoday.com • Director of Web Events DawnEl Harris dawnel@infotoday.com CORPORATE HEADQUARTERS
Information Today, Inc., 143 Old Marlton Pike, Medford, NJ 08055 EXECUTIVE MANAGEMENT President and CEO Thomas H. Hogan • Chairman Roger R. Bilboul • Vice President, Administration John Yersak • Vice President, Content Dick Kaser • Group Publisher Bob Fernekees INFORMATION TECHNOLOGY Vice President, Information Technology
Bill Spence
PRODUCTION Vice President, Graphics and Production M. Heide Dengler • Ad Trafficking Coordinator Michael Hardwick
Speech Technology (ISSN: 1088-5803; USPS 23156) is published four times a year, February (Spring), June (Summer), August (Fall), and November (Winter), by Information Today, Inc., 143 Old Marlton Pike, Medford, NJ 08055 USA; Phone: (609) 654-6266; Fax: (609) 654-6266; Internet: www.infotoday.com. Registered in U.S. Patent & Trademark Office. Periodicals postage paid at Medford, N.J., and additional mailing offices. ©, Copyright, 2013, Information Today, Inc. All rights reserved. No part of this publication may be reproduced in whole or in part in any medium without the express permission of the publisher. P R I N T E D I N U S A POSTMASTER: Send address changes to Speech Technology, P.O. Box 3599, Northbrook, IL 60065 Rights and Permissions Permission to photocopy items is granted by Information Today, Inc. provided that a base fee of $3.50 plus $0.50 per page is paid directly to Copyright Clearance Center (CCC), or provided that your organization maintains an appropriate license with CCC. Visit copyright.com to obtain permission to use these materials in academic coursepacks or for library reserves, interlibrary loans, document delivery services, or as classroom handouts; for permission to send copies via email or post copies on a corporate intranet or extranet; or for permission to republish materials in books, textbooks, and newsletters. Contact CCC at 222 Rosewood Drive, Danvers, MA 01923; (978) 750-8400; Fax: (978) 646-8600; www.copyright.com. If you live outside the USA, request permission from your local Reproduction Rights Organization. (For a list of international agencies, consult www.ifrro.org.) For all other requests, including making copies for use as commercial reprints or for other sales, marketing, promotional and publicity uses, contact the publisher in advance of using the material. For a copy of our Rights and Permissions Request form, contact Lauree Padgett, lpadgett@infotoday.com. Online Access Visit our Web site at www.speechtechmag.com
**** If you find these types of speech technology developments intriguing, then mark your calendar for the SpeechTEK conference (August 19–21, 2013) at the New York Marriott Marquis. Visit www.speechtek.com for updates on the largest speech technology conference in the United States.
Searchable archive of all articles with digital document delivery: www.iti-infocentral.com Contents also available online under direct licensing arrangements with EBSCO, NewsBank, ProQuest, and Gale, and through redistribution arrangements with information service providers, including Dow Jones Factiva, LexisNexis, OCLC, STN International, and Westlaw. Subscription Information SUBSCRIPTIONS: Print version free to qualified recipients within the U.S. Digital version free to qualified recipients worldwide. Subscription rates for nonqualified subscribers: U.S. subscription rate—$29; Canada and Mexico—$94; overseas delivery—$130. All rates to be prepaid in U.S. funds. Subscribe online: www.speechtechmag.com/subscribe. BACK ISSUES: $8 per copy, prepaid only. CHANGE OF ADDRESS: Mail requests, including a copy of the current address label from a recent issue and indicating the new address, to Speech Technology Magazine, P.O. Box 3006, Northbrook, IL 60065-9736 or call (847) 559-7301. Reprints For quality reprints of 500 copies or more, contact Sheila Willison at (877) 993-9767, sheila@infotoday.com.
David Myron Editorial Director dmyron@infotoday.com @dmyron on Twitter
2 | Speech Technology
SPRING 2013
Disclaimers Acceptance of an advertisement does not imply an endorsement by the publisher. Views expressed by authors and other contributors are entirely their own and do not necessarily reflect the views of the publisher. While best efforts to ensure editorial accuracy of the content are exercised, publisher assumes no liability for any information contained in this publication. The publisher can accept no responsibility for the return of unsolicited manuscripts or the loss of photos. Privacy Policy Occasionally we make a portion of our mailing list available to organizations whose products or services we think might be of interest to our customers. If you do not wish to receive such mailings, please send a copy of your mailing label with a request to be removed from the third-party mailing list to Speech Technology magazine, Customer Service, P.O. Box 3599, Northbrook, IL 60065-9736 or call (800) 248-0588 or (847) 559-7301. Editorial Office 237 W. 35th St., 14th Floor, New York, NY 10001; (212) 251-0608; www.speechtechmag.com
New York Marriott Marquis New York City
AUGUST Registration
19–21 Opens Soon!
Gold Sponsor
Co-located with
Organized and produced by
#SpeechTEK
CONNECT: W W W
.
S P E E C H T E K
.
C O M
SUSAN HURA
INTERACT
When Bad IVRs Are Good Enough Users accustomed to substandard systems can’t imagine any better recently ran a usability test on an IVR system that design fails, not to devise solutions to the problems. Solvleft me wondering about the value of such tests. The ing interaction problems isn’t the user’s job. system was, in a word, awful. The call started with a long That’s not to say users don’t try to solve the problems list of infrequently used options, each explained in exhaus- they discover in a system. Usability participants often tive detail; for example, “If you’re calling to talk about or share what they think you should do to fix the problems make changes to an existing account or discuss opening a they discover, but we should be cautious in taking these new account, say, ‘Account Services.’” The system identi- suggestions at face value. Participants often recommend fied calls via ANI without informing callers that ANI was solutions based on other systems that they are accustomed being used or asking them to confirm their identity, and to. There’s no way to judge the quality of these systems, then launched into a status readout that whether the solution used there will also had not been introduced or requested. The work here, or if the participant is even faithUsability results readout contained a great deal of informafully representing the interaction. We need won’t reveal how tion presented in a mix of recorded prompts to listen past the specific suggestions and to fix a system, only what needs and poorly tuned text-to-speech, which hear what users really desire. When particto be fixed. failed to include pauses that would have ipants in my test said they didn’t want the made the information far easier to comprecompany to use their phone number to idenhend. I was sure that the test was going to produce hours tify their call, I sensed their discomfort with ANI being of scathing commentary. used without their knowledge, but the solution was not to I was quite wrong. Participants didn’t hate the system; on discontinue ANI identification as participants suggested. A the contrary, some said it was easy enough to use and better solution is to add prompting that explains what’s understand. To be fair, the majority did successfully com- occurring and puts callers at ease about the company plete their tasks. Participants had some complaints, but using information it already has to provide a more streamoverall, they were satisfied. lined experience. I asked a colleague, “Why can’t they see how much betWe can’t expect usability results to reveal how to fix the ter it could be?” And then the light bulb went on: Maybe system we’re testing, only what needs to be fixed. Usabilthey can’t picture a system that is designed better. If partic- ity testing is an excellent method for finding flaws, but ipants can’t imagine anything better, and they’re able to what we should do to eliminate them is usually not complete the tasks in front of them, maybe this bad system revealed. Imagining a better interaction is the role of the doesn’t seem so bad. designer. Looking back at my own experience, I realize I The role of user testing versus the role of design is a had confused my role and that of the user. First, I made the longstanding debate in the larger (that is, non-VUI) design rookie mistake of thinking I knew what the usability community. Even a casual Google search will return articles results would be before listening to the users. Then, I was and blog posts about how user testing is worthless because confused when the users didn’t see the solutions to probit fails to produce new ideas. The thinking is that no lems that seemed so clear to me. What looked like a failure amount of usability testing will lead us to the next big thing of imagination on the part of the user was really a reminder or the next-generation user experience. In a literal sense, I to this long-time usability tester to approach each test with agree: User testing does not directly produce innovation. a beginner’s mind. But this is a fundamental misunderstanding of the purpose Susan Hura, Ph.D., is a principal and founder of SpeechUsability, a VUI design of user testing. We test to understand the interaction from consulting firm. She can be reached at susan@speechusability.com. users’ perspectives and find instances in which the existing
I
4 | Speech Technology
SPRING 2013
www.speechtechmag.com
NEED DIRECTION WITH SPEECH TECHNOLOGIES?
Subscribe FREE* today and we’ll help get you headed the right way. Each issue of Speech Technology includes deployments from companies just like yours, working hard to make their service the very best. Don’t miss an issue of the No. 1 source for information on speech-based applications and solutions. Speech Technology is available in print and digital for your convenience.
Subscribe online at www.speechtechmag.com.
*Print edition free to qualified U.S. subscribers. Digital edition free to all subscribers. 143 Old Marlton Pike, Medford, NJ 08055
KEVIN BROWN
INSIDE OUTSOURCING
Raise Your Mobile Profile Look to outsourcing to give your customers the technology they expect you’re a smartphone geek with the latest W hether mobile technology or a typical mobile user who
brick-carrying mode and into the future without the need for Doc Brown, a DeLorean, or even a flux capacitor. couldn’t care less about showing off your gear, imagine Here’s a tip on outsourcing: Start by checking out the yourself in this scenario. You’re walking down the main Web callback and smartphone offerings from Fonolo, shopping venue in your hometown talking to your call cen- Jacanda, Radish, and Virtual Hold. By no means is this a ter from a mid-1980s’ Motorola “brick” cell phone. No, comprehensive list, but it should help you understand the you’re not Marty McFly in Back to the Future. For 99.9 per- functionality available from outsourcing partners. They all cent of those reading this column, that’s how you and your offer Webcasts that can provide insights into their approach contact centers appear to today’s smartphone users. to how to share data from your Web presence and smartSince the turn of the century (yes, you may be that out phone app with your CSRs, thereby supporting use of these of date!), organizations have tremendously increased the cost-effective self-service channels. amount of information they know about their customers All of these providers use the callback model for their through Web sites and smartphone applismartphone and Web-to-call offerings. I cations. Yet such channels are almost am not a fan of callbacks for a number of Having to repeat never synchronized with a business’s voice reasons—too busy, bad connection, etc. information is channels, leaving customers feeling as if Fill in your personal situation, we all among the top three they are being punished for using these have them. most irritating call lower-cost self-service channels. But the trade-off from complex and center attributes. Customers hate repeating information expensive integration to real-time they have already provided via a Web inbound queues seems acceptable. If you site, IVR, or smartphone app. In fact, it’s always among are in a smartphone app or online in a company’s Web the top three most irritating call center attributes in site and need to discuss a “What the heck is this” quesnearly all customer experience surveys. VoxPeritus is tion, wouldn’t you rather accept a callback in minutes seeing a shift from hold times to repeating information from a fully briefed CSR than have to wait in a queue and as the call center irritant callers most hate. It appears then repeat yourself? Outsourcing this functionality as that the more tech-savvy consumers become, the more soon as possible to bring your organization into this centhey expect their providers to be as well. Callers seem to tury makes a lot of sense. Do your due diligence with a understand that staffing to prevent queues at all times is smidgen of conservative reticence, and you should still expensive, but question why sharing data to support self- end up with a strong ROI. service doesn’t pay for itself. And though we all can Do you have banking apps, airline apps, or others on agree that our customers aren’t always right, in this case, your smartphone from companies that need to read this colthey are. umn and wake up to the 21st century? Pass on the message. Our research has found that contact center executives’ Maybe we need to put their executives in a hold queue and most frequent explanation as to why they haven’t synchro- play The Who’s “Going Mobile” a few hundred times to get nized their channels is cost. The second most quoted rea- the point across. Doc Brown confirms that it truly isn’t son is complexity coupled with risk. While these executives flux-capacitor science! realize that their voice channel is critical, they wrongly assume that the solution puts their telephony infrastructure Kevin Brown is managing director at VoxPeritus, where he specializes in speech and delivery capabilities at risk. solutions consulting. He has 20 years of experience designing and delivering speech-enabled solutions in on-premises and hosted environments. He can be The good news is that there are simple, low-risk, and reached at kevin.brown@voxperitus.com or follow him on Twitter: @CustExperGuru. cost-effective ways to bring your contact center out of
6 | Speech Technology
SPRING 2013
www.speechtechmag.com
THOMAS SCHALK
VIEW FROM AVIOS
Beyond Speech in the Car Automotive safety calls for a holistic approach idely accepted is the notion that speech interfaces
A voice menu can be thought of as an audio version of a choices list. An audio interface can be cumbersome to use larly when a task requires text entry. Speech can be used for list management, because each item in the list has to be to manage secondary tasks such as navigation systems, played to the driver followed by a yes/no query. Complex music, phones, messaging, and other functionalities— items take longer to play and are more difficult for the making it possible to be more productive while driving driver to remember. However, using a visual-manual interwithout the burden of driver distraction. However, actual face for list management is much quicker, easier, and natusage of such speech enablement has fallen short of ural. Consider the following use case: expectations, spurring some to blame the 1. Driver taps the navigation icon. low usage on the unreliability of speech 2. Vehicle asks driver, “Where would you When tackling in the car. Regardless, keeping the prilike to go?” complex tasks, mary task of driving in mind, user inter3. Driver says, “An Italian restaurant that’s we can’t expect a faces for secondary tasks shouldn’t not too expensive.” one-size-fits-all require lengthy and/or frequent eye 4. Vehicle displays top five search results, interface. glancing, nor very much manual manipincluding address, distance, and price category. ulation. The ultimate goal is to provide 5. Driver glances briefly. natural interfaces simple enough to allow the driver to 6. Driver taps to select restaurant. enjoy technological conveniences while maintaining a A similar scenario can be given for managing music and focus on driving. infotainment. In this case, the best practice may be simply In today’s vehicles, a speech button is commonly used to highlighting the tapped result to avoid the tendency to stare initiate a speech session that may have visual and manual at the screen during the audio playback. Long eye glances dependencies for task completion. Once the button is are dangerous, with a maximum of two seconds being a pushed, a task can be selected from a voice menu. And critical limit for safety. without doubt, the trend is toward freely spoken speech, Of course, we still need gestures. Controlling volume with no boundaries on what a user can say. But even with with speech is silly when you think about it. It is much such advanced capabilities, perhaps a speech button is still more natural to use gesture by turning a knob or pressing not a good idea. Arguably, a visual-manual interface should and holding a button usually found on a steering wheel. We be used for task selection, and with appropriate icon use gesture as an input modality every time we drive—to design, the user experience would be natural. Navigation, steer, to accelerate, and to brake (sorry, speech won’t work music, messaging, infotainment, and other functionality for these tasks). So we should avoid using speech instead of can be easily represented with icons. gesture when fine motor skills are required. “Please say a command” is an unnatural prompt, yet Speech in the car should be approached holistically. We common today. From a multimodal perspective, we con- should think in terms of a cognitive model for secondary sider speech, manual touch, and gesture to be input modal- driving tasks—a model that will indicate the best use for ities (from the driver to the vehicle) and visual (including speech and other modalities. Simple tasks such as voice heads-up displays), sound, and haptic (touch) feedback to dialing can be done with an audio-only interface, combinbe output modalities (from the vehicle to the driver). ing speech and sound. But when tackling more complex Sound can be an audio prompt that should be natural. tasks, we can’t expect a one-size-fits-all interface. Examples include “Where would you like to go?”; “Please Thomas Schalk, Ph.D., is vice president of voice technology at Agero, a company say your text message”; “What’s your zip code?”; and that provides telematics services to the automotive market. He is a member of the AVIOS board of directors and a former president of the organization. He can be “Say a song name or an artist.” And yes/no queries can cerreached at tschalk@agero.com. tainly be natural and necessary.
W fit nicely into the driving experience, particu-
www.speechtechmag.com
SPRING 2013
Speech Technology | 7
Because not everyone is on the same time schedule. W W W. S P E E C H T E C H M A G . C O M Your anytime access to: News • Digital Issues • White Papers Case Studies • Web Events • And More …
> > N E W S > > T R E N D S > > A N A LY S I S
Speech Pioneer Ray Kurzweil Joins Google Google has hired Ray Kurzweil, often credited with the invention of text-to-speech, as its director of engineering. “Ray’s contributions to science and technology, through research in character and speech recognition and machine learning, have led to technological achievements that have had an enormous impact on society—such as the Kurzweil Reading Machine, used by Stevie Wonder and others to have print read aloud,” says Peter Norvig, director of research at Google. “We appreciate his ambitious, long-term thinking, and we think his approach to problem-solving will be incredibly valuable to projects we’re working on at Google.” While at Google, Kurzweil will focus on machine learning and language processing, particularly to enhance the company’s popular search engine. “My focus will be enabling computers to understand the semantic content of natural language and to use that understanding to enhance Google applications, such as search and question answering. The plan will be to combine my decades of experience in the [artificial intelligence] field with the Google scale [and] resources here in data, computing, and users,” Kurzweil tells Speech Technology magazine in an email. For years, Kurzweil had forecasted that search engines would become more predictive with search results. During his opening keynote at SpeechTEK 2008, Kurzweil said, “We’ll have search engines that are like little assistants that won’t wait to be asked—if they see you struggling with some information, they’ll pop up information.”
“
[Kurzweil’s] vision can be a framework for Google's diverse initiatives around predictive search, natural language understanding, and mobile assistance.
”
www.speechtechmag.com
Industry insiders are reacting favorably to the news. “By hiring Ray Kurzweil, Google has brought on board a person steeped in speech automation, who has also dedicated his professional career to understanding and defining the pace at which machines display human-like and then superhuman-like qualities,” says Dan Miller, senior analyst and founder of Opus Research. “His vision can be a framework for Google’s diverse initiatives around predictive search, natural language understanding, and mobile assistance.” The latter, in particular, could help Google compete against Apple’s mobile assistant, Siri, launched on the iPhone 4S. “Google is one of a very short list of companies with the technology and market reach to introduce a powerful mobile assistant,” Miller says. “An example is the combination of Google Now with Google Voices to create a personal assistant that ‘understands’ what a person is saying and presents a number of relevant responses based on location, time of day, previous searches, Kurzweil and, perhaps, the recommendations of friends in a social network.” Some industry observers expect Kurzweil to have a hand in other, more futuristic technology. “He will be an inspiration to Google by providing forward-looking insight, ideas, and suggestions,” says Jim Larson, vice president of Larson Technical Services and co-chair of the SpeechTEK conference, who adds that Kurzweil’s knowledge will benefit such projects as Google Glass. Google Glass aims to provide users with wearable glasses that combine augmented reality with reality. During the 2008 SpeechTEK keynote, Kurzweil predicted the arrival of such wearable technology: “We’ll have augmented real reality—we’ll see real reality but we’ll have virtual reality overlaid on it…. [This technology] will be built into our eyeglasses, so as you look at someone, it will remind you what their name is, that it’s their birthday next week.” Google has stated plans to release early prototypes of Google Glass to developers this year and make the augmented reality glasses publicly available in 2014. —Michele Masterson and David Myron SPRING 2013
Speech Technology | 9
> > N E W S > > T R E N D S > > A N A LY S I S > > N E W S > > T R E N D S > > A N A LY S I S > > N E W S > > T R E N D S > > A N A LY S I S
Clearing Up the Hype Around Hypervoice information shared in meetings and, more importantly, take action on it without losing the context in which it was shared,” said Alan Lepofsky, vice president and principal analyst at Constellation Research, in a statement. Another early hypervoice evangelist is Jason Goecke, That’s because hypervoice, an emerging model for organiz- president and CEO of Voxeo Labs. Voxeo Labs is working to ing and navigating conversational data by transforming voice provide operators with hypervoice-enabled applications on components into native Web objects and breaking them the Ameche communications platform-as-a-service, which down into smaller bits of audio that can be searched, shared, will enable carriers to embrace hypervoice fully as a key and secured independently, is rapidly picking up steam. innovation, according to Goecke. In fact, the Hypervoice Consortium, which advances In October, HarQen showcased the first hypervoice aphypervoice standards and applications, launched in December plication at Oracle OpenWorld. HarQen added hypervoice under the guidance of Martin Geddes, a telecom expert and conversations as a service to Oracle Social Network, enfuturist. Telefonica Digital, Harabling collaborators to go from Qen, and Voxeo Labs have joined typing to talking, and then Geddes as charter members of capturing their conversations, the organization. with just a click. And with With hypervoice, users can HarQen-enabled CallMe funclink, tag, and share small parts tionality, users could join, of their conversations as easily interact with, and review conas they can share text. One of versations directly from within the primary benefits of hyperthe activity stream. voice is the ability to share Oracle Sales and Marketing quick highlights from a converCloud Service users will be sation with stakeholders not on able to capture important an original call. sales calls, using their notes “Hypervoice links what we as the natural markup of the say to what we do, creating audio being recorded. In addiThe ability to annotate a unified activity stream of tion, hypervoice will help audio conversations provides everything we say and do,” trace crucial agreements back Geddes says. to the point of decision. For tremendous business value. It does this by creating metaexample, users will be able to data for recorded voice files that find and listen to the exact are part of the digital interactions people have while speak- point in time when parties reached agreement during a ing and listening to each other. The solution doesn’t use contract negotiation. voice recognition or a phoneme system. “Hypervoice conversations hold the promise of enabling “Instead, we assume that the significance of what we say a whole new paradigm for human communications,” said E. is best captured by our actions—the notes we take, the cus- Kelly Fitzsimmons, cofounder of HarQen, at the time of its tomer records we modify, the slides we share, the tags we an- release. “We are eager to partner with others via the consornotate the audio with, and the agenda items we advance tium to help ensure the richness of future applications.” through,” Geddes explains. “We are excited to support this initiative both because it “These interactions with digital objects—all the gestures extends the reach of voice and because it makes working and responses—can then be used as data to make voice con- globally ever so much easier,” said Tracy Isacke, head of the versations truly searchable, shareable, and syndicated.” Silicon Valley office of Telefonica Digital, in a statement. This allows someone, for example, to subscribe to all of “Being part of a leading telecommunications company that the shared conversations that are relevant to the accounts he operates around the globe, this innovation will help us intermanages. The metadata allows him to navigate to potentially nally and externally.” important parts of those conversations, Geddes points out. “Given the importance of linking what we say to what we “The ability to annotate audio conversations provides do, ultimately hypervoice conversations will replace telephtremendous business value, as people can now discover the ony in the enterprise,” Geddes predicts. —Leonard Klie
When it comes to voice technologies, what you say and what you do will be more closely linked in the future.
“
10 | Speech Technology
SPRING 2013
”
www.speechtechmag.com
> > N E W S > > T R E N D S > > A N A LY S I S > > N E W S > > T R E N D S > > A N A LY S I S > > N E W S > > T R E N D S > > A N A LY S I S
Speech to Play a Bigger Role in Translation Technologies The language industry is expected to grow substantially this year and next, due to rising demand for professional translation and localization services, according to the Globalization and Localization Association, an industry trade group based in Andover, Mass. Machine translation technologies were expected to be a big part of the growth, which was predicted to hit 12 percent in 2012 and is expected to continue at a similar pace this year, according to Hans Fenstermacher, CEO of the association. Though machine translation technologies involving speech have seen limited usage thus far, Fenstermacher says they are a growing area within the industry. Part of the reason for their slow adoption has been the complex, multilayered processes involved. “First [the system] has to recognize what is being said, convert it to text, translate the text into the other language, and then convert it back to speech for output,” he says.
“
There are multiple technologies at work…but the speech recognition and output is really quite good today.
”
Fenstermacher says the technology has vastly improved in the past few years. “There are multiple technologies at work here, but the speech recognition and output is really quite good today,” he states. Nonetheless, most applications of technology in the language industry today still involve text-to-text translation, especially for Web content. This is particularly true among three language groups dubbed “the Triple A”—Asian, African, and Arabic. Internet adoption in these emerging markets is surging. In the past decade, Internet usage has grown more than 20 times faster in Africa and 17 times faster in the Middle East than in the United States, according to research from Common Sense Analysis. These language groups can pose unique challenges for speech technologies because of the many dialects, accents, and regional variations. www.speechtechmag.com
In these and other languages, speech is producing adequate results for some very specialized uses with very limited domains and phrase requirements. One of the biggest users of such technology is the military; U.S. forces have fielded several device types in combat zones to aid in interactions with the locals. Machine translation technology is also being spurred by faster delivery of translations, demand for real-time translations, and increasing interest from corporate investors. “We will see speech [translation] become much more prevalent in the next few years,” Fenstermacher predicts. He does caution, however, that despite the technological advances, businesses cannot afford to rely on machine translations alone. Increasingly they will need professional translation and localization services to compete and maintain their brands in global markets, he says. —Leonard Klie
> > N E W S > > T R E N D S > > A N A LY S I S > > N E W S > > T R E N D S > > A N A LY S I S > > N E W S > > T R E N D S > > A N A LY S I S
Drivers Prefer Speech, But It Needs Work The days of knobs, buttons, and dials on the car dashboard are numbered. In an era of tablets and smartphones, these conventional automobile control interfaces might soon become as antiquated as whitewall tires, rumble seats, and chrome bumpers. Increasingly, drivers today want global automotive at J.D. Power & voice activation in their vehicles, par- Associates, is that similar applications ticularly for use with their in-car nav- on smartphones have raised owner exigation systems. In a survey of more pectations, and vehicle manufacturers than 20,000 people who recently pur- are having a hard time keeping up. In chased or leased new 2012 cars with the study, 47 percent of vehicle owners factory-installed navigation systems, indicated they use downloaded apps J. D. Power & Associates found that 67 on their smartphones for navigation in percent of owners without voice activa- their vehicles, up from 37 percent in tion would want it in 2011. Additionally, 46 pertheir next navigation cent of owners said they Sixty-seven percent systems, and 80 percent wouldn’t repurchase a of owners without of those who have it factory-installed navigavoice activation would want it again. tion system if their smartwould want it in their But while it’s such a phone navigation apps next navigation sought-after feature, satcould be displayed on systems; 80 percent isfaction with voice acticentral screens inside who have it would vation came in at 544 on their vehicles. want it again. a 1,000-point scale, the Part of the appeal of lowest score among all these smartphone apps is navigation system features in the study. the cost—most are free—but they also In fact, input and selection controls offer more up-to-date maps, a more were among six of the top 10 com- familiar interface, and better voice plaints logged in the study, which was recognition, VanNieuwkuyk wrote in released in early January. Other issues the report. In-car-system manufacturincluded problems with the visual dis- ers “have a window of opportunity to play, mapping, and connectivity. either improve upon the current navPart of the reason for speech’s poor igation system platforms or focus on scores, according to Mike Van- new ways to integrate smartphones,” Nieuwkuyk, executive director of he concluded. Beyond that, he said navigation systems are no longer viewed as standalone systems but rather as part of a much broader multimedia, safety, and infotainment package. And the dashboard is only going to get more sophisticated as the safety impacts of speech interfaces in the car become more widely known. One 12 | Speech Technology
SPRING 2013
study conducted by Agero and the Virginia Tech Transportation Institute found that a combination of speech and visual interfaces is best for interacting with in-vehicle navigation and infotainment systems. In the study, two groups of drivers (18- to 30-year-olds and 65- to 75-yearolds) were tested for their levels of distraction while driving on the Virginia Smart Road, a 2.2-mile closed test track in Blacksburg, Va. Participants were asked to complete selected tasks on conventional portable navigation devices and specially designed devices featuring interactive speech and speech/display screen capabilities. “Across all key measurements—driving performance, ease of use, workload demand, and task execution, defined as the successful retrieval and selection of information—both the speech-only and speech-and-visual interfaces reduced distractions,” noted Tom Schalk, vice president of voice technology at Agero, in the report. “The measurements also indicate a speech-only driver interface is best for entering a destination...while a combination of speech and visual cues is best for selecting a particular search result from a list.” The Agero/Virginia Tech study also uncovered another important fact: Speech/visual interfaces, as well as speech-only interfaces, received better marks for perceived mental demand, frustration level, and situational awareness. In other words, when using a navigation device, a combination of speech and visual cues, followed by speechonly, are the most intuitive ways for drivers to request, obtain, and sort realtime information about destinations and routes. By contrast, conventional navigation devices that require touch-screen interactions fared poorly. —Leonard Klie www.speechtechmag.com
> > N E W S > > T R E N D S > > A N A LY S I S > > N E W S > > T R E N D S > > A N A LY S I S > > N E W S > > T R E N D S > > A N A LY S I S
under-the-radar speech news // ■ ■ We’re all familiar with the phrase “The pen is mightier than the sword.” Well, now, thanks to voice technologies, the pen is also mightier than disabilities. Inspired by his younger sister, Julie, who has autism and often finds it difficult to speak, 12-year-old Eric Zeiberg created a handwriting-tospeech application for the iPad that helps people with speech
SOUND BYTES >> Siri Sparks Chevy’s Interest Chevrolet this year will integrate Siri, Apple’s voiceenabled digital virtual assistant, into its Spark and Sonic models in the United States. Through the cars’ standard Chevrolet MyLink infotainment system, customers with compatible iPhones will be able to direct Siri to make calls; play songs from their iTunes library or the radio; listen to, compose, and send text messages; and access their calendars. >> CereProc Gets Regional Speech synthesis company CereProc has updated and expanded the English language text-to-speech voices it offers to Android users as downloadable mobile applications with www.speechtechmag.com
BY LEONARD KLIE
disabilities communicate. Zeiberg, who lives in Connecticut, saw a huge gap in the assistive technology market, so he created HandySpeech, which is marketed by iSpeak4U and is available for download through the Apple AppStore for $29.99. With the app, users write what they want to say in any one of 13 languages, and the
regional accents from Scotland, Ireland, and England. The voices pair with mobile apps that provide navigation directions, newsfeeds, e-books, and SMS text messages via synthesized speech. >> SAP Users Just Need to Ask EasyAsk, a natural language technology provider, has released Quiri for SAP CRM, a solution that enables mobile employees to interact with their CRM data on smartphones and tablets, using a Siri-like voice interface. EasyAsk Quiri uses natural language technology to allow users to speak their business questions into their mobile devices and receive highly accurate answers in an instant. >> The Bird’s the Word Text-to-speech provider NeoSpeech is the voice behind Buddy Bird ToDo, an all-in-one organizer that reads aloud a user’s to-do list. The app is
software converts the handwriting into speech. The user can choose between male or female voices. ISpeech provides the application’s textto-speech capability. “We’re excited to have the opportunity to support such a young developer with a great cause,” says iSpeech CEO Heath Ahrens. “This is an inspiration to our entire developer community.” PhatWare supplies the handwriting recognition technology, which adapts to the user’s writing style. The
application recognizes cursive, print, and mixed handwriting styles. Users can also input text via the keyboard to be read by the application. Simple finger gestures can be used to insert special characters. “I hope that HandySpeech will provide much needed help and open new opportunities for people in need,” Zeiberg said in a video on the iSpeak4U Web site. “The application is dedicated to courageous people who struggle every day to overcome their disabilities.”
specifically designed for Apple’s iPhone and iPod Touch devices. It can read all to-do tasks, tasks due today, tasks due soon, overdue tasks, recurring tasks, or tasks assigned to user-specified lists. All tasks can also be organized into tabs or lists and sorted via filters. Users can even direct their devices to complete tasks by making calls, sending email or text messages, searching the Web, and more. They can also synchronize tasks with iCloud and the popular online service Toodledo.com.
messages, make calls, set appointments, and now set the alarm, play particular songs, get instant access to apps, and more. When set to its alwayslistening Driver Mode, Dragon’s personal assistant experience is also completely hands-free.
>> Dragon Mobile Emerges Nuance Communications has expanded the personal assistant capabilities of its Dragon Mobile Assistant for Android and has extended its beta offering to the broader Android community. Now supporting devices running Android 2.3 and higher, Dragon enables nearly all Android users in the United States to use speech to send text
>> Microsoft Is Multilingual In the release of Windows Phone 8, Microsoft has expanded its set of speech recognition languages. From the six supported in the last release, the company has increased support to 15 languages and dialects, including U.S., U.K., and Indian English; Brazilian and European Portuguese; French; German; Italian; Japanese; Polish; Russian; Mexican and European Spanish, as well as variations of Chinese spoken in mainland China, Taiwan, and Hong Kong. It also added 22 new display languages to Windows Phone 8, bringing the total to 50 languages.
SPRING 2013
Speech Technology | 13
14 | Speech Technology
SPRING 2013
www.speechtechmag.com
SPEECH FINDS A HOME Talking to the walls might not mean you’re crazy / / B y Le o n a rd K l i e / /
in
Star Trek: The Next Generation, Captain Jean-Luc Picard enters his quarters and tells the ship’s computer to adjust the
lights, prepare a cup of hot Earl Grey tea, play a Klingon opera, and open a video conference with an admiral at Starfleet Command in San Francisco—all with simple voice commands. His requests are carried out flawlessly, and not just because he’s the guy in charge of the Enterprise. The ship’s computer responds just as well to verbal commands from an ensign. That level of man-machine interaction isn’t entirely possible just yet, but industry insiders agree we’re pretty close. “The technology is here today. [This] kind of futuristic scenario could be days or weeks away,” says Bill Scholz, president of the Applied Voice Input/Output Society (AVIOS) and president and founder of NewSpeech Solutions, a speech technology consultancy. Gary Clayton, chief creative officer at Nuance Communications, agrees. “From a pure technology standpoint, we’re already there, and have been for quite a while,” he says. “Star Trek is a very realistic paradigm,” adds Todd Mozer, CEO of Sensory, a provider of embedded speech technologies for consumer products.
www.speechtechmag.com
SPRING 2013
Speech Technology | 15
THE SPEECH-ENABLED HOME
Speech interfaces not only exist today, but they are widely popular for everything from smartphones and car telematics to TVs and video game consoles. Experts agree that it’s just a matter of time before they spread to other devices and applications, both in the consumer and business worlds. On the consumer front, if a research project underway in Europe is successful, the prototype of a fully voiceenabled home—complete with voiceenabled lighting, thermostat, security system, and a host of everyday appliances and mobile devices—will be completed by 2015, well ahead of Star Trek creator Gene Roddenberry’s vision. That three-year project, which launched in January 2012 with $6.3 million in funding from the European Union, is called Distant Speech Interaction for Robust Home Applications (DIRHA). It involves research and development around multichannel acoustic processing, speech recognition and understanding, speaker identification and verification, near-field and far-field speech processing, and spoken dialogue management. It even considers power consumption and energy efficiency. DIRHA’s goal is to create a pervasive microphone array so users can say what they want from anywhere in the house and have their requests recognized and understood. The project also seeks to make it possible for the system to identify and capture an individual speaker from several yards away, in a crowded room, and with music playing.
“As housing construction starts up again, you’ll see more [smart-home technology] in the development plans.”
16 | Speech Technology
SPRING 2013
The DIRHA project has the potential to not only dramatically change the way people interact with technology, but can make a real difference for those who can’t easily move around, such as the elderly or disabled. In addition to the home scenarios, the distant-speech interaction systems can find use in robotics, telepresence, surveillance, video conferencing, and industry automation. A part of this project would be the development of systems that do not rely on the user to push a button to initiate a voice command. Rather, the system would need to be in a constant listening mode. “We shouldn’t have to push buttons to use the speech recognition,” Mozer says. “The whole purpose of speech in this case is so we would not have to use a lot of button pushes.” His company, Sensory, has been at the forefront of development in this area. Its TrulyHandsfree Trigger is a voice technology that makes devices and applications come alive—or “wake up”—with a spoken word or phrase. Another company that is tackling this problem is Conexant. It, too, has created wake-up technology for speech-enabled devices. “Power has been a big issue. If the device has to be on all the time, it can really run hot and burn a lot of power,” says Saleel Awsare, the company’s vice president and general manager. “Our equipment scans for a human speech pattern, and when it detects one, it brings up the speech engine to listen for a command, and then it activates the appliance or device to carry out the command,” he explains. But despite all the advances and ongoing research, a fully speechenabled home is still a very ambitious project. That’s not to say that it can’t be done. In fact, many people think research and development will gain momentum as the economy improves. “As housing construction starts up
www.speechtechmag.com
THE SPEECH-ENABLED HOME
again, you’ll see more [smart-home technology] in the development plans,” predicts Jim Larson, an independent consultant and professor at Portland State University in Oregon. Larson sees smart-home technology playing out especially well in the senior living environment, leading to a whole host of applications that can help the elderly lead independent lives and stay connected with loved ones. These technologies, he says, can provide greater security, convenience, and connectivity, and can even be instrumental in monitoring and reporting on daily activities and healthcare regimens. The same system can, for example, keep track of a person’s medicines and alert him when he forgets to take a pill at the designated time. Deborah Dahl, chair of the Multimodal Interaction Working Group at the Worldwide Web Consortium (W3C) and principal at speech and language consulting firm Conversational Technologies, shares Larson’s enthusiasm. “It’s been a dream for so long for the disabled, the elderly, and the just plain lazy to be able to control the lights, heat, air-conditioning, TV, etc., without having to move around a lot,” she says. “There’s no reason that we couldn’t have speech as a central interface. It makes a lot of sense [to have] one speech interface so we wouldn’t have to worry about so many different dials, knobs, and buttons.” But for the speech-enabled home to meet with any success, it will need to gain consumer acceptance, something Bill Meisel, president of TMA Associates and an executive director of AVIOS, thinks will not be easy. “Yelling across the room, while possible, is not something people are comfortable with,” he says. “The whole idea of walking around the house talking to the walls does not appeal to a lot of people.” Perhaps even more problematic will be getting speech vendors, system developers, appliance manufacturers, and others to come together on a set of standards. Right now, all the work has been centered on individual devices and appliances, with nothing connected or integrated. And all the technology has been offered through OEM agreements with individual device manufacturers. Very little is offered directly to consumers as an aftermarket or add-on product.
Yaron Oren, chief technical officer at iSpeech, a home automation systems provider, also sees the need for a single, converged platform. “The whole space will take a huge step forward when someone brings together one complete solution that works well with multiple systems,” he says. Dahl maintains that this is very possible. “We’ll eventually see speech decoupled from each device itself and run on a network that sends commands directly to all of the devices on that network,” she says. According to Dahl, many of the pieces for this single network are already in place to some degree. However, “all the people involved need to put them together,” she says. The Remote Control App Others don’t believe the industry will ever come to agreement on a single standard, citing the still fragmented environment around HDTV despite the government’s 2009 mandate. They think moving all the technology to the cloud is a more likely possibility.
“The whole space will take a huge step forward when someone brings together one complete solution that works well with multiple systems.”
Lacking Uniformity “The ideal is to have one platform that everything plugs into rather than each device having its own system,” Nuance’s Clayton asserts. “At the end of the day, the user doesn’t want ten different systems that he has to remember commands and passwords for.”
www.speechtechmag.com
Already Veveo, a natural language conversational interface and search platform provider, has turned all its efforts to the cloud. Veveo’s technology has been adopted by some of the largest TV service providers to help subscribers search programming choices by voice right from their living room couches. “We don’t put anything on the set-top box,” says Sam Vasisht, chief marketing officer at Veveo. “A lot of these set-top boxes are already IP-enabled.” ABI Research recently predicted that the worldwide IPTV subscriber base would grow to 79.3 million people this year, continuing the steady growth that began a few years ago. “Everything has to be IP-based,” Vasisht continues. “You really need some central hub for all the commands. There has to be something that sends specific commands to the device.” For Veveo, that one hub is the mobile phone. And the company is not alone in thinking this way.
SPRING 2013
Speech Technology | 17
THE SPEECH-ENABLED HOME
“What’s really going to happen is it will all converge on the mobile phone,” Meisel says. “You will have an app on your mobile phone that you will use to control everything.” But in the mobile space as well, a lack of standards could derail any immediate progress toward a speechenabled home. “With smartphones you can do a lot today, but there are so many different models, manufacturers, and carriers,” Awsare says. Getting the device manufacturers to agree on a single mobile platform, or to develop technologies that can be deployed across all platforms, will be tough, he adds. It’s What’s On The car and smartphone markets today are saturated with voice interfaces that let users perform a host of functions in eyes-free and hands-free modes. The TV, therefore, is seeing the lion’s share of the development efforts and consumer attention today. Major manufacturers of TVs and set-top boxes, as well as cable and satellite service providers, are investing in speech systems that let consumers search for and access TV programming and other content with voice commands. Nuance’s Dragon TV technology, for example, is incorporated into models from Samsung, Panasonic, LG Electronics, and several other manufacturers, allowing consumers to use voice commands to search program
“This frictionless interaction can be a real benefit to any company that runs ads on TV.” guides, change channels, search for content on the Web, connect with friends and family via Skype, and access social media content from sites such as Facebook, Twitter, and YouTube. “Traditional search on televisions is tedious and amazingly outdated,” said Michael Thompson, senior vice president and general manager of Nuance Mobile, in a statement. “Dragon TV brings an amazing voice experience directly to the living room, similar to what people do every day on their phones and in their cars.” Other developers of these smart TVs include ActiveVideo Networks, Veveo, iSpeech, Sensory, and Conexant. Even Apple, Google, and AT&T have publicly stated their interest in smart TV technology.
18 | Speech Technology
SPRING 2013
It’s What’s Next Beyond the TV, the larger smart-home market is only starting to take shape, with iSpeech and a few other manufacturers taking a leadership position. ISpeech last year unveiled iSpeech Home, a complete home automation system that blends speech recognition and text-to-speech technologies to interact with a number of systems and devices around the home. Panasonic, though, is right on iSpeech’s heels, with its release in Japan last fall of a number of smart, connected appliances, including air conditioners, refrigerators, microwaves, washer-dryers, rice cookers, blood pressure monitors, and calorie counters. Owners of these appliances can remotely program and operate them with their iPhones or Android smartphones. These devices can connect with cloud-based support services to automatically report device faults to the manufacturer. Google also announced its entry into the home automation market last year with the launch of Android@Home, a network of connected accessories that would use Android as the central operating platform. Google has said little about the initiative since then, however. And then there’s Vivint, the largest home automation services company in North America. The company offers electronic door locks, door and window sensors, lighting and small appliance controls, an IP camera, and a programmable thermostat, all of which can be controlled through smartphones. Its clients number more than 675,000 throughout the United States and Canada. Of these companies, iSpeech is the only one so far that has committed to speech as an interface. “Voice control is a natural interface to offer a more compelling experience,” Oren says. “We have been pleasantly surprised at how many device manufacturers are looking at this,” Oren adds. “There is quite a lot going on beyond just the TV. “There are a limitless number of products out there that could be made easier to use with a speech interface,” he continues. “The technology has such broad applicability.” The Business Case In business, the same technology used to talk to the alarm clock, and have it talk back to the user, could be used for controlling printers, fax machines, phones, computers, calculators, and any other piece of office equipment. One industry that really stands to gain from these technologies is healthcare. “Hospitals can really use this,” Awsare says. “They want to keep doctors’ hands sterile, and so they don’t want them to touch a lot of stuff.”
www.speechtechmag.com
THE SPEECH-ENABLED HOME
Other opportunities exist in the entertainment, government, education, retail, automotive, security, and telecommunications sectors, many contend. The technology will also expand opportunities for e-commerce, giving TV viewers the opportunity to buy products via voice right through their TVs. “If you see a commercial on TV with a call to action, shouldn’t you be able to take that action?” Clayton asks. “If you see an ad for a travel destination, you can book a trip. If you see an ad for a restaurant, you can make a dinner reservation.” The business benefits could be great, Veveo’s Vasisht states. “All of this automation puts information and service at our fingertips,” he says. “If you can see an ad or a program about a vacation and the system takes you somewhere to book a trip to that location, that’s really powerful. This frictionless interaction can be a real benefit to any company that runs ads on TV.” That, Clayton maintains, is an easy way for companies to become more involved in the daily lives of consumers. Sensory has taken it a step further. In May, it launched a speaker identification product to accompany its TrulyHandsfree speech recognition technology, allowing companies to personalize offerings based on information about the types of programs a customer views, for example. “A lot of companies have figured out ingenious ways to make their products more useful by offering recommendations and tracking individual usage models and settings. But in a shared device situation, this oftentimes becomes meaningless because each user has individual preferences and desires,” Mozer says. “For example, I often get recommended movies targeting twelve-year-old girls, not because that is my preferred viewing experience, but because I have a twelve-year-old daughter who also uses the TV.” As voice control of consumer devices becomes more prevalent, Mozer expects speaker identification technology to improve the user experience on shared devices by providing recommendations and suggestions based on an individual’s habits, behavior, and lifestyle or by automatically adjusting the device to the user’s unique preferences. Not for Everything And while almost everyone would agree that speech could easily become the most natural interface for interacting with many of the products we use every day around the house or office, it might not be the right fit for everything. Having a voice interface for a washing
www.speechtechmag.com
machine, for example, might seem attractive at first, but it isn’t really all that practical, Awsare says. After all, with a washer, you still have to load the laundry into the machine and add the detergent. Voice is not going to do that for you. Similarly, to use a microwave, someone still needs to stand in front of the machine, open the door, put the food in it, and then close the door. Having to push a button to start it doesn’t detract from the user experience, Awsare says. Advances in other technologies, such as robotics, could change that. “Our personal lives are getting automated very quickly—much more quickly than our business lives,” Vasisht says. “As consumers get more comfortable [with technology], they’ll demand it everywhere.” That was the case when Apple first introduced Siri on the iPhone 4S in October 2011. Now, the technology is pervasive on mobile phones, with similar virtual assistant offerings from Nuance (Nina), Angel (Lexee), and Taptera (Sophia). But Apple really gets all the credit for taking the speech interface to the mainstream. “Siri has done a lot to raise the consciousness about speech applications. When Siri came out, it changed the way people [thought] about speech technology and how to use it,” Clayton says. “Apple did a good job of showing the consumer what you can do with voice. Now it’s up to the rest of the industry to show what else it can do,” Awsare adds.
News Editor Leonard Klie can be reached at lklie@infotoday.com.
SPRING 2013
Speech Technology | 19
20 | Speech Technology
SPRING 2013
www.speechtechmag.com
THE GOVERNMENT IS LISTENING Speech technology sees government growth spurt
B Y
M I C H E L E
M A S T E R S O N
ontrary to what people may think, the government does hear us, thanks to speech technology advances. At the federal, state, and local level, applications such as voice biometrics and speech recognition are showing up in a range of deployments, from Medicare compliance to Hurricane Sandy notifications. “There are many reasons to compel government agencies to make improvements, and the easy and obvious first level of improvement is about engaging with speech,� says Scott Fischer, chief operating officer at MicroAutomation, a Voxeo preferred partner.
C
www.speechtechmag.com
SPRING 2013
Speech Technology | 21
SPEECH IN GOVERNMENT
Healthy Changes Thanks to a slew of ever-changing laws concerning eligibility, liability, compliance, and confidentiality, speech technology in the government healthcare sector is hot. The most significant U.S. legislation to affect speech technology in the next year is likely to be the Health Information Technology for Economic and Clinical Health (HITECH) Act, enacted in February 2009, which mandates that all healthcare providers switch to electronic health records (EHRs), also known as electronic medical records (EMRs), by 2014. The federal government will offer tax incentives under the Medicare EHR Incentive Program to providers that make the switch and can prove meaningful use. The program, which falls under the HITECH Act, stipulates that organizations that qualify for the Medicare EHR Incentive Program and achieve meaningful use by 2014 will be eligible for incentive payments, but those that haven’t complied by 2015 will suffer penalties. “With meaningful use, there are government incentives [and] government penalties for hospital systems to deploy EMRs, but you can’t just install software,” says Peter Mahoney, chief marketing officer at Nuance Communications. “Medical facilities actually have to prove that they’re being used in a meaningful way. Dragon is one of the key technologies that is used to help hospitals—not only in EMR systems to get them deployed, but to get doctors to actually use them.” Another growth area comes from a change in the government’s Medicare program, involving accountable care organizations (ACOs)—organizations formed by groups of doctors and other healthcare providers to coordinate care for people with Medicare. Currently, participation in an ACO is voluntary for providers. The Medicare Shared Savings Program (MSSP) and other initiatives related to ACOs are made possible by the 2010 Affordable Care Act. ACOs serve more 22 | Speech Technology
SPRING 2013
than a million people with Medicare in 40 states and Washington, D.C. Mahoney explains that hospitals are transitioning from a model of being paid for delivering a service, such as a test, to being paid based on patient outcomes. “To do this, you need to do a really good job of tracking patient information and categorizing it, which is driving significant deployment of natural language,” he says. “Healthcare is somewhat ahead of the curve in using speech technology in some areas because of the necessity across all healthcare, whether it’s government or not. We’ve seen high adoption here.” There are many healthcare initiatives, all with some common considerations— the need to protect privacy while providing secure access and to provide an audit
has amassed some large clients in the healthcare industry, including WellPoint/Anthem, Aetna, and Blue Cross/ Blue Shield of Kansas City. “Traditionally, health insurers are a little behind the times from a technology standpoint, but we really see that they are engaging in voice biometrics,” says Julia Webb, executive vice president of sales and marketing at VoiceVault. “I look at it as a great install base, where we can extend the enrolled voiceprints to other applications, such as information about benefits or premiums. It’s also very relevant right now because of healthcare reform. The healthcare providers we’ve been talking to are saying, ‘How are we going to deal with this major influx of applications from people who are moving policies?’” Using voice biometrics, VoiceVault has seen mailThe most significant ing and back-end costs decrease, and an increase U.S. legislation to affect in closure rates as well. speech technology in the The company says that its next year is likely to be the VoiceSign solution has enabled organizations to HITECH Act, requiring all increase telephone sales healthcare providers to closure rates by more than switch to electronic health 20 percent and reduce administrative costs assorecords (EHRs) by 2014. ciated with the typical paper trail that accompanies handwritten signatrail that meets industry regulatory stan- tures by up to 80 percent. dards, according to VoiceVault. The com“With changes in healthcare laws, we pany points out that collecting traditional see people looking for individual medical pen-and-paper signatures is not only time policies moving away from small group consuming, but costly as well, whereas plans. Voice signature solutions allow obtaining voice signatures is a relatively customers to apply over the phone,” simple process and as legally binding as a Webb says, adding the efficient verificawritten contract. tion process enables organizations to hanVoice biometrics can be used to facili- dle a large number of applications and tate compliance, secure access to EMRs, policy changes. reduce costs, and provide iPad access to EMRs for medical practitioners in a Speech Arms Users secure manner. Voice biometrics is also widely used for VoiceVault helps companies use voice surveillance and identification in military biometrics to generate legally binding e- and police organizations, with users signatures over the phone to efficiently including agents, soldiers in the field, add security to the health insurance and forensic specialists. application process. The company’s eIn December 2012, Speech Technology signatures are recognized as legally bind- Center (STC), based in Russia, rolled out ing under several government acts, such the world’s first bimodal system on a as the E-Sign Act, HIPAA/CMS, and nationwide level in Ecuador. The soluFDA 21CFR Part 11. Already, VoiceVault tion combines facial and voice identity www.speechtechmag.com
SPEECH IN GOVERNMENT
biometrics that are used for criminal forensics. Voice samples and photographs of suspected criminals can be placed in the database and used for comparison purposes to detect matches among suspects. The state-of-the-art system took about a year to roll out. The company’s voice biometrics solution has a reliability rate of 97 percent, and when voice is combined with facial biometrics, the rate is closer to 100 percent. The technology’s algorithms are able to deliver reliable results even if a face has been altered. “The entire system is more reliable because you’re not using just one form of identification,” says Alexey Khitrov, president of SpeechPro, the U.S. subsidiary of STC. “The idea was to make the system as usable as possible. You can use just the face, the voice, or both.” “Voice and face identification are providing new and valuable investigative capabilities,” said Mikhail Khitrov, chief executive officer of STC, in a statement. “The biometric technologies providing the foundation of the system have proven to be reliable and robust in even the most challenging conditions. As biometric technologies mature, we’re seeing a growing demand for these kinds of tailored voice and multimodal biometric solutions—not just in Latin America, but in the global marketplace.” In June 2010, SpeechPro launched the world’s first nationwide automatic voice identification system for the government of Mexico. That deployment helped more than 250 law enforcement agencies throughout Mexico collect, manage, and search hundreds of thousands of voiceprints in their fight against crime. The company is reportedly negotiating similar deployments elsewhere in Latin America, Asia, and Europe. For now, though, the focal point is Latin America. “In Latin America in particular, they are dealing with crimes like kidnapping, drug trafficking, and organized crime,” Alexey Khitrov says. “It’s such a big issue that [governments] are willing to boost their technology to counter it.” STC is also making headway in U.S. law enforcement following a strategic partnership deal last year with Data Works Plus to integrate voice www.speechtechmag.com
identification technologies into its law enforcement software application platform. Several law enforcement agencies have committed to pilot the system, which will take voice samples of criminals when they are arrested. STC’s success in Ecuador and Mexico is “helping to build confidence in it throughout the U.S. and Europe,” Alexey Khitrov says. “We’re seeing a growing interest in biometrics in general as a result.” Madrid-based voice biometrics software solution provider Agnitio has been a growing presence in the field. In early 2012, the company launched Kivox Mobile, a product for secure authentication in mobile devices using a person’s voice. The software can perform voice authentication on Android smartphones or tablets without being connected to a network. “Kivox Mobile will bring voice authentication closer to the user,” said Emilio Martinez, CEO of Agnitio, in a statement. “You will teach your device to recognize your voice, whenever you want and at the pace you want. The device will improve its recognition capabilities over time, and you will be able to test how it works in multiple situations. Agnitio technology is used around the world to identify persons of interest to fight terrorism and improve homeland security. Reducing the size of our engine so that it can run locally in a consumer mobile device is truly unique.” In May 2012, the U.S. government’s Advanced Analytic Capabilities Subgroup of the Technical Support Working Group awarded Agnitio a research and development contract to deliver improved voice biometrics–based mobile phone security capabilities. With the contract, Agnitio’s mobile voice authentication solution will be used in various devices and operating systems to secure remote authentication in field tactical operations. “Our technology has been tested by many end users,” Martinez says. “In addition to being accurate…our systems can process millions of conversations in a few minutes. The time you need to create
The U.S. Army’s intelligent virtual assistant, Sgt Star, lets the Army communicate with recruits and their families, and has answered more than 11 million questions to date.
a voiceprint is roughly thirty to forty seconds, but once you have that, you can do matching in a fraction of a second. We’re very optimistic about voice biometrics.” The U.S. Army is employing several types of speech technologies. Next IT, a provider of Intelligent Virtual Assistant (IVA) technology, created Sgt Star, a virtual assistant that the Army uses to communicate with recruits and their families. Sgt Star—who routinely shares personal information with potential recruits—has answered more than 11 million sensitive, personal, and potentially life-altering questions ranging from “Will I have to serve in combat?” to “Are the showers co-ed?” “[An] advantage of our platform is that it is able to work with any speech recognition commodity, since the power is in the conversational understanding,” said Denise Caron, Next IT’s chief technology officer, in a statement. “We believe in proving that our technology works, and Sgt Star is the first of many upcoming deployments, including releases in healthcare.” Several speech technology providers, including SRI International, are working on more technical solutions. SRI was recently awarded a $7.1 million contract for the first phase of a fiveyear, $41.5 million project with the Defense Advanced Research Projects Agency (DARPA). SRI will provide research with the goal of developing systems that can translate SPRING 2013
Speech Technology | 23
SPEECH IN GOVERNMENT
foreign languages accurately, no matter what the source, and provide clarification and instantaneous interpretation. The DARPA contract is part of the BOLT program, a worldwide research project focusing on language technology. SRI also has worked with DARPA’s Global Autonomous Language Exploitation effort, which develops software that analyzes and translates speech and text in various languages. Part of the initiative, the Spoken Language Communication and Translation System for Tactical Use, uses technology to facilitate communication between the U.S. military and foreigners. The International Computer Science Institute (ICSI) has long been involved in government-driven speech initiatives and in November announced its Babel project, on which it was working with the Intelligence Advanced Research Projects Activity. Babel focuses on building speech recognition solutions with self-imposed time and data limitations for a variety of languages. ICSI said that a team of researchers will be focusing on speech technology’s basic principles rather than smaller improvements that have been made to existing technology. The scientists believe this study could be helpful for keywordsearch systems for languages without much transcribed audio. Nelson Morgan, the deputy director and leader of the speech group at ICSI, says that the advances made in speech recognition have also, ironically, been a hindrance. Because they have the “curse of being relatively good,” there’s been less impetus to change the technology, he explained in a statement. Notification Services Speech solutions are also a hot commodity in providing the technology behind N11 services, such as 911 and 511, and have more than proved their worth in disasters, such as Hurricane Sandy, and for notifications, such as in the case of mass shootings. 24 | Speech Technology
SPRING 2013
VoltDelta N11 offerings include a number of features that help callers to quickly pinpoint the information they need. Exceptional disambiguation works to avoid confusion for like-sounding topics or points of interest to increase success rates. Personalization remembers the stretch of road the motorist last asked about to more quickly provide targeted traffic updates. “Government is a key focus of our business,” says Terry Saeger, senior vice president and general manager. “We’re getting a lot of traction in the 511 space. That’s a pretty complex speech recognition task because of all the points of interest and the unbounded grammars that are involved with space, locations, and points of interest. With 511, it’s not just speech,
“Because of the nature of the 511 business, they tend to be five-year contracts,” Saeger says. “511 really got going about ten years ago, so companies have been going through a few [contract] cycles, and now we’re going into a stage where states are going through evaluation processes.”
Government IVRs MicroAutomation has several contracts with government agencies, such as the Department of Treasury’s Financial Management System (FMS), which provides a type of help desk to callers asking about tax issues. “In every regard, [government agencies are] like the customer service department of a major business, in that they’re helping people using the same technologies of intuitive menus; they are looking at natural language processing Advances in speech within their IVRs; they’re recognition have the “curse looking at multiple languages,” MicroAutomaof being relatively good,” tion’s Fischer explains. resulting in less impetus to “All of these commercial characteristics are equally change the technology. relevant.” Fischer says that such technologies involving speech are either in place or on a roadmap it’s information that’s needed, such as to be put into place. accidents and bad weather.” “Sometimes what we’re finding is that VoltDelta, which has a contract with the government can appear to be slow to the state of New Jersey, proved its mettle adopt to many of these new technologies when Hurricane Sandy hit the state. and capabilities, but that’s only because “We have a hybrid deployment model they have to be bulletproof and have to with the state, where we have some withstand not only usage requirements premise-based equipment and also have but any political issues that could potenour traditional hosted call center technol- tially arise,” Fischer notes. ogy in our data centers,” Saeger says. “As the world is starting to embrace “When the storm hit, obviously there speech, the standard is now becoming were huge spikes in traffic, five or six speech recognition, and the growing stantimes the normal volume, but it wasn’t a dard is going to be natural language,” he challenge. The building that housed the continues. “There have been investments on-premise equipment got flooded and in a lot of technologies. Government cuslost power, and we had seamless fail- tomers are getting smarter and realizing over to our data centers.” that they don’t have to throw out what Saeger maintains that the future of they had five or ten years ago, but improve 511 is bright, and he sees more mobile upon it.” applications integrating with speech Staff Writer Michele Masterson can be applications. He also points to a lot of 511 reached at mmasterson@infotoday.com. turnover this year. www.speechtechmag.com
Speech TECHNOLOGY
The 2013 Annual
Reference Guide Visit online at www.SpeechTechMag.com/BuyersGuide and discover the full functionality of our interactive guide!
The 2013 Annual
Reference Guide Welcome to the 2013 Speech Technology Reference Guide. This past year has indeed seen a proliferation of mobile speech-assisted applications which are especially well suited for speech interfaces, and 2013 and the near future beyond also bode well for the trend to continue with strength and momentum. In this 16th publication of Speech Technology’s Annual Reference Guide, we’ve seen the transformation of speech technologies from an era of frustration and unrequited promise to the current era of casual confidence and assumed reliability. As great a boon as speech-enabled voice apps have been in the mobile market, so, too, have speech-enabled apps succeeded in the drive to improve customer experiences at a cost that makes sense for business executives and boardrooms.
Speech Technology magazine and SpeechTEK Conference & Exhibition remain committed to our core readership and audience, but we have branched out to support the practical application of speech (and other) technologies with the Customer Service Experience (CSE) Conference we launched in the summer of 2012. CSE is colocated with both SpeechTEK and CRM Evolution conferences and will be reprised at the New York Marriott Marquis Hotel, August 19-21, 2013. Colocating these three independent conferences gives us the opportunity to comingle attendees for maximum networking and cross learning. We’ve also made every effort to encourage attendees to sample sessions in all three conferences by making “All Access” passes very inexpensive. Take advantage of our content offerings, which promote improving customer experiences from three unique lenses. As you continue your speech technology explorations, please make sure to visit us online, where you will find a wealth of targeted information built into the online version of this print Reference Guide, which is simply not possible to present in a magazine. This is going to be a great year in the speech world! Bob Fernekees Group Publisher Speech Technology magazine
Speech TECHNOLOGY
RG26 Speech Technology 2013 Reference Guide
www.speechtechmag.com
2013 Annual
Sponsored Content
Speech Reference Guide TECHNOLOGY
Angel 1850 Towers Crescent Plaza Tysons Corner, VA 22182 Phone: 888-MyAngel Contact: sales@angel.com Web: angel.com
ANGEL OVERVIEW
ANGEL INNOVATION
Angel is a leading provider of cloud-based Customer Experience Management (CEM) solutions for Interactive Voice Response (IVR) and Contact Centers. These solutions enable mid-market and enterprise organizations to quickly deploy voice, SMS, chat, mobile, and Business Intelligence (BI) applications that all put the Customer Experience (CX) First. Angel’s solutions are built on an on-demand, software-as-a-service (SaaS) platform and require no investment in hardware, software, or human resources. More than 1,000 customers worldwide turn to Angel’s CEM solutions to delight their customers and their bottom line.
Traditional views of call center or voice automation infrastructure center around “boxes”. These functional unites were added, in chain-like mode, to phone lines, to construct somewhat sophisticated voice offerings. Unfortunately, this approach has proven complex to manage, difficult to scale, and very costly. Angel was designed as an alternative to these cumbersome applications. The solution was built from the ground up as a service-oriented architecture, and as such, can inter operate seamlessly with other web-service architectures. Angel takes traditional “boxes”, e.g., IVR, ACD, PBX, and Call Recording, and introduces them as components of a platform that can be used and re-mixed as needed in applications that seek to solve a specific business challenge.
KEY BENEFITS AND FEATURES ■ Interactive prototyping ■ Fastest deployment ■ Flexibility and control ■ Full integration ■ Personalized, interactive
communications ■ Highly customizable
At the bottom of the Angel technology stack is the telecom infrastructure, making use of technologies such as SIP and VOIP. On top of it, a standards-based VoiceXML platform provides the interface to drive caller interactions with speech recognition, media playback and speech synthesis. These resources are managed by the application definitions produced within Site Builder. Site Builder and the Voice site framework run on enterprise-grade Java and Oracle database technology. As calls terminate, reporting information is gathered for further analysis in a date warehouse.
■ Real-time reporting ■ Variable greetings ■ Call recording and analysis ■ Support with PASS
www.speechtechmag.com
In addition, Angel has published a powerful set of APIs for developers. With these high-level APIs, developers can embed voice functionality into their enterprise applications. We currently offer transaction API, outbound API and call queue API.
Speech Technology 2013 Reference Guide
RG27
2012 Annual 2013
Speech Reference Guide
Calabrio Inc.
•
•
•
•
Digital Base
•
iSpeech, Inc. •
Nexidia
•
•
•
•
•
•
•
•
•
•
•
Plum Voice Portals
•
•
•
•
•
•
•
•
•
•
Voice User Interface (VUI) Design
Voice Prompt Recording Services
•
•
Text to Speech
• •
•
Testing Solutions
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
VoiceVault
•
•
• •
•
•
•
•
VoltDelta •
•
•
•
•
•
•
•
•
•
VoxPeritus Walsh Media
Analytics Solutions for analyzing speech to extract useful information about the content or the speakers. Authoring Environments Individual products or suites of products to aid in the design, deployment, and maintenance of speech solutions. Auto Attendant Solutions Telephony applications that work in conjunction with a live operator to incorporate functions such as call answering, call transferring, unified messaging, voicemail, and information on demand. Call Center Solutions Speech-enabled technology for the call center with the goal of replacing or enhancing current IVR systems. Carrier/Service Provider Companies that provide speechenabled solutions to the carrier and service provider industry. Computer Telephony Integration (CTI) Solutions that integrate telephony and computing to provide a platform for applications that streamline or enhance business processes. Consumer Electronics Companies that embed their speech solutions into products like talking dictionaries, toys, games and other consumer products used in the home. Desktop Applications Desktop software solutions that rely on speech recognition, speech synthesis, and/or speaker verification. Embedded Solutions Companies that develop speech solutions that are small enough to be integrated into other devices.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Managed Services Service providers that allow businesses to invest in advanced customer interaction technologies without capital investment or the risks associated with implementation. Mobile Solutions Speech technology solutions used in mobile devices such as wireless phones and multimodal devices. Outsourced Services Third-party providers of speechrelated services and solutions. Packaged Applications Pre-built, generic solutions for specific applications (e.g., auto attendant). Platform Providers/Hosting Hardware and software architecture for speech-enabled solutions, which can have additional software written and customized by an outside vendor. Professional Services/Consulting Companies that offer technical and operational expertise in this industry. These companies can be vendor-neutral or tied to a specific vendor. Service Creation and Management Solutions Companies that provide subscribers and the associated service-specific operations the complete life cycle of a voice application including design, development, deployment, means for analysis, monitoring and maintenance capabilities. Speaker Verification/Biometrics Solutions using an automatic process that analyzes human voice characteristics to identify a speaker (or caller). Speech Recognition Solutions that enable machines to understand and act on human speech inputs.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
• •
•
•
•
• •
•
•
•
•
•
• •
• •
•
•
•
•
•
• •
• •
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
• •
•
•
•
Speech Processing Solutions
RG28 Speech Technology 2013 Reference Guide
•
•
•
•
Sensory, Inc. •
•
•
•
Parlance
Voxeo
•
•
•
•
•
LumenVox
Verint
• •
•
Chant
SOFTEL Communications
•
•
• •
Voice over IP (VoIP)
•
AVST/Applied Voice and Speech Technologies, Inc.
•
Speech to Text
Angel.com
•
Speech Solution Integration
•
•
Speech Search
•
•
Speech Recognition
•
Speaker Verification/Biometrics
•
Service Creation and Management Solutions
Embedded Solutions
•
Professional Services/Consulting
Desktop Applications
•
Platform Providers/Hosting
Consumer Electronics
•
Packaged Applications
Computer Telephony Integration (CTI)
•
•
All Worldly Voices
Outsourced Services
Carrier/Service Provider
•
Mobile Solutions
Call Center Solutions
•
3M Health Information Systems
Managed Services
Auto Attendant Solutions
Authoring Environments
Topic Centers
Analytics
Index to
Translation Services
Sponsored Content
TECHNOLOGY
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
Speech Search Companies that provide audio mining or search-by-voice solutions. Speech Solution Integration Consulting firms and vendorbased departments that work with clients to make sure that all the speech solutions and other systems work together seamlessly. Speech-to-Text The process of converting spoken words into written format for processes such as transcription and voicemail-to-email. Testing Solutions Solutions that serve as one of the last critical phases before implementation to ensure customer efficiency and satisfaction and to confirm an application works within the given infrastructure. Text-to-Speech Technology solutions with the capabilities to interpret electronic text and generate audible speech from the text. Translation Services The process of changing written or spoken words into another language through automated or human-based products and services. Voice over IP (VoIP) Providers of telephony services that send voice data as digital packets over the Internet. Voice Prompt Recording Services Voice talent and recording services for IVR, voicemail, ACD, PBX and speech recognition applications. Voice User Interface (VUI) Design Companies or service providers that write voice prompts and scripts, outline call flows, record professional-sounding prompts, and conduct usability testing.
www.speechtechmag.com
2013 Annual
Speech Reference Guide
Sponsored Content
TECHNOLOGY
Verint Systems ®
Phone: 1-800-4VERINT Email: info@verint.com Web: www.verint.com Twitter: @verint.com Facebook: facebook.com/verint
Enhance Customer Experiences and Corporate Performance with Voice of the Customer Analytics Customers use a variety of channels to communicate with— and about—your business, including phone, chat, email, text messages, review sites, social media such as Facebook and Twitter, and more. With Verint® Voice of the Customer Analytics™ solutions, you can capture and centralize this information across multiple channels, analyze it efficiently, and share it across your enterprise for strategic planning and quick decision making. Unlike point solutions that are typically scattered across different departments and lines of business, Verint Voice of the Customer Analytics offers a holistic view of the customer experience, providing insight into rising trends, market dynamics, and more. Armed with this information, you can: • Understand what’s really happening with your customers, spot issues and patterns across channels, and better predict customer needs, expectations, and behaviors. • Use customer feedback and sentiment from multiple communications and social media channels to drive decisions on services, processes, and products. • Improve the overall customer experience by developing targeted customer treatment strategies while delivering a consistent experience across channels. • Enhance operational efficiency through a detailed understanding of key performance metrics for your business. • Uncover and take action on developing trends and areas of opportunity for competitive advantage.
Impact 360 Text Analytics™—Captures and analyzes customer interactions and sentiments across all text-based communication channels, including survey responses, email, Web chat, and social media sources. Verint Enterprise Feedback Management™—Provides an enterprise-wide customer feedback capability, enabling organizations to capture targeted, highly segmented comments and sentiment through surveys across customer touchpoints—IVR, email, mobile, social media, SMS and the Web—and push reports to drive timely insight across the business. About Verint Verint® Systems Inc. (NASDAQ: VRNT) is a global leader in Actionable Intelligence® solutions. In the enterprise intelligence market, our Workforce Optimization and Voice of the Customer Analytics software helps organizations of all sizes capture and analyze multichannel customer interactions and sentiments, optimize the customer experience, and improve enterprise performance. Thousands of organizations worldwide rely on Verint solutions as a strategic asset to help increase customer satisfaction and loyalty, enhance products and services, reduce operating costs, and drive revenue. Contact a Verint representative today for more information or to schedule a demo.
Verint Voice of the Customer Analytics solutions capitalize on the powerful integrations within our award-winning Impact 360® Workforce Optimization™ suite. They include: Impact 360 Speech Analytics™—Mines recorded phone conversations to surface the intelligence essential for building effective cost containment, revenue generation, and customer service strategies.
©2013 Verint Systems Inc. All rights reserved worldwide.
www.speechtechmag.com
Speech Technology 2013 Reference Guide
RG29
2013 Annual
Speech Reference Guide
Sponsored Content
TECHNOLOGY
Voxeo Corporation 189 South Orange Avenue 10th Floor Orlando, FL 32801 Phone: +1 (407) 418 1800 Fax: +1 (407) 264 8530 Twitter: @voxeo www.voxeo.com
About Voxeo Voxeo helps businesses worldwide improve customer service while lowering costs. Our multi-channel communications platform enables personalized self-service, proactive outbound customer care and targeted sales. In addition to traditional inbound and outbound IVR, Voxeo supports shifting communication preferences to mobile, web and social interactions. Our unique “design once, deploy anywhere” solution minimizes the cost and complexity of migration to a multi-channel contact center and Unified Customer Experience. More than 250,000 developers, 45,000 companies and half the world’s largest service providers use Voxeo.
Improve Automated Customer Service Voxeo helps automate customer interactions in ways that benefit the business and the consumer. We offer: • A single solution for IVR, text, chat, mobile web and social channels. • An application lifecycle management environment that is proven to cut application development costs by up to 50% and ongoing maintenance by 80%. • A simplified approach to improving automation success with personalization and integrated analytics. One large retailer was able to increase call containment from 70% to 90%. • A 10x performance advantage over other standards-based IVR platforms. Prophecy has the ability to support more than 6,000 concurrent calls per server. • Cloud hosting options that eliminate barriers to adopting the latest technologies and customer interaction channels.
interactions, proactive outbound support and targeted promotions across communication channels.
Optimize performance with cross-channel analytics Continually adapt and refine your applications to identify opportunities and quickly resolve issues. Voxeo delivers actionable cross-channel analytics and more than 60 pre-defined, customizable reports covering the areas of administration and maintenance, application development and tuning, as well as business and caller analysis.
Deploy on your terms with the Voxeo Triple Cloud Voxeo provides complete flexibility to meet your specific business needs today and protect your investment for the long term. Take advantage of our global cloud infrastructure, deploy in a private cloud on your premise or in a managed services environment, or implement a hybrid cloud for unique efficiencies around bursting and failover. Companies around the world are using Voxeo’s Triple Cloud to balance conflicting business requirements, enhance communications and simplify migration to a multi-channel strategy that serves and engages today’s mobile customer.
Our experience is your advantage Hundreds of thousands of applications have been deployed on Voxeo solutions. Our global cloud is the largest hosted multi-channel communications platform in the world, with 90,000 ports spanning 6 data centers in North America, Europe and Asia. Voxeo Customer Obsession Teams are aligned to support your needs, exceed your expectations, and rapidly mobilize to apply the experience gained from working with thousands of customers, partners and developers.
Design once, deploy anywhere
Freedom to build great communication apps
Implement a customer contact solution that is ready today to support evolving communication preferences. Voxeo provides a clear migration path to multi-channel customer service, eliminates duplicate investments and minimizes risk. Use one application and platform to drive consistent customer interactions via voice, text messaging, mobile web, mobile native and smartphone apps and social networks.
Everything Voxeo offers is openly available for you to try. Sign up for the Evolution Developer Portal and get started for free. You don’t have to sign an NDA. You don’t have to talk to a sales person if you don’t want to. If you need tech support, we’re available 24x7 to help. Get started today at www.voxeo.com/free
Personalize every interaction Personalization increases automation rates, streamlines interactions and improves customer satisfaction. Powered by built-in integration to popular Business Intelligence and CRM systems, Voxeo enables dynamic selfservice across multiple channels and languages based on customer types, values, preferences, transaction histories and more. Voxeo makes it easier to create and update personalized dialogs, enabling enhanced inbound
RG30 Speech Technology 2013 Reference Guide
www.speechtechmag.com
2013 Annual
Sponsored Content
3M Health Information Systems 575 W. Murray Blvd. Salt Lake City, UT 84123 USA Phone: 800-367-2447 Email: 3MhisSales@mmm.com Web: www.3Mhis.com
Speech Reference Guide TECHNOLOGY
3M Health Information Systems delivers dictation, transcription, speech recognition and electronic signature solutions to meet the healthcare industry's need for integrated voice and document management, including 3MTM VoiceScriptTM and 3MTM ChartScriptTM Software. 3M can help your organization reduce transcription costs and turnaround times, integrating with other information systems across your enterprise. 3M also offers a web-based solution specifically developed for MTSOs. 3M Health Information Systems is a global provider of medical record coding, terminology and reimbursement solutions designed to improve clinical and financial performance. Our innovative software and consulting services focus on on solutions for coding, ICD-10, dictation/ transcription, speech recognition, web-based and mobile technology, and clinical terminology to support the electronic health record.
Speech Processing Solutions One Deerfield Centre 13560 Morris Road, Suite 1400 Alpharetta, GA 30004 Tel: 877-SPEECH2 (877-773-3242) Fax: 877-SPEECH4 (877-773-3244) Email: pds.marketing@philips.com Web: www.philips.com/dictation
Put Your Voice to Work! Over 50 years of experience and an outstanding reputation for quality and reliability has made Speech Processing Solutions the number one manufacturer in the professional dictation market. With its headquarters and production centre located in Vienna, Austria, Speech Processing Solutions has been the driving force in delivering voice technologies to users around the world. Revolutionary digital dictation products, such as the Philips SpeechMike, the Philips Digital Pocket Memo, and the Barcode Module are helping professionals to work more efficiently. We continually strive to innovate, develop and design superior dictation products and complete workflow solutions to make voice processing a natural part of everyday work life.
office-based desktop dictation to mobile dictation, from conference recording to entire workflow solutions with the SpeechExec software family, our products support every imaginable scenario. Professional voice recording with dictation functionality on BlackBerry, iPhone/iPad and Android as well as speech recognition integration via our SpeechExec software boosts voice technologies to the next level. Philips’ digital dictation management software enables companies to manage their dictation workload on an enterprise-wide basis. Integration of state-of-the-art mobile and stationary recording devices and seamless integration with other software automates the document creation process. With today’s high tech tools your voice can do the typing, whether you’re in or out of the office! For more information, visit: www.philips.com/dictation Follow us: Twitter: www.twitter.com/speech_com Facebook: www.facebook.com/philipsvoicetracer YouTube: www.youtube.com/philipsdictation LinkedIn: www.linkedin.com/company/speech-processing-solutions
A broadly based and highly customizable range of professional dictation products enables you to optimize your entire documentation workflow on an unprecedented scale. From
www.speechtechmag.com
Speech Technology 2013 Reference Guide
RG31
2013 Annual
Speech Reference Guide
Sponsored Content
TECHNOLOGY
All Worldly Voices 2610 Westwood Drive Nashville, TN 37204 USA Phone: 615-321-8802 Email: sales@worldlyvoices.com Web: www.worldlyvoices.com All Worldly Voices has been producing professional voice prompts for 20 years. We translate and record voice prompts in 35 languages digitized to any NP format. We’re the choice of Speech Technology leaders, Fortune 500 companies and businesses from around the globe. Email your script for a free quote today!
Chant, Inc. AVST (Applied Voice & Speech Technologies, Inc.) 27042 Towne Centre Drive, Suite 200 Foothill Ranch, CA 92610 USA Phone: 949-699-2300 Web: www.avst.com AVST delivers the industry’s most interoperable Unified Communications (UC) platform that brings best-of-breed voice, mobile and business process applications to the enterprise. By connecting new and existing technologies, AVST frees organizations from the constraints of a closed, single-source UC solution, unlocking the full potential of their communications infrastructure.
8921 S. Sepulveda Blvd. Suite 207 Los Angeles, CA 90045 USA Phone: 310-410-9895 Fax: 310-410-9896 Email: speechtek@chant.net Web: www.chant.net Chant is a leading provider of software and services that help organizations gain competitive advantage using speech and natural user interface (NUI) technology. Whether you develop desktop, server, embedded, mobile, telephony, or web applications, Chant software makes it easy to integrate speech and NUI technology. Chant application-ready components simplify using the SDKs from speech and NUI technology vendors.
LumenVox
Digital Base Productions
iSpeech
2715 81st Street Lubbock TX 79423 USA Phone: 800-776-1446 Fax: 806-745-7424 Email: sales@digitalbase.com Web: www.digitalbase.com
211 Warren Street Newark, NJ 07103 USA Phone: 917-338-7723 Email: sales@iSpeech.org Web address: www.iSpeech.org
At Digital Base Productions we offer 65 languages with some of those talents being multi-lingual. When you need timely responses Digital Base will be available to you. Our pricing is competitive and you can use your prompts for ANY application. Digital Base Productions will meet or beat your deadlines.
RG32 Speech Technology 2013 Reference Guide
iSpeech is a leading provider of speech technology and mobile apps, including the award winning app, DriveSafe.ly®. Developers, organizations and consumers worldwide choose iSpeech for high quality, scalable and easy to use text-tospeech (TTS) and speech recognition (ASR) solutions. The iSpeech Cloud has been used over a billion times.
3615 Kearny Villa Road Suite 202 San Diego, CA 92123 USA Phone: 858-707-7700 Fax: 858-707-7072 Email: LVinfo@Lumenvox.com Web: www.LumenVox.com LumenVox is a speech automation solutions company that provides native 64-bit Automated Speech Recognition (ASR), Text-to-Speech (TTS), Call Progress Analysis (CPA), and Natural Language Understanding (NLU) technologies. LumenVox v.11.0 software is highly scalable, reliable and certified by many industry partners to operate with their platforms. Through accurate speech recognition and powerful tools, our software helps hundreds of contact centers, platform providers, IVR developers, system integrators and IP–PBX vendors achieve their speech enablement goals.
www.speechtechmag.com
2013 Annual
Speech Reference Guide
Sponsored Content
TECHNOLOGY
Nexidia 3565 Piedmont Rd., NE Building Two, Suite 400 Atlanta, GA 30305 USA Phone: 866-355-1241 Fax: 404-495-7221 Email: info@nexidia.com Web: www.nexidia.com Nexidia provides customer interaction analytics solutions with patented technologies and breakthrough applications that enable companies to drive business transformation by capturing, making sense of, and using the full range of communications they have with customers. As the traditional voice of the customer expands from the contact center to include surveys, email, chats, and even social media sites, Nexidia provides software and service expertise to help companies synthesize this data into both a tactical tool for operational improvements and a catalyst for strategic business transformation.
Sensory, Inc. 4701 Patrick Henry Drive, Bldg 7 Santa Clara CA 95054 USA Phone: 408-625-3300 Fax: 408-625-3350 Web: www.sensoryinc.com Sensory sells speech recognition and speaker verification technology into consumer electronics. Our target markets include mobile phones, PC’s, Bluetooth devices, televisions, automotive and home electronics. Sensory has over 30 patents issued and applied, and have shipped over 150 million devices. Key customers include AT&T, BlueAnt, Hasbro, Mattel, Motorola, Plantronics, Samsung, Sega, Sony, Toshiba, VTech and many, many, more. Sensory's TrulyHandsfreeTM approach to keyword spotting has enabled a new generation of products that require no touching to activate speech!
www.speechtechmag.com
Parlance Corporation
PlumVoice
400 West Cummings Park, Suite 2000 Woburn, MA 01801 USA Contact Person: Mark Bedard Phone: 888-700-NAME (6263) Email: mbedard@parlancecorp.com Web: www.parlancecorp.com
131 Varick Street 9th Floor New York, NY 10013 USA Phone: 800-995-7586 Fax: 646-349-5972 Email: sales@plumgroup.com Web: www.plumvoice.com
Parlance Corporation provides highly effective operator services solutions that reduce operator workload and enhance the caller experience. A customer-first attitude, combined with innovative tools, applications and services, has made Parlance the leading operator services provider to leading healthcare facilities, colleges & universities, and Fortune 500 enterprises since 1996. Visit www.parlancecorp.com.
SOFTEL Communications 400 Perimeter Center Terrace NE Suite 900 Atlanta, GA 30346 USA Phone: 877-4-SOFTEL Fax: 781-998-7737 Email: jcognata@softel.com Web: www.softel.com Since 1993, SOFTEL has provided systems integration and enterprise application development for leading organizations across the globe. This rich history of multi-platform systems integration to optimize contact center, speech self-service, mobility, and unified communications has enabled us to expand our expertise in developing complimentary solutions and products for the industry.
Plum Voice is the leading single-source provider of automated telephony solutions and web-based IVR application development tools. For businesses of all sizes that demand high-performance, versatile, scalable IVR, Plum offers a unique combination of hosted and premise-based VoiceXML IVR systems as well as complete professional services and industry-specific voice applications.
VoiceVault 400 Continental Blvd., 6th Floor El Segundo CA 90245 USA Phone: 310-426-2792 Email: info@voicevault.com Web: www.voicevault.com VoiceVault Inc. is a leading provider of voice biometrics with focused expertise delivering identity verification solutions to the Financial Services and Healthcare markets. Voice biometrics is a method of identifying individuals by measuring their unique vocal characteristics. The approach relies on the simple fact that speaking is completely natural and effortless and that no two voices are exactly the same. Our solutions enhance multi-factor authentication processes with something you are — your voice.
Speech Technology 2013 Reference Guide
RG33
2013 Annual
Speech Reference Guide
Sponsored Content
TECHNOLOGY
VoxPeritus
VoltDelta 3750 Monroe Ave. Suite 4B Pittsford, NY 14534 Phone: 866-436-1169 Email: info@voltdelta.com Web: www.voltdelta.com twitter.com/voltdeltanews youtube/voltdeltachannel VoltDelta OnDemand provides voice recognition and contact center solutions to enhance customer care with Software as a Service (SaaS) efficiency. Unique CrystalWAVE speech technology works to engage callers while reducing costs. VoltDelta’s multi-channel platform with international reach supports more than 2 billion calls and SMS messages per year with exceptional reliability.
13771 N Fountain Hills Blvd. Suite 114-353 Fountain Hills, AZ 85268-3733 Phone: +1 623-295-9211 Fax: +1 734-468-4253 Email: info@voxperitus.com Web: www.voxperitus.com VoxPeritus brings strong brand management knowledge, coupled with years of customer experience improvement skills in contact centers’ multiple channels. We work closely with branding and marketing teams to clearly understand each client’s current state of brand management across all channels, and then partner with clients’ operations, IT and marketing teams to align contact centers’ multiple channels to appropriately reinforce the brand. The VoxPeritus team has experienced and corrected customer experience faux pas from subtle to the ridiculous, and improved our clients’ brand value while reducing their operating costs through increased use of self-service and reduced queuing.
Walsh Media, Inc. 2100 Clearwater Drive Suite 300 Oak Brook, IL 60523 USA Phone: 630-574-8008 Fax: 630-574-8118 Email: info@walshmedia.com Web: www.walshmedia.com Professional voice solutions for Contact Center Applications: persona development; script verbiage consultation; professional international voice talent; expert telecom file formatting; ongoing support. We employ best practices to enhance the customer experience in IVR, Speech Recognition, PBX, ACD, Message-on-hold and web audio applications. Now offering TTS alternatives.
For additional Interaction with Vendors keep your attention focused on our eWeekly newsletter and Web sites for information on
upcoming Webinars and Round Table Discussions concerning specific technology and vertical areas of interest. Visit us at www.speechtechmag.com/webinars
It’s Not Too Late to be included in our online guide at www.speechtechmag.com/AnnualReferenceGuide Please contact your integrated marketing manager for details. Mountain & Pacific
Eastern & Central
Dennis Sullivan Advertising Director 203-445-9178 dennis@infotoday.com
Adrienne Snyder Advertising Director 201-327-2773 adrienne@infotoday.com
RG34 Speech Technology 2013 Reference Guide
www.speechtechmag.com
COME HANG OUT WITH US! Speech Technology is online! Come hang out and get social with us. It’s the perfect place to connect with others in the industry, get insights, discounts, and industry news!
Speech Developer Programs
Have Hit
Prime Time From the novice to the experienced, it’s boom time for developers By Michele Masterson
D
id you ever think that you could one day go to WalMart and add items to your shopping list by speaking into your phone? How about getting into your car and having a navigation assistant speak directions to you? Until very recently, such scenarios may have seemed futuristic at best, but these and thousands of other applications are now part of the commercial market, thanks to speech technology developer programs. Several speech tech giants are sharing the wealth of their technology and offering developers platforms to play, create, and ultimately sell applications that can turn into profits. “The whole idea is to allow developers to build applications in speech where there’s minimal or no knowledge of speech,” says Mazin Gilbert, assistant vice president of technical research at AT&T Labs. 36 | Speech Technology
SPRING 2013
Developer Projects Angel recently unveiled Lexee, a software development kit (SDK) that uses Angel’s Site Builder to create conversational, personalized voice solutions that can quickly adjust mobile solutions depending on customer needs or market changes. The Lexee SDK is a Web-based point-and-click application that does not require any coding background. Lexee enables businesses to provide a voice-activated iOS or Android mobile solution to their customers, track the impact of their mobile solutions with analytics, and allow their users to be more productive and flexible by letting them have conversations via their mobile apps. Prior to the official launch, Salesforce.com incorporated Lexee into its mobile application, which allows users to verbally www.speechtechmag.com
request information and reports to be pulled from their Salesforce.com account. Rather than search for information manually, Lexee enables mobile applications to perform tasks and execute transactions, such as updating sales information or quickly pulling reports, all by voice commands. In July 2012, AT&T launched seven AT&T Watson-enabled speech APIs that developers can access to quickly create apps and services with voice recognition and transcription capabilities. The first set of APIs focus on seven areas: Web search, local business search, question and answer, voicemail to text, short message service (SMS), a U-verse electronic programming guide, and dictation for general use of speech recognition. “This includes an open speech or generic API, and that is sort of the holy grail of being able to transcribe speech into www.speechtechmag.com
text,” Gilbert says. “That API is trained on a million-plus words and hundreds of thousands of speakers, and that’s available to developers who want to do speech recognition and don’t have a clear notion of what application they need.” Initially, the APIs will be available for Androids and iOS, with more AT&T Watson Speech APIs coming for areas such as gaming, social media, speaker authentication, and language translation. Developers at Verdatum, a provider of software focused on voice productivity, management, and workflow, used AT&T’s APIs in-house for its Verbble solution, a voice input application for mobile workforces that provides a proprietary talk, type, and tap shell over the top of business applications and services, such as Salesforce.com, Oracle CRM On Demand, Word/DOCX, PDF, SPRING 2013
Speech Technology | 37
THE BOOM IN SPEECH DEVELOPER PROGRAMS SQL, and Outlook/Exchange. Verbble enables users to employ a native device application to complete data input, validation, and editing. When the data is input, a single tap routes it back to the original system programmatically, as if the user was at her desk the entire time. “Speech recognition is only one part of the Verbble platform, but it is certainly the most visible and impactful component,” says Michael Fitzpatrick, Verdatum’s chief technical officer. “Integrating the AT&T Speech API expanded our platform’s technology reach and prepared us to be able to leverage some of the pending advanced functionality of the AT&T Speech API.” Nuance Communications has the largest speech developer program, NDEV, which has more than 12,000 subscribed members. NDEV Mobile brings Nuance’s Dragon speech platform to mobile developers via the Dragon Mobile SDK, and offers broad language coverage and support for mobile app developers supporting the iOS, Android, and Windows 7 platforms. The program has yielded many voice-enabled apps, including Price Check by Amazon, Ask for iPhone, Merriam-Webster, Dictionary.com, RemoteLink from OnStar, SpeechTrans, Yellow Pages, AirYell from Avantar, iTranslate, Taskmind, SayHi Translate, Vocre, Bon’App, Dolphin Sonar, and Sonico iTranslate, among others. Coupons.com’s Grocery iQ is an iPhone and iPad app that integrates Nuance’s Dragon voice technology through the NDEV Mobile developer program. Grocery iQ creates, manages, and shares shopping lists and helps users find and use coupons as well. Free to download, the app lets users add items to their shopping lists by simply speaking what
Coupons.com’s Grocery iQ is an app for the iPad and iPhone that integrates Nuance’s Dragon voice technology to let shoppers manage and add to shopping lists, find coupons, and more.
with our technology, and you can start building apps and trying them out free of charge.” Voxeo developers can get a free account in its host environment as well as a free download of its premise product. The company has a customer developer forum called Evolution, where users can sign up for a free account. From there, developers can start writing voice XML applications and upload voice XML scripts, and Voxeo will provide free phone numbers for testing. The company also offers APIs for SMS, so in addition to building IVR applications, developers can build SMS apps, such as confirmation messages. Goebel says that this is a fully self-sufficient approach that lets developers get their own accounts, resources, and documentation. “You don’t even have to sign up to get an evaluation download of our software; you don’t have to interact with Voxeo at all,” he says. “If you like how it works and you’re ready to go to production, then you would contract with us. We also offer deployments without a contract, which is a pay-as-you-go model.” VoiceVault has had a developer program in place for more than a year, and has roughly 400 developers. It also has a selfregistration program and provides access to its APIs and its voice biometric engine, which can be accessed free of charge for 90 days. Documentation is free, and there is a self-help community developer forum. “We have very few requirements. We’ve made it open, easy to use, and as frictionless as possible,” says Nik Stanbridge, director of product marketing at VoiceVault. “If someone wants to join our program, they don’t have to talk to a salesperson. We want to encourage developers to talk to each other and not feel as though they would get a hard sell from us, which tends to put people off.” After the 90-day trial period, a developer may reach out to the company and extend the trial. This is done on a case-by-case basis. Typically, at some point during this extended trial, project and pricing are discussed. Since each project is different, costs are custom tailored. Powered by the AT&T Watson speech engine, AT&T’s Speech API supports speech-enabled apps that run on virtually any cell network in the United States. There are seven speech contexts available that are built and maintained by AT&T, and which the company tunes on an ongoing basis. Developers can send audio, and AT&T sends the text of what an end user said. Key features include native and HTML5-based SDKs and seven optimized speech contexts. “We’re providing the software that goes into your application, and this software basically talks to our API that sends speech in real time and is able to recognize it,” Gilbert says. “Some
“We want to encourage developers to talk to each other and not feel as though they would get a hard sell from us, which tends to put people off.” they want. By integrating Dragon voice recognition into the iOS version of Grocery iQ, users are able to dictate multiple items in a continuous list for automatic recognition and addition to their list. Users can also add items by typing or scanning bar codes on product packaging using the camera on their mobile device. Tearing Down the Walls Rather than viewing its members as competitors, speech tech companies are seeing the benefits of extending their technologies to the developer community and are focused on lowering the barriers of entry. At Voxeo, the welcome mat is out for developers. “We’ve always been open to developers,” says Tobias Goebel, director of mobile strategy. “We’re really tearing down the walls of getting in touch 38 | Speech Technology
SPRING 2013
www.speechtechmag.com
THE BOOM IN SPEECH DEVELOPER PROGRAMS developers want to build their own APIs, they want to specialize in their platform; some of them don’t have that expertise and they want to pull the software into their application. We’re doing this so people don’t have to reinvent the wheel.” Gilbert, stressing the openness of the program, says there are no prerequisites for developers. “There are no requirements,” he says. “The whole idea is that we’re trying to fuel innovation in the industry, we’re not acting as a filter. Building speech applications takes anywhere between a minimum of three months to three years, mostly in IVR types of applications. The approach we’ve taken is to make it simple—they are just plug and play.” There is a registration or introductory fee of $99 for all of AT&T’s APIs. In the coming year, there will be a monetization pricing model. Nuance features three service tiers for developers, Silver, Gold, and Emerald, and provides access to Nuance Mobile SDKs, training materials, partner APIs, and support services to facilitate Nuance speech and text input solution integration with developer applications. Our mission “was to make it really easy for third-party developers to integrate speech, both dictation as well as text-tospeech, and integrate those core services into applications,” says Kenneth Harper, director of product management and marketing at Nuance. “We’ve seen some developers who have been able to fully integrate Dragon Dictation and text-to-speech in their application in a matter of days. That is our goal, to simplify how third parties can get access to this technology.” The Silver program is free to develop and free to go live. It provides automated speech recognition dictation and search models for more than 20 languages; network TTS for over 45 languages; a speech kit SDK; support for Android, iOS, and Windows Phone 7 platforms; Bluetooth; a customizable UI; and help with an application via a centralized speech resource. “The Silver program is targeted toward those types of application developers where volume isn’t high,” Harper says. “Maybe they’re going to be shipping their application to 10,000 consumers, in which case there aren’t going to be a lot of transactions that we’re handling on our side.” The Gold tier runs about $300 to develop and $3,000 to go live. It provides the same offerings as the Silver program, plus an
“Some developers … don’t have that expertise [to build their own APIs] and they want to pull the software into their application. We’re doing this so people don’t have to reinvent the wheel.”
HTTP interface, help via an online ticketing system, and Secure Sockets Layer. “Here, we do have a fee. In exchange for that, we support a lot more volume,” Harper says. “This is for developers who expect very high downloads, and this is where some of our more successful applications fall into.” The Emerald program offers everything in the Silver and Gold tiers, plus additional speech capabilities, dedicated tech support, the highest service-level agreements, and consulting services. Pricing in this model is customized, with more services offered depending on a developer’s unique needs. “Maybe they have a unique domain or they want us to build a custom language model to help improve accuracy of dictation or maybe they want us to do some consulting in the area of design,” Harper says. “This is where there is a lot more customization, and we also expect that with these types of applications, volume is going to be highest. This is where we have very successful applications that are on hundreds of thousands of mobile phones.” In the coming year, Harper says Nuance has a goal of exposing more speech technology to third parties and expanding the Silver and Gold communities. “Over time we’re going to offer more abilities to customize dictation and text-to-speech for specific use cases that developers care about,” he says, recognizing the growing need for customized and vertically focused solutions. Harper expects a steep increase in developers and handset manufacturers alike using speech technology. “A big part of our business is selling our solutions to handset manufacturers, such as a personal assistant that’s available out of the box. But that’s really only half of the mobile phone ecosystem. Our vision with the NDEV program is [to] get lots of third parties also integrating speech into their applications for the downloadable market. We’re starting to move in that direction, where there’s going to be a pervasive voice ecosystem on the phone.” The future for speech developers looks bright. “Speech has become mainstream,” Harper points out. “One reason for that is Check out the following Web sites for more information on the developer what Apple has done with Siri. Apple has programs highlighted here. done a tremendous job of using speech and natural language to create a good • Angel Lexee SDK: http://www.angel.com/labs/lexee.php experience around speech-enabled inter• AT&T Developer Program: http://developer.att.com faces. That’s created a certain level of awareness in the market that we haven’t • Nuance NDEV Mobile: http://dragonmobile.nuancemobiledeveloper.com/ seen before.”
Programs Mentioned
• VoiceVault Developer Program: http://www.voicevault.com/developers/ • Voxeo Developer Program: http://evolution.voxeo.com/
www.speechtechmag.com
Staff Writer Michele Masterson can be reached at mmasterson@infotoday.com. SPRING 2013
Speech Technology | 39
>> SUCCESS STORIES of listening to recorded calls. For clients, the payoff is significant. Not only does the system offer a valuable resource, but it spawns greater compliance, efficiency, and ultimately performance, as well, according to CBE. CallMiner Eureka speech analytics software allows CBE to extract meaningful business intelligence from recorded calls and complete quality assurance and compliance reviews on every collection call made. CallMiner notes indicators of stress, excess silence, or other signs to indicate calls where additional collector training may promote a more favorable outcome. “We’re making hundreds of thousands of calls a day,” says Chad Benson, chief operating officer of CBE Group. “Being able to efficiently and systematically analyze every Eureka Speech Analytics helps extract meaningful single one makes it possible business intelligence | BY MICHELE MASTERSON to respond quickly to change, measure the success of client BE Group has a long history in the accounts receivable manageinitiatives, monitor expectations of ment business, having served customers since 1933. Today, the associates, and have a better Cedar Falls, Iowa–based business employs nearly 1,000 people understanding of what is being commuin four locations, and has over 500 clients in a variety of indusnicated on the phone.” tries, including healthcare, utilities, satellite telecommunications, fiCallMiner Eureka captures customer nancial services, education, and government. The company focuses on conversations from a variety of sources maximizing the recovery of accounts receivables for lenders and serv—predominantly recorded phone calls, ice providers via first-party accounts receivable collections and but also chats, emails, and any other charged-off debt recovery. channel in which organizations comCBE originally installed CallMiner Eureka Speech Analytics Version municate with their customers. The sys7.1 in November 2010, and upgraded to Version 8.1 in July 2012. tem directly connects with these According to CBE, the upgrade of the software strengthens its investthird-party sources of data and pulls the ment in speech analytics and positions it to deliver greater value in its conversations in. accounts receivable management solutions. The long-term impact of Along with those conversations the upgrade will be seen through increased performance and complicome metadata or attributes associated ance, the company said. with the conversation, as well as inforThe integration and application of speech analytics has been vital mation on which agent handled the to CBE’s ability to proactively ensure compliance, pinpoint training interaction, when it occurred, and opportunities, and monitor trends—all without the time-intensive task whether it was in a specific location
CBE Lets Companies Collect with CallMiner Platform
40 | Speech Technology
SPRING 2013
www.speechtechmag.com
>> SUCCESS STORIES >> SUCCESS STORIES >> SUCCESS STORIES
or department. This may include a customer identifier, which allows Eureka to associate any CRM data with the interaction, such as the length of time someone has been a customer, the services they subscribe to, or the amount of their outstanding debt. The first thing Eureka does is normalize conversations into data; in the case of calls, this means recognizing words spoken on the call and any acoustic metrics, such as silence, agitation, tempo, length of contact, etc. With text-based communication, Eureka parses the text out of the source file of the conversation into words and punctuation. Next is automated categorization, in which contacts with similar characteristics are tagged according to a set of rules. These rules typically regard language patterns (the presence or absence of certain language), whether the appropriate procedures are followed in the collections process, and other key components specified by the Fair Debt Collections Practices Act. Every contact (or targeted contact) is then scored based on a combination of measures. The system takes disparate measures of activity on the call, such as “Was the proper procedure followed?” and “How much silence was on the call?” and combines those into a weighted average or summed score. At this point, all of the unstructured information stored in the conversations has been converted into structured data that can be analyzed, searched, and measured. Lastly, Eureka looks at thresholds, trends, and other aspects of correlation and then presents feedback in an easyto-understand and consumable fashion for contact center managers, supervisors, and agents. Eureka also delivers reports and scorecards to users and provides a full analytics application for conducting searches, discovery, and root cause analysis. www.speechtechmag.com
CBE turned to speech analytics after seeing a negative trend in certain areas of performance within a team of employees. The platform allowed the company to quickly identify specific factors contributing to the trend, determine the root causes, and then implement a fast plan for improvement. Continued trending showed a 299 percent improvement in just seven months. A recent policy change by CBE affected the order in which certain disclaimers are stated during a phone call. Now, with speech analytic capabilities, CBE can measure compliance with the change through a custom tracking element within CallMiner. With continual trend monitoring, CBE can proactively pinpoint individuals or teams that can benefit from additional training. It is anticipated that this proactive measurement will reduce the time to implement the change by 200 percent. After a client’s competitive analysis metrics were released, CBE implemented corresponding custom speech analytic measurements. Not only was CBE able to inspect what was expected, but it as able to provide the ability to proactively influence trends. Scorecard performance improved from an average of 89.6 percent over three quarters to a recent 98.4 percent. CallMiner’s latest upgrade significantly increased the benefits CBE brings to its clients. Key features include: • a score builder tool that allows CBE to weight criteria in a way that pinpoints analysis and training on highpriority goals like compliance and performance; • an enhanced format with improved graphics, drill-down capabilities, export features, tag clouds, and criteria-based sorting that enhances the ability to analyze scores and see trends; and • the capability for supervisors to analyze scored calls, make comparisons, identify trends, and extract
“
When we can be more efficient [and] more compliant, it means more bottom-line ROI for our clients.
”
specific details, allowing them to learn from their success and be proactive about individual improvement. “Utilizing speech analytics in our operations has made a significant difference for us and, more importantly, our clients,” says Benson. “Speech analytics has proven to be a benefit for our clients—directly and indirectly. When we can be more efficient, more compliant, and more productive, it means more bottom-line ROI for our clients. “The key to our success with speech analytics doesn’t come from just having the tool,” he continues. “We’ve invested a great amount of time and resources into truly integrating it into our business and our culture. We’ve customized the program, incorporated it into our training and quality assurance programs, and extended access to everyone in operational management so they can leverage the tool right on the call center floor.”
App at a Glance SINCE INSTALLING CALLMINER EUREKA SPEECH ANALYTICS, CBE GROUP HAS:
seen a 9.6 percent increase in collection revenues; reduced the time needed to implement changes by 200 percent; and noted scorecard performance improvement from an average of 89.6 percent over three quarters to 98.4 percent.
SPRING 2013
Speech Technology | 41
>> SUCCESS STORIES >> SUCCESS STORIES >> SUCCESS STORIES
Mobile Technology Solves TV Troubles Synchronoss’ SmartCare IVR gives Mediacom subscribers on-the-go options via their smartphones | BY LEONARD KLIE olving their service and billing issues has become easier for the approximately 1 million subscribers to Mediacom Communications, the nation’s eighth-largest triple-play cable TV, phone, and Internet service provider. Those subscribers, located in smaller cities throughout 22 states in the Southeast and Midwest, can now use their smartphones or tablets to access the company’s customer service department. This improvement is thanks to a mobile application, MediacomConnect, which is powered by SmartCare, a cloud-based product of the Synchronoss Mobile Content Management Platform. The application was launched in November. In the first month alone, the mobile application was downloaded by more than 12,000 customers. Since then, the company has seen about 1,000 additional downloads per week, with very little marketing. The application is able to guide customers through the process of troubleshooting problems with their TV or Internet service, schedule service appointments, and view and pay bills. Customers can use a voice-activated or on-screen search to quickly connect to the information they need. And if they book service appointments, the app can automatically add that information to their phones’ calendars. 42 | Speech Technology
SPRING 2013
MediacomConnect is designed specifically to extend key customer care services and functions to the mobile environment. “Consumers increasingly rely on smartphones for anywhere, anytime access to information, and that trend is driving the need for self-service capabilities that further enhance the customer experience,” says Tapan Dandnaik, senior vice president of customer service and financial operations at Mediacom, headquartered in Middletown, N.Y. “By expanding customer interaction options to include mobile devices, our subscribers gain direct access to their accounts and the ability to communicate with us in a manner that meets their preferences and fits their busy schedules.” If needed, customers can connect to the contact center right through the app. The 24-hour call center employs about 800 agents, but if the call center is busy, customers can schedule callbacks for when it is convenient for them. “An agent is never more than a button press away,” Dandnaik says. During a customer’s call with an agent, Synchronoss’ computer telephony integration makes all the account www.speechtechmag.com
>> SUCCESS STORIES >> SUCCESS STORIES >> SUCCESS STORIES
in this issue
“
Customers are using the app, and we can look at that as call deflection.
information and the troubleshooting steps already taken available to the agent as a screen pop. That goes a long way toward keeping customer satisfaction high, Dandnaik says. Prior to adopting Synchronoss, Mediacom had been using interactive voice response technology from SpeechCycle (acquired by Synchronoss early last year) for its regular phone channel. “Our relationship with SpeechCycle provided a lot of value to our customers, so [implementing Synchronoss] was a natural move,” Sonia Boska, senior manager for customer experience at Mediacom, says. “Our goal has always been to make it easy for our customers to do business with us,” Dandnaik adds. “Synchronoss really bought into our vision of what we want to do for our customers.” Prior to the official rollout of the mobile offering in November, the company tested the application for about a month with about 2,000 subscribers, who gave it high marks for ease of use. Before that, the company subjected the app to internal testing among its employees. “When we launched it, we were very confident that our customers would like it and use it,” Boska says. Roughly 30 percent to 40 percent of the customers who have already downloaded the application have used it for troubleshooting problems with their service; about 25 percent have used it to pay their bills or check their accounts. “These are people that are getting satisfaction when they use [the application], and there’s no need for them to call the contact center,” Dandnaik says. “Customers are using the app, and we can look at that as call deflection,” Boska adds. The new application is available as a free download for use on iPhones, iPads, or Android smartphones. Dandnaik says that Mediacom will likely www.speechtechmag.com
”
add support for Windows Mobile 8 and other mobile platforms in the not too distant future. The app took about a year and a half to build, largely because of the amount of integration that went with it, according to Dandnaik. The company also plans to update its Web site to allow for more multichannel and cross-channel interactions. The customer service portion of the company’s home page currently provides information on how to chat with, call, or email Mediacom customer support. “Service providers are quickly realizing the value of adding a mobile selfservice application as a way to improve customer relationships,” said Biju Nair, executive vice president of product management and chief strategy officer at Synchronoss, in a statement. “With the adoption of SmartCare, Mediacom will enhance the customer experience by providing its subscribers with an application that utilizes the rich capabilities of the smartphone and [lets them] gain access to the answers they need, when and how they want them.” Dandnaik says, “By deploying innovative solutions like MediacomConnect, we hope to further simplify the customer service experience for all our customers.”
■ SPEECH USER COMPANIES References are to the first page of the story or section in which the company appears. CBE Group ................................................................ 40 Coupons.com ........................................................... 38 Mediacom Communications................................. 42 Salesforce.com ....................................................... 36
■ ADVERTISERS 10-Page Speech Technology Annual Reference Guide ................................ RG25 Conversational Technologies ...............................47 www.conversational-technologies.com CRM Evolution 2013 ........................... back cover www.crmevolution.com Customer-Centric Books... inside back cover books.infotoday.com Customer Service Experience 2013 ................. inside front cover www.custservexperience.com
Face2Face ................................................................ 47 books.infotoday.com International Services ........................................... 47 www.internationalservices.com
Speech Technology magazine online ................... 8 www.speechtechmag.com
Speech Technology
App at a Glance
SINCE ROLLING OUT THE MOBILE CUSTOMER SERVICE APP FROM SYNCHRONOSS IN NOVEMBER, MEDIACOM HAS SEEN:
12,000 downloads in the first month; 1,000 downloads a week since; and 30 percent to 40 percent of users troubleshoot with it and 25 percent of users pay bills with it, leading to a significant amount of calls deflected from the contact center.
Social Media ........................................................... 35 www.speechtechmag.com
Speech Technology subscriptions......................... 5 www.speechtechmag.com SpeechTEK 2013......................................................... 3 www.speechtek.com Worldly Voices.........................................................11 www.worldlyvoices.com 615-321-8802
DONNA FLUSS AND DEBORAH NAVARRA
THE BUSINESS CASE
The Emergence of Real-Time Solutions for Contact Centers Improving the customer—and agent—experience with analytics and guidance ontact centers are real-time organizations. They respond to customer phone calls and, increasingly, text messages, in real time, yet depend upon procedures and systems that are reactive. This is fine for assessing the performance of a contact center or looking for trends, but it’s not ideal when agents need guidance or next-best-action recommendations to get their jobs done. In the past two years, vendors have been working to build real-time capabilities for contact centers. They are being delivered in the form of real-time guidance and realtime speech analytics. The opportunity for such applications is huge, and DMG expects to see new capabilities introduced to the market over the next few years. Workforce optimization (WFO), performance management, and speech analytics vendors are starting to deliver solutions dedicated to influencing or altering the outcome of calls.
C
A New Generation of Real-Time Applications The new breed of real-time–enabled WFO solutions utilizes predictive analytics, decision engines, business rules, and workflow to drive smarter interactions and help contact center agents consistently take the right actions to address customers’ needs. End users are beginning to place more emphasis on real-time capabilities that enable agents and supervisors to react in the moment, alter outcomes, and have a positive impact on the customer and agent experience. Following are some examples of how real-time applications are being used to improve various aspects of contact center and agent performance and the overall customer experience. Real-time guidance. The presence or absence of a speech and/or desktop event can trigger a predefined set of instructions or a script. During customer interactions, agents are presented with real-time coaching prompts to remind them to complete a critical compliance step, notify them of exceeded limits, and possibly provide them with an optimal upsell and/or cross-sell offer. Based on the real-time context of the interaction, an agent can also be given links to a knowledge base, an Internet/intranet site, or other training material. Real-time guidance is of particular benefit for new hires, as it can help expedite the training process while maintaining quality standards. Real-time next-best action. Next-best-action capabilities rely on an application’s ability to pull real-time data from disparate systems and databases and combine it with contextual information about the interaction from the agent’s desktop and/or conversation. Information about the agent’s profile (tenure, skill proficiency, key performance indicators, etc.) 44 | Speech Technology
SPRING 2013
can be factored into the analysis through integrations with quality assurance, workforce management, and other applications. Data about the customer profile and preferences can be retrieved through integrations with the CRM or servicing system. The data, gathered in real time while the caller is on the line, is aggregated and compared with business rules or predictive analytics models. Based on the real-time data analysis and client-defined business rules, the application presents callouts or pop-up boxes that provide agents with the most appropriate action to take or offer to make. Real-time workflows. A real-time decisioning engine can provide contact center agents with next-best-action and next-best-offer recommendations while they interact live with customers by phone or in a chat session. Some solutions can provide real-time, step-by-step guidance based on an agent’s skill level and quality scores, as well as the customer’s profile and data on past interactions and preferences. Some solutions can also take into consideration what the customer is saying. Real-time process automation. This application automates data entry and propagation, screen navigation, and lookups. It can also aggregate relevant data from multiple applications into a single view to improve operational efficiency and customer satisfaction. Real-time reports and dashboards. These feedback tools give supervisors and agents a visual snapshot of what is happening at the agent desktop or in the contact center environment, in the moment. Real-time alerts. Pop-up alerts and workflow flags notify users when action is needed, metrics have breached acceptable thresholds, or something out of the ordinary has occurred. Analytics is going to play an essential role in the future of all service organizations, including contact centers. WFO vendors are the leading providers of analytics to contact centers, and many are continuing to make substantial investments in their analytical capabilities. Today, leading WFO vendors are offering many analytics applications, including speech analytics, text analytics, desktop analytics, and performance management. But these are just the beginning of the analytical capabilities expected to emerge in the next five years. Predictive analytics has been talked about for years, but is just now attracting investment dollars and is starting to be delivered in product packages. Capabilities such as next-best action and real-time guidance use business rules and, increasingly, predictive capabilities, to help contact center agents consistently take the right actions to address customers’ needs (see chart). www.speechtechmag.com
Solution
Definition
Real-time Capabilities
Use Cases/Benefits
Quality Assurance
Determines how well agents adhere to internal policies and procedures
Real-time scoring of interactions based on the presence or absence of speech and/or desktop events
• Real-time agent guidance triggered based on quality monitoring-driven criteria and/or agent skill level • Real-time alerts to supervisors
Workforce Management (WFM)
Forecasts and schedules agent staffing needs; may include long-term planning capabilities
Real-time visibility into forecasting and scheduling needs, intra-day management, and real-time adherence
• Real-time multichannel queue management • Monitoring of channel status or agent/group activity to assess real-time intra-day staffing needs based on volume • Real-time agent schedule adherence • Real-time delivery of targeted training materials during low-volume periods • Real-time intra-day optimization to expedite and optimize current-day schedule adjustments • Real-time communication with agents
Speech Analytics
Provides real-time analysis of channel-separated audio streams based on content, context, and emotion detection
Identifies customer needs, wants, and insights in real time
• Real-time next-best-action and next-best-offer recommendations for agents • Real-time script adherence • Real-time compliance monitoring
Text Analytics
Extracts information from unstructured text-based data by applying linguistic and/or statistical techniques for extraction and modeling
Alerts to signal changes in topics, emerging issues, and sentiment
• Topic detection and monitoring • Reputation management • Sentiment analysis • Product management and development • Customer experience • Voice of the customer (VoC) insights • Transactional processing and decision support • Theme and sentiment extraction • Content mining to analyze structured metadata in tandem with unstructured data • Auto-categorization of text-based comments based on context and brand/businessspecific rules
Desktop Analytics
Measures and provides transparency into how well agents interact with their desktop servicing applications, and the overall performance of these supporting systems
Real-time monitoring and measurement of all desktop application events and usage
• Real-time alerts for managers and IT staff about system operations and performance • Real-time ability to append agent-entered data to recorded interactions (e.g., ordertaking, payment processing, etc.) • Real-time feed of events of employee activity, to be used to track real-time schedule adherence • Real-time prompts based on specific desktop actions or thresholds of application usage • Real-time workload balance • Ability to be used with predictive analytics or client-defined business rules to provide real-time guidance and process automation
Contact Center Performance Management
Empowers line managers and supervisors to make ongoing tactical adjustments and improvements in real time to achieve departmental and enterprise goals
Aggregation of real-time data from a variety of contact center operating systems, specifically the automatic call distributor, computer telephony integration, WFM, sales, collections, etc., to populate real-time reports and dashboards
• Real-time displays of KPIs, metrics, and other statistics to facilitate immediate intervention • Real-time performance information for third-party compensation systems • Real-time activity alerts
Coaching
Communicates with agents to assist them in improving their performance
Real-time coaching prompts to remind agents to complete a critical compliance step, notify them of exceeded limits, and provide them with an optimal upsell/crosssell offer and script; can also provide links to a knowledge base, an Internet/intranet site, or other training material based on the realtime context of the interaction
• Expediting the training process for new hires • Improvement of quality standards • Ensuring consistency of responses across channels and agent population • Improving outcomes • Improving adherence to policy, process, and procedures • Improving customer satisfaction
eLearning
Assists in the creation, issuance, and tracking of training courses
Real-time targeted delivery of training and other content during low-volume periods
• Delivering customized training content to each agent who needs help • Ensuring that the agents get the information they need on a timely basis • Improvement of call quality, customer satisfaction, and agent productivity • Reduction in canceled training sessions due to high call volume and other departmental priorities
Surveying/ VoC
Captures and measures customer satisfaction with a company’s products and service
Real-time speech analytics, multichannel customer feedback, decisioning, agent guidance, and feedback-based alerts and workflows to enable organizations to improve the customer experience
• Real-time pop-up alerts and workflow flags to notify users automatically when action is needed or a customer has crossed a threshold of dissatisfaction • Real-time churn decisioning
Reporting/ Dashboards
Provides user-configurable, real-time reports and dashboards for any contact center data sources
• Providing automated real-time tickers, alerts, and broadcast messaging • Providing supervisors with a visual snapshot of what is happening in the moment • Empowering managers to intervene while customers are still on the line • Enabling agents and supervisors to alter the outcome of sales and service interactions Source: DMG Consulting LLC
Real-Time Capabilities and Benefits by Application
Final Thoughts There is great potential to use real-time insights from analytics applications to alert agents and supervisors about sales or problem-solving opportunities immediately, while the caller is still on the phone, and to provide feeds to other contact center applications or processes. DMG expects to see significant investments in the area of real-time capabilities
www.speechtechmag.com
during the next five years by both vendors and other customers, as this is going to be one of the next major waves of investments in contact centers. Donna Fluss (donna.fluss@dmgconsult.com) is founder and president of DMG Consulting, a provider of contact center and analytics research, market analysis, and consulting. Deborah Navarra is a senior analyst and consultant with DMG. She is a contact center technology industry analyst and operations specialist with more than 25 years of experience.
SPRING 2013
Speech Technology | 45
ROBIN SPRINGER
VOICE VALUE
Universal Design Offers Options—and Access For many, this solution provides more than just convenience peech recognition technology is being used to feature, perhaps the last four digits of your phone number; replace more and more functions, seemingly daily. and the number of the prescription you want to refill. We use speech recognition to replace our hands when we That’s pretty much it. Here, dual-tone multifrequency is are typing. We use it to replace our eyes while we are driv- faster and more accurate than speech, unless you cannot ing. We even use it to replace ID cards that allow us access trigger the buttons on your telephone keypad. to restricted areas, whether those areas are on a laptop computer, an ICU in a hospital, or a nuclear facility. Steps Versus Ramps With Apple’s introduction of Siri, it seems the whole In an attempt to make society more accessible for people world jumped on a collective bandwagon, wanting all who have disabilities, laws were put in place requiring that speech all the time. Ironically, though, with the heightened buildings provide a means for a person with a disability to focus on speech technologies, in many make use of the structure. Fire and smoke ways, people today are communicating less alarms should have an audible siren and We have to be often by speaking and more often in blinking lights to alert those who have careful not to silence, by methods such as texting. vision impairment or hearing loss. If the abandon features With this juxtaposition of speech and that will also entrance to the building is a set of stairs, the abandon users. nonspeech options, we seem to be in the building owner must add a ramp. midst of a social experiment. What is the But not all ramps are created equal. One right mix? When should we include speech as an option? may have such a sharp incline it is better suited for skateWhen, if ever, should we require it? Under what circum- boards than wheelchairs. Another traverses the side of the stances should we forgo speech? How do these answers building multiple times, exponentially increasing the distance change as we ask the questions to people throughout the one must cover by foot, walker, crutches, or wheelchair. vast social spectrum? Moving the accessibility conversation to our connected There is no one technology that is right for every person lives, manufacturers may offer more than one way for users at every time in every circumstance. This is where Univer- to interface with products, but will there always be suffisal Design comes in. A concept whereby products and cient choices? Legacy features are already disappearing places are designed to be usable by the greatest number of because they are considered passé, even though they are people, Universal Design incorporates redundancy; prod- relied upon by people with disabilities. We have voice in, ucts and places must offer more than one way to use or we have voice out, we can type on a virtual keyboard withcontrol them. If there’s a touch screen, there should also be out lifting our fingers from the screen. We even have tactile discernible keys. If you can enter information by voice, you feedback, by which items on the screen vibrate to confirm should also be able to enter information by touch screen to the user that he has implemented the action he intended. and keyboard/keypad and even with a stylus. But try finding a smartphone with a physical keypad. It’s not that every product must have every option, but we have Convenient Versus Cool to be careful not to abandon features that will also abandon Devices today are getting smaller. That’s great if you users. We also have to ensure that the most accessible prodwant to put your computer in your purse, but not if you ucts are not the most expensive ones. lack fine motor skills. Sometimes it seems as if manufacturFunctionality should take priority over technology. Uniers make products to be as cool as possible, without consid- versal Design promotes choice for some. But it provides ering the social effect. Putting accessibility aside for a access for others. moment, some tasks can be more efficiently completed by not using one’s voice. Robin Springer is an attorney and the president of Computer Talk, Inc. (www.comptalk.com), a consulting firm specializing in speech recognition and other hands-free technology Take calling your pharmacy for a refill. What informaservices. She can be reached at (888) 999-9161 or contactus@comptalk.com. tion does the pharmacy need? Your name; an identifying
S
46 | Speech Technology
SPRING 2013
www.speechtechmag.com
SPRING 2013
Speech Technology | 47
MOSHE YUDKOWSKY
FORWARD THINKING
Singing the Praise of Speech Recognition Applications are out there—if you know where to look t last census, I owned 15 or so types of musical don’t integrate speech technologies, there must be a reason. instruments, though I’m not proficient in all just My preliminary conclusion: Voice is just too difficult. yet. And I may purchase a set of bagpipes, which, surprisMy current framework of choice for development on ingly, is still legal here in Chicago. mobile devices is Phonegap, which lets me create code in Every Tuesday, I’m at the Celtic Knot in Evanston, Javascript and HTML5 and works cross-platform. The playing in an Irish session (more properly known as a “sei- code supports access to the core functions of mobile siun”). I also sing, and after extensive research of folk songs applications, but not speech recognition. So I thought I’d just prior to February 14 last year, I discovered a half-dozen found the doom and gloom I was looking for...but not love songs in which nobody dies, and now when someone quite. Android provides a decent set of application procalls for a love song, I can offer a range of body counts. gramming interfaces to let developers access its capabiliThe other night, one of the singers was two-thirds of the ties, and if you’re willing to program in Java, there’s a way through a song and stumbled over a verse. So I had to number of simple tutorials that explain how to use speech ask myself: Wouldn’t it be nice to have an application that recognition. And since Phonegap encourages the developcould follow along as we sang and whisper the next verse ment of plug-ins to extend its basic capabilities, I tapped in our ear? This would make for a fine out a few more searches and found an mobile application, using speech recognition extensive official list of plug-ins, includWhen it comes to and text-to-speech. ing one that extends Phonegap to use speech technology, there’s no room for I admit that I don’t believe it’s possible to speech recognition on Android and gloom and doom. write this application for a mobile device; another that enables text-to-speech. the task would be difficult enough for a A few test sentences with Android desktop or server. Unlike ordinary dictation on my mobile speech recognition or Swype’s dictation will convince you device, in which I speak a sentence or two and then wait that recognition takes place elsewhere, as otherwise it’s simfor and check the transcription, this task requires real-time ply too good, and that indeed is the case: Absent a network tracking of each utterance. But if I cue the device in connection, recognition fails. Probably as a consequence of advance with the text I’m supposed to sing, the recognition this client-server relationship, my ability to set parameters might be possible in real time. I also admit that I find it dif- for recognition appears to be very limited. ficult to believe that I could get text-to-speech to sing in any So I’m afraid this article is entirely free of gloom and manner that would be other than hideously annoying. But doom. All the building blocks for speech technology are it’s problems like these that the fun of research and devel- available, some free and some for a fee; recognition continopment is all about. ues to improve; and more powerful phones can’t help but Since this article was due at the time, I thought I had my improve recognition speed and possibly accuracy. topic in hand: the difficulty of incorporating speech techThe missing ingredient seems to be a reason to use nology into mobile applications. As proof, I had the paucity speech technology on the phone—does anyone really use of such applications on my phone. I have perhaps two Siri or its relatives? I’ve never seen anyone use these perdozen applications on my Android phone, not including the sonal assistants outside of a demo. But today I’m optimistic: ones forced on me by Google and my service provider, none This is not a failure of speech technology, but an opportuspeech-enabled. The Google search app and mapping app is nity waiting to be exploited. speech enabled; I’ve got the Swype beta keyboard from Nuance that accepts dictation as well as touch input, but Moshe Yudkowsky, Ph.D., is president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolution. none of my other apps accept speech recognition directly He can be reached at speech@pobox.com.. and just one provides TTS output. If useful applications
A
48 | Speech Technology
SPRING 2013
www.speechtechmag.com
Books for Customer-Centric Business Pros from the Publisher of EXCELLENCE EVERY DAY
FACE2FACE
Make the Daily Choice—Inspire Your Employees and Amaze Your Customers By Lior Arussy 240 pages ISBN 978-0-910965-79-8 $24.95 ebook also available
Using Facebook, Twitter, and Other Social Media Tools to Create Great Customer Connections By David Lee King 216 pages ISBN 978-0-910965-99-6 $24.95 ebook also available
WEB OF DECEIT
DESIGNING THE DIGITAL EXPERIENCE
Misinformation and Manipulation in the Age of Social Media Edited by Anne P. Mintz 224 pages ISBN 978-0-910965-91-0 $29.95 ebook also available
THE MOBILE MARKETING HANDBOOK, SECOND EDITION A Step-by-Step Guide to Creating Dynamic Mobile Marketing Campaigns By Kim Dushinski 248 pages ISBN 978-0-910965-90-3 $29.95 ebook also available
How to Use Experience Design Tools and Techniques to Build Websites Customers Love By David Lee King 200 pages ISBN 978-0-910965-83-5 $24.95 ebook also available
DANCING WITH DIGITAL NATIVES Staying in Step With the Generation That's Transforming the Way Business Is Done Edited by Michelle Manafy and Heidi Gautschi 408 pages ISBN 978-0-910965-87-3 $27.95 ebook also available
MOB RULE LEARNING
CRM IN REAL TIME
Camps, Unconferences, and Trashing the Talking Head By Michelle Boule 248 pages ISBN 978-0-910965-92-7 $24.95 ebook also available
Empowering Customer Relationships By Barton J. Goldenberg 384 pages ISBN 978-0-910965-80-4 $39.95 ebook also available
Look for these titles wherever books and ebooks are sold, or order direct from the publisher W W W. I N F OTO D AY. C O M For more information, call (800) 300-9868; outside the U.S. call (609) 654-6266. Visit our website at www.infotoday.com or email custserv@infotoday.com. Write to Information Today, Inc., 143 Old Marlton Pike, Medford, NJ 08055.
learn to: ■ streamline business processes ■ create great customer experiences ■ improve customer satisfaction and loyalty
Co-located with
■ increase profitability ■ generate high returns on CRM investments ■ benefit from web 2.0 and social CRM
■ prepare for customer trends that are reshaping the marketplace ■ leverage technologies that will change customer relationships
August 19–21, 2013 Marriott Marquis | New York, NY Super Early Bird Pricing
Platinum Sponsors
For a limited time, register for an All Access Pass before the program is released and save $300. Regular registration opens soon!
Organized and produced by
Smartphone Interactive Users
#CRMe13
SCAN HERE