Thesis - Advertising Futures: The possible implementation of advertising in Voice-First interfaces

Page 1

ADVERTISING FUTURES:

THE POSSIBLE IMPLEMENTATION OF ADVERTISING IN VOICE-FIRST INTERFACES BY NIKOLAJ J. MADSEN Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Bachelor of Arts in Marketing communications at the design akademie berlin, Hochschule für Kommunikation und Design. Supervised by: Prof. Markus Wente & Prof. Dr. Holger Zumholz Submitted July 10th, 2017

Sandy van Helden

1


Legal notice All artworks presented in this paper may be restricted to copyright laws. Any public publication or distribution of this work may lead to legal prosecution. 2


ADVERTISING FUTURES:

THE POSSIBLE IMPLEMENTATION OF ADVERTISING IN VOICE-FIRST INTERFACES July 2017

Chapter 1

Chapter 2

i

ABSTRACT

ii

INTRODUCTION

1

An introduction into the Voice-first world

1 2 3

Voice-First Interface Voice based digital assistants Current state of advertising in voice-first interfaces

5

Three fields affecting the possible voice-first advertising implementation

5

Chapter introduction: The Computer

10

Pervasive Advertising

6 9

Chapter 3

Chapter 4

Human-Computer-Interaction Pervasive Computing

13

Marketing 4.0

17

Restrictions and grounds rules for voice first advertising acceptance

17 17 18

Control Privacy Ethics

19

The voice first advertising framework Conjectured tactics for implementing advertising into voice first advertising interfaces

23

Conclusion

24

Limitations and future research

25

References

3


ABSTRACT

T

he diffusion of voice-first interfaces such as Google’s Google Home, Apple’s newly announced HomePod and Amazon’s Alexa/ Echo create exciting new opportunities for marketers. However, very limited knowledge about the possible implementation of advertisements in this new medium exist. This paper examines the drivers and limitations of users’ acceptance of advertising in voice-first interfaces by studying the fields of Human-Computer-Interaction, Pervasive Computing (Pervasive Advertising), and Marketing 4.0. The study identifies three key limitations to voice-first advertising: (1) user desire of being in control of their system, (2) respect to user privacy, (3) acting ethically correct. This study proposes a conceptual model consisting of four future voice-first advertising scenarios (calm, trigger, relevance, engagement) combined with a chronological structure which advertisements must follow in order to be accepted. This study further emphasizes the bygone era of advertisements as finished products. In a voice-first world advertisements must be ‘packages’ containing the required information digital assistants need to achieve their goal of supporting the user. This study is intended to establish a knowledge-foundation, inspire, and provide possible research questions for future empirical researchers willing to examine the field of voice-first advertising to either confirm or discard. ◆

i


Introduction

T

he use of voice-first interfaces/digital assistant technologies for advertising purposes presents huge opportunities and challenges for the future. Whether we like it or not, voice-first powered digital assistants are infiltrating an ever-increasing part of our daily lives. These technologies have the potential to change the way humans and computers interact fundamentally. Ever since the early computer era of the 1960s, computers have become more intelligent. However, all prior computer interfaces force humans to learn arcade commands to interact with the machine. What if we did not need to learn arcane commands? What if we could use the most effective and powerful communication tool ever comprehended? A tool evolved over millions of years: our voice. When machines understand humans, and not the other way around. How will that affect marketers’ ability to implement advertising into this new relationship? The fundamental direction of adverting in this new voice-based medium is being determined now, and the direction we choose will influence our understanding and the appearance of advertising for years to come. We are now at a crossroads, where marketers and advertisers must make decisions that will influence the way we advertise in a voice-first driven world. One direction is to continue current traditional and digital advertising practices. However, this creates a world clogged with voice-first spam, where people feel spied upon or manipulated into unwanted purchasing decisions. We can, however, also choose to build the future in a beneficial direction. This is a world in which digital assistants, and also advertisements in this medium, actually achieve their potential to support users in making their everyday life easier and more convenient. Any information we desire; news, product information, contact to friends and family or inspiring experiences are provided in a personalized way everywhere and at any time. To this day, we have limited knowledge on how advertising can

take place in voice-first interfaces. Leaving advertising, as we know it today, silent in a voice-first world. Research into the fields of marketing and computer science are both rich, but we lack knowledge into how a new pervasive and voice-based computer environment affect the way we think about digital advertising. When we no longer have a visual space on which ads can be displayed, how do we fit advertisements into this new medium in a way the consumers are willing to accept? As humans, we choose what to focus on with our line of vision. We can decide to ignore banner ads even as they are being displayed right in front of us. But we have far less control over what we can hear, or not hear. So how do we make people comfortable with audible advertisements? Should advertising change its perspective? As machines become more human, should advertising as well? At first, this paper provides an overview of the essential concept of voice-first interfaces/ digital assistants and the current state of advertisements in this medium. Next, a presentation of the concepts; Human-Computer-Interaction, (HCI), Pervasive Advertising, (Pervasive Computing), and Marketing 4.0 will provide valuable knowledge leading to the identification of key limitations voice-first advertising must comply with to be accepted by users. Based on the identification of three key constraints to voice-first advertising this paper proposes a ‘voice-first advertising framework’ presenting four future advertising scenarios (calm, triggers, relevance and, engaging) structured into a chronological order, creating a strategic model for marketing and advertising professionals to build advertising tactics and formats for voice-first interfaces. To validate the legitimacy of the ‘voice-first advertising framework’ this paper will end by stating eight open questions to the proposed framework for future empirical research to either confirm or discard. ◆ ii


Chapter 1: An introduction into the Voice-first world

Voice-first Interfaces

According to Cohen, Giangola, and Balogh (2004, p. 5), Voice-First Interface or VoiceUser Interface (VUI) is “what a person interacts with when communicating with a spoken language application.” Other authors such as Hura (2008) describes VUI as “the script to a conversation between an automated system and a user,” (Hura 2008, p. 197). In other words, a Voice User Interface makes humans able to interact with a computer through voice. What distinguishes VUI from other computer interfaces is speech recognition technology making the computer able to capture and decode its user’s spoken input. The elements, or scripts, behind VUI technology is categorized as: prompts, 1

grammars, and dialog logic, (Cohen, Giangola, and Balogh 2004, p. 5). The prompts, also called system messages, are the prerecorded or computer synthesized elements of speech played by the computer to the user. All the possible responses users can provide to the prompts are defined as grammars. VUI systems are thereby only capable of understanding words, phrases or sentences the grammar include. Dialog logic represents the actions taken by the VUI system, for example providing answers to a given question based on information the computer is capable of acquiring from a database, (Cohen, Giangola, and Balogh 2004, p. 5). Roemmele (2016) states that voice-first in-

terfaces have three advantages over traditionally used computer interfaces. Voice is an ambient medium, in contrast to an intentional medium (typing, clicking, etc.). Visual activity requires a singular focused attention (cognitive load), where speech allows humans to do something else while speaking. Voice is descriptive rather than referential. With spoken language, humans describe objects concerning their role and attributes. In comparison, most human-computer-interactions are referential detailing form and physical shape. Voice requires more modest physical resources. Contrary to manual or visual modalities voice-based interactions can be scaled down to minimal and simplistic inputs. Instead of having to open an app, type in an address, and press ‘start navigation,’ voice interaction allows for very light inputs such as “show me the way home.” However, voice as a medium is simultaneously also the greatest hindrance for the placement of advertisements in voice-first interfaces. While voice-based interaction does not require a singular focus, thereby allowing humans to multitask, humans are not in control of their hearing the same way humans are in control of their vision. On a graphical surface, for example a computer screen, users can decide on what to address their focus and attention. This allows the user to feel in control of the medium. With vision, unwanted content can easily be ignored, even as the content is displayed. Audio, on the other hand, is different. Humans cannot control what to hear or not hear. This is a problem for ads in a voice based medium as today’s receivers of advertisements are programmed to be in control while consuming content. Netflix let users pick exactly what to watch without any interruptions, and YouTube’s TrueView ads can be skipped after five seconds. This has set a fundamental expectation that today’s digital advertisements can be skipped or ignored if the receivers are not interested. The fact that humans are not in control of their hearing, forces voice-first advertisements to address this problem differently than offering the option to either ignore or skip advertised messages users find irrelevant. Voice-first or VUI interfaces have been around for some years now, but first recently made a significant move into our homes and aspects of everyday life as they have transitioned into Digital Assistants. ◆


Chapter 1: An introduction into the Voice-first world

Voice based Digital Assistants The most common use of voice-first interfaces today is found in digital assistants. A digital assistant is hard to define as the academic community is yet to provide a universal definition. Some researchers refer to the technology as Virtual Assistants (Geoffroy et al. 2002, Schmeil and Broll 2007), Voice Assistant (Aron 2011), Virtual Personal Assistant (Imrie and Bednar 2013), Personal digital assistant (Ferguson et al. 2005, Milhorat et al. 2014), Intelligent Assistant (Sadun and Sande 2014), etc. This paper understands a Digital Assistant as a software application programmed with artificial intelligence capable of carrying out electronic tasks for the user. Such technologies currently available on the market include Apple’s Siri, Google’s Google Assistant, Amazon’s Alexa, Facebook’s M, Microsoft’s Cortana, Samsung’s Bixby, etc. But not all Digital Assistants interact with their users by voice alone. For example, Microsoft’s Cortana let its users chat with the assistant, Samsung’s Bixby uses a combination of voice, text, or touch, and chatbots on messaging platforms such as Facebook Messenger is becoming more and more familiar. This paper therefore further differentiates Digital Assistants and Voice-Based Digital Assistants. Voice-Based Digital Assistants is a software application programmed with artificial intelligence which can understand natural language and capable of carrying out electronic tasks for the user. Throughout this paper ‘voicefirst interface’ is used as a description for a ‘voice-based digital assistant,’ limiting the scope of this paper to Digital Assistant usage through voice. The possible implementation of advertising in non-voice based aspects of Digital Assistant usage (e.g. chatbots) is not considered in this paper. The current situation of Digital Assistants is rather complex. Each major technological corporation offers their own digital assistants, each with different capabilities, build into different devices and controlled by different forms of input, from touch to text to voice etc. What exactly a digital assistant is and what its looks like

can be hard for users to clarify. In response to a questionnaire by iProspect, users responded to the question “what’s a digital assistant anyways?” With the answer “It’s a thing on my kitchen counter that I talk to, that searches the web and plays songs when I ask it to”, (Olson et al. 2017a, p. 12). In the quest for making the Digital Assistant scenery more palpable this paper propose a who, where, how classification of the current Digital Assistants landscape. Who describes the particular assistant. For example; Apple’s Siri, Google’s Google Assistant, Amazon’s Alexa, Facebook’s M, Samsung’s Bixby, etc. Where is a description of where these assistants ‘live’. Apple’s Siri can for example be found in different devices such as iPods, iPhones, iPads, Macs, and the newly introduced HomePod. How describe how the user interact with the assistant. For example though voice, text, gestures, etc. The who, where, how classification provides an understanding of the existence of multiple Digital Assistants, Siri, Google Assistant, Alexa. Bixby etc. The where aspect of the classification tells us that each of these assistants are not bound to individual devices but have multiple ‘homes’. The Google Assistant found in a Google Home smart speaker is the same assistant the user will find using their Android smartphone. And lastly how makes us understand that each Digital Assistant offer different ways of interaction. Some assistants offer interaction by voice alone, others exclusively by text, and some offer a combination of different possible inputs. The who, where, how clarification is universal to all Digital Assistants currently available to the public. The most adopted Digital Assistants today are Apple’s Siri, which is installed in about one billion iPhones world wide (Townsend 2017), Google’s Google Assistant, Microsoft’s Cortana, Facebook’s M, Samsung’s Bixby and Amazon’s Alexa. As these assistants are based on machine learning/artificial intelligence, they will get to know their users very well over time. This allow for an immense personalized experience as these assistants are able to adapt

themselves to each individual user. With the increasing popularity og these digital assistants, we can expect an increasing amount of internet traffic coming from voice powered systems as well. In 2016, twenty percent of mobile search queries were made via voice (Molla 2017). This trend not only challenges the way we currently publish information and content online, but also the way we advertise on digital platforms, see: search engine marketing. As advertising is one of the most important sources of revenue for the majority of websites (excluding e-commerce sites) the successful implementation of advertisements in the voice-first interface world is critical for the free and open access to information online as we know today. ◆

2


Chapter 1: An introduction into the Voice-first world

VOICE-FIRST ADVERTISING Current state of advertising in voice-first interfaces

A

dvertising has yet to find its foothold in voice-first interfaces but is expected to completely disrupt the current pay-per-click online marketing business model we know today, (Forbes 2016). As of today, no paid placement of advertisements in any of the Google Assistant/ Google Home, Amazon Alexa or Apple Siri/ HomePod is yet possible. But with Sridhar Ramaswamy’s, Senior Vice president of advertising and commerce at Google, statement: “…one thing that we are all clear about is the days of three top text ads followed by ten organic results is a thing of the past in the voice first world,” (Forbes 2016), the current non-existent state of paid/ sponsored messages in Google’s Digital Assistant is expected to change. However only one attempt of ‘voice-advertisement’ in Google’s Google Home, Google Assistant, knowingly exist so far. The attempt has been named ‘The Disney case,’ in which information regarding Disney’s Beauty and the Beast was seamlessly blended into the “My Day” daily Google Home briefing. For background information, the configuration of “My day” in the Google Home system is a daily briefing consisting of weather, commute, calendar, reminders, and latest news information. Many user-made videos of the Google Home promoting Disney exist online, and the promotion was not met with a large acceptance by the users. Welch (2017) summarizes the case in brief as; when users asked Google Home, “How is my day,” the assistant would reply by the usually daily 3

summary (weather, commute, calendar, reminders) but implement the message “by the way, Disney’s live action Beauty and The Beast opens today… In this version of the story, Belle is the inventor instead of Maurice. That rings truer if you ask me. For some more movie fun, ask me something about Belle,” (Welch 2017), and then continued with the daily news update. This sudden interruption of the standard “My Day” update left the users feeling deprived by their system and caused users to turn to social media to complain. The vast amount of user complaints eventually made Google take down the Disney promotion and publish the statement: “this wasn’t intended to be an ad. What’s circulating online [user videos of the Google Home promoting Disney’s Beauty and the Beast red.] was a part of our My Day feature, where after providing helpful information about your day, we sometimes call out timely content. We’re continuing to experiment with new ways to surface unique content for users and we could have done better in this case,” (Welch 2017). Whether or not Google calls the information regarding the opening of Disney’s live action advertising or not, ‘The Disney case’ as explained above used a very similar ‘takeover-approach’ to how traditional radio advertisement takes over the flow of the radio program. This is standard procedure in the radio medium, and users/ listeners are aware and accept that ads will eventually interfere with the broadcast. ‘The Disney case’ tells us that users do not allow a transfer of ‘radio advertising practice’

such as interrupting the ‘broadcast,’ of the ‘My Day’ briefing in a voice-first interface. Users expect to be in control of their digital assistant and ‘intrusive’ messages such as ‘The Disney case’ will not be accepted. This raises the question of what is ‘voicefirst advertising’ actually, and where is the line between advertising and information? Or even better, what should voice-first advertising be? In this section, voice-first was explained as being the way humans interact with digital assistants, not a complete system by its own. We saw that these digital assistants are found in many ’homes’, from smartphones to computers and smart home speaker devices, and allow for multiple forms of interaction. Voice-first advertisements must respect the user’s quest for remaining in control of their system and any attempt to overrule user’s control will make the users turn on the promoted message. The following section will briefly introduce the term computer and how our understanding of a computer has evolved though time. Next, this paper takes a closer look at the three fields affecting the possible implementation of advertisement into voice-first interfaces; Human-Computer-Interaction, (HCI), Pervasive Advertising, (Pervasive Computing), and Marketing 4.0. ◆


ILLUSTRATION BY SANDY VAN HELDEN

4


Chapter 2: Three fields affecting the possible voice-first advertising implementation

Chapter introduction: The Computer

The word computer was first recorded in 1613 and referred to a person carrying out calculations or computations. The word kept the same meaning until the mid 20th century but beginning by the end of the 19th century the word began to take on a different meaning, describing a machine that carries out computations, (Oxford 1989). Computing, as a profession, was typical within the social sciences, statistical research, astronomy and ballistics testing from the late 19th century until the 1970’s. Langley Memorial Aeronautical Laboratory (LMAL), the main research center for the US National Advisory Committee for Aeronautics (NACA, the precursor to NASA) employed hundreds of human computers to carry out calculations from 1935 to 1970, (NASA 2017). It is an often-mistaken impression that much of the important HCI research and technological innovations occurred in the liberalized industry. Brad (1998) states that without the research done at universities and government research labs the technological improvements regarding computers as we have seen since the 1940’s would not have occurred as rapidly. Until World War II most 5

government research spending was invested in the department of agriculture, but the war changed everything, (Grudin 2012, p. 7). In 1941 at the University of Pennsylvania’s School of Engineering two students, Mauchly and Eckert met by chance and soon developed a shared vision to ‘make electricity think.’ Supported by the US army in their need to develop a machine capable of quickly calculating ballistic missile trajectories, Mauchly and Eckert invented ENIAC the world’s first computer that occupied 1,800 square feet and weighed 30 tons. Today ENIACs entire capabilities would be able to sit on a chip the size of a coin. For us to understand how computers and the way humans interact with computers have changed over time, this paper will now take a closer look at the Human-Computer-Interaction developments from the pre-digital-computer age, up until the birth of the internet. The followwing, HCI developments stated in this paper is a summary of the main conclusions of Jonathan Grudin’s (2012) publication: A Moving Target—The Evolution of Human-Computer Interaction. ◆


Chapter 2:

Human-ComputerInteraction

“human-computer-interaction, or HCI, is, put simply, the study of people, computer technology and the ways these influence each other. We study HCI to determine how we can make computer technology more usable by people.” And finally, Preece et al. (1994, p. 1) suggest: “human-computer-interaction (HCI) is about designing computer systems that support people so that they can carry out their activities productively and safely.” Baecker list all of the above definitions in his 2nd edition of Readings in Human-Computer-Interaction. Further references and historical insight regarding HCI can be found in the Account of European contributions by Brian Shackel (1997). HCI engineering history by Brad Myers (1998). History of metaphor in design by Alan Blackwell (2006). Banker and Kaufmann (2004) and Zhang, Scialdone, and Carey (2009) covers HCI research in management information systems. A review of the pre-digital history of information science is found in Rayward (1983; 1998), and Burke (1998) focuses on early digital efforts. The development of human and computer interaction.

The purpose of this section is to provide an overview of the state of current HCI research and then in more detail describe the HCI technological evolution through time. Towards the end, this section will focus on the Graphical User Interface and point out the limitation in current HCI and computer interfaces which result in a human to computer interaction gap due to the need for users to learn complicated commands. Finally, this section will end by focusing on the promising future voice-first technology has proved itself in the recognition of human emotions, resulting in a more natural form of Human-Computer-Interaction. Research in Human-Computer-Interaction (HCI) has been spectacularly successful and has fundamentally changed our

understanding of what a computer is, and what it is capable of doing, (Brad 1998). The blueprint for Human-Computer-Interaction was established by Ron Baecker in 1987 with his Readings in Human-Computer-Interaction. The second edition of his book was published in 1995 and was followed in 2003 by Richard Pew’s first edition of the Human-Computer-Interaction Handbook. During the period from 1987 to 1995 three definitions of HCI have defined how the academic community perceives HCI. Hewett et al. (1992, p. 5) state: “Human-computer-interaction is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.” Dix et al. (1993, p. xiii) provide the definition:

The evolution of how we understand computers today begins in the century before the development of the first digital computers, with what Grudin (2012, p. 4) calls Human-Tool-Interaction and Information Processing. During this time, two fields of research contributed to the development of HCI. One focused on making human-interface tools more efficient and the other on how to organize, represent, process and distribute information most effectively. From 1945–1955 an era of human-computer interaction occurred which Grudin (2012, p. 7) names managing vacuum tubes. Early computers, such as ENIAC, required entire crews to operate. The crew was divided into three groups managers, programmers, and operators. Managers would oversee the entire operation, distribute the outcome and specify which programs to be written. Programmers would write programs in combination with mathematicians, decomposing mathematical tasks into small components the computer would be capable of handling. And once written a program could take days to set up. Operators would set switches, insert punchcards and connect wires as the individual program required. Early human-computer-interaction was highly sophisticated, demanding 6


Chapter 2: Human-Computer-interaction

special skilled workers for the machine to function. From 1955–1965 transistors became the new technological breakthrough, (Grudin 2012, p. 8). Solid-state transistor based computers were still complex but more reliable than vacuum-tube computers, eliminating the need for engineers consistently maintaining the machine to operate. Transistor-based computers were still primarily used for scientific and research purposes and could only perform ‘batch processing,’ meaning a program would run from its beginning until it terminated as programs were still written on cards or tape. With transistors, researchers could perform before unimaginable calculations far more efficiently, but the human-computer-interaction was limited to operating the hardware, plan the programming, and use the output, Grudin (2012, p. 8). Although the abilities and powers of the machines in the transistor age were limited, the HCI research taking place doing this period has become the foundation for how we interact with computers today. Grudin (2012, p. 9) calls Ivan Sutherland’s 1963 Ph.D. thesis “maybe be the most influential document in the history of HCI.” As the first in history, Sutherland demonstrated computer graphics by being able to represent icons on a digital display. His findings not only had a determining impact on HCI twenty years after his discovery but is still one of the foundations for the way we interact with computers today. 1965–1980: HCI prior to personal computing, Grudin (2012, p. 11). The computers of this age still required a crew (managers, programmers, and operators), to function. Although these crews were significantly smaller than before, managers would still decide which programs programmers should write and operators would still punch the programs onto cards and load these into the computer. At this point, managers who received the paper printed computer outputs were called ‘computer users’ although they never interacted directly with the machine themselves, Grudin (2012, p. 11). 1980–1985: Discretionary use comes into focus, Grudin (2012, p. 16). A major shift in human-computer-interaction takes place at the beginning of the 1980’s with the introduction of the PC, following the success of the Graphical User Interface. Innovative products such as Apple’s Lisa, and Xerox Star by IBM start to disrupt the way humans interact with computers. The new machines, PCs, were designed for a greater interactive use and no longer required teams of skilled individuals 7

to be operated. By 1985 the Graphical User Interface becomes the dominant medium for how human and computers interact. Graphical user interfaces gave birth to the development of software and HCI was no longer restricted to hardware. To Grudin (2012) the most recent significant development in HCI occurred in the 1990s by the birth of the Internet. The Internet gave rise to a wave of new technologies, and the computer users were now able to communicate and access and share information in ways no-one just a few years earlier ever imagined possible. Grudin’s (2012) description of the HCI development makes it clear that the way humans interact with computers has dramatically changed from the introduction of ENIAC until today’s technologies. As HCI involves both computers and humans, the field of HCI is influenced by not only computer science but also communication science, phycology, linguistics, social sciences, and not to forget graphic and industrial design disciplines. The driving goal behind HCI is to increase the user experience, which has had an immediate impact on UX design. Since the 1980’s several design philosophies, have emerged and Norman (2013) presents six principles of user interface design to be taking into account doing all stages of any design process (visibility, feedback, affordance, mapping, constraint, and consistency). With the rise of software development, much of the current HCI focus has been directed towards the Graphical User Interface. The development of Graphical User Interfaces All Graphical User Interfaces we know today are based on the WIMP paradigm (Windows, Icons, Menus and Pointing devices), Porta (2007, p. 198). Porta (2007) categorize the current graphical user interfaces into four categories depending on the different input and or output they accept and produce. Multimedia user interfaces provide at least two various kinds of outputs. They focus on the media, e.g. sounds, graphic, text, etc. By combining at least visual elements with text, the vast majority of recently developed graphical interfaces all exploit some sort of multimedia. Preceptive user interfaces try to acquire and decode explicit and implicit information about the user and his or her’s surroundings, thereby providing the computer with perceived ca-

pabilities. With Preceptive user interfaces, computers become able to “hear,” and “see,” etc. Multimodal user interfaces allow for multiple forms of input e.g. gestures, voice, etc. Combined the Multimedia, Preceptive, and Multimodal make up the Perceptual user interfaces (PUIs), which integrates all three categories abilities to provide users with a more natural way of interaction. The aim of PUIs is making machines aware of their environment and able to sense the people interacting with them, projecting the computer’s output in a visual form. As computer use become an increasingly larger part of all aspects of our life, interface research is currently moving towards new directions, Porta (2007). According to Turk and Robertson (2000), WIMP will most certainly not match all the expectations and use of computers in the future. In a time where computers find their way into electronics and products like never before, the need for new interfaces outside WIMP increase, (Porta 2007, p. 198). We might expect that with the continued employment of computers into everyday appliances and industrial machinery, computers are encouraged to adapt communicational abilities which until yet only have characterized human-to-human interaction. The expert knowledge required to operate the early age computers was dramatically higher compared to the knowledge which is needed for operating computers today. Although today’s hardware and software has become more intuitive for humans to use, humans still have to learn and understand commands to interact with computers. Regardless of technology or interface, modern human and computer interaction (HCI) still primarily takes on an explicit command-answer based level. In contrast, Human-to-human interaction according to Scherer et al. (2011) is multifaceted consisting of: “manifold interactive feedback loops between interlocutors, comprising social components (e.g. display rules, social state), moods, feelings, personal goals, nonverbal and paralinguistic conversation channels and more,” (Scherer et al. 2011, p. 1). This leaves a interaction-gap between human-to-human interaction and human-to-computer interaction. The interaction-gap is a significant issue for HCI as users often expect to be able to interact with the machine, computer, the way they interact with humans, (Hura 2008, p. 197). Even though voice-first interfaces have become more advanced over the last years in decoding and


understanding human language, Google claims their Google Assistant is capable of understanding human language with 95 percent accuracy (Glaser 2017). Unfortunately, no voice-first interface, or speech recognition engine, is yet fully capable of decoding full human intention. This is mainly due to human’s common combined use of verbal and nonverbal elements of communication in transmitting their message. Thereby, not the complete meaning of the message is carried by the words spoken alone; some information is conveyed implicitly instead of explicitly. For example, when Julian heard someone knocking on his door he asked: “Who is there.” Supposed the person knocking on the door, Emma is Julian’s girlfriend. Instead of saying “This is Emma,” Emma would answer “It is me.” Because Julian knows the features of Emma’s voice, Julian will be able to identify Emma immediately without additional clarification. This type of implicit human communication proves, to build voice-first interfaces capable of interacting with people so naturally as people themselves (natural dialog), computers must learn to understand all aspects of human communication including both explicit and implicit cues. As technology and HCI research is advancing, human and computer interaction is becoming more multifaceted, attempting to close the interaction-gap by developing new technologies outside or in addition to the WIMP and Graphical User Interface environment. In order to close the interaction-gap, it is crucial for computers to be able to understand its user’s affective state as well as the recognition of social signals which are composed of multiple behavioral cues (Vinciarelli, Pantic and Bourlard 2009), as demonstrated in the example above. Most current research into the recognition of human emotions focuses on the recognition of the so-called ‘big six’ introduced by Darwin (1978) and Ekman (1993). Speech Emotion Recognition (SER) is one of the emerging fields of human-computer interaction. The field of emotional cues have been studied since the 1950’s but only recently experienced great progress (Vogt, Andre and Wagner 2008., Zeng, Roisman and Huang 2009), mainly due to; technological improvements in recording, storing, and processing audio and visual information; wearable computers; non-intrusive sensor developments, and the urge to evolve human-computer interface from point-and-click to sense-and-feel, (Ramakrishnan and El Emary 2011). The

understanding of emotions has always been essential in human-to-human communication. Emotion-oriented computing aims at the emotional recognition and amalgamation of speech, facial expression, or any other biological channel. Letting computer’s capabilities to identify human emotions become ‘more human’. Research regarding computer-automated recognition of emotions in facial expressions is widely studied and very rich (Chibelushi, Deravi and Mason 2002., Elwakdy, Elsehely, Eltokhy and Elhennawy 2008., Grimm, Kroschel and Narayanan 2008., Ranjan 2010). As the computer-automated recognition of emotions in facial expressions is complex and requires high-resolution cameras to be accurate, this technology remains profoundly expensive. Speech instead of facial expression has consequently been proven more promising in recognizing human emotions, (Ramakrishnan and El Emary 2011). Speech Emotion Recognition is an important subject for the future of human and computer interaction, as speech is the fundamental form of human communication. According to Ramakrishnan and El Emary (2011), the aim of Speech Emotional Recognition is to establish a very natural interaction between humans and computers by not only making computers able to understand words but more subtle cues such as emotions, feelings and mood coded within the spoken language. Summary. In this section, we have seen that HCI is a study of how humans interact with computers. HCI involves the design, evaluation, and implementation of interactive computing systems that support people so that they can carry out their activities productively and safely. We learned that two fields of research have been major contributors to the development of HCI. One focusing on making humans use tools more efficient and the other on how to organize, represent, process and distribute information most effectively. Two technological revolutions, PCs with the graphical user interface and the Internet, have been the most impactful modern events in changing the way humans interact with computers. We further saw that all graphical user interfaces we use today are based on the WIMP paradigm (Windows, Icos, Menus and Pointing devices), but as technology continues to develop WIMP will most certainly not

match the expectations and use of computers in the future. Even though computers today are significantly easier to operate in comparison to the first mainframe computers, current human and computer interaction (HCI) still mainly takes place by explicit interaction (command-answer based), whereas human to human interaction is multifaceted, consisting of explicit and implicit communication, creating an interaction-gap. To close this gap it is crucial for computers to be able to understand its user’s affective state as well as the recognition of social signals which are composed of multiple behavioral cues. Speech Emotion Recognition technology has been proved especially useful in decoding such cues. Speech Emotional Recognition aims to establish a very natural interaction between humans and computers, where computers understand not only words and language, but also intentions implicitly coded within the vocal communication. Voice-First interfaces are thereby a promising technology for computers to adapt human-to-human communication abilities and thereby ‘become more human,’ in the way they respond to human inputs. Ever since the introduction of computers they have increasingly become smaller and more powerful, and all signs tell that they will continue to do so. Computers and the Internet have fundamentally changed our everyday life. As we will see in the next section, the future of computing is going to be ‘invisible,’ integrating computers into almost all aspects of our everyday life. ◆

8


Chapter 2:

Pervasive computing

T

his section introduces the idea of pervasive computing and idea of ’technology that disappears.’ The purpose is to briefly present the ideas behind the technology and further describe the three core principles of pervasive computing. At the end of this section, the concept of pervasive advertising as an outgrowth of pervasive computer technology will be introduced and analyzed in more detail. Pervasive computing is a concept of computing introduced by Marc Weiser in 1991. In his paper The computer of the 21st century published in Scientific American, Weiser (1991) write “My colleagues and I at the Xerox Palo Alto Research Center think that the idea of a ”personal” computer itself is misplaced and that the vision of laptop machines, dynabooks and ”knowledge navigators” is only a transitional step toward achieving the real potential of information technology. Such machines cannot truly make computing an integral, invisible part of people’s lives. We are therefore trying to conceive a new way of thinking about computers, one that takes into account the human world and allows the computers themselves to vanish into the background,” (Weiser 1991, p. 94). He concludes his paper with the statement: “specialized elements of hardware and software, connected by wires, radio waves and infrared, will be so ubiquitous that no one will notice their presence…The most profound technologies are those that disappear. They weave themselves into the fabrics of everyday life until they are indistinguishable from it,” (Weiser 1991, p. 94). Schmidt et al. later defines Pervasive Computing as: “Pervasive computing describes the trend that connected computational devices become interwoven with artifacts in our everyday life. Hence processing, sensing, acti9

vation, and communication are embedded into devices and environments, making computing an integral part of our daily life,” (Schmidt et al. 2008, p. 9). With his paper, Weiser (1991) planted the seeds of the pervasive computing idea which by today has been widely accepted by the technology industry. Most profoundly, the pervasive computing technology has shown itself with the rise of the IoT (Internet of Things) and now also voice-first controlled smart home speakers such as the Google Home, Amazon Echo/ Alexa, and Apple’s HomePod. As we saw in the section about the development of human and computer interaction, our understanding and the way we interact with computers have changed through time. Weiser et al. (1998) categorize the development in HCI into three main eras. First, the mainframe era, beginning by the 1940’s with the development of ENIAC and is defined with the massive machines needing entire crews to operate. Computer power was a rare resource during this time, and only well-funded organizations had the financial requirement to acquire and operate these machines. Mainframe computers are still in operation today (e.g. supercomputers needed for calculating advanced simulations and weather forecast) but have widely been replaced by PCs. By the early 1980’s the personal computing era arises, computers became individualized, considerably smaller and easier to operate. With the arrival of the internet, computers and machines became interconnected, creating a distributed medium. Weiser et al. (1998) describes the time we currently live in as the pervasive computing era, characterized by the ever-increasing mass computer deployment, from the hundreds of computers we access by browsing online to the

deep integration of computers into devices from lightbulbs to cars to toasters to clothes — literally everything. The Pervasive Computing Concept Müller et al. (2011) mentions five key technologies (processing, storage, networking, sensors, actuators) which are crucial prerequisites to pervasive computing. For a detailed description of each technology see Müller et al. (2011), p. 14-17. The technological advancements in processing power, storage space, networking abilities, sensors, and actuators make possible the three core principles of pervasive computing: automation, interactivity, and ubiquity. Automation: beginning with the industrial revolution many work processes has been automated, and this trend continues today were increasing machine operations are becoming fully computer-controlled, going beyond what is understood by mechanization (Müller et al. 2011, p. 17). Due to automatization in the producing industry, the price of products entering the market has overwhelmingly decreased over the last couple of decades. With the latest technological advancements, automatization is no longer restricted to manufacturing, but have found its way into almost every industry: telecommunications (telephone switchboards), finance (ATMs and online banking), and even creative industries such as advertising with services like Google AdWords and real-time-bidding advertising engines. From a computer scientist’s point of view, the ultimate computer automatization is the development of artificial intelligence (AI) capable of emulating all human cognitive capabilities (strong AI). As strong


AI has proven itself highly sophisticated, scientists and developers today focus on developing technologies capable of solving only specific problems (weak AI). With continuing developments in processing, storage, networking, sensors, and actuators technologies, the possible amount of computer automation is entirely limited to the scientist and developer’s imagination. With personal digital assistants, automatization reaches far beyond the commercial industries, infiltrating phases of personal everyday life. Interactivity, the second principle of pervasive computing is concerned with multiple sensors and processors ability to connect and share data with one and another. It further focuses on the interaction between human and machine also called HCI (Human-Computer-Interaction) as we learned about in the previous section. Pervasive computing interactivity mainly focuses on eliminating the interaction-gap, see above, between Human-Computer-Interaction and Human-to-Human-Interaction, by utilizing the processing, storage, networking, sensors, actuators technologies to develop machines abilities to understand and decode human intentions through sharing immense amounts of data. Pervasive computing interactivity focuses on both implicit interaction and explicit interaction. Schmidt defines implicit interaction as “an action, performed by the user that is not primarily aimed to interact with a computerized system but which such a system understands as input,” Schmidt (2000). Pervasive computing interactivity aims at understanding the user’s affective state as well as the recognition of social signals which we learned in the previous section is composed of multiple behavioral cues. The pervasive computing ideology thereby contributes to making computers ‘more human’. As Speech Emotion Recognition has been proved suitable for decoding human emotions, this paper predicts the combination of pervasive computing with speech-emotion-recognition (SER) technology the most promising technology for developing a stronger and more human-like form of human-computer interaction. Ubiquity, the last principle of pervasive computing allows people at any time or place to instantly and without trouble to get in contact with one another. It further focuses on the implementation of multiple sensors and technologies into individual devices, for example, the integration of cameras, movement sensors, GPS and microphone into mobile phone devices (smartphones), as

well as the linking of TV to the internet so that this technology no longer is restricted to a specific location. More and more processors and sensors are being implemented into everyday appliances, toys, clothes, and tools. This integration allows users to easily connect the virtual world with the physical world making new forms of interaction and communication possible.

PERVASIVE COMPUTING

Pervasive Advertising Following the introduction of pervasive computing ideology (Weiser 1991), a few marketing and advertising researchers, as well as professionals, have begun to pay attention to what opportunities this new technology affords marketing and advertising. The web and digital technologies once revolutionized advertising, providing a richer channel for reaching and interacting with the target group than any former media ever did. However, as more and more advertisers have adopted the online medium for placing their advertising messages, increasing amounts of advertisements bring products or services that are more or less, if even, relevant to the consumers, (Bublitz et al. 2013a, p. 290). Ferdinando et al. (2009) state that the only way to avoid irrelevant advertisements is through personalization. After decades of research and development, the pervasive computing technologies described above are now finally ready for reshaping the way humans and computers interact. Increasing numbers of marketing and or advertising researchers or professionals believe pervasive computing is going to revolutionized advertising once more. Ranganathan and Campbe (2002) write: “One of the reasons for the high effectiveness of online advertising is that users interact with the web at a far more personal and intimate level than they interact with other advertising media like the radio or television. Pervasive computing environments deal with users at an even more intimate level; hence such environments are even better advertising platforms than the web. Pervasive environments allow the delivery of relevant advertising in suitable ways to selected consumers,” (Ranganathan and Campbe 2002, p. 10). Bublitz et al. (2013a) supports Ranganathan and Campbe’s (2002) statement and write: “pervasive advertising stands out from other forms of ad serving by using contextual information of consumers,” (Bublitz et al. 2013a, p. 290).

In other words, pervasive advertising allows for an ever deeper and better contextual personalization of advertisements, with the understanding of context as being the knowledge of the situation, interest, and activities of the potential customer, (Carrara and Orsi 2011). According to Bublitz et al. (2013a, p. 290), pervasive advertising research has two main challenges: first delivering the right ad at the right time and in the best way possible. And secondly, finding the consumers’ context without being invasive. Current studies in this field, see Mostafa 2013, Bublitz et al. 2013b, and Kostakos and Ojala 2013, declare a lack of knowledge in finding the perfect balance between what the consumers need and the objective of the advertisers still exist. All three publications agree that the field of pervasive advertising has seen a significant advancement. Kotler and Keller (2008) define advertising as: “any paid form of non-personal presentation and promotion of ideas, goods, or services by an identified sponsor.” Pervasive computing represents the implementation of sensors and processors into everyday objects, emerging technological features so smoothly to the users that they become ‘technology that disappears.’ With this understanding Müller et al. (2011) define pervasive advertising as: “the use of pervasive computing technologies for advertising purposes,” (Müller et al. 2011, p. 20). The goal of traditional advertising has always been to transmit information, evoke emotions and trigger actions. The three principles of pervasive computing (automation, interactivity, and ubiquity) make this technology especially capable of reaching the advertising goals. Different researchers have published different hypotheses for how pervasive computing may affect advertising, Müller 2011 writes: “it is our belief that advertising will be the business model that drives pervasive computing,” (Müller et al. 2011, p. 20). Müller et al. (2011) propose pervasive computing has potential to change advertising in six particular ways: symmetric communication, long tail, experiences, personalization, audience measurement, and automated persuasion. The following description of symmetric communication, long tail, experiences, personalization, audience measurement, and automated persuasion is a summary of the full descriptions published in Pervasive Advertising, by Müller et al. (2011, p. 22-25).

10


Chapter 2: Pervasive Computing

Symmetric communication. “Power to the people.” Classical advertising can be classified as an unidirectional communication model with a mass media approach where a few advertisers are distributing their messages to a large audience. The outcome of this approach is an asymmetrical distribution of power where advertisers are in control of which ads are being displayed to whom and where. For some receivers this asymmetrical distribution of power construct a feeling of being controlled by advertisers and their best armor of resistance, is to ignore, protest against, or vandalize this type of advertising. As pervasive advertising is interactive it offers a very interesting opportunity to transfer a significant amount of power from the advertisers to the audience. This is fundamentally opposed to the unidirectional communication and allows the audience to communicate and give feedback directly to the advertiser and among other receivers. In this pervasive advertising universe companies must treat their audience as equal partners. This can benefit both parts as a closer bond between the two is being established. Eventually, symmetric communication may lead to a democratization of advertising. The Long Tail. “Me, too.” As we saw earlier, one of the main principles of pervasive computing is automation. As advertising adopts this technology, many of the current processes required in the production of advertisements will also become automated. This will significantly lower the costs and efforts required in producing individualized campaigns. We already see this automatization in online advertising with bots continuously optimizing advertorial text and graphical elements to generate the best possible outcome. Starting a campaign with pervasive advertising may eventually be as easy as filling out an online form and the price may be only a few dollars, Müller et al. (2011, p. 23). This affordable price enables tiny companies to launch their own local advertising campaigns. This is an extremely advocating factor for pervasive advertising as not only big corporations are interested in advertising in promoting their product and services. Small local restaurants and market stalls must also encourage their business, and the likely low price of running pervasive advertising campaign might be exactly what these businesses, as 11

well as big corporations, are looking for. Experiences. “The wow Effect.” As computers integrate an increasing amount of objects surrounding all aspects of everyday life, pervasive computing offers a compelling medium that can respond to all human senses. Norman (2013, p. 50) suggest three levels of processing; visceral, behavioral, and reflective, which according to Müller et al. (2011, p. 23) can be translated into three levels of interactive computer systems. The first level, visceral, can be described as the “first impression” a specific technology is making and are often based

on visual impressions, (Müller et al. 2011, p. 23). The second level, behavioral, relates to the look and feel of a given technology and can be described as “how it feels.” The highest level, reflective, is the home of conscious cognition: what we think others think about us using a specific technology. Müller et al. (2011, p. 23) claims that most traditional advertising is not interactive and therefore not surpassing more than Norman’s lowest visceral level. This is a big issue for traditional advertising as all three levels of processing work together in determining a person’s like or dislike of a product or service, (Norman 2013, p. 54). Pervasive advertising has the ability to address all three levels, by having a look and feel and at the


same time make receivers reflect when they interact with the ads. This ability makes pervasive advertising superior to traditional advertising. The pervasive advertising experience, addressing all three levels of processing can follow and surprise the user at any given time. And as pervasive advertising is digital it is overwhelmingly easy to produce new advertisement experiences. All of these factors put together makes pervasive advertising repeatedly able to produce a ‘wow effect.’ Personalization and Context Adaptivity “Just for Me, Just for Now.” Personalization and context adaptivity is at the core of pervasive computing making it a natural tool for advertising. Traditional advertising personalization is based on basic demographics, clustering people into segment groups. Pervasive computing opens up a whole new dimension of personalization by measuring all kind of conditions. This information is collected by the many sensors and processors, allowing advertisers to build very precise user profiles, and letting advertisements adapt to the user’s context, identified by the pervasive computing system. Also, personalization may no longer be restricted to specific user data such as previous search history or interest stated on various social media. The many interconnected sensors in the pervasive comping system open up for advanced emotion recognition technology. Home Voice-First assistant systems constantly listening/ monitoring the environment such as Google Home, Amazon Alexa or Apple HomePod, will by Speech Emotion Recognition technology eventually be able to, very precisely identify human emotions implicitly embedded into our vocal expressions. Imagine this ‘emotional-data’ as basic for advertising personalization where for example individuals identified as hungry, or even ‘hangry,’ will be shown advertisements for food or snack products or services. Audience Measurement “Did you see me?” All advertising campaigns are driven by goals which must have specific, measurable factors in order to determine the campaign’s success. In online advertising, KPIs are widely used as an indicator for campaign performance. The web provided online advertising unique abilities to measure campaign success compared to traditional

advertisements by the easy measurement of clicks and ad views. Tools like Google Analytics have made campaign measurements highly transparent allowing advertisers to optimize campaigns based on live data. Pervasive advertising allows measurements of data exceeding the collection of clicks or views. For instance, User Behavior: whether a certain audience saw an advertisement can be measured by computer vision and face detection. As technology progress, eye tracking technologies will be cheap enough to be widely integrated into our electronic devices in a similar way iris scanners are currently becoming a greater standard in smartphones. These technologies, in combination with artificial intelligence will allow advertisers to set extremely specific goals for advertisements, for example, 60% of females with blonde hair in the age between 35 to 50 should have read the first text line of this ad promoting shampoo. As a result of the increased measurement possibilities, pervasive computing allows pervasive advertising to rapidly identify which aspects of an ad the audience prefers and adapt the campaign accordingly, or in a best-case scenario, pervasive computing automatization will do all of the adjustments on its own to provide the best outcome. Automated Persuasion “Wouldn’t You Like This?” Fogg (2002) defines persuasive technology as: “using computers to change what we think and do.” Regarding computers’ ability to persuade humans IJsselsteijn et al. (2006) write, based on Fogg’s (2002) findings: “computers can have a number of distinct advantages over human persuaders. They can be more persistent (irritatingly so, in fact), they can allow anonymity (useful in cases where sensitive issues are at play), they can access and control a virtually unlimited store of data (retrieving exactly the right nuggets of information at the right time), and they can use many modalities (text, audio and video clips, rich graphics and animations, etc.) to create a seamless and convincing experience. Moreover, successful pieces of persuasive software can be easily replicated and distributed (addressing large numbers of people at the same time), and, with computers becoming increasingly ubiquitous and embedded, persuasive [persuasive pervasive] technology may gain access to areas where human persuaders would not be welcomed (e.g., bedroom, bathroom) or are physically unable to go (e.g., inside clothing or household appliances),” (IJsselsteijn et al.

2006, p. 1-2). There is no doubt that automated persuasion is highly interesting for advertising purposes. However persuasive technology is not only limited to advertising but can, for example, help people quit smoking, lose weight or any other type of behavior changes wanted by an individual. For a technological system to persuade humans, the system must be perceived as credible and trustworthy. As advertising wishes to change certain behavior or attitudes towards a specific product or service, automated persuasion is well suited for advertising purposes. Summery With pervasive computing, Marc Weiser (1991) introduced a new concept of computing. This section has explained how Weiser’s idea of ‘technology that disappears’ describe the trend that connected computational devices, sensors and processors, become integrated into everyday devices and appliances, from lightbulbs to cars to toasters to clothes — literally everything. We have seen that three core principles (automation, interactivity, and ubiquity) of pervasive computing exist. Automation describes the growing computer-controlled machine operation as well as aspects of society outside the manufacturing industries such as telecommunications, banking, and advertising. Interactivity describes the ability for sensors and processors to share data with one another and use this interactive connectivity to decode human intention. Lastly, ubiquity is the vast implementation of multiple sensors into individual devices. Pervasive advertising uses the three principles of pervasive computing (automation, interactivity, and ubiquity) to reach the advertising goals of transmitting information, evoke emotions and trigger action. The pervasive advertising concept proposed by Müller et al. (2011) suggest pervasive advertising will change traditional digital advertising in six particular ways: symmetric communication, long tail, experiences, personalization, audience measurement, and automated persuasion creating far more relevant advertisements by utilizing the data from the pervasive computing system. ◆

12


Chapter 2:

Marketing 4.0

T

he last field influencing the possible implementations of advertising into voice-first interfaces which this paper covers is the newly introduced Marketing 4.0 concept by Kotler et al. (2017). As we saw in the two earlier sections, voice-first advertising will be highly affected by technological progress (HCI and Pervasive Computing), but voice-first advertising is also subject to influence by progress in marketing research as advertising is part of an organization’s larger marketing operation. Traditionally, the standard of advertising and marketing research, what Kotler (2010, p. 6) describes as Marketing 1.0 (product-centric) and Marketing 2.0 (consumer-centric), served the pre-internet-advertising-industry well but is incomplete in a voice-first advertising context. Müller et al. (2011, p. 11) state that the study of digitalization in marketing and advertising research is lagging behind 13

other research-based fields. Implicitly, traditional approaches to advertising all assume a unidirectional communication where advertising is something marketers ‘do to people,’ (Pavlou and Stewart 2001, p. 218). This unidirectional idea of advertising is also the foundation on which classical marketing theories, such as the much accepted AIDA (attention, interest, desire, action) model is based on. In 2008, Kotler et al. (2008) briefly described what we today understand as digital marketing by the concept of online marketing being an ‘extension of direct marketing.’ This statement by Kotler et al. (2008) indicate, that both marketing and advertising researchers and professionals did not see digital marketing as a key element of companies’ marketing operations in 2008, but only as an extension to already existing direct marketing operations. Since then, marketing and advertising researchers have attempted to adapt themselves to the digital

era but failed to keep up with the speed by which technological innovations are being adopted by consumers. Many consumers today perceive advertisements with a traditional approach (Marketing 1.0 and Marketing 2.0) as spam, recent research from multiple industries have proved that today’s consumers believe more in the f-factor (families, friends, Twitter followers and Facebook fans) than in marketing communication, (Kotler 2017, p. 12). Today’s relationship-driven world, powered by social media and mobile connectivity, is starting to fundamental challenge these long-accepted traditional concepts of marketing research, and thereby also advertising. Marketing 4.0 is what Kotler et al. (2017) suggest, will influence the next ◆ generation of marketing and advertising. In their book: Marketing 4.0 Moving from Traditional to Digital, Kotler et al. (2017) describe how the power structures we know today is under dramatic


change, affecting not only marketing but also macro levels of society. Largely thanks to the internet and social media, Kotler et al. (2017) argues that powers are now shifting in three ways. From exclusive to inclusive; on a macro level; national powers are shifting from the ‘old’ exclusive superpowers such as the European Union and the United States to Asia. As Asia’s economic influence grows, the Western world’s political powers are in decline, military powers are being replaced by a different inclusive approach of economic support and diplomacy, (Kotler et al. 2017, p. 8). On a micro level, the online world of social media has influenced humans to embrace social inclusivity. Humans now interact with each other across borders, enabling people to build relationships without regard to geographical and demographical boundaries. Powers are also shifting from vertical to horizontal. Globalization is leveling the international playing field. Companies’ competitiveness is no longer determined by their size, skills, location or past advantages. Globalization makes it easier for small and young companies to compete with big market players. In marketing, this trend is seen by the market shifting away from high-volume mainstream brands towards low-volume niche brands, (Kotler et al. 2017, p. 11). Kotler et al. (2017) describe, that in the horizontal context, brands should not treat customers as targets but consider them peers and friends of the brand instead. The last shift in power Kotler et al. (2017) describe as; from individual to social. Companies used to have control over their marketing communication in a horizontal manner, communicating with consumers individually. When making purchase decisions, regardless of low- or high-involvement products, individual preferences used to be superior to social conformity. Hence: these two factors(individual preferences and social conformity) variates in importance from one person to another. In our connected world consumers care more and more about the opinions of others. Product ratings and reviews are looked up online instantly, and consumers own opinion can be shared just as easily. The social aspect of the connected society allows a more intense consumers corroboration, building a crow based picture of the company rather than adopting the image the brand intends to project, (Kotler et al. 2017, p. 13). In a social world, companies must have honest claims and strong reputations to achieve social confirmation as it is almost impossible to hide

flaws and fake claims, (Kotler et al. 2017, p. 14). It is important for marketers to reflect this new horizontal, inclusive, and social market landscape to build brands and marketing and advertising strategies for the digital economy. Kotler et al. (2017, p. xvii) describe Marketing 4.0 as: “a deepening and a broadening of human-centric marketing [Marketing 3.0] to cover every aspect of the customer’s [digital] journey.” Marketing 3.0 is the development of the former introduced product-centric marketing (Marketing 1.0), and customer-centric marketing (Marketing 2.0). The above-stated description of Marketing 4.0 clarifies that Kotler et al. (2017) still believe human-centric marketing (Marketing 3.0), will be the key in differentiating companies and their offers in the digital era. The authors do however state that a lot has happened, especially in technological advancements, since the introduction of Marketing 3.0 and thereby needs adjustments, Kotler et al. (2017, p. xvi). Marketing 4.0 can be described as an outgrowth of Marketing 3.0, sharing the same human-centric idea but paying significant attention to the aspects of the customer’s digital journey. The authors write: “In human-centric marketing, marketers approach customers as whole human beings with minds, hearts, and spirits. Marketers fulfill not only customers’ functional and emotional needs but also addresses their latent anxieties and desires… Marketers must adapt to this new reality and create brands that behave like humans — approachable and likable but also vulnerable. Brands should become less intimidating. They should become authentic and honest, admit their flaws, and stop trying to seem perfect. Driven by core values, human-centric brands treat customers as friends, becoming an integral part of their lifestyle,” (Kotler et al. (2017, p 109). By moving from traditional to digital marketing Kotler et al. (2017) claim four radical changes to marketing will occur. As first Kotler et al. (2017) claim marketing will change from segmentation and targeting to customer confirmation. They reason their claim by stating how the segmentation and targeting of customers which allow for an efficient allocation of corporate resources exemplify a vertical relationship between companies and customers. Kotler et al. (2017, p. 47) argue that in today’s digital economy where

consumers in significant numbers develop horizontal social connections with other consumers, communities are now the new segments. Communities differentiate from segments by being formed directly by consumers within boundaries that they, not the company, define. As these communities are self-organized, they are immune to spamming and irrelevant advertising, and will, in fact, reject a company’s attempt to force its way into these communities, Kotler et al. (2017, p. 48). Permission marketing is a concept introduced by Seth Godin and resolves around the idea of companies ‘asking for permission’ to deliver their messages. The idea of permission marketing is a promising approach for advertising practice in a world of customer confirmation, digital permission marketing must be authentic, carrying core values, with minds, hearts, and spirits, and have a sincere desire to help customers including acting as a true friend. Kotler et al. (2017) further claim digital marketing will change marketing from brand positioning and differentiation to brand clarification of characters and codes. Traditionally what distinguished one company’s offerings from its competitors was its brand positioning. With the high transparency within the digital economy (particularly due to the rise of social media) companies can no longer get away with communicating false unverifiable promises. The vast possibilities for communication offered by the internet and social media allow companies to position themselves as anything they like, but Kotler et al. (2017, p. 49) state that unless a community-driven consensus of the brand positioning is established, such positionings are no more than a corporate posture. Kotler et al. (2017, p. 49) state that the traditional marketing practice of continuously communicating the same brand identity and positioning may no longer be enough. Kotler et al. (2017) thereby criticize the concept of brand positioning, one of the core aspects of marketing communication, as no longer being relevant enough for the digital economy. Kotler et al. (2017, p. 49) argue that disruptive technologies, dominating the terms of the digital market, force brands to constantly adapt themselves. Brands must be constantly dynamic, not only carry and communicate one predetermined position. For example, a position as the always serious brand. Just 14


Chapter 2: Marketing 4.0

like humans act differently when exposed to different situations, brands must as well. Instead of a consistent brand positioning, Kotler et al. (2017) propose consistent brand character and codes. A brand character is described as the brand’s raison d’êrte, the authentic reason for its existence, Kotler et al. (2017, p. 49). It is thereby clear that the traditional brand positioning approach is too superficial to Kotler et al. (2017) in the digital economy. In the digital world brands and companies should be ‘something bigger’ than a positioning. They should have a core purpose for their existence and when that remains true to its roots the outer imagery, perceived position, of the company or brand, is flexible enough to adapt itself to any future challenges. Another aspect of traditional marketing digital marketing is going to change will, according to Kotler et al. (2017), be the change from selling the four P’s to commercializing the four C’s. The traditional tool introduced by McCarthy called the marking mix; product, price, place, promotion, has been used by companies for various years. The classical way of how companies plan what to produce,

set their prices, distribute their products and finally promote their offerings is fundamentally being challenged by a stronger consumer participation comprehended in the connected world. The marketing mix, the four P’s, must be redefined as the four C’s (co-creation, currency, communal activation, and conversation) to fit increasing consumer participation in the connected world, (Kotler et al. 2017, p. 50). Hence: the original idea of the four C’s was first published by Lauterborn (1990) but has been edited by Kotler et al. (2017). According to Kotler et al. (2017) in the digital world, co-creation is the new product development strategy. Dynamic pricing, based on demand and market capacity, is the new currency for optimizing companies profitability. Communal activation is the new channels in which consumers demand almost instant access to product and services. And conversations are the new two-sided affair of digital promotion allowing receivers to respond to communicated messages instantly. The last aspect of traditional marketing Kotler et al. (2017) state digital marketing is going to effect is the change

from customer service processes to collaborative customer care. Kotler et al. (2017) describe this change by traditional marketing approaches’ treatment of customers as targets chaining into digital marketing’s treatment of customers as equals. By this customer-care approach, companies express their genuine concern for the customers by listening, responding and consistently following through on terms collaborative created by both the company and the customer, (Kotler et al. 2017, p. 52). It is thereby clear that Kotler et al. (2017) believe the key for successful customer-care in the connected world is collaboration. For more information and in-depth explanations regarding the changes; from segmentation and targeting to customer confirmation, from brand positioning and differentiation to brand clarification of characters and codes, from selling the four P’s to commercializing the four C’s, and from customer service processes to collaborative customer care, see Kotler et al. (2017, p. 47-52).

Summary Marketing 4.0 is described as a deepening and a broadening of the human-centric marketing approach with a much stronger focus on the digital economy. We saw that powers in today’s world are shifting from exclusive to inclusive, from vertical to horizontal, and from individual to social challenging not only marketing aspect of our world but also macro levels of our society. Kotler et al. (2017) claim these shifts in power will change marketing in the following ways: marketing will change from segmentation and targeting to customer confirmation, from brand positioning and

15

differentiation to brand clarification of characters and codes, from selling the four P’s to commercializing the four C’s, from customer service processes to collaborative customer care. By this change in marketing, advertisements and brands in the digital world should ‘become more human’ by being likable but also vulnerable. They should be driven by core values and be authentic and honest ‘creatures’. In Marketing 4.0 brands and advertisements should no longer treat consumers as targets or segments, but as friends, establishing relationships and become an integral part of the consumers’ lifestyle. ◆


ILLUSTRATION BY SANDY VAN HELDEN

16


Chapter 3:

Restrictions and grounds rules for voice first advertising acceptance

T

his section introduces restrictions — ground rules — voice first advertising must obey for the users to accept the presence of advertisements in their voice-first interface. Three restrictions: control, privacy, and ethics are particularly pertinent when it comes to voice-first advertising. Ristrictions and gorund rules

Control It has always been human nature to exercise control over their environment by making choices. Such choices include complex and emotionally important decisions which may only take place once a lifetime, for instance, what university to attend, and also basic intuitive decisions which occur many times doing each day, e.g. deciding where to focus your visual attention. The WIMP paradigm is an interface design engineered around human’s desire for decision making. Windows, icons, menus and pointing devices allow the users of the interface to always remain in control. An example is the close button almost always visibly indicated in the top corner of a computer program, allowing the user to shut down any computer operation immediately. This type of computer interface design has led to computer-users’ clear expectation of always being in control of their computer and its operations. Voice-first interfaces are not based on the WIMP paradigm. When a system design such as voice first interface takes away the buttons, the icons and the physical input controls, it must operate in a way in which users still feel they are in power of the system. ‘The Disney Case’ clearly demonstrate the negative consequences of taking away the users demand of being in control of their system. It proves that the 17

users of voice-first interfaces do not allow for advertisements to interrupt the ‘broadcast.’ Thereby the statement can be made that the techniques and tactics of traditional advertisements (e.g. radio sports), as well as digital advertising (e.g. online banners), are not suitable in a voice-first world as they, in their current form, cannot be implemented into the voice-first medium while simultaneously letting users remain in control. Advertisements must be redesigned to let users stay in control to fit the new voice-first medium. The change in power from vertical to horizontal presented by Kotler et al. (2017) support this statement with their argument of in the horizontal-world brands, and thereby also advertising, should stop treating consumers as pure targets but adopt a greater human-centric approach and start to treat consumers like friends of the brand. The artificial intelligence technology powering today’s voice-first interfaces allow for great automatization and the machine learning technology behind AI further allow the voice-first interface technology to get to know the user very well. Establishing a ‘friendship’ between the user and the advertisement is thereby from a technical point of view not impossible. But as AI becomes smarter and capable of more complex problem-solving advertisers must remember always to respect users desire for remaining in control. No-one like to feel forced to do, think, or act in a certain way.

Ristrictions and gorund rules

Privacy One of the main concerns regarding digital advertising is the privacy of the users/ audience. The rise of ad-blocking software is a clear indicator for users concern regar-

ding their personal data. As new technologies such as pervasive computer devices (Internet of Things) continue to infiltrate almost every aspect of everyday life, users are expected to be increasingly aware of their privacy in the digital economy. In a voice-first world, the “pay-per-click” advertising industry we know today is a time of the past, (Forbes 2016). It is thereby absolutely necessary for the voice-first advertising technology to be built on the principle of respecting the user’s privacy, to secure revenue to the millions of websites depend on advertising revenue. As researchers and professionals are aware of this relationship, the technological structure expected to be the tool for future advertising is built from the ground up to respect users’ privacy by being within the strict lines of the EU Directive 2002/58/EC, also known as E-privacy Directive, while also addressing the goal of advertisers to be able to effectively and precisely target their advertisements. While some might think of such a technology as fantasy, Carrara, Orsi, and Tanca (2013) write: “we propose PervADs: an ads/ coupon distribution platform for pervasive environments providing: (i) a rich semantic formalism for the description of pervasive ads (PADs), (ii) a novel architecture for their pervasive distribution, and (iii) a client-side, ads-filtering mechanism based on context matching, ensuring the privacy of personal data. PervADs enables one-to-one interaction between businesses and consumers without intermediate third-party entities. The PADs published by businesses are locally and privately filtered on the devices of potential customers making very hard for advertisers to collect personal consumer’s data. PervADs also provides businesses with an autonomous and inexpensive advertising infrastructure enabling a fine-grained monitoring of the performance of the advertising campaign,” (Carrara, Orsi, and Tanca 2013, p. 217). The PervAD te-


chnology is currently being developed, and I personally believe a such advertising distribution technology is expected to experience great success if adopted correctly by key market players. Ristrictions and gorund rules

Ethics As explained earlier, pervasive computing opens up for a new level of persuasive pervasive advertising formats. As these ads are expected to be produced and distributed by automated systems, such as voice-first interfaces, is it critical for these systems to understand what is accepted and perceived as ethical and unethical. Researchers such as Fogg (2002) have identified ethical concerns related to the use of persuasive computer technology and findings such as theirs must be taken into account by future voicefirst advertising formats. Persuasion has always been an integrated part of advertising and will continue to play a major role in the use of pervasive voice-first advertising as computers, and digital assistants have been proved to have distinct advantages over human persuaders, Fogg (2002). For computers to avoid unethical computer persuasion, researchers must provide boundaries for what is seen as ethical and unethical. Müller et al. (2011, p. 27) propose: “it is our belief that any intention to persuade audiences against their own interests is unethical.” Furthermore, it is a widely accepted fact that any attempt to influence vulnerable groups, such as kids or mentally ill people, is unethical, as well as the use of methods such as deception, coercion, or surveillance. For more information regarding ethical persuasion technology see Reitberger et al. (2011), and for a detailed description of the six unique ethical concerns regarding

persuasive computer technology see Fogg (2002, p. 213-219). Summary This section introduced three restrictions to voice-first advertising. Firstly voice-first advertisement should at any time allow users to feel in control of their system and its actions. Any attempt to take over users’ sense of control will only lead to failure. This section further expressed the importance of advertisers to respect users privacy in order to deliver revenue to the million of websites dependent on advertising revenue to produce their content. The PervAD distribution platform for pervasive environments is a promising technology providing advertisers the necessary tools to respect user privacy. Lastly, voice-first advertisements should act ethically, thereby not persuade audiences against their own interests. ◆

18


Chapter 4:

The voice first advertising framework Conjectured tactics for implementing advertising into voice first advertising interfaces

In the last sections, I have been looking at fields which this paper propose affect the possible implementation of advertising in voice-first interfaces. Based on the knowledge gained from the analysis of Human-Computer-Interaction, Pervasive Computing (Pervasive advertising), and Marketing 4.0, three constraints, control, privacy, and ethics, for implementing advertising into voice-first interfaces have been identified. I would now like to propose the ‘voice-first advertising framework,’ see Figure 1.1. Conjecturing four future advertising scenarios (calm, triggers, relevance and, engaging) into a single framework. The framework suggests a chronological structure of the four providing a strategic model for marketing and advertising professionals to build advertising tactics and formats for voice-first interfaces. As research into voice-first interfaces, Human-Computer-Interaction, pervasive advertising, digital marketing, and Marketing 4.0 develop, the ‘voice-first advertising framework’ as proposed in this paper will further evolve, affecting its elements and how it translates into specific voice-first advertising formats. 19

Concluding this section, each element of the proposed ‘voice-first adverting framework’ is translated into a working hypothesis for future empirical research to either confirm or discard the validity of the ‘voicefirst advertising framework’.

T

he voice-first advertising framework is structured into four stages, calm, triggers, relevance and, engaging, which should be understood as a timeline. Voice-first advertising should begin to the left of the framework and chronologically work its way through the model. Successful voicefirst advertisements can thereby not be engaging without first having built relevance and been triggered. Key limitations to voice-first advertising are identified as constraints it can cross throughout the whole framework leading to failure by either overstepping the users’, privacy, control or ethics. Only advertisements which follow the chronological structure and simultaneously not overstep the constraints will be accepted by the users and thereby able to be implemented into voice-first interfaces. The first element of the framework,

calm, introduces the concept of voice-first advertising calmness. For voice-first advertising to overcome the challenge of letting the users remain in control, they must stay calm until they are wanted. As speech is a medium in which humans cannot control what to hear, and not to hear, calm advertising in a voice-first context does not mean allowing users to skip or ignore advertised messages easily, but remaining completely silent. This is a showdown with today’s standard for buying media placement in which marketers and advertisers have the power to choose where to place their messages. In today’s world, powers are shifting from vertical to a horizontal-context in which companies no longer treat customers as targets but consider them friends. Marketers and advertisers must realize that in the voice-first world they cannot buy their way into the medium any longer. Just as a hunting tiger will hide itself among the foliage waiting for the perfect moment to attack, voice-first advertising must wait for users to allow them to appear as well. ‘The Disney Case’ is a clear indicator for what happens when promoted messages forget


the concept of remaining calm. Users will build self-made communities and turn on the intrusive promoted message, as they expect to be in control of their voice-first system. Users of voice-first interfaces do not accept any sudden interruptions of the ‘broadcast’ without prior allowance. Advertising in a voice-first world is no longer something marketers ‘do to people,’ they must be in an integral part of the interaction and this starts by respecting the users’ desire to be in control of the system. Any attempt by advertisements to out-power the voice-first interface users will be turned down immediately and only hurt the companies which have not adopted to the new reality where users expect brands and thereby also advertisements to behave like humans treating the users as friends. The trigger practice can be described as a ‘tinder match’ between matching/ triggering the right ad (solution), to the right audience (user need). The framework distinguishes between explicit and implicit trigger identification practices. Explicit trigger identification is characterized by users giving an explicit input to the system. This could, for example, be asking specific questions regarding a particular product or service category. When users ask questions such as “where can I find a good place for lunch?” the assistant would trigger/ allow prior calm sponsors with a relevant offer to step out of their silence and place their offer in the system’s response to the user’s question. The explicitly triggered placement of advertising in voice-first interfaces is similar to current search engine advertisement. SEA (Search Engine Advertising) is based on written search queries using keywords to identify whether or not the advertisement is relevant to the user’s intent. Speech search queries are different than written search queries in the way human verbal questions tend to be less precise and more semantic, with follow-up questions to supplement the initial query. This makes it hard for computers to understand the user’s exact intent. As an example, a spoken question for information regarding Barack Obama might sound like “Who is Barack Obama?”. If the user is interested in more information about Barack Obama, for example, his age, the next spoken question might sound like “How old is he?” Instead of explicitly asking “How old is Barack Obama?” While speaking, humans expect a certain understanding of context leading to less specific question formulation. Marketers and advertisers in the voice-first word

must keep the “human” aspect of spoken questions in mind and voice-search-optimize their content and offers for voice-first interfaces to index and present it correctly. Implicit trigger identification practice is the automated identification of a user’s need by decoding subtle cues coded within the spoken language. Speech Emotion Recognition technology allow computers to understand and decipher human intention, feelings and cognitive state, such as hunger, thirst, boredom, stress, etc. Some human emotions can translate into a particular product or service need, for example, the identification of hunger might resolve in a voice-based digital assistant asking for permission to look up local restaurants, or if the user is interested in information regarding what food delivery options are available at the current location. Implicitly triggered voice-first advertisements thereby have the ability to be highly supportive and suggestive, without being intrusive as they are only shown by the voice-first interface’s identification of a currently existing user need. The major challenge for identifying user need and triggering the right ad to the right audience is dependent on user data provided to the voice-first system. The pervasive computing system offers a great opportunity for collecting extensive amounts of user data from multiple sources. Voicefirst advertising must thereby be seen as part of a larger pervasive computing (IoT) system, as proposed by the PervAD system, in which data should be shared by all parties involved but with respect to the user’s privacy, to offer the users the best experience possible. In the relevance phase advertisements must not only offer the right solution to a user need, but also prove itself relevant, customized to the current situation (contextual personalization), and in the right way (interrelation personalization). Helping the users accomplish what they want or need is key for voice-first advertisements to be implemented into voice-first interfaces. The wording ‘digital assistant’ implies a personal user-assistant-relationship and today’s users of voice-first interfaces not only prefer a personalized experience, they directly expect it. Personalization is the only way for voice-first advertisements to achieve relevance. Contextual personalization is the customization of the sponsored message to the user’s current situation and activities. Situations could be any given physical situation in which the user is currently in. Examples include whether the

user is at home or at work, if the user is alone or in the company of others. Advertisers can use this information to customize their offering, for example, if a user is identified as hungry and at work, the advertised offer differs than if the user is identified as hungry and at home. By fitting the sponsored messages to the user’s current situation, the advertised offer is much more relevant. Activities refers to the user’s current physical activity. This could, for instance, be whether the user is cooking, reading, biking, driving, doing groceries, streaming TV shows, etc. Interrelation personalization, in contrast to contextual personalization is not based on the current situation of the user, but by the systems knowledge of the user’s personality gained over time. This has a high human-centric focus and, for example, include the system’s knowledge of the tone the user like to be addressed, the user’s standpoints, interest, “taste,” “style,” and preferences, etc. For example, advertisements regarding the upcoming weekend’s cultural offerings could be customized to the individual user’s preference, e.g. going to concerts, and “taste in music” e.g. classical music. Not only the content of the sponsored message (classical concert) could be personalized; the tone of voice in which the offer is promoted could be personalized as well. By interrelation personalization brands and advertisers can adapt a human-like behavior, acting as a ‘friend’ only suggesting offers they know the user will find relevant. This respect for the user’s interest and personality establishes an authentic and honest relationship between the user and the advertised offer, and might not only be something the user find relevant but even something the user appreciate. By ‘speaking in eyesight,’ interrelation personalized messages allow for a very natural interaction between the user and the advertised offer, establishing a ‘friendship’ by having ‘things in common’ (interest, standpoints, taste, style, etc.). For voice-first advertisements to achieve relevance, it is important that marketers and advertisers remember to take both contextual- and interrelation personalization into account. Ads which fit the context of the users does not automatically also meet the interrelation aspect. For example, a vegetarian user wants to order a pizza. The trigger is thereby hunger, and the need is pizza. Based on the relationship established between the user and the voicefirst interface (interrelation personalization) no pizzas containing meat should be suggested. However, if the vegetarian user’s 20


Chapter 4: The voice-first advertising framework

context changes, for instance, if the user is not alone, meat containing pizzas should be offered as well. By this example is clear that for voice-first advertisements to achieve relevance both contextual personalization and interrelation personalization must be carried out. As human to human interaction is multifaceted, the engaging phase of the framework forces voice-first advertisement to adopt multifaceted aspects of communication into their interaction with the users. This means advertisements must establish a complex dialog logic and also be able to respond to the way the users react to the promoted message. If a user asks questions, the advertisement should provide answers. If the user shows interest, the advertisement could start a purchase process. And if the users are neglecting the advertisements, they could even apologize for the intrusion. Multifaceted user-advertisement engagement should be highly emotional and clearly show brand character. Brands and other sponsors of voice-first advertisements should be driven by core values and openly communicate and support them. Allowing users to question the sponsored messages directly opens up room for dialog. This not only makes the advertisement approachable and hopefully likable, but also vulnerable. By being vulnerable, advertisement and brands have the potential to establish a very natural, human-like, brand identity. Vulnerability allows brands to feel authentic and honest. It allows the ad and brand to ‘be human,’ struggling to find their place in the world, supporting their case, admitting their flaws, and stop trying to seem perfect, just like the users themselves. Imagine advertisements where users can engage with a brand by having a conversation with the brand or it’s brand ambassadors. For instance, imagine voice-first interface users having a virtual talk with George Clooney over a cup of Nespresso Coffee. The voice first advertising practice proposed by the ‘voice-first advertising framework’ can in brief be summarized as triggering the right ad (solution), customized, to the current situation (contextual relevance), in the right way (interrelation relevance), to the right audience (need identification), offering the right amount of engagement. But voice-first advertisements must operate within certain boundaries. At the top of the framework, we find the lid containing limitations voice-first advertisements must respect at all time in order for the users to accept the presence of advertisements in 21

their voice-first interface. In every stage of the framework, voice-first advertisements must operate within the boundaries of allowing users to feel in control, accepting the user’s privacy, and operate ethically correct. For the users, remaining in control of their system is of absolute importance. Voicefirst advertisements should be supportive and suggestive but not take over the system. Users should at all times feel the system is under their control, and advertisements taking that feeling away will only cause users to immediately protest against the advertisement. For voice-first ads to be relevant they must be personalized to the individual user, but the personalization can also go too

far. Personalized voice-first advertisements must not make users feel uncomfortable by giving the users the impression their system knows too much about them, thereby overstepping their privacy. Which leads to the last limitation of voice-first advertisements being ethics. Advertisements in voice-first interfaces must be presented in a way that users feel carry their best interests. Any attempt to persuade users against their own will, or without regard to the users’ best interest is considered as unethical. The framework is further portrayed as a box because advertisements in voice-first interfaces should no longer be delivered as a ‘fi-


H1: The future of voice-first advertising will be calm. Successful voice-first advertising must remain silent until the user allows for their presence. H2: Voice-first advertisements must be either explicitly or implicitly triggered by the user for their presence to be accepted. H3: The future of advertising will be interconnected. Successful voice-first advertisements understand that they are part of a larger pervasive computer system where traditional and digital marketing tactics are no longer suitable. H4: Voice-first advertising must be personalized to the user’s context, and the interrelation between the voice-first digital assistant and the user, in order to be relevant. H5: Voice-first advertisements must be engaging by establishing a complex dialog logic allowing for deeper levels of multifaceted interaction. H6: Voice-first advertising must chronologically follow the structure of the ‘voice-first advertising framework’ to be accepted by the users. H7: Voice-first advertising must not overstep the users’ need for being in control, user privacy, and/ or user ethics, to be accepted. nished product’. Voice-first advertisements should be delivered as ‘advertisement packages’ containing an offer which can be paired (triggered) to solve a specific user need, and include a guide for voice-first interfaces to customize the offer making the advertisement relevant (relevance) to the user. In a voice-first interface world, “one size fits all” certainly no longer functions. Marketers and advertisers must embrace the new technological possibilities and allow for a greater automated production of advertisements based on individual user profiles, not segments and target groups based on demographics data. In a voice-first world, advertisers should no longer produce finished

voice-first advertisements, but feed digital assistants ‘advertisement packages’ with the information the digital assistants need to help their user achieve what the user wants, or help the user get what the user needs. As this paper proposed that ‘voice first advertising framework’ is build solely upon the knowledge gathered from the study of Human-Computer-Interaction, Pervasive Computing (Pervasive Advertising), and Marketing 4.0, this paper ends with an open-question on the proposed ‘voice-first advertising framework’. To prove the validity of this proposition further empirical research must be done in order to confirm the following hypotheses:

H8: The days of advertisement as finished products are gone. In the pervasive computing and voice-first driven world, ads must be provided as ‘advertisement packages’ allowing for vast amounts of computer automated customization in the production, personalization, and distribution of advertisements.

22


Conclusion

V

oice-first is an interface design for a voice-controlled interaction between humans and computers. The most common use of voice-first interfaces today is found in personal digital assistants such as Apple’s Siri, Google’s Google Assistant, Amazon’s Alexa, and Microsoft’s Cortana. In this paper, we saw that advertising has yet to find its place and form in voice-first interfaces and this new interface design challenges current, traditional as well as digital, practices and formats of advertising. Just as the Internet forced traditional advertising to develop new digital and online advertising formats, Mark Weiser’s idea of a new Pervasive Computer age is a good indicator for the next generation of pervasive and voice based advertising, exceeding what we currently understand by digital/ online advertising.

is structured into four stages and argues successful voice-first advertisements follow a chronological structure. First, voice-first advertising must remain calm, silent, until triggered. Explicit or implicit user cues (need identification) is used to trigger/ match the right user need with the right sponsored solution/ offer. Advertisers can thereby no longer specifically choose placements in a voice-first word, but must wait for users to ‘allow’ their presence. The idea of contextual- and interrelation personalization aims at constructing voice-first advertisements with a high user relevance, to avoid being considered spam. Finally, the ‘voice-first advertising framework’, points out the need for voice-first advertising to establish a complex dialog, providing deep levels of engagement letting users question and comment on the promoted message.

To understand a possible implementation of advertising in voicefirst interfaces this paper proposed studying the fields of Human-Computer-Interaction, Pervasive Computing (Pervasive Advertising), and Marketing 4.0. The study led to the identification of three key limitations voice-first advertising must respect in order to be accepted by the voice-first interface users. To remain in control of the voice-first system was found to be of absolute importance. Secondly, users’ concern for their privacy was found to be a critical issue for voice-first advertising in a pervasive computer world. And third, voice-first advertising must act ethically, carrying the best interest of the user at heart.

The ‘voice-first advertising framework’ strongly suggest advertisements in a voice-first world become more ‘human’ in line with the idea of Marketing 4.0. The most important task of advertisement in this new medium is to understand when it is wanted and offer an experience the user of the voice-first interface find relevant. The voice-first interface technology is not a mass-medium for advertisers to promote a pre-set message to as large an audience as possible. Voice-first advertisements should understand its role as supporting digital assistants in creating a more convenient and easier everyday for its users.

I strongly believe voice-first advertising is going to disrupt the current pay-per-click online advertising industry, and as marketers and advertisers it is our responsibility to shape the development of voice-first advertising in a meaningful way. For meaningful advertisements to successfully take place in voice-first interfaces this paper proposes a ‘voice-first advertising framework’. The framework proposes a future in which advertisements are no longer finished products (ads) but ‘packages’ containing an offer which can be paired to a specific user need and include a vast amount of information allowing digital assistants to build sponsored messages following the chronological order of the framework. The framework 23

I hope the ‘voice-first advertising framework’ will prove itself valid and useful for marketers and advertising professionals to operate within the voice-first world, and inspire further researchers to study this field, collectively constructing a solid foundation for advertising practices in the approaching era of voice-first. ◆


Limitations and future research

T

his study has multiple methodological limitations calling for the reader’s caution in interpreting the findings. First, due to the limited period of 30 workdays for producing this paper, the findings presented in this study are not based on empirical research observations but by the study of existing research fields. It should be noted the study of the possible implementation of advertising in voice-first interfaces provided in this paper is limited to the study of Human-Computer-Interaction, Pervasive Computing (Pervasive Advertising), and Marketing 4.0. Other disciplines such as consumer phycology, general marketing research, other ears of computer science, are not considered in this paper. The purpose of this papers was to establish a foundation and provide possible research questions for future empirical researchers willing to examine the field of voice-first advertising to either confirm or discard. This paper has focused on voice-based digital assistants alone and has not included any regards to other forms of digital assistants (e.g. chatbots). Nor has this papers included any regard to ambient computing or Davis’ Interpersonal Reactivity Index.

Another area for future examination is the PervAD technological platform with a specific focus on voice-first interfaces. This study only briefly introduced the PervAD technology but further research into overcoming the two main challenges of delivering the right ad at the right time and in the best way possible, and secondly, finding the consumers’ context without being invasive is necessary. Furthermore, a lack of knowledge in the search for the perfect balance between what the consumers need and the objective of the advertisers still exist. Also, marketing research has finally provided answers to marketing and advertising practices in the digital economy, but as we are entering another era of pervasive computing, the current state of research in digital marking is no longer sufficient. Research into how pervasive computing and voice-first interfaces affect current marketing theories and practices within organizations and agencies seem highly relevant. ◆

To prove the legitimacy of the ‘voice-first advertising framework,’ empirical research needs to be done to confirm or discard the proposed hypothesis in this paper. Voice-first advertising has potential to revolutionize how we think advertising should be and boost the adoption of the pervasive computing technology. But it seems although some professionals are excited about the pervasive computing future, a considerable amount of marketing and advertising professionals are unaware of the recent developments in computing technology and how it affects their industry. Simultaneously, many computer scientists have great knowledge within their field but lack basic marketing and advertising expertise. Research combining the fields of computer science and marketing is needed to provide information on the design of future advertising formats and the development of future technologies for the distribution of digital advertisements.

24


References Aron, J. (2011). How innovative is Apple’s new voice assistant, Siri? New Scientist 212(2836), p. 24. doi: 10.1016/S02624079(11)62647-X Baecker, R. & Buxton, W. (1987). A historical and intellectual perspective. In R. Baecker & W. Buxton, Readings in HCI: A multidisciplinary approach (pp. 41–54). Morgan Kaufmann. San Francisco. Banker, R. D. & Kaufmann, R. J. (2004). The evolution of research on Information Systems: A fiftieth-year survey of the literature in Management Science. Management Science, 50(3), p. 281–298. Blackwell, A. (2006). The reification of metaphor as a design tool. ACM Trans. Computer-Human Interaction, 13(4), p. 490–530. Brad A. M. (1998). A Brief History of Human Computer Interaction Technology. ACM interactions. Vol. 5, no. 2. pp. 44-54. Bublitz, F., Almeida, H., Luiz, S. O. D., and Perkusich, A. (2013a). Pervasive Advertising: an Approach for Consumers and Advertisers. IEEE Third International Conference on Consumer Electronics. Berlin. p. 290-294. Bublitz, F., Silva, L. C. e., Oliveira, E. A. da S., Luiz, S. O. D., Almeida, H. O. de., and Perkusich, A. (2013b). A Petri Net Model Specification for Delivering Adaptable Ads through Digital Signage in Pervasive Environments. In Proceedings of the 24th International Conference on Software Engineering & Knowledge Engineering (SEKE’2013). Knowledge Systems Institute Graduate School. p. 405–410. Burke, C. (1998). A rough road to the information highway: Project INTREX. In Hahn, T.B., and Buckland, M. (Eds.), Historical studies in Information Science, p. 132-146. Information Today / ASIS, Medford, NJ. Carrara, L., and Orsi, G. (2011). A New Perspective in Pervasive Advertising. University of Oxford Department of Computer Science. Carrara, L., Orsi, G., and Tanca, L. (2013). Semantic Pervasive Advertising. In Faber, W., and Lembo, D. (eds.) Web Reasoning and Rule Systems 7th International Conference, RR 2013 Mannheim, Germany, July 2013 Proceedings. p. 216-222. Springer, Berlin. Chibelushi, C. C., Deravi, F., & Mason, J. S. D. (2002). A review of speech-based bimodal recognition. IEEE Transactions on Multimedia, 4(1), 23–37. Cohen, M. H., Giangola, J. P., and Balogh, J. (2004). Voice User Interface Design. 1st ed. Pearson Education. Print, Boston, MA. Darwin, C. (1978). The expression of emotion in man and animals, 3rd edition. Harper Collins, London. 25

Dix, A., Finlay, J., Abowd, G., and Beale, R. (1993). Human-Computer Interaction. Prentice-Hall. Ekman, P. (1993). Facial expression and emotion. Am Psychol 48:384–392 Elwakdy, M., Elsehely, E., Eltokhy, M., and Elhennawy, A.(2008). Speech recognition using a wavelet transform to establish fuzzy inference system through subtractive clustering and neural network (ANFIS). International Journal of Circuits, Systems and Signal Processing, 4(2), p. 264–273. Ferdinando, A. D., Rosi, A., Lent, R., Manzalini, A., and Zambonelli, F. (2009). MyAds: A System for Adaptive Pervasive Advertisements. Pervasive and Mobile Computing, 5(5), p. 385–401. Ferguson, H., Smith Myles, B., and Hagiwara, T. (2005). Using a personal digital assistant to enhance the independence of an adolescent with Asperger syndrome. Education and Training in Developmental Disabilities, 40, p. 60–67. Fogg, B.J. (2002). Persuasive Technology. Morgan Kaufmann, San Francisco. Forbes. (2016). Voice-First Technology Is About To Kill Advertising As We Know It. Available at: https://www.forbes.com/sites/ quora/2016/12/05/voice-first-technology-is-about-to-kill-advertising-as-we-know-it/#711f89cc3788 (Accessed: 15 June 2017) [Web] Glaser, A. (2017). Google’s ability to understand language is nearly equivalent to humans. Available at: https://www.recode.net/2017/5/31/15720118/google-understand-language-speech-equivalent-humans-code-conference-mary-meeker (Accessed: 2 June 2017) [Web] Geoffroy, F., Aimeur, E., Gillet, D. (2002). A Virtual Assistant for Web-Based Training in Engineering Education. In: Cerri, S.A., Gouardères, G., Paraguaçu, F. (eds) Intelligent Tutoring Systems. ITS 2002. Lecture Notes in Computer Science, vol 2363. Springer, Berlin, Heidelberg. Reteived from https://link.springer.com/chapter/10.1007/3-540-47987-2_34 Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera Am Mittag German audio-visual emotional Speech Database. In IEEE international conference on multimedia & expo, Hannover, Germany, p. 23–26 Grudin, J. (2012). A Moving Target—The Evolution of Human-Computer Interaction. In J. Jacko (3rd Edition), Human-Computer Interaction Handbook, Taylor & Francis. Hewett, T., Baecker, R., Card, S., Carey, T., Gasen, J., Mantei, M., Perlman, G., Strong, G., and Verplank, W. (1992). ACM SIGGHI


Curricula for Human-Computer Interaction, Report for the ACM SIGGHI Curriculum Development Group.

Myers, B. A. (1998). A brief history of human computer interaction technology. ACM Interactions, 5(2), p. 44–54.

Hura, S. L. (2008). Voice User Interfaces. In Kortum, P. HCI Beyond the GUI - Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces, p. 197-228. Elsevier.

NASA (2017). Human Computers. Available at: https://crgis.ndc. nasa.gov/historic/Human_Computers (Accessed: 30 May 2017) [Web]

IJsselsteijn, W., Kort, Y. d., Midden, C., Eggen, B., and Hoven, E. v. d. (2006). Persuasive Technology for Human Well-Being: Setting the Scene. In IJsselsteijn, W., Kort, Y. d., Midden, C., Eggen, B., and Hoven, E. v. d. (Eds.). Persuasive Technology. First International Conference on Persuasive Technology for Human Well-Being, PERSUASIVE 2006 Eindhoven, The Netherlands, May 2006, Proceedings. p. 1-5. Springer, Berlin.

Norman, D. (2013). The design of everyday things. Revised and Expanded Edition. Basic Books, New York.

Imrie, P., and Bednar, P. (2013). Virtual Personal Assistant. In M. Martinez, & F. Pennarolaecilia (Eds.), ItAIS 2013. Proceedings of 10th Conference of the Italian Chapter of AIS Università Commerciale Luigi Bocconi. Milan, Italy. Kostakos, V. and Ojala, T. (2013). Innovations in Ubicomp Products. Pervasive Computing, IEEE, vol. January–March, p. 8–13. Kotler, P., Kartajaya, H., and Setiawan, I. (2017). Marketing 4.0 —Moving from Traditional to Digital. John Wiley & Sons Inc., Hoboken, New Jersey, US. Kotler, P., Keller, L.K. (2008). Marketing Management, 13th edn. Prentice Hall, Herts. Lauterborn, B. (1990). New markerting litany: Four P’s passe C-words take over. Available at: http://www.rlauterborn.com/ pubs/pdfs/4_Cs.pdf (Accessed: 14 June 2017) McCartney, S. (2001). ENIAC. 1st ed. Berkley Books, New York, US. Milhorat, P., Schlögl, S., Chollet, G., J. Boudy, A. Esposito and G. Pelosi. (2014) Building the next generation of personal digital Assistants. 2014 1st International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), p. 458-463. doi: 10.1109/ATSIP.2014.6834655 Molla, R. (2017). Internet Trends Report: All The Slides, Plus Analysis. Available at: https://www.recode.net/2017/5/31/15693686/ mary-meeker-kleiner-perkins-kpcb-slides-internet-trends-code-2017 (Accessed: 2 June 2017) [Web] Mostafa, M. M. (2013). More than words: Social Networks’ text Mining for Consumer Brand Sentiments. Expert Systems with Applications 40(10), p. 4241–4251. Müller, J., Alt, F., and Michelis, D. (2011). Pervasive Advertising. Springer-Verlag London, London.

Olson, C., Hull, J., and Francis, P. (2017a). Optimizing for Voice Search: Real Talk about AI and Intelligent Agents. Available at: https://cache.webcasts.com/content/adwe001/1150834/content/ e8e841d8a83afe1ce4ee2e01cfe40d4438b44bba/pdf/AdweekWebinarDigitalAssistantsFINAL.pdf (Accessed: 15 June 2017) Olson, C., Hull, J., and Francis, P. (2017b, June 15). Optimizing for Voice Search: Real Talk about AI and Intelligent Agents. [WEBINAR] Recorded version available at: https://event.webcasts.com/viewer/event.jsp?ei=1150834&tp_key=21f37f9d57 In Adweek webinar series. Oxford. (1989). Oxford English Dictionary. Oxford University Press. Pavlou, P. A., Stewart, D. W. (2001). Interactive Advertising: A new conceptual framework towards integrating elements of the marketing mix. In Moore, M., and Moore, R. S. (2001). New Meanings for Marketing in a New Millennium. Proceedings of the 2001 Academy of Marketing Science (AMS) Annual Conference, p. 218-222. Springer Cham Heidelberg New York Dordrecht London. Pew, R. (2003). Evolution of HCI: From MEMEX to Bluetooth and beyond. In J. A. Jacko & A. Sears (Eds.), The Human-Computer Interaction handbook, p. 1–17. Lawrence Erlbaum. Mahwah, NJ. Porta, M. (2007). Human–Computer Input And Output Techniques: An Analysis Of Current Research And Promising Applications. Artificial Intelligence Review 28(3) p. 197-226. Preece, T., Rogers, T., Sharp, H., Benyon, D., Holland, H., and Cerey, T. (1994). Human-Computer Interaction. Addison-Wesley. Ramakrishnan, S., and I. M. M., El Emary. (2011). Speech Emotion Recognition Approaches In Human Computer Interaction. Telecommunication Systems, 52 (3). Ranjan, S. (2010). Exploring the discrete wavelet transform as a tool for Hindi speech recognition. International Journal of Computer Theory and Engineering, 2(4), p. 642–645. Rayward, W.B. (1983). Library and information sciences: Discipli26


nary differentiation, competition, and convergence. In F. Machlup & U. Mansfield (Eds.), The study of information: Interdisciplinary messages, p. 343-405. Wiley, New York. Rayward, W.B. (1998). The history and historiography of Information Science: Some reflections. In Hahn, T.B. & Buckland, M. (Eds.), Historical studies in Information Science, p. 7-21. Information Today / ASIS, Medford, NJ. Reitberger, W., Meschtscherjakov, A., Mirlacher, T., Tscheligi, M. (2011). Ambient Persuasion in the Shopping Context. In: Müller, J., Alt, F., and Michelis, D. Pervasive Advertising, p. 309-324. Springer-Verlag London, London. Roemmele, B. (2016). There is A Revolution Ahead and It Has A Voice. Available at: https://techpinions.com/there-is-a-revolution-ahead-and-it-has-a-voice/45071 (Accessed: 8 June 2017) [Web] Sadun, E., Sande, S. (2014). Talking to Siri Mastering the language of Apple’s intelligent assistant. 3rd ed. Que Publishing. Schmeil, A., and Broll, W. (2007). MARA - A Mobile Augmented Reality-Based Virtual Assistant. 2007 IEEE Virtual Reality Conference, Charlotte, NC, p. 267-270. doi: 10.1109/VR.2007.352497 Schmidt, A., Kern, D., Streng, S., and Holleis, P. (2008). Magic beyond the screen. IEEE. MultiMedia 15(4), p. 8–13. Schmidt, A. (2000). Implicit human-computer interaction through context. In: Personal and Ubiquitous Computer, vol. 4, p. 191–199. Springer. London. Shackel, B. (1997). HCI: Whence and whither? Journal of ASIS, 48(11), p. 970–986. Townsend, T. (2017). Google has bigger challenges with Home than just recognizing different voices. Available at: https://www. recode.net/2017/5/14/15527306/google-problems-voice-control-ai-okay-google (Accessed: 17 June 2017) [Web] Turk, M,. and Robertson, G. (2000). Perceptual user interfaces. Commun ACM 43(3) p. 33–34. Vinciarelli A, Pantic M, Bourlard H (2009). Social signal processing: survey of an emerging domain. Image Vis Comput J 27(12) p. 1743–1759 Vogt, T., Andre, E., and Wagner, J. (2008). Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In C. Peter & R. Beale (Eds.), LNCS: Vol. 4868. Affect and emotion in HCI, p. 75–91. Weiser, M. (1991). The computer of the 21st century. Scientific American 265(3), p. 94-104 Weiser, M., Brown, J.S. (1998). The coming age of calm technology. In: Denning, P.J., Metcalfe, R.M. (Eds.) Beyond Calculation: The Next Fifty Years of Computing. Copernicus, New York. Welch, C. (2017). Google Home is playing audio ads for Beau27

ty and the Beast. A ‘timely’ experiment sounded very close to a movie advertisement. Available at: https://www.theverge.com/ circuitbreaker/2017/3/16/14948696/google-home-assistant-advertising-beauty-and-the-beast (Accessed: 7 June 2017) [Web] Zeng, Z., Roisman, M. P. I., and Huang, T. S. (2009). A survey of affect recognition methods: audiovisual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), p. 39–58. Zhang, P., Li, N., Scialdone, M.J., and Carey, J. (2009). The intellectual advancement of Human-Computer Interaction Research: A critical assessment of the MIS Literature. AIS Trans. Human-Computer Interaction, 1(3), p. 55-107. LIST OF IMAGES IMG1. Front page. Helden, Sandy Van. ‘Conversational Assistant 1’ [illustration] Retrieved from: https://static1.squarespace. com/static/57e28fc12e69cf8358a5bf1e/t/58946140725e2570d8d243f5/1486119236728/Conversational-Assistant-1-thumbnail. png?format=original, accessed July 8th 2017. IMG2. Page ii. Helden, Sandy Van. ‘Conversational interfaces’ [illustration] Retrieved from: https://static1.squarespace.com/ static/57e28fc12e69cf8358a5bf1e/t/5853be59440243be1963b8da/1481883234344/?format=2500w, accessed July 8th 2017. IMG3. Page 1. Helden, Sandy Van. [illustration] Retrieved from: https://static1.squarespace.com/static/57e28fc12e69cf8358a5bf1e/t/5821b99c15d5dbd2a2374dc7/1478605222352/?format=2500w, accessed July 8th 2017. IMG4. Page 2. Helden, Sandy Van. [illustration] Retrieved from: https://cdn-images-1.medium.com/max/1200/1*Qo7uVAC5mugDISAdi44wKQ.png, accessed July 8th 2017. IMG5. Page 4. Helden, Sandy Van. [illustration] Retrieved from: https://static1.squarespace.com/static/57e28fc12e69cf8358a5bf1e/t/58738d68bebafb70ccf4446e/1479985781407/?format=2500w, accessed July 8th 2017. IMG6. Page 5. [illustration] Retrieved from: http://www.newyorker.com/wp-content/uploads/2011/02/110228_cn-cogito_p465. jpg, accessed July 8th 2017. IMG7. Page 6. Helden, Sandy Van. [illustration] Retrieved from: https://static1.squarespace.com/static/57e28fc12e69cf8358a5bf1e/t/5853be26bebafbe99c8a5e44/1481883207424/?format=2500w, accessed July 8th 2017. IMG8. Page 8. Schwartz. [illustration] Retrieved from: http:// www.popsci.com/sites/popsci.com/files/styles/medium_1x_/public/images/2016/08/14045699_10153943539263869_518649979 5430517113_n.jpg?itok=DwuXSjdM, accessed July 8th 2017. IMG9. Page 11. [illustration] Retrieved from: http://blog.fitzcarraldoeditions.com/wp-content/uploads/2017/01/18ai-cover2-su-


perJumbo-v4.jpg, accessed July 8th 2017. IMG10. Page 13. Helden, Sandy Van. [illustration] Retrieved from: https://static1.squarespace.com/static/57e28fc12e69cf8358a5bf1e/t/583d61bf46c3c40654768019/1480417737172/?format=2500w, accessed July 8th 2017. IMG11. Page 14. ‘Elements’ [illustration] Retrieved from: https://media.newyorker.com/photos/59516de13826ed7f272f9f97/4:3/w_300,c_limit/2016-Elements.png, accessed July 8th 2017. IMG12. Page 16. Helden, Sandy Van. [illustration] Retrieved from: https://static1.squarespace.com/static/57e28fc12e69cf8358a5bf1e/t/5894606315d5db8ef4ade657/1486119036552/ conversation%2C-for-better-or-worse.png?format=2500w, accessed July 8th 2017.

Declaration: Herewith I declare that this thesis is the result of my independent work. All sources and auxiliary materials used by me in this thesis are cited completely.

Berlin, . . . . . . . . . . . .

(Nikolaj J. Madsen) 28


Nikolaj J. Madsen July 2017

29


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.