Computational Study of Primitive Emotional Contagion in Dyadic Interactions

Page 1

258

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 11, NO. 2, APRIL-JUNE 2020

Computational Study of Primitive Emotional Contagion in Dyadic Interactions Clavel , and Mohamed Chetouani Giovanna Varni , Isabelle Hupont , Chloe Abstract—Interpersonal human-human interaction is a dynamical exchange and coordination of social signals, feelings and emotions usually performed through and across multiple modalities such as facial expressions, gestures, and language. Developing machines able to engage humans in rich and natural interpersonal interactions requires capturing such dynamics. This paper addresses primitive emotional contagion during dyadic interactions in which roles are prefixed. Primitive emotional contagion was defined as the tendency people have to automatically mimic and synchronize their multimodal behavior during interactions and, consequently, to emotionally converge. To capture emotional contagion, a cross-recurrence based methodology that explicitly integrates short and long-term temporal dynamics through the analysis of both facial expressions and sentiment was developed. This approach is employed to assess emotional contagion at unimodal, multimodal and cross-modal levels and is evaluated on the Solid SAL-SEMAINE corpus. Interestingly, the approach is able to show the importance of the adoption of cross-modal strategies for addressing emotional contagion. Index Terms—Primitive emotional contagion, facial expressions analysis, sentiment analysis, cross-recurrence quantification analysis

Ç 1

I

INTRODUCTION

the last decade, researchers in Human-Machine Interaction (HMI) worked to endow virtual agents with socioaffective skills by mainly focusing on face-to-face communication [1], [2]. These agents have been employed in a variety of applications such as intelligent tutoring systems [3], serious games [4], and health-care scenarios [5]. Interacting with them, users increase their willingness to disclose the feeling of rapport [6], or report a better quality of experience [7]. Current research on virtual agents has mainly focused on the study of single communication modalities such as gaze and smile [8], [9] and of the relation among all the intrapersonal modalities, for example language and gestures [10]. However, when people interact in dyads or groups, they consciously and unconsciously exchange feelings and emotions simultaneously through and across multiple modalities (see [11] for a survey on multimodal emotional perception). Therefore, models and technologies allowing virtual agents to engage humans in these sophisticated forms of interpersonal interaction are required [12], [13]. The scarce interpersonal dynamics models that can be found in the human-agent interaction literature still rely on the classical information-transmission metaphor of communication in which, turn by turn, user and agent produce (encode) and receive (decode) messages that travel across N

G. Varni, I. Hupont, and M. Chetouani are with the Institute for Intelligent Systems and Robotics, Sorbonne University, Paris 75005, France. E-mail: {varni, hupont, mohamed.chetouani}@isir.upmc.fr. C. Clavel is with the Institut Mines-T el ecom, T el ecom ParisTech, CNRS LTCI75014, Paris 75013, France. E-mail: chloe.clavel@telecom-paristech.fr.

Manuscript received 6 Mar. 2017; revised 2 Nov. 2017; accepted 21 Nov. 2017. Date of publication 28 Nov. 2017; date of current version 29 May 2020. (Corresponding author: Giovanna Varni.) Recommended for acceptance by A. A. Salah. Digital Object Identifier no. 10.1109/TAFFC.2017.2778154

channels between them [14]. These models therefore fail to fully emulate human communication dynamics and its cross-modal nature. During a real interaction, the partners, like in a dance, continually co-regulate their behaviors [15]. This communication does not necessarily imply the use of the same modalities from each partner but consists in an interpersonal cross-modal exchange, i.e., a dynamic interleaving of several modalities leading to the emergence of communicative behaviors such as engagement and synchrony. The dynamic interplay of emotions during interactions is referred in psychological literature as emotional contagion. Hartfield and colleagues [16] argue that emotional contagion is a multiply determined (i.e., with many possible causes), multilevel (i.e., occurring through different communication modalities) family of phenomena. It “can manifest as similar responses (e.g., as when smiles elicit smiles) or complementary responses (e.g., when the sight of a stroke aimed leads to a drawing back of the site of the blow)”. Particularly, Hartfield and colleagues introduce an automatic and largely unconscious contagion they call primitive emotional contagion [17] and define as “the tendency to automatically mimic and synchronize facial expressions, vocalizations, postures, and movements with those of another person’s and, consequently, to converge emotionally”. An important step toward the development of advanced models of interpersonal human-agent interaction is the measure of such emotional contagion specially taking into account crossmodality. This would enable to improve, even in long term interactions, the naturalness and believability of virtual agents avoiding for example the perception of uncanniness [18]. This paper applies a computational approach to investigate the dynamics of primitive emotional contagion1 in face1. Note that primitive emotional contagion will hereinafter be referred to as emotional contagion for the sake of simplicity.

1949-3045 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See ht_tps://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


VARNI ET AL.: COMPUTATIONAL STUDY OF PRIMITIVE EMOTIONAL CONTAGION IN DYADIC INTERACTIONS

to-face human conversational dyadic interactions where the roles of the two partners are established in advance. The focus is on examining and measuring to which extent the emotions of one partner –the follower– converge to the emotions of the other partner –the leader– across conversational time. The relevance in HMI to explore such a scenario is twofold. First, interactions involving artificial agents are mainly scripted in terms of roles and behavior. Furthermore, when an agent acts as tutor, guide or interviewer, it is implicitly assumed that the agent is the interaction leader: it tries to engage the user and lead her to accomplish the task or trust it. In particular, the affective polarity (positive versus negative) expressed by the partners through facial expressions and semantics was here taken into account. Although these two specific modalities were addressed here, the proposed approach is scalable and could be applied to other communication modalities as, for example, gestures and physiological signals. More specifically, this paper (i) investigates methodologies for measuring cross-modal emotional contagion; and (ii) examines short and long-term temporal dynamics of such emotional contagion. Addressing these issues will help to provide a baseline for future studies aimed at creating computational tools and models to automatically measure affective interpersonal dynamics in interactions involving virtual partners. In particular, a virtual partner could exploit such measures of emotional contagion as additional input to improve its reasoning model and its multimodal behavior. In such a way, it could plan its actions not just in accordance with the emotional state of the user but also taking into account the emotional dynamics of the interpersonal interaction at several temporal granularities. The paper is organized as follows. Section 2 reviews related work. Section 3 provides a description of how affective polarity time series from facial expressions and semantics were built and introduces the automated measures of emotional contagion. Section 4 details how these measures were applied to a reference use case. Section 5 shows results and discussion. Section 6 concludes the paper.

2

RELATED WORK

2.1

Emotional Contagion in Human-Human Interaction The social nature and function of emotions has been only recently investigated by psychologists [19]. They started to shift attention from the paradigm of emotion as an intrapersonal system having as subcomponents appraisals, physiology, experience and expressive behavior to an interpersonal system where partners share and adapt these subcomponents over time [20]. This dynamic adaptation has been studied under various names, the most known being emotional contagion [16]. Previous experiments demonstrated that this phenomenon does not necessarily occur among acquaintances [21], [22] and that its temporal scale ranges from few seconds to several weeks [16], [23]. Emotional contagion has been studied in dyads as in larger groups, and, only recently, in social networks. Several previous studies focused on dyadic close relationships such as romantic relationships or parent-infant dyads. For example, Westman found that a partner experiencing stressful situations at work can transmit this stress to the other partner at home [24]. Sels et al. [25] conducted a study involving 50

259

couples and modeled emotional contagion for each couple. Their findings highlight that not all the couples reported strong evidence of sharing emotions and that the possible specific patterns signaling this contagion show a large degree of inter-dyad differences. As regards parent-infant relationship, Waters and colleagues [26] focused on high-low arousal positive/negative stress transmission in mother-child interaction by measuring the co-variation of their physiological responses. Weisman et al. [27] investigated this kind of interaction taking into account the interplay between non-verbal features and hormonal changes. At group level, Barsade [28] studied emotional contagion in a managerial decision making scenario and its influence on work group dynamics. She investigated: the amount of contagion occurring among the members of the groups, the role played by the valence and arousal in this contagion, and the influence of positive valence on cooperation and task performance. Through the analysis of the participants’ self-reports and the ratings of the recordings provided by external observers, she found the predicted effect of emotional contagion among the members and also the expected effect of positive valence to increase the level of cooperation and the perception of the task performance. However, no support was found to validate the hypotheses about the role of valence and arousal. Finally, in the last few years, some studies were performed to verify whether emotional contagion could also occur in larger groups exclusively through an online text-based interaction. Kramer and colleagues carried out experiments on Facebook by manipulating the affective content of the News Feed of users and they found a small but significant correlation between the number of words with positive/negative valence and the valence of the stimulus they posted [29]. Ferrara and Yang explored, in a similar study, emotional contagion via Twitter [30]. Differently from Kramer, they did not manipulate the content of the texts, and they observed that, in average, users reply by expressing the same valence of the stimulus. All these studies show that emotional contagion is an ubiquitous component of social interaction which has been also supposed to be a human innate mechanism to facilitate social connection and coordination. However the analysis, the modeling and the evaluation of emotional contagion are made difficult since they require to combine multiple individual behaviors, which are themselves multimodal with their own dynamics. Consequently, very few studies proposed approaches to capture this phenomenon in an automated way and architectures suitable for HMI.

2.2 Emotional Contagion in Human-Agent Interaction Virtual agents need to recognize and interpret in real-time the verbal and nonverbal cues of their human partner, and rapidly generate responses consequent to both partners’ current goals, beliefs, intentions and expectations [31]. In particular, the multimodal alignment of virtual agent’s behavior on the user has been widely acknowledged as an interpersonal strategy for improving quality of interaction. The simplest, yet most popular, alignment process that can be found in the literature consists on making the virtual agent mirror its partner’s behavior. Several works demonstrated that a virtual agent is more persuasive and better liked when it mimics a human speaker’s gestures, such as smiles [32], head

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


260

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 11, NO. 2, APRIL-JUNE 2020

nods/shakes [33], gaze movements [1], body postures [34], and linguistic style [35]. Higher-level subjective alignment processes, less frequent in the state of the art, take into account user’s socio-emotional behavior and subsequently define socio-emotional strategies linking the user input to the agent output. Endowing virtual agents with such socio-emotional capabilities has been proved to significantly enhance their believability, trustworthiness and lifelikeness during interaction [36]. User’s affect is typically detected by analyzing different channels, such as facial expressions, audio and body gestures. Agents’ emotional response models can be based on hand-crafted expert rules [3], on machine-learned rules [37], on Social Sciences literature [38] or on well-known computational models of affect such as OCC (Ortony, Clore and Collins’s model of emotions) [39], ALMA (A Layered Model of Affect) [40] or BDTE (Belief-Desire Theory of Emotions) [41]. For example, Andre et al. [42] applied politeness strategies as an answer to user’s negative emotional states: the more the interlocutor was in a negative state, the more their virtual guide had to be polite. Also, in the study of D’Mello and Graesser [43] a virtual tutor provided emotional feedback and modulated the complexity of a learning task to regulate student’s disengagement and boredom. In any case, both low and high user-agent alignment levels interleave. For instance, copying gestures can be viewed as a way to maintain an empathetic connection [32], emotional bonding [44] and increase the feeling of rapport with the user [1]. Nevertheless, even though nowadays virtual agents are largely equipped with abilities for using natural language, conducting dialog, expressing emotions and nonverbal behavior, their alignment strategies still rely on “ping-pong communication”, which is insufficient to emulate human-like conversation [31]. Alignment processes, mostly based on if <state> then <action> production rules, focus on the adaptation of the virtual agent to its interlocutor, but do not take into account the reciprocal adaptation of this interlocutor. Behaviors are computed in reaction to partners behavior, but not in interaction with partners behavior. Current virtual agents are not yet capable of achieving such dynamical interpersonal coupling and coordinations. In particular, at the affective level, the need of an “emotional resonance” stated by Gratch et al. [1] is far from being fully accomplished.

2.3 Analysis of Interpersonal Emotion Dynamics Most previous efforts in affective computing were devoted to the computation of intrapersonal measures of emotion over short interaction segments by analyzing uni-modal, multi-modal features and their relation (e.g. [45]). The extraction of interpersonal measures of emotion has still received little attention in the literature [37]. This is partially due to the time-consuming nature of collecting behavioral data for different aspects of interpersonal connectedness, and also to the difficulty of developing algorithms that can take into account the time course of interaction among different modalities and between interlocutors. An interesting psychological model that takes into account the interpersonal dynamics of emotions is the Temporal Interpersonal Emotion System (TIES) [20]. The basic assumption of TIES is that human relationships are dynamic self-organizing systems. This implies that: (i) relationships can be described through the temporal evolution of some variables, (ii) the next

value of these variables depends on their past value(s) and on other possible influencing factors, and (iii) the temporal evolution of each sub-component of the system influences the behavior or the system as a whole. For example, in the case of emotional contagion, the descriptive variables can be arousal or other features of emotion computed from different modalities such as facial expressions. During a conversation, the arousal of a person’s facial expression partially depends on its previous values and partially on the interaction with other people (e.g, a person can smile in response to a specific behavior of another person). Further, the affective state of a person is able to affect the one of the other to reach a shared emotional state. Several previous studies proposed both mathematical and graphical models of TIES. Robin and colleagues [46] conceived a formal mathematical model based on the coupling of logistic equations to illustrate how coordination develops over time in interpersonal relationship. They took into account both overt behavioral component and internal states like emotions. However, they implemented their model only in computer simulations and claimed that empirical research involving humans should be carried out to fully validate their findings. Regarding graphical models, Granic and Lamey [47] combined state space grid graphical analysis with log-linear multivariate models to study mother-child interactions. More recently, researchers addressed interpersonal emotion dynamics adopting subspaces representation techniques and machine learning approaches. For example, Lee et al. [48] investigated the relationship between vocal synchrony and the affective processes in distressed married couples’ interactions. By using a PCA-based measure, they achieved the 62 percent of accuracy in differentiating positive and negative emotions expressed during interactions. Following this study, Yang et al. [49] made an attempt to investigate the dynamic adaptation of vocal and gesture behavior in affectively rich interactions through functional data analysis (FDA) techniques. From the machine learning perspective, Xiao et al. [50] explored the dynamics of positive and negative emotions through head motion clustering models and Kullback-Leibler divergence-based similarity measures. Another interesting and powerful approach to investigate the behavior of dynamical systems exploits the Cross-Recurrence Plot (CRP) and its quantification through the Cross-Recurrence Quantification Analysis (CRQA) [51]. CRP and CRQA show various advantages over the other methods. First, they do not require any assumptions on data as, for example, stationarity which is a strong requirement for correlation-based methods (e.g., crosscorrelation, PCA) and information theoretic methods (e.g., mutual information). Second, they are able to inform both on linear and nonlinear contributions of the dynamics. Nonlinear contributions are neglected by correlation-based methods. Third, they provide an easy way to quantify and to visualize over a bi-dimensional space (CRP) how the dynamics of interactions described through multivariate time series unfold over different time scales. Finally, CRQA is a totally unsupervised method. This is a relevant point to overcome experimental issues to gather data from a large population of participants. CRQA was already successfully exploited to quantitatively measure how and to which extent a dyad exhibits couplings of features from several modalities during conversational tasks (e.g., [52], [53], [54], [55]). However, in all these studies, reference to cross-modality and emotions are neglected. Varni et al.

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


VARNI ET AL.: COMPUTATIONAL STUDY OF PRIMITIVE EMOTIONAL CONTAGION IN DYADIC INTERACTIONS

261

Fig. 1. Methodology pipeline. Facial expressions and sentiment polarity time series were extracted from video recordings and transcripts, for each person (leader and follower). Each modality was separately analyzed by building CRPs from facial expressions polarity or sentiment affective polarity time series and applying on them CRQA. Each element of CRP was colored by using the following coding: green, red and blue were used to represent positive, negative and objective polarity match between sentiment or facial expressions of the persons, respectively. Purple and black stripes indicated when one of the two persons was silent. The same kind of analysis was performed by using cross-modal time series.

[56] developed a CRQA software module in a multimodal system aimed at real-time analysis of nonverbal affective social interaction. They analyzed the head movements of a string duo acting different emotions during the performance of a musical excerpt. They studied to which extent the duo synchronize depending on the played emotion. However, they analyzed time series of kinematic features only.

3

MEASURING EMOTIONAL CONTAGION

In this section, the proposed approach is described. Emotion was here addressed taking into account two modalities: semantics and facial expressions. These two modalities were mainly chosen due to their different dynamics and because they are the modalities mostly used in face-to-face interactions. It is noteworthy that para-verbal communication (e.g., voice quality and laughter) was not included in the analysis. For this reason, emotional information from constructs such as irony was not taken into account. Fig. 1 depicts a general overview of the approach. Time series of the affective polarity of facial expressions and sentiment were automatically extracted from videos and dialogue transcripts, respectively. Cross-recurrence plots (CRPs) and cross-recurrent quantification analysis (CRQA) were computed from these series. CRQA was carried out both at short-term and long-term to explore temporal dynamics of emotional contagion. Short-term was defined in accordance with the dynamics of each modality, whereas long-term referred to the whole interaction. As regards the tool used for dealing with CRP and CRQA, this was built on the top of Python code available in the SyncPy library [57].

3.1 Building Affective Polarity Time Series 3.1.1 Sentiment Polarity Time Series Sentiment scores were extracted through SENTIWORDNET 3.0, a lexical resource supporting sentiment classification and opinion mining applications [58]. The time resolved

version of a dialogue D between a leader L and a follower F was analyzed to extract the positive, negative and objective polarity of sentiment. The following steps were performed: Pre-processing: D was decomposed in a set of utterances U Âź fU1 ; U2 ; . . . Un g. Very long utterances were split into two or more sub-utterances taking into account punctuation and conjunctions. Then, each utterance was processed to expand all the contracted forms and to remove stop words. All the remaining terms were classified into their parts-ofspeech (POS) and the resulting tags were remapped in the SENTIWORDNET format. Verbs and nouns were lemmatized and, finally, for each term its synset was extracted according to its POS tag and its sentiment score computed. The sentiment score was represented as: (positive_score, negative_score, objective_score). When a synset resulted in an empty list, stemming was applied to the term and synset and sentiment score are recomputed. Dealing with modifiers: utterances included modifiers such as negations, intensifiers, and downtoners. These modifiers affect the sentiment score of adjectives in different ways. Modifiers from the Google Web 1T 5-grams database [59] that reports a list of the most common modifiers and their frequencies of use were included in the analysis. When a negation occurred in an utterance, the positive_score of the next adjective was inverted with its negative_score. The objective_score remained unchanged and the scores of negation were removed. When a modifier (intensifier or downtoner) was encountered and the next term was an adjective, the sentiment score of this adjective was modified in accordance with the polarity of modifier and its strength, then the modifier scores were removed from the sentiment time series. The strength of modifiers was expressed as a percentage indicating how much the positive/negative score of the next adjective had to be changed. The percentages were drawn from English grammar and the work of Taboada and colleagues [60].

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


262

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 11, NO. 2, APRIL-JUNE 2020

Fig. 2. Semantic polarity extraction for utterance U: on the right, the original utterance, the pre-processed utterance, and the polarity of each term. Finally, the utterance polarity was computed starting from the histograms of the labeled positive, negative and objective terms (bottom left). On the left, each term is sketched on the SENTIWORDNET plane.

Utterance’s sentiment polarity: Fig. 2 shows how the sentiment polarity of each utterance U was computed. First, the sentiment polarity scores of each term in U were compared with each other: when the positive (negative) score was greater than or equal to the negative (positive) and objective scores, the term was counted as positive (negative); otherwise it was counted as objective. Then, from the number of positive, negative and objective terms occurring in U, its sentiment label was computed using the algorithm below (Algorithm 1).

Algorithm 1. Utterance’s Sentiment Polarity Computation. The Labels Associated to the Polarity Values were Chosen in Accordance with the Colorimetry Proposed by SENTIWORDNET Data: U utterance 2 D Result: Up sentiment polarity of U; for each U 2 D do if (#pos terms ¼ 0) and (#neg terms ¼ 0) then Up obj (label ’B’) else if (#pos terms ! ¼ 0) and (#neg terms ¼ 0) then Up pos (label ’G’) else if (#pos terms ¼ 0) and (#neg terms ! ¼ 0) then Up neg (label ’R’) else if (#pos terms ! ¼ 0) and (#neg terms ! ¼ 0) then (Pc ; Nc ; Oc ) Centroid (scores of pos/neg terms) if Pc > Nc then Up pos (label ’G’) else if Nc > Pc then Up pos (label ’R’)

Finally, to build the sentiment polarity time series of D, the sentiment polarity of the utterances were aggregated

together. The resulting time series was re-sampled in order to have the same temporal resolution of the facial polarity time series, that was 200 ms (see next paragraph). The sentiment label of the whole utterance were replicated at each of these resulting samples. Fig. 3 shows a 14s segment of a scarf plot of the re-sampled sentiment scores time series relative to the following exemplary utterances from a dialogue involving two persons (L and F ). L : Hi, I am Mr. X (1) F : Hi, my name is Mr. Y (2) L: Have we met before, Mr. Y? (3) F : Mmmmh, maybe. How are you? (4) L: Oh well, I am...I am happy (5) F : Great! You look really a very happy person (6) L: I am a happy person and I like to make other people happy (7) Segments of dialogue in which a person does not speak were synthesized as time series by using a prefixed unique label for each person and were represented as dashed boxes in the plot.The algorithm was evaluated on 808 utterances from the transcripts of the Solid-SAL SEMAINE corpus [61]. This corpus consists of recordings of dyadic dialogues between an operator playing the role of a virtual agent and following a simple conversational scenario, and a user. The virtual agents are characterized by different personalities showing positive or negative affective polarity and they are designed to evoke emotional responses. Therefore, it is reasonable to expect that an agent with a positive/negative personality should preferably adopt positive/negative words when interacting. Table 1 summarizes the number of positive, neutral and negative utterances detected by the algorithm. A x2 test for independence was run to check whether there was a relationship between the polarity expected by the agent and that one detected by the algorithm (neutral utterances are not taken into account). The TABLE 1 Evaluation: Contingency Table of the Utterances’ Polarity Detected by the Sentiment Analysis Algorithm No. Utterances Agent’s polarity

Fig. 3. Scarf plot of the sentiment polarity time series associated to the short dialogue involving two persons (L and F ). The numbers in each box correspond to the utterance number. Dashed boxes and lines codify silence.

Positive Negative Total

Positive

Neutral

Negative

82 97 179

192 246 438

34 157 191

Total 308 500 808

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


VARNI ET AL.: COMPUTATIONAL STUDY OF PRIMITIVE EMOTIONAL CONTAGION IN DYADIC INTERACTIONS

Fig. 4. Scarf plot showing the facial expressions polarity time series associated to the short dialogue.

relationship between these variables was significant x2 (1, N=370)=32.4, p < .001.

3.1.2 Facial Expressions Polarity Time Series Each frame of both partners’ facial video sequences has to be assigned an affective polarity (negative, neutral or positive). The facial expression polarity detection pipeline followed in this work involves the steps explained here below (see top left Fig. 1). First, the face was segmented from the whole input image by means of Viola and Jones Haar Cascade algorithm [62]. After that, 12 facial landmarks (4 in the eyebrows, 4 in the eyes, 1 in the nose and 3 in the mouth) were extracted thanks to the Supervised Descent Method provided by Intraface library [63]. On the basis of facial landmarks positions and following the methodology presented in [64], 3 specific rectangular regions of interest (ROIs) were defined: mouth ROI, frown ROI and eyes ROI. To cancel face rotations, each ROI is aligned with regard to different surrounding facial landmarks. Then, Histogram of Gradients (HOG) descriptors were computed from them. The 3 HOG descriptors extracted from each ROI are then concatenated into a single feature vector and classification of each frame’s polarity is performed by a pre-trained Support Vector Machine (SVM) model. This model, built using LIBSVM library [65], makes use of a linear kernel and tackles multi-class classification into “negative”, “neutral” or “positive” categories through a one-versus-all strategy. MUG database was used for training and testing the model [66]. MUG contains onset-apex-offset 896x896 pixels resolution facial videos from 52 different persons showing 6 basic facial emotions (“happiness”, “anger”, “disgust”, “fear”, “sadness” and “surprise”). Our “negative” class training set was built by selecting apex and mid-apex frames from negative emotions (i.e., “anger”, “disgust”, “fear”,“sadness”) sequences. The same approach was followed to build the “positive” category set, this time by using the videos labeled as “happiness”. Finally, the “neutral” class images were selected from video sequences’ first (onset) and last (offset) frames. A final training set containing 2000 samples per polarity class was obtained. Regarding testing and benchmarking issues, the C value for the SVM soft margin cost function and HOG parameters were optimized using a grid-search procedure. The accuracy of the created facial expression polarity detector model, in accordance with a 10-fold cross-validation strategy without subject overlap between training and testing samples, was 82.01 percent for “negative” class, 84.78 percent for “neutral” class and 96.05 percent for “positive” class. This led to an overall accuracy of 87.61 percent and a macro-F1 score of

263

Fig. 5. Scarf plot showing cross-modal polarity time series.

F1M ¼ 0:79. For each video sequence, facial expression polarity was extracted frame-per-frame using the SVM model and the corresponding time series was built. Then, the time series was smoothed to remove the effect of noisy frames by applying a majority voting strategy over a centered moving-window of 10 frames (i.e., 200 ms). Fig. 4 depicts a 14s segment of facial expressions polarity time series from the previous exemplary dialogue. As in Fig. 3, colored boxes are used to emphasize the polarity labels. Conversely to sentiment time series, where segments codifying silence are present, facial expressions polarity time series are continuous, that is, a polarity value is provided at each time instant.

3.1.3 Cross-Modal Polarity Time Series Cross-modal polarity time series were built from the sentiment polarity time series and the facial expressions polarity time series. Fig. 5 depicts a cross-modal time series. More specifically, each cross-modal time series was obtained by filling up the silent instants of the sentiment polarity time series (dashed gray boxes in Fig. 3) with the coincident instants of time of the facial expression polarity time series. This approach enabled to have a continuous polarity signal even when people were silent and to avoid spurious effects of mouth movements on facial expressions related to vocalization. In other words, the facial expressions channel is assigned a zero confidence when a given participant speaks. It is important to highlight that the methodology applied here to build cross-modal time series could be easily scalable to new affective inputs coming from other modalities (e.g., body gestures or EEG). A confidence could be assigned to each new channel over time and cross-modal time series could be built by averaging the polarity information available at each time instant. Once the cross-modal times series is built, the CRQA analysis could be further performed exactly as explained in the following section. 3.2 CRQA for Dynamical Analysis CRQA was adopted to investigate emotional contagion in each dyad. Concretely, the times at which a polarity value of the leader recurs (i.e., it is close) to a polarity value of the follower were looked. The focus is on the co-visitation patterns of the time series in the polarity space. Cross-recurrence was first visually introduced through a black and white plot: the Cross-Recurrence Plot (CRP), a square/rectangular black and white area spanned by two time series describing two dynamical systems. Black points correspond to the times at which the two systems co-visit the same area in the feature space, whereas white points correspond to the times at which each system runs in a different area. A CRP

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


264

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 11, NO. 2, APRIL-JUNE 2020

is expressed by the following cross-recurrence matrix (CR) ~ f2 ~ CRf1; i;j ð Þ

~i f2 ~j kÞ; i ¼ 1:::N; j ¼ 1:::M; ¼ Qð kf1

“Long term” measures Cross-Recurrence Rate (cRR). It is defined as cRRð Þ ¼

N 1 X CRi;j ð Þ; 2 N i;j¼1

(2)

and measures the density of recurrence points in a CRP, that is, it corresponds to the ratio between the number of the matrix elements shared by the two partners and the number of available elements (i.e., all the elements in the matrix). cRR represents the overall extent to which the partners were using same polarity. In order to take into account only the influence that the leader exerted on the follower, cRR was computed only on the lower triangular part of the anti-diagonal matrix of the CRP. Average diagonal lines length (L) and entropy (E). L represents the average length of a recurrent trajectory in a CRP. It is defined as PN m L ¼ Pl¼l N

l¼lm

lP ðlÞ P ðlÞ

;

l¼lm

(1)

~ and f2 ~ 2 IRd are the d-dimensional time series of where f1 the two systems having N and M samples, respectively; is the threshold to claim closeness between two points, Qð:Þ is ~ the Heaviside function and k:k is a norm. In this study, f1 3 ~ and f2 2 IR are the 3-dimensional polarity time series of the two partners during a dialogue of N samples length. The threshold was set to zero so that only exact matches of the same polarity were considered as recurrent. Polarity can assume three different values: positive, negative and objective. In order to distinguish polarity matches among them, the following coding of colors was used in the CRPs. A cross-recurrence point was green-, red- or blue-tinted when a match in positive, negative or objective polarity occurred, respectively. A CRP can be analyzed both at graphical and numerical level (through CRQA). Both of these approaches are informative about the common dynamics of the underlying systems. At the former level, a qualitative analysis can be carried out starting from the patterns a CRP shows. Typical patterns are: single isolated points, periodical diagonal lines, and vertical/horizontal lines. These patterns are hints of randomness, periodicity and laminar states, respectively. At the latter level, CRQA offers several measures to quantify these patterns (see [51] for a complete survey). CRQA holds on also by using categorical data (see [67] and [68]). The following measures were extracted

(3)

where lm is the minimal diagonal length to be taken into account, and P ðlÞ is the histogram of the diagonal lines. This measure expresses how strongly stable a recurrent trajectory is. More specifically, in the analysis of polarity, high L corresponds to long temporal segments of shared polarity of the two partners.A measure strictly connected to L is the entropy of the length of the diagonal lines. It is defined as

N X

pðlÞ ln pðlÞ; pðlÞ ¼

P ðlÞ Nl

(4)

and measures the complexity of the diagonal lines. In polarity analysis, high E indicates that there is not a preferred diagonal length, that is that the partners are using a plurality of patterns of polarity sharing. Low E stands for a predominant length, that is the partners tend to repeat the same patterns of polarity sharing. L and E were computed only on the lower triangular part of the anti-diagonal matrix of the CRP. “Short term” measures t-cross-Recurrence (cRRt ). The oriented diagonal lines in a CRP represent slices of time for which the two systems run parallel although with a certain relative delay. The diagonal-wise cross-Recurrence Rate (cRRt ) measures the probability of occurrence of similar values in both systems at a given relative delay t. With a reference to cRR, it expresses the density of recurrent points over time distances from the main diagonal line in CRP. cRRt is computed as cRRt ¼

X X 1 N t 1 N t CR i;j ¼ lP ðlÞ; N t j i¼t N t j i¼t t

(5)

where P ðlÞt is the number of diagonal lines of length l on each diagonal line parallel to the main diagonal. Diagonal lines above the main diagonal are identified by a positive t, diagonal lines below the main diagonal by a negative t. The main diagonal is the diagonal line of reference (t=0). cRRt quantifies the extent to which the partners were engaged in polarity during the dialogue, that is it measures the amount of crossrecurrence occurring in temporal proximity. By analyzing two modalities (facial expressions and semantics) having different dynamics, two different temporal proximity criteria were adopted. As for the other measures, cRRt was computed only on the lower triangular part of the anti-diagonal matrix of the CRP. t-average diagonal line length and t-entropy (Lt , Et ). Lt and Et are the average length of the diagonal lines on the t-lagged diagonal parallel to the main diagonal in a CRP and the entropy of the histogram of the diagonal lines on the t-lagged diagonal parallel to the main diagonal in a CRP.

4

USE CASE: THE SAL-SEMAINE CORPUS

4.1 The Solid SAL Audiovisual Corpus The corpus from the SEMAINE (Sustained Emotionally coloured MAchine-human Interaction using Nonverbal Expression) project [61] was adopted as test-bed. It grounds on the Sensitive Artificial Listener (SAL) induction technique [69] and includes emotionally-coloured audio-video recordings of dialogues between an operator and a user. Following the SAL paradigm, the operator can be a person simulating a machine or a real machine, whereas the user is always a human partner. Here, the focus was only done on the Solid SAL scenario where the operator is a human acting one among 4 roles different in personality. The roles are: Poppy (cheerful and outgoing), Prudence (pragmatic), Spike (aggressive), and Obadiah (pessimistic). The dialogues were non-scripted in terms of sentences to allow for the most

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


VARNI ET AL.: COMPUTATIONAL STUDY OF PRIMITIVE EMOTIONAL CONTAGION IN DYADIC INTERACTIONS

265

Fig. 6. The CRPs obtained from session 107 (Obadiah) of the SAL-SEMAINE corpus. In accordance with the personality of the character, the plots show a large number of red points. (a) The CRP from the facial expressions polarity time series. (b) The CRP from the sentiment polarity time series. (c) The CRP from cross-modal affective polarity time series.

spontaneous interaction possible. However, each operator was trained to the affective content and had to remain as constant as possible during a session. The operator is therefore considered as the leader and the user as the follower of the conversation. The user and operator were situated in separate rooms, equipped with video screens, and recorded using wearable microphones and frontal cameras (780 x 580, 49.979 fps). Twenty-one SAL sessions (that is, about 84 dialogues) out of 24 were also fully transcribed into a text file. These transcripts were additionally time aligned with the turn taking changes. This study included 21 pairs of audio-video recordings2 of Poppy (7 clips, M=333.280s, SD=67.860s), Spike (7 clips, M=315.140s, SD=72.630s) and Obadiah (7 clips, M=302.860s, SD=122.600s). Recordings of Prudence were not included due to the neutral affective characterization of the character in terms of polarity of emotion. The other recordings of the corpus were not suitable for facial expressions polarity extraction (e.g., due to frequent out-of-plane) or had problems in their transcripts.

4.2 The Solid SAL Annotations Solid SAL corpus was continuously annotated by raters (6 - 8 raters for each dialogue) along several dimensions (e.g., valence, arousal, and so on) via the FeelTrace tool [70]. All the recordings of the users were annotated, whereas only few annotations are available for the operators. To verify that the partners were really expressing emotions with a specific valence, inter-raters reliability of the valence expressed by users and operators in each dialogue was assessed through a two-way mixed, consistency, average-measures ICC. All the obtained ICCs were in the excellent range (ICC > 0.75). This indicates that a minimal amount of measurement error was introduced by independent raters. Then, starting from the average of the scores provided by the raters, the percentage of time for which user/operator showed the expected valence according to the characters personality was computed. This analysis showed that the operators exhibited the expected valence at each instant of the interaction. Moreover, during the interaction with Obadiah 4 users (from the sessions 15, 54, 107, 127) exhibited negative valence for more than half of the 2. These recordings are from the following SAL sessions: 15, 21, 25, 26, 52, 54, 60, 65, 67, 71, 73, 78, 79, 82, 83, 84, 94, 95, 96, 107, 127.

interaction; during the interaction with Spike 5 users (from the sessions 52, 73, 78, 84, 95) exhibited negative valence for more than half of the interaction; during the interaction with Poppy all the 7 users exhibited positive valence for more than half of the interaction.

4.3 Facial Polarity Cross-Recurrence Analysis 4.3.1 Long-Term Analysis A CRP was created for each of the 21 dialogues from the facial expressions polarity time series of operator and user. As previously described, the following coding of colors was used in CRP: green points correspond to the sharing of positive polarity, red ones to the sharing of negative polarity, whereas blue ones to the sharing of neutrality. When there was not any matching between the facial polarity of operator and user, white points were used. Two additional colors were adopted, purple and black, indicating when the operator or the user were talking, respectively. This was done to guarantee to take into account in the analysis only the emotional contribution from non-speaking faces. Indeed, as facial appearance changes resulting from mouth movements could strongly impact the performance of facial expressions polarity detection, it was considered that this modality is not reliable for speaking faces. Fig. 6a shows an example of a CRP from facial expressions polarity time series. Each point in the plot corresponds to 200ms. In accordance with the personality of the character, there are very few green points and a prevalence of red points. cRR, L, and E were computed for each dialogue. 4.3.2 Short-Term Analysis The diagonal-wise cRRt profile was computed with a maximum delay t max = 3s. This implies that only user’s facial expressions occurring at most after 3s an operator’s facial expression were evaluated. This value was chosen as facial expressions typically last between 0.5 and 4s on the face [71]. The threshold on diagonal length was set to lm =2 (400 ms). In this way, single isolated recurrence points that are hint, for example, of stochastic or uncorrelated behavior were filtered out. Fig. 7 shows the area of the plot that was taken into account to compute cRRt (a), and cRRt profile in terms of negative, positive and neutral polarity (b). All the diagonal lines in the area having as upper bound the

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


266

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 11, NO. 2, APRIL-JUNE 2020

Fig. 7. Short-term facial polarity CRQA. (a) CRP obtained by using facial expressions affective polarity time series. The area contributing to the computation of the cRRt is highlighted. This area is bounded on its upper side by the anti-diagonal (yellow dotted line) and on its lower side by a diagonal line intercepting the time axis of the user at tmax =3s. (b) Profile of cRRt in terms of negative (red), positive (green), and neutral (blue) polarity. The profile of the negative polarity reaches higher values according to the larger number of red points composing diagonal patterns.

anti-diagonal (dotted yellow line) and as lower bound the diagonal intercepting the tuser axis at t max (solid yellow line) contribute to grow up the profile.

4.4 Sentiment Polarity Cross-Recurrence Analysis 4.4.1 Long-Term Analysis A CRP was also created for each of the 21 dialogues from the semantic polarity time series of operator and user. The green, red, blue and white color codes have the same meaning as in the CRP from facial polarity cross-recurrence analysis. Purple and black indicate here when the operator and the user were silent, respectively. This was done to guarantee to take into account in the analysis only the contribution from semantics. Fig. 6b shows a CRP built from the sentiment polarity time series. As for the corresponding plot from facial expressions polarity, there is a prevalence of red points. Then, cRR, L, and E were computed for each dialogue. 4.4.2 Short-Term Analysis As for facial expressions polarity, the diagonal-wise cRRt profile was also computed. However, to take into account that to transmit her emotional polarity, the operator has to carry out a greater and longer effort through semantics with respect to facial expressions, and that the timing at which semantics can influence the user affective state is longer (e.g., [72], [73]), some arrangements were made. An adaptive t max changing in accordance with conversational turns was used. A conversational turn (CT) consisted of a speaking turn of the operator followed by a speaking turn of the user. Concretely, at each conversational turn n with temporal length tn , cRRt was computed on the submatrix with a size Sn equals to 2 (6) Sn ¼ Sn 1 þ tn 1 þ 3 The choice of increasing by 23 the length of the CT was made to use this length as the t max value for this turn (denoted by t n ). This allowed to overcome the theoretical limit of recurrence that imposes the use of only 23 of data to avoid to obtain degraded CRQA measures [51]. Thus, corresponding t n values were computed as follows

tn ¼

n X

ti :

(7)

i¼1

This procedure was repeated iteratively over all the CTs in the dialogue and cross-recurrence contributions were cumulated at each iteration. Fig. 8 shows a graphical example of this procedure3.

4.5 Cross-Modal Cross-Recurrence Analysis 4.5.1 Long-Term Analysis A CRP (see Fig. 6c) was created for each of the 21 dialogues from the cross-modal polarity time series built as explained in Section 3.1.3. The color coding was exactly the same as used in the previous CRPs. The only difference was that, due to cross-modality, purple and black dots disappeared from the plot. The resulting CRP clearly shows, already from a graphical point of view, the power of a cross-modal approach to capture insights of emotional contagion. In this case too, cRR, L, and E were computed for each dialogue. 4.5.2 Short-Term Analysis Semantics and facial expressions have different dynamics. While the t max for the short-term analysis of facial expressions was set to 3s (c.f. Section 4.3.2), an adaptive t max strategy depending on conversational turns duration was adopted for semantics (c.f. Section 4.4.2). To include the cross-modal contributions of both modalities at the shortterm, the diagonal-wise cRRt profile was computed as for the most restrictive one, i.e., semantics. Fig. 8a shows this approach, Fig. 8b shows the cRRt profile.

5

RESULTS AND DISCUSSION

5.1 Long-Term Analysis Quantification of emotional contagion. A first inspection of the cRR values showed that frequently the maximum cRRs occur for the objective polarity. As concerns semantics, this 3. Fig. 8 refers to cross-modal affective polarity time series whose analysis was performed by using the same procedure (see next paragraph). The choice to refer to this figure here was done only for the sake of graphical clarity.

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


VARNI ET AL.: COMPUTATIONAL STUDY OF PRIMITIVE EMOTIONAL CONTAGION IN DYADIC INTERACTIONS

267

Fig. 8. Short-term cross-modal polarity CRQA of the session 107 (Obadiah). On the left (a) the CRP over which cRRt was computed by using an adaptive t max changing in accordance with the length of the conversational turns. On the right (b) the corresponding cRRt profile. t n denotes the t max used at each conversational turn n.

can be explained by the fact that all the dialogues of the SAL-SEMAINE were scripted in emotional content only from the operator’s side. The operators used short utterances including only positive (negative)/objective terms in accordance with the character’s personality. On the contrary, the users freely spoke and were encouraged to extensively talk about different topics. This implied a larger probability of use of objective terms, resulting in a larger amount of possible objective matching than of positive or negative matching. The second cRR maximum corresponds to the Targeted Polarity (TP), i.e., the one corresponding to the character’s personality. The only exception are for the sessions 26 (Poppy), 25 and 95 (Spike), where the first or second maximum occur for the Non-Targeted Polarity (NTP). For similar reasons, also for facial expressions, the maximum cRRs frequently occur for the objective polarity. Interestingly, in the case of Poppy the maximum always appears either for the objective or positive polarity, with values much higher than the ones for the negative polarity. However, this is not always the case for Obadiah and Spike, where sometimes the maximum cRR or its second maximum occur for the positive polarity (sessions 65, 67 and 84) and differences in cRRs are not so large. This result is in line with previous psychological studies that found negative facial expressions less mimicked than Duchenne smiling in a natural dyadic interactive setting [74]. CRQA of each separate modality was performed only over trustworthy parts of the dialogue, i.e., non-speaking segments for facial expressions and speaking segments for semantics (see purple and black stripes in CRPs, Fig. 7). Thus, there was not room to directly compare the results from each separate modality vs cross-modality, as crossmodality takes into account more conversational information per se. However, it is of great interest to analyze to which extent the cross-modal approach overcomes the mere sum of modalities’ contributions. To that end, a pair of paired sign tests using Bonferroni adjusted a=.025 per test was run to verify whether there was a significant difference between the median of the cRR values of the sum of the two modalities versus the cRR values obtained by using a cross-

modal approach. More specifically, the tests compared the cRR obtained for the TP of the character and the cRR obtained for the NTP of the character, respectively. In both cases, cross-modality elicited a statistically significant (S=21 (TP), S=20 (NTP), p < .001) median increase in the cRR values. For the TP, the median increase was equal to 10.9 percent, whereas for the NTP it was equal to 0.67 percent. Boxplots illustrating the differences are shown in Fig. 9a. This witnesses how the adoption of a computational approach able to catch cross-modality boosts the detection of the emotional links established between two partners. Patterns of emotional contagion. cRR quantifies to which extent the operator and the user shared the same polarity. However, this measure does not provide any information about how the emotional contagion is structured in patterns over time. In order to tackle this issue, the analysis of shared polarity patterns (L) in the dialogue and their complexity (E) was performed. The statistical significance of L and E was tested through a rank order statistics by computing L and E generated by using 100 shuffled surrogates for each dialogue. These surrogates were generated by shuffling the temporal order of the user’s samples. This procedure basically consisted in permuting the user samples without replacement. The test was done to check if identically distributed noise (i.i.d.) gave the same values of the measures obtained by using the original time series both from the operator and the user. The null hypothesis at which these surrogates correspond is that they are indistinguishable from i.i.d. noise.4 The L and E from the original data were statistically different (p < .05) from those from the surrogates for each dialogue. Then, the average over modalities (i.e., semantics and facial expressions) of the L and E values obtained for the TP and NTP were tested against the averaged values of L and E obtained for the TP and NTP from the cross-modal approach. A Bonferroni adjusted a ¼ :025 per test was used. 4. Note that shuffling data does not change the cRR in a CRP but only the shape of the patterns. For this reason, for the long-term analysis, the test was carried out only on L and E.

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


268

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 11, NO. 2, APRIL-JUNE 2020

Fig. 9. (a) Boxplot of the cRR values obtained summing up the modalities (sum) and using a cross-modal approach (cross). (b) and (c) Boxplots of the L and E values obtained averaging the modalities (avg) and using a cross-modal approach. Light gray and black bold lines indicate the NTP (Non Targeted Polarity) and TP (Targeted Polarity) boxes, respectively. Outliers are marked as full black dots. Statistical significance of the differences is reported via * (*** p < .001). Red dots in (c) stand for the mean.

A pair of paired sign tests was run on the average lengths L5 and the median increases for the TP and the NTP were equal to 1.2s (S=21, p < .001) and 0.7s (S=19, p < .001), respectively. A pair of paired t-test run over the entropy E enabled to observe a significant difference in the use of the recurrent patterns6: the average increase for the TP (t = 6.1361, df = 16, p < .001) and the NTP (t = 8.6785, df = 16, p < .001) were equal to 1 and 0.7, respectively. Although the values of E were very small (the range of E is ½0; þ1Þ), the analysis shows how emotional contagion is characterized by a predominant polarity pattern, i.e., there is a preferred diagonal line length. Panels (b) and (c) of Fig. 9 show the boxplots of the L and E values.

several possible explanations of this result. First, Solid SAL does not include characters covering each quadrant of the affective circumplex model: a character having positive polarity and low arousal is missing. This implies that the positive polarity is covered only by the sessions involving Poppy and this results in an unbalanced dataset. Furthermore, as already observed in [28], some users may have found at certain times the behavior of Spike too exaggerated and hostile. This could imply two possible reactions in the users: a lack of attention and an opposite polarity escalation appearing as smiles or short joyful laughter episodes (this was confirmed by visual inspection of the corpus).

These tests revealed that a cross-modal approach guaranteed to detect more stable and more complex temporal segments over which the user and the operator were sharing the same polarity.

5.2 Short-Term Analysis Quantification of emotional contagion. A hypothesis test based on surrogates was adopted to validate the cRRt s resulting from the analysis. For each dialogue and each polarity, 100 shuffled surrogates of the user’s time series were built. Then, a set of cRRt s was computed by these surrogates and the original time series of the operator. For each dialogue and for each modality, these new 100 cRRt s were aggregated in the matrices Mp (positive), Mn (negative), Mo (objective). Each matrix had a size equal to 100 length dialogue. To control the Type I error rate at a=.05 over the multiple hypotheses tests, a False Discovery Rate (FDR) approach was used [77]. In this way, the statistical significance of each value of cRRt was checked. Then, a pair of paired sign tests using Bonferroni adjusted a =.025 per test was run on the maxima of cRRt for the TP and NTP of the sum of the two modalities and of that of crossmodality, respectively. These tests involved only the maxima occurring at the statistically significant times detected by FDR, that is all the maxima for semantics and cross-modality and only 13 maxima for facial expressions. This resulted in taking into account only the contribution of semantics when it was not possible to sum up the values from the two modalities. The median increase in the TP was equal to 15.4 percent (S=12, p < .001). However, there was no significant difference between the medians of the NTP. Panel (a) of Fig. 10 depicts the boxplots of the differences.

Effect of the expressed polarity. The effect of the polarity (positive versus negative) expressed by the operator on the amount of emotional contagion was investigated for the cross-modal approach. For this analysis, the CRQA metrics from the sessions of Obadiah and Spike were aggregated together. An unpaired Wilcoxon test revealed that there was a not significant difference between the two polarities’ cRRs (V=23.5, p > .05). A Wilcoxon rank sum test and a Welch t-test showed that the polarity expressed by the operator did not significantly affect (p > .05) neither the average length L nor the entropy E. This indicates that the extent to which the partners were using a polarity, the stability and the complexity of the patterns of shared polarity did not change in accordance with the polarity of the expressed emotion. Concerning cRR, this result quantitatively supports the findings by [28], but it is in contrast with literature in psychology. Several studies showed that people provide a stronger emotional response when they experience stimuli with negative polarity (e.g., [75]). Moreover, people tend to pay more attention on negative cues and to attribute negative polarity during social comparisons [76]. There are 5. When L was equal to NaN, a conservative approach was used for the analysis by setting NaN to 0. 6. The sessions where E was NaN were removed from the analysis.

The use of a cross-modal approach at short-term caught a larger amount of emotional contagion as well as at long-term. Patterns of emotional contagion. To provide more details about how the dynamics of emotional contagion unrolls, the

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


VARNI ET AL.: COMPUTATIONAL STUDY OF PRIMITIVE EMOTIONAL CONTAGION IN DYADIC INTERACTIONS

269

Fig. 10. (a) Boxplot of the cRRt values obtained summing up the modalities and using a cross-modal approach. (b) and (c) Boxplots of Lt and Et values obtained averaging the modalities and using a cross-modal approach. Not significant (n.s.) indicates p > .05. Red dots in (c) stand for the mean.

analysis of the t-average diagonal line length (Lt ) and the t-entropy (Et ) for the TP and the NTP was performed. A surrogates-based test was carried out for the Lt and the Et values for each session through a rank order statistics as already done for the long-term analysis. This test showed that only few values (3 values all from facial expressions) were not significant (p > .05). The following analysis took into account their not significance. The values obtained from the crossmodal approach were tested against the values obtained by averaging the contributions of the single modalities. For all the tests, a Bonferroni adjusted a=.025 per test was used. A pair of paired sign tests was run on the Lt values. Statistically significant median increases (1.3s and 0.6s in the TP (S=21, p < .001) and in the NTP (S=20, p < .001), respectively) were found when the cross-modal approach was adopted. Concerning Et , two paired t-tests showed that there were significant differences both for the TP (t = 8.0926, df = 11) and the NTP (t = 4.0924, df = 11). More specifically, by using a crossmodal approach, the mean Et increases were 1.4 (TP) and 0.7 (NTP). Panels (b) and (c) of Fig. 10 show the boxplots of the Lt and Et values. Therefore, also at the short-term, the cross-modal approach guaranteed the detection of more stable and more complex temporal segments of shared emotional polarity. Effect of the expressed polarity. Concerning the effect of the polarity expressed by the operator, a Wilcoxon rank sum test did not detect any significant difference for cRRt values (p > .05). The effect of the polarity on Lt and Et was also investigated. Two Wilcoxon rank sum tests did not reveal significant differences due to the polarity either (p > .05).7 This indicates that CRQA metrics did not change according to the polarity of the expressed emotion also at the short term. The possible explanations of this result are the ones outlined in Section 5.1.

6

CONCLUSION AND FUTURE WORK

This paper is a starting point for the automated analysis of emotional contagion in dyadic scenarios involving humans and/or virtual partners. It provides a reliable methodology scalable in terms of number and type of modalities to investigate the interplay of partners’ emotions both over the whole 7. Note that the reduced size of the number of available Et for the test could have affected the results.

interaction and at a shorter-term. This methodology was successfully applied to the well known Solid SAL-SEMAINE corpus, making use of semantics and facial expressions modalities. This paper addressed the interpersonal crossmodal emotional exchange by interleaving the affective polarity time series from the modalities of each partner looking at their use during interaction and analyzing them with techniques from complex systems analysis (CRQA). The results showed that the adoption of a cross-modal computational approach enabled to improve the detection of emotional contagion in both its amount and its structure at long- (cRR, L, and E) as well as at short- (cRRt , Lt , and Et ) term. Other strategies could be envisaged in the future to implement a crossmodal analysis of modalities, e.g, by providing a weight to each modality depending on the trustworthiness of the channels or by applying a majority voting algorithm. The methodology and the obtained results impact on HMI. For example, in a serious-game designed for learning multimodal expression of emotions and eliciting emotional responses in autistic children, a virtual partner able to adapt its multimodal behavior in accordance with the measures we proposed could be included. The short-term measures (cRRt and Lt ) could be used to have a first hint about whether the child is able to exhibit a coherent response and, in case of incoherent response, to devise strategies for quickly engaging the child in the interaction (e.g., smiling). In parallel the virtual partner could store knowledge about the long-term dynamics. This knowledge could be useful both for the virtual partner to devise more complex strategies and for therapists for a posteriori analysis of the emotional behavior of the child. Such analysis can benefit from measures taken from the single modalities of by interleaving them. Obviously, several time scales could be adopted in parallel in accordance with the scenario and the modalities involved. Moreover, our methodology could also support advances in social/human sciences allowing researchers to explore in a deeper way the dynamics and the structure of emotional contagion.

ACKNOWLEDGMENTS This research has been supported by the Laboratory of Excellence SMART (ANR-11-LABX-65) supported by French State funds managed by the ANR within the Investissements d’Avenir programme (ANR-11-IDEX-0004-02). G. Varni thanks Dr. G. Arnulfo for the discussions on FDR. Both Giovanna Varni and Isabelle Hupont equally contributed to this work.

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


270

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 11, NO. 2, APRIL-JUNE 2020

REFERENCES [1] [2]

[3] [4] [5] [6] [7]

[8] [9] [10] [11] [12] [13]

[14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

[24] [25] [26]

J. Gratch, S.-H. Kang, and N. Wang, “Using social agents to explore theories of rapport and emotional resonance,” Social Emotions Nature Artifact, 2013, Art. no. 181. B. Van Straalen, D. Heylen, M. Theune, and A. Nijholt, “Enhancing embodied conversational agents with social and emotional capabilities,” in Agents for Games and Simulations, Berlin, Germany: Springer, pp. 95–106, 2009. E. Cerezo, F. J. Seron, I. Hupont, and S. Baldassarri, Affective Embodied Conversational Agents for Natural Interaction. London, U.K. INTECH Open Access Publisher, 2008. K. Anderson, et al., “The tardis framework: Intelligent virtual agents for social coaching in job interviews,” in Proc. Advances in Comput. Entertainment, 2013, pp. 476–491. G. Stratou, et al., “A demonstration of the perception system in simsensei, a virtual human application for healthcare interviews,” in Proc. Int. Conf. Affect. Comput. Intell. Interaction, 2015, pp. 787–789. G. M. Lucas, J. Gratch, A. King, and L.-P. Morency, “Its only a computer: Virtual humans increase willingness to disclose,” Comput. Human Behavior, vol. 37, pp. 94–100, 2014. V. Demeure, R. Niewiadomski, and C. Pelachaud, “How is believability of a virtual agent related to warmth, competence, personification, and embodiment?” Presence: Teleoperators Virtual Environments, vol. 20, no. 5, pp. 431–448, 2011. K. Ruhland, et al., “A review of eye gaze in virtual agents, social robotics and HCI: Behaviour generation, user interaction and perception,” Comput. Graph. Forum, vol. 34, no. 6, pp. 299–326, 2015. M. Ochs, R. Niewiadomski, P. Brunet, and C. Pelachaud, “Smiling virtual agent in social context,” Cognitive Process., vol. 13, no. 2, pp. 519–532, 2012. S. Marsella, Y. Xu, M. Lhommet, A. Feng, S. Scherer, and A. Shapiro, “Virtual character performance from speech,” in Proc. ACM SIGGRAPH/Eurographics Symp. Comput. Animation, 2013, pp. 25–35. M. Klasen, Y.-H. Chen, and K. Mathiak, “Multisensory emotions: perception, combination and underlying neural processes,” Rev Neurosciences, vol. 23, no. 4, pp, 381–92, 2012. L.-P. Morency, “Computational study of human communication dynamic,” in Proc. ACM Workshop Human Gesture Behavior Understanding, 2011, pp. 13–18. E. Delaherche, M. Chetouani, A. Mahdhaoui, C. Saint-Georges, S. Viaux, and D. Cohen, “Interpersonal synchrony: A survey of evaluation methods across disciplines,” IEEE Trans. Affect. Comput., vol. 3, no. 3, pp. 349–365, Jul.-Sep. 2012. M. Argyle, Bodily Communication. Abingdon, U.K.: Routledge, 1988. A. Fogel, “Developing through relationships: Origins of communication, self and culture. 1993, ” London, U.K.: Harvester Wheatsheaf. E. Hatfield, J. Cacioppo, and R. Rapson, Emotional Contagion. Cambridge, U.K.: Cambridge Univ. Press, 1994. E. Hatfield, J. Cacioppo, and R. Rapson, “Primitive emotional contagion. Emotion and Social Behavior,” Thousand Oaks, CA, US: Sage Publications: pp. 151–177, 1992. A. Tinwell, The Uncanny Valley in Games and Animation. Boca Raton, FL, USA: CRC Press, 2014. A. H. Fischer, et al., “Social functions of emotion,” Handbook of Emotions, vol. 3, pp. 456–468, 2008. E. A. Butler, “Temporal interpersonal emotion systems the ties that form relationships,” Personality Social Psychology Rev., vol. 15, no. 4, pp. 367–393, 2011. S. D. Pugh, “Service with a smile: Emotional contagion in the service encounter,” Academy Manag. J., vol. 44, no. 5, pp. 1018–1027, 2001. T. W. Buchanan, S. L. Bagley, R. B. Stansfield, and S. D. Preston, “The empathic, physiological resonance of stress,” Social Neuroscience, vol. 7, no. 2, pp. 191–201, 2012. M. J. Howes, J. E. Hokanson, and D. A. Loewenstein, “Induction of depressive affect after prolonged exposure to a mildly depressed individual,” J. Personality Social Psychology, vol. 49, no. 4, 1985, Art. no. 1110. M. Westman, “Stress and strain crossover,” Human Relations, vol. 54, no. 6, pp. 717–751, 2001. L. Sels, E. Ceulemans, K. Bulteel, and P. Kuppens, “Emotional interdependence and well-being in close relationships,” Frontiers Psychology, vol. 7, 2016, Art. no. 283. S. F. Waters, T. V. West, and W. B. Mendes, “Stress contagion physiological covariation between mothers and infants,” Psychological Sci., vol. 25, no. 4, pp. 934–942, 2014.

[27] O. Weisman, et al., “Dynamics of non-verbal vocalizations and hormones during father-infant interaction,” IEEE Trans. Affect. Comput., vol. 7, no. 4, pp. 337–345, Oct.-Dec. 2016. [28] S. G. Barsade, “The ripple effect: Emotional contagion and its influence on group behavior,” Administ. Sci. Quart., vol. 47, no. 4, pp. 644–675, 2002. [29] A. D. Kramer, J. E. Guillory, and J. T. Hancock, “Experimental evidence of massive-scale emotional contagion through social networks,” Proc. National Academy Sci., vol. 111, no. 24, pp. 8788– 8790, 2014. [30] E. Ferrara and Z. Yang, “Measuring emotional contagion in social media,” PloS One, vol. 10, no. 11, 2015. [31] S. Kopp, “Social resonance and embodied coordination in face-toface conversation with artificial interlocutors,” Speech Commun., vol. 52, no. 6, pp. 587–597, 2010. [32] K. Prepin, M. Ochs, and C. Pelachaud, “Beyond backchannels: Coconstruction of dyadic stance by reciprocal reinforcement of smiles between virtual agents,” in Proc. Int. Annu. Conf. Cogn. Sci. Soc., 2013, pp. 1163–1168. [33] J. N. Bailenson and N. Yee, “Digital chameleons automatic assimilation of nonverbal gestures in immersive virtual environments,” Psychological Sci., vol. 16, no. 10, pp. 814–819, 2005. [34] G. Castellano, M. Mancini, C. Peters, and P. W. McOwan, “Expressive copying behavior for social agents: A perceptual analysis,” IEEE Trans. Syst. Man Cybern., Part A: Syst. Humans, vol. 42, no. 3, pp. 776–783, May 2012. [35] H. P. Branigan, M. J. Pickering, J. Pearson, and J. F. McLean, “Linguistic alignment between people and computers,” J. Pragmatics, vol. 42, no. 9, pp. 2355–2368, 2010. [36] H. Boukricha, C. Becker, and I. Wachsmuth, “Simulating empathy for the virtual human max,” in Proc. 2nd Int. Workshop Emotion Comput., 2007, pp. 22–27. [37] R. Zhao, T. Sinha, A. Black, and J. Cassell, “Socially-aware virtual agents: Automatically assessing dyadic rapport from temporal patterns of behavior,” in Proc. 16th Int. Conf. Intell. Virtual Agents, 2016, pp. 218–233. [38] J. Gratch, et al., “Virtual rapport,” in Proc. Int. Workshop Intell. Virtual Agents, 2006, pp. 14–27. [39] J. Dias, S. Mascarenhas, and A. Paiva, “Fatima modular: Towards an agent architecture with a generic appraisal framework,” in Emotion Modeling, Berlin, Germany: Springer, 2014, pp. 44–56. [40] P. Gebhard, “Alma: A layered model of affect,” in Int. Joint Conf. Auton. Agents Multiagent Syst., 2005, pp. 29–36. [41] F. Kaptein, J. Broekens, K. Hindrinks, and M. Neerincx, “CAAF: A cognitive affective agent programming framework,” in Proc. 16th Int. Conf. Intell. Virtual Agents, 2016, pp. 317–330. [42] E. Andre, M. Rehm, W. Minker, and D. B€ uhler, “Endowing spoken language dialogue systems with emotional intelligence,” in Affective Dialogue Systems, Berlin, Germany: Springer, 2004, pp. 178–187. [43] S. D’Mello and A. Graesser, “AutoTutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back,” ACM Trans. Interact. Intell. Syst., vol. 2, no. 4, pp. 1–39, 2013. [44] N. Jaques, D. McDuff, Y. L. Kim, and R. Picard, “Understanding and predicting bonding in conversations using thin slices of facial expressions and body language,” in Proc. 16th Int. Conf. Intell. Virtual Agents, 2016, pp. 64–74. [45] A. Zadeh, R. Zellers, E. Pincus, and L.-P. Morency, “Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages,” IEEE Intell. Syst., vol. 31, no. 6, pp. 82–88, Nov.-Dec. 2016. [46] R. R. Vallacher, A. Nowak, and M. Zochowski, “Dynamics of social coordination: The synchronization of internal states in close relationships,” Interaction Stud., vol. 6, no. 1, pp. 35–52, 2005. [47] I. Granic and A. V. Lamey, “Combining dynamic systems and multivariate analyses to compare the mother–child interactions of externalizing subtypes,” J. Abnormal Child Psychology, vol. 30, no. 3, pp. 265–283, 2002. [48] C.-C. Lee, et al., “Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions,” Comput. Speech Language, vol. 28, no. 2, pp. 518–539, 2014. [49] Z. Yang and S. S. Narayanan, “Analyzing temporal dynamics of dyadic synchrony in affective interactions,” in Proc. INTERSPEECH, 2016, pp. 42–46. [50] B. Xiao, P. Georgiou, B. Baucom, and S. Narayanan, “Modeling head motion entrainment for prediction of couples’ behavioral characteristics,” in Proc. IEEE Int. Conf. Affect. Comput. Intell. Interaction, 2015, pp. 91–97.

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


VARNI ET AL.: COMPUTATIONAL STUDY OF PRIMITIVE EMOTIONAL CONTAGION IN DYADIC INTERACTIONS

[51] N. Marwan, M. C. Romano, M. Thiel, and J. Kurths, “Recurrence plots for the analysis of complex systems,” Physics Reports, vol. 438, no. 56, pp. 237–329, 2007. [52] D. C. Richardson, R. Dale, and N. Kirkham, “The art of conversation is coordination: Common ground and the coupling of eye movements during dialogue,” Psychological Sci., vol. 18, pp. 407– 413, 2007. [53] M. M. Louwerse, R. Dale, E. G. Bard, and P. Jeuniaux, “Behavior matching in multimodal communication is synchronized,” Cognitive Sci., vol. 36, pp. 1404–1426, 2012. [54] E. Delaherche, G. Dumas, J. Nadel, and M. Chetouani, “Automatic measure of imitation during social interaction: A behavioral and hyperscanning-eeg benchmark,” Pattern Recog. Lett., vol. 66, pp. 118–126, 2015. [55] R. Fusaroli and K. Tyl en, “Investigating conversational dynamics: Interactive alignment, interpersonal synergy, and collective task performance,” Cogn. Sci., vol. 40, no. 1, pp. 145–171, 2016. [56] G. Varni, G. Volpe, and A. Camurri, “A system for real-time multimodal analysis of nonverbal affective social interaction in usercentric media,” IEEE Trans. Multimedia, vol. 12, no. 6, pp. 576–590, Oct. 2010. [57] G. Varni, M. Avril, A. Usta, and M. Chetouani, “SyncPy: A unified open-source analytic library for synchrony,” in Proc. 1st Workshop Model. INTERPERsonal SynchrONy Influence, 2015, pp. 41–47. [58] S. Baccianella, A. Esuli, and F. Sebastiani, “Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining,” in Proc. 7th Conf. Int. Language Resources Eval., 2010, pp. 2200–2204. [59] T. Brants and A. Franz, “Web 1t 5-gram version 1,” 2006, https:// catalog.ldc.upenn.edu/ldc2006t13 [60] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-based methods for sentiment analysis,” Comput. Linguistic, vol. 37, no. 2, pp. 267–307, 2011. [61] G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schr€ oder, “The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent,” IEEE Trans. Affect. Comput., vol. 3, no. 1, pp. 5–17, Jan.-Mar. 2012. [62] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vision, vol. 57, no. 2, pp. 137–154, 2004. [63] X. Xiong and F. Torre, “Supervised descent method and its applications to face alignment,” in Proc. IEEE Conf. Comput. Vision Pattern Recog., 2013, pp. 532–539. [64] I. Hupont and M. Chetouani, “Region-based facial representation for real-time action units intensity detection across datasets,” Pattern Anal. Appl., pp. 1–13, 2017. [65] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, pp. 1–27, 2011. [66] N. Aifanti, C. Papachristou, and A. Delopoulos, “The MUG facial expression database,” in Proc. 11th IEEE Int. Workshop Image Anal. Multimedia Interactive Serv., 2010, pp. 1–4. [67] F. Orsucci, K. Walter, A. Giuliani, C. L. Webber, and J. Zbilut, “Orthographic structuring of human speech and texts: Linguistic application of recurrence quantification analysis,” Int. J. Chaos Theory Appl., vol. 4, pp. 80–88, 1978. [68] M. Coco and R. Dale, “Cross-recurrence quantification analysis of categorical and continuous time series: An R package,” Quantitative Psychology Meas., vol. 5, 2013, Art. no. 510. [69] E. Douglas-Cowie, R. Cowie, C. Cox, N. Amir, and D. Heylen, “The sensitive artificial listener: An induction technique for generating emotionally coloured conversation,” in Proc. Workshop Corpora Res. Emotion Affect, 2008, pp. 1–4. [70] R. Cowie, E. Douglas-Cowie, S. Savvidou*, E. McMahon, M. Sawey, and M. Schr€ oder, “’FEELTRACE’: An instrument for recording perceived emotion in real time,” in Proc. ISCA Tutorial Research Workshop Speech Emotion, 2000, pp. 19–24. [71] D. Matsumoto and H. S. Hwang, “Evidence for training the ability to read microexpressions of emotion,” Motivation Emotion, vol. 35, no. 2, pp. 181–191, 2011. [72] T. Kaukomaa, A. Per€ akyl€a, and J. Ruusuvuori, “How listeners use facial expression to shift the emotional stance of the speakers utterance,” Res. Language Social Interaction, vol. 48, no. 3, pp. 319– 341, 2015. [73] I. Hupont, S. Ballano, E. Cerezo, and S. Baldassarri, “From a discrete perspective of emotions to continuous, dynamic, and multimodal affect sensing,” Emotion Recognit., John Wiley and Sons, Inc., pp. 461–491, 2015, doi: 10.1002/9781118910566.ch18. [74] U. Hess and P. Bourgeois, “You smile-I smile: Emotion expression in social interaction,” Biological Psychology, vol. 84, no. 3, pp. 514– 520, 2010.

271

[75] R. F. Baumeister, E. Bratslavsky, C. Finkenauer, and K. D. Vohs, “Bad is stronger than good,” Rev. General psychology, vol. 5, no. 4, 2001, Art. no. 323. [76] D. E. Kanouse and L. R. Hanson Jr, “Negativity in evaluations,” E. E. Jones, D. E. Kanouse, H. H. Kelley, R. E. Nisbett, S. Valins, and B. Weiner (Eds.), Attribution: Perceiving the Causes of Behavior, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 47–62, 1987. [77] C. R. Genovese, N. A. Lazar, and T. Nichols, “Thresholding of statistical maps in functional neuroimaging using the false discovery rate,” Neuroimage, vol. 15, no. 4, pp. 870–878, 2002. Giovanna Varni received the MSc degree in biomedical engineering and the PhD degree in electronic, computer, and telecommunications engineering from the University of Genoa, Italy, in 2005 and 2009, respectively. Since 2009 to 2014, she worked as postdoctoral researcher at InfoMus Lab-DIBRIS, University of Genoa. Then she moved in Paris where she was postdoctoral researcher at the Institute for Intelligent Systems and Robotics (ISIR), University Pierre and Marie Curie-Paris 6, France. Since September 2017, rences at LTCI, Telecom-ParisTech, Paris Saclay she is Ma^ıtre de Confe University, Paris, France. Her research interests include the area of socio-affective human-machine interaction. She was involved in several FP7 EU-ICT STREP and FET projects.

Isabelle Hupont received the MSc and PhD degrees in computer science, in 2008 and 2010, respectively, from the University of Zaragoza, Spain. From 2006 to 2015 she was a research manager in the Aragon Institute of Technology, participating in several R&D+i European projects. She is currently a post-doctoral researcher at ISIR (Sorbonne University, Paris, France). Her research focuses on multimodal affective computing, social signal processing, artificial intelligence, and computer vision. Clavel is an associate professor in affecChloe tive computing in the Greta Team from the multimedia group of the Signal and Image Processing Department of Telecom-ParisTech. Her research focuses on two issues: acoustic analysis of emotional speech and opinion mining through natural language processing. After her PhD degree, she worked in the laboratories of two big French companies that are Thales Research and Technology and EDF R&D where she developed her research around audio and text mining applications. At le com-ParisTech, she is currently working on interactions between Te humans and virtual agents, from users socioemotional behavior analysis to socio-affective interaction strategies.

Mohamed Chetouani received the MS degree in robotics and intelligent systems from the University Pierre and Marie Curie (UPMC), Paris, 2001. He received the PhD degree in speech signal processing from the University Pierre and Marie Curie, in 2004. He is currently a full professor in signal processing and pattern recognition with UPMC. His research activities, carried out in the Institute of Intelligent Systems and Robotics, cover the areas of non-linear signal processing, feature extraction, pattern classification and fusion for human centered interaction analysis: verbal and non-verbal communication, physiological signals. He is an associate editor of several journals and served as a chairman in several international workshops related to non-linear speech processing, human-robot interaction, and human centered multimodal signal processing. " For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/csdl.

Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:12 UTC from IEEE Xplore. Restrictions apply.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.