59 minute read

Effect of Foreign Accent on Immediate Serial Recall Kit Ying Chan, Ming Ming Chiu, Brady A. Dailey, and Daroon M. Jalil

Effect of Foreign Accent on Immediate Serial Recall

Kit Ying Chan 1

Advertisement

, Ming Ming Chiu 2

, Brady A. Dailey 3

, and Daroon M. Jalil 4

1

Department of Social and Behavioural Sciences, City University of Hong Kong, Hong Kong 2

Department of Special Education and Counselling, The Education University of Hong Kong, Hong Kong 3

Department of Linguistics, Boston University, Boston, MA, USA 4

Department of Psychology, Old Dominion University, Norfolk, VA, USA

Abstract: This study disentangled factors contributing to impaired memory for foreign-accented words –misperception and disruption of encoding. When native English and Cantonese-accented words were presented auditorily for serial recall (Experiment 1), intrusion errors for accented words were higher across all serial positions (SPs). Participants made more intrusion errors during auditory presentation than visual and auditory presentation, and more errors for accented words than native words. Lengthening the interstimulus intervals in Experiment 2 reduced intrusion, repetition, order, and omission errors in the middle and late SPs during accented word recall, suggesting that extra time is required for identification and encoding of accented words into memory. Analyses of the intrusions showed that a majority of them were misperceptions and sounded similar to the stimulus words. These findings suggest that effortful perceptual processing of accented speech can induce perceptual difficulty and interfere with downstream memory processes by exhausting the shared pool of working memory.

Keywords: serial recall, foreign accent, speech perception, short-term memory, listening effort

Foreign accent refers to the extent to which the pronunciation of second language (L2) learners deviates from native speaker norms (Munro & Derwing, 1995a). The acousticphonological deviations include different subsegmental (Caramazza, Yeni-Komshian, Zurif, & Carbone, 1973), segmental (Flege & Hillenbrand, 1984; Munro & Derwing, 1995a), suprasegmental (Reed, 2000; Riazantseva, 2001), and temporal characteristics (Munro & Derwing, 1998; Temple, 2000). These deviations induce a mismatch between the speech inputs and the native listener’s representations, resulting in increased misperceptions, processing time, and vulnerability to noise compared to native speech (Van Wijngaarden, 2001). Most research has focused on the accent-induced perception costs and the perceptual learning of accented speech (Clarke & Garrett, 2004; Reinisch & Holt, 2014; Witteman, Weber, & McQueen, 2013). Few studies have examined the influence of foreign accent on memory. Gill (1994) studied how regional and foreign accents affected comprehension, and subsequent recall. Lectures by native North American English speakers were rated as most comprehensible, followed by those of British English speakers and then by

those of Malaysian English speakers. Native listeners recalled significantly more information from native North American English speakers than from British or Malaysian English speakers. Gill (1994) suggested that comprehending messages from unfamiliar regional or foreign-accented speakers requires more cognitive resources, resulting in fewer available resources to encode information for subsequent recall.

Furthermore, Pickel and Staller (2012) showed that a perpetrator’s accent influenced witnesses’ memories of the perpetrator’s message and physical appearance. Witnesses listening to a message spoken by a native-speaking perpetrator rather than by a foreign-accented perpetrator performed significantly better on a secondary visual search task, suggesting that processing foreign-accented speech is more effortful. Witnesses subsequently recalled more correct details and fewer incorrect details from messages by native-speaking perpetrators. Pickel and Staller (2012) also proposed that processing accented speech demands more cognitive resources, leaving fewer cognitive resources for remembering the speech. However, these studies did not examine the intelligibility of the foreign-accented stimuli. For less intelligible foreign-accented words, initial lexical access failure rather than subsequent memory processes might account for the poor recall. A study by Cho and Feldman (2013) accounted for word intelligibility, and their participants were given 2 s

to process indexical information of the native and accented stimuli. Then participants either repeated the word or did nothing, followed by a visual presentation of the stimulus for 1.5 s. The free recall of foreign-accented words was superior to that of native words, and memory for less intelligible words was better. Cho and Feldman (2013) suggested that the visual feedback after the initial auditory presentation might make the less intelligible words more salient and memorable. More elaborated traces were formed for accented words due to the more variable indexical information. However, this study could not disentangle the impact of accent-induced recognition difficulty and indexical processing on memory.

Past studies showed that speech recognition interfaces with memory functions (Mattys, Davis, Bradlow, & Scott, 2012), especially when the speech is sufficiently degraded that raises working memory (WM) demands (Francis & Nusbaum, 2009; Surprenant, 1999, 2007). Rabbitt (1968) auditorily presented listeners with two lists of four digits either with or without noise at a masking level that permitted correct identification. Regardless of the initial presentation condition, recall of the first list was poorer when the second list was presented with noise than without it. Rabbitt (1968) proposed that even though recognition of the noisedegraded second list was successful, it required effort and depleted processing resources, which would otherwise be available for effective encoding and rehearsal of the first list. Increased listening effort in adverse conditions can interfere with downstream higher-level cognitive processing (effortfulness hypothesis). The perceptual processing of accented words, whether correct lexical access is achieved or not, demands increased listening effort (Pickel & Staller, 2012; Van Engen & Peelle, 2014) and can interfere with downstream memory encoding. As a continuous accented speech stream unfolds, recognition and encoding of the prior words might not be completed before presentation of the next word, which might cause loss of maintained items in short-term memory (STM). Hence, both accented word recognition failure and impaired memory encoding might contribute to poor accented word recall.

The current study aimed to disentangle these two potential explanations by contrasting the recall of foreignaccented words in serial recall (SR) tasks with an auditory and a bimodal presentation mode, respectively (Experiment 1). Recall performance at the initial, middle, and final SPs reflects different memory processing. For example, recall performance at initial SPs reflects transfer of information to long-term memory (LTM), and recall performance at final SPs reflects retrieval from STM (Glanzer & Cunitz, 1966; Murdock, 1962). Foreign-accented words with varying intelligibility were used as stimuli, and their intelligibility was measured with a perceptual identification (PI) task. During auditory presentation, low intelligibility, accented

words might induce recognition difficulty that contributes to poor serial recall. During bimodal presentation, the target word was simultaneously presented visually and auditorily to increase correct accented word recognition, so recall performance likely reflects memory processing alone. Contrasting the two conditions can help determine the contribution of accent-induced perceptual difficulty to poor accented word recall.

To further test whether accents impair memory encoding, a slower presentation rate was adopted in the auditory SR task in Experiment 2 to provide extra time for the perceptual processing and encoding to be fully completed before the arrival of the next word. The extra time also allows indexical processing of the stimuli. Contrasting Experiments 1 and 2 helped identify the contribution of impaired memory processing to the poor recall of accented words, as well as the influence of indexical processing on accented word recall.

Recall errors were categorized as omissions, intrusions, orders, or repetitions (Hurlstone, Hitch, & Baddeley, 2014). Accent and task manipulations might differentially impact different error types, which reflect different cognitive processes. For example, recall of items absent from the original list (i.e., intrusion) could be due to successful recall of misperceptions. Recall in an incorrect SP (order) reflects lower efficiency in encoding relational information for items. Erroneous repetition of an item during recall (re petition) reflects lower efficiency in suppressing representation of an item after recall (Henson, 1998). No response during recall (omission) reflects forgetting or retrieval failure. Error analyses provide further insights into whether the accented words tend to be misperceived, and how accent impacts different perceptual and memory processes. Chan and Vitevitch (2015) showed that a particular word (rather than many different similar sounding words) accounted for a majority of the misperceptions of accented words. Results from Cho & Feldman’s (2013) memory recognition task also showed that both native and accented stimuli activated phonologically similar words, but accented words did not broadly activate more phonological neighbors. These previous findings suggested that misperceptions of accented words are likely to be similar sounding. To test how likely intrusions of accented words were actually misperceptions, intrusions from the SR tasks and misperceptions from the PI task were subjected to phonological analyses and their phonological similarity with the stimuli was examined.

Experiment 1

Experiment 1 examined the impact of accents on the recognition and immediate recall of spoken words using a PI task

and SR tasks with two presentation modes, auditory and bimodal, adopted from Frankish (2008) study. In the SR task, the auditory group only heard the stimulus words, whereas the bimodal group simultaneously heard and saw the stimulus words. The auditory group was expected to have difficulty identifying the accented stimuli. The bimodal group could derive the identity of the stimuli from the synchronized visual input and was expected to approach perfect identification.

The intelligibility of the accented items was expected to vary depending on their lexical characteristics (Chan & Vitevitch, 2015; Imai, Walley, & Flege, 2005). This experiment took advantage of this variability to examine how intelligibility of the accented stimuli might be a significant predictor for intrusions and omissions, which are related to accent-induced misperception or recognition failure, in the auditory group. Without identification difficulty, the bimodal group is expected to commit fewer intrusions than the auditory group. Comparing the performances of the two groups can help determine the extent to which accentinduced identification difficulties contribute to poor accented word recall.

By comparing the native and accented word recall in the initial and final SPs, we examined the impact of accents on the transfer of an item into LTM and its retrieval from STM (Glanzer & Cunitz, 1966; Murdock, 1962). Based on previous findings about effortful processing for foreign-accented speech (Gill, 1994; Pickel & Staller, 2012), we predicted that fewer resources would remain for encoding the temporal relation of the items, transferring items into LTM, and maintaining items in STM. Hence, more omission or order errors were predicted for accented words.

Method

Participants Fifty-three native speakers of American English were recruited from the Introductory Psychology participant pool at the James Madison University in Virginia. All participants were right-handed and reported no history of speech or hearing disorders. They were randomly assigned to the auditory group (27) or the bimodal group (26).

Materials The nine English stimulus words, bud, cot, dog, fin, gas, job, lice, pool, and soak, were used to construct 36 stimulus lists for each accent condition. Eight 9  9 Latin squares were used to make the stimulus lists, so that each stimulus word appeared twice at each SP. Presentation of the stimulus lists was blocked by accent. To prevent participants from knowing the identity of the stimulus word before hearing the accented stimuli, the accented block was always presented before the native block. Potential practice effects

are modeled as a control variable in the analysis. The sequence of lists within each block was randomized under the constraint that no stimulus word appeared at the same SP on two successive trials. For the PI task, the same accented stimulus words were used.

Survey The survey asked about participants’ first language, their fluency in any other language(s), any history of hearing and speech disorder, history of studying Cantonese, any family members or close friends with a Cantonese accent, and regular contact with non-native speakers of English.

Speakers Two female students were recruited from James Madison University to record the stimuli. The native speaker of American English was from the South Atlantic region of the United States. The non-native speaker was from the southern part of China with Cantonese as her native language and English as a L2 with a strong accent. The stimuli were recorded digitally at a 44.1 kHz sampling rate using Adobe Audition CS5 and the PreSonus AudioBox Studio Set (PreSonus Audio Electronics, Inc.) connected to a Dell PC (Dell Inc.). The amplitude of the individual sound files was increased to their maximum without distortion using Praat (Boersma & Weenink, 2009). The duration of the native (M = 562 ms, SD = 55.6) and accented stimuli (M = 499 ms, SD = 112) did not differ significantly, F(1, 8) = 2.31, p > .05.

Procedure Each participant sat in front of a Dell PC with a set of Sennheiser HD 25-SP II headphones (Sennheiser Electronic GmbH & Co. KG). The presentation of stimuli and collection of responses were controlled by Paradigm 2.0 (Perception Research Systems, 2007). The whole experiment lasted about 60 min. Participants first completed a language background questionnaire followed by the SR task and the PI task.

For the SR task, each participant received 36 accented trials followed by 36 native trials separated by a short break. Each trial started with a 500 ms warning tone with a “+” sign appearing simultaneously on the computer screen, followed by a blank screen of 1 s. Participants in the auditory group heard the word list over the headphones. For the bimodal group, a word was presented auditorily and a synchronized visual display of that word appeared on screen for 500 ms. The bimodal group were instructed to attend to both the visual and auditory channels. The stimulus words were separated by a 150-ms interstimulus interval (ISI). List presentation was followed by an 18-s interval for written recall of the words in the order of their presentation. Participants were instructed to fill in the nine spaces on the response sheet from left to right, and to guess when

Table 1 . Analytic difficulties about the outcomes and explanatory variables in the current data and the statistics strategy adopted

Outcome variables

Analytic difficulty Statistics strategy

Nested data (Trials within people) Multilevel analysis (aka hierarchical linear modeling, Goldstein, 2011)

Differences across serial positions Multilevel cross-classification (Goldstein, 2011)

Discrete variable (yes/no) Logit/Probit

Multiple dependent variables (Y 1 , Y 2 , ...) Multivariate outcome models (Goldstein, 2011)

Explanatory variables

Analytic difficulty Statistics strategy Cross-level moderation (e.g., Trial  Serial position) Random effects model (Goldstein, 2011) False positives (Type I errors) Two-stage linear step-up procedure (Benjamini et al., 2006) Robustness of results Separate multilevel, single outcome models; Analyses of subsets of the data

uncertain. Recall was monitored by the experimenter to ensure participants’ compliance with these instructions. A practice trial was presented before the experiment and was excluded from data analyses.

The PI task was the same for both groups. It included a randomized presentation of 90 trials consisting of 10 repetitions of the 9 accented words. Each trial started with the word “READY” appearing on the screen for 500 ms followed by the stimulus word presented over the headphones. Participants were given as much time as they needed to type on the keyboard, the word that they heard. They could see and correct their responses on the screen before hitting the ENTER key to initiate the next trial. Participants had five practice trials that were excluded from data analyses.

Data Analysis

For the PI task, a response was scored as correct if its phonological transcription matched the stimulus. For the SR tasks, a response was scored as correct only if the target word or a phonetically equivalent spelling was recalled in the correct SP. Recall errors were further classified with respect to output SP as omission, intrusion, order, or repetition errors following the same scoring criteria from McCormack, Brown, Vousden, and Henson (2000). An omission error was recorded for any non-response. An intrusion error was recorded for recall of any word outside the study list. An order error was recorded when a word was recalled in an incorrect SP. A repetition error was recorded for any erroneously repeated recall of any word beyond its number of occurrence. Data for the perceptual identification tasks and the serial recall tasks for each condition can be found in the Electronic Supplementary Material, ESM 1. We addressed analytic difficulties in the outcomes (nested data, differences across serial positions, discrete variables, and multiple dependent variables) with multivariate outcome, multilevel, cross-classification Logit/Probit analyses

(see the Equation 1 in Appendix A) and Wald tests (see Table 1; Goldstein, 2011). Furthermore, we dealt with analytic difficulties in the explanatory variables (cross-level moderation, false positives, robustness) with a random effects model (Goldstein, 2011), the two-stage linear step-up procedure (Benjamini, Krieger, & Yekutieli, 2006), separate multilevel, single outcome models, and analyses of subsets of the data (see Table 1). Variables were centered, and we used MLwiN 3.00 software (Charlton, Rasbash, Browne, Healy, & Cameron, 2017). The explanatory variables included: word identification error rate, accented speech , bimodal presentation, time of trial, serial position, regular contact with a non-native speaker, and interactions among variables at the same trial level: accented Speech  Time of Trial. Practice effects are controlled for with the explanatory variable time of trial.

Results and Discussion

Table 2 shows the mean identification error rates and standard deviations per stimulus word in the PI task. The mean identification error rates showed a large variability in the auditory group, ranging from 0% to 41.1%. Compared to native words, an accuracy rate of about 60% seems low, but it is near the typical range for accented speech –52.1% for isolated words in Chan and Vitevitch (2015), 52.8% for isolated words embedded in noise in Imai et al. (2005), and 51–54% for sentences in Bent and Bradlow (2003). The bimodal group almost achieved perfect identification with minimal variability.

Statistical power differs for each level. For an effect size of0.1, statistical power exceeded .99 at both the serial position level (34,344 potential errors) and at the trial level (3,816 trials). For 53 participants, however, statistical power is only .86 for an effect size of 0.4. Accented word identification error rate, accent, bimodal presentation, time of trial, serial position, and its interactions were all linked to recall errors (see Table 3). Figure 1 displays the mean accuracy

Table 2. Mean identification error rates (%) and standard deviations (SD) for each of the nine stimulus words in the perceptual identification task, Experiments 1 –2

Experiment 1 Experiment 2 Auditory Bimodal Accented Native M SD M SD M SD M SD

Bud 4.44 19.00 0.80 2.70 9.13 28.70 1.20 2.70 Cot 38.50 48.00 0.00 0.00 53.50 49.90 20.80 1.69 Dog 0.00 0.00 0.00 0.00 1.30 3.44 0.00 0.00 Fin 41.10 46.00 2.70 4.50 48.30 50.70 20.40 1.26 Gas 1.85 4.80 1.20 3.30 2.17 5.18 0.80 1.69 Job 1.48 5.30 0.00 0.00 0.00 0.00 0.00 0.00 Lice 5.56 20.00 0.40 2.00 13.90 34.20 9.20 6.27 Pool 15.20 36.00 0.00 0.00 9.57 28.70 1.20 1.93 Soak 8.15 27.00 0.00 0.00 17.80 38.60 0.00 0.00

rates (Figure 1A), mean error rates for intrusion (Figure 1B), omission (Figure 1C), repetition (Figure 1D), and order (Figure 1E) as a function of output SP for the two accent conditions in the two groups.

Recall performance was lower among accented words than native words across all SPs in the auditory group. When the accent word identification error rate was higher than otherwise, the following error rates were higher: intrusion (β = 0.012, SE = 0.001, p < .001), omission (β = 0.003, SE = 0.001, p < .05), and order (β = 0.003, SE = 0.001, p < .01). The less intelligible the accented words were, the higher the intrusion, omission, and order error rates were. The auditory group made significantly more intrusions during recall of accented words than native words, across all SPs, β = 0.318, SE = 0.084, p < .001. These results suggest that less intelligible accented stimuli might contribute to misperceptions and successful recall of the misperception, thereby yielding intrusions. Figure 1C suggested a higher omission error rate for accented words than native words, especially in late SPs, regardless of presentation mode. However, the regression model showed no significant accent effect on omission error rates in the auditory group, except at SPs 4 and 9, controlling for other explanatory variables. The other explanatory variable that might account for this discrepancy is the significant interaction between time of trial and SP at middle and late SPs; see Table 3 fo r details. As the practice effect was larger at middle and late SPs, and native trials were presented after the accented trials, it might have partially accounted for the fewer omissions observed for native word recall at middle and late SPs. Contrary to our predictions, order errors for accented words and native words were not significantly different in the auditory group. One possible explanation is that accented words recalled in an incorrect order were also misperceived and counted as intrusions instead.

The bimodal group made fewer intrusions than the auditory group, for both accented words, β = 1.143 (= 2.016 + 0.318 + 0.555), p < .001, and native words, β = 2.016, SE = 0.249, p < .001, which showed a larger reduction, β = 0.555, SE = 0.105, p < .001. As the intelligibility of accented stimuli was lower than that of native stimuli in this study, this latter result was unexpected. Even without recognition difficulty, the bimodal group still made significantly more intrusions for recalling accented words than native words (1.143 > 2.016; β accented = 1.143 (= 2.016 + 0.318 + 0.555); β native = 2.016, respectively). This result suggests that accents might exert other detrimental effects on memory in addition to recognition difficulty.

More omissions occurred in the bimodal group than the auditory group at SPs 7–9, β sp7 = 0.791, SE = 0.084; β sp8 = 1.339, SE = 0.086; β sp9 = 1.383, SE = 0.089; all ps<.001; during recall of both native words, β = 1.748, SE = 0.731, p < .05, and accented words, β = 1.546 (= 1.748 + 0.027 + 0.175; p < .05). Increased omissions in final SPs suggest that the additional information in the bimodal presentation interferes with direct retrieval of information from STM or its maintenance. The bimodal group also showed significantly more order errors than the auditory group at SPs 3–6 (β sp3 = 0.380, SE = 0.079; β sp4 = 0.271, SE = 0.079; β sp5 = 0.439, SE = 0.079; β sp6 = 0.239, SE = 0.080; all ps<.001). Encoding of relational information for items at middle SPs was poorer as they were less likely to be rehearsed enough to enter LTM or to be retrieved directly from STM. Taken together, these results might suggest that integrating the auditory and visual cues from bimodal information might increase participants’ cognitive load regardless of accent type, which can yield poor encoding of the relational information among middle items, as well as poor maintenance of information in STM.

The time of trial effect was significant for intrusions (β = 0.007, SE = 0.002, p < .001), omissions (β = 0.010, SE = 0.002, p < .001), and order errors (β = 0.005, SE = 0.001, p < .001). Participants were slightly less likely to make these errors in later trials, suggesting a small practice effect less than a tenth the size of other significant regression coefficients. The practice effect was larger for accented words than native words for omission, β = 0.013, SE = 0.004, p < .01. Participants were slightly more likely to commit repetitions in later trials (β = 0.010, SE = 0.001, p < .001) , suggesting participants’ weaker response suppression in later trials. Controlling for this practice effect, the other explanatory variables still showed significant effects. Other variables and interactions were not significant.

The current results suggest that accent-induced identification difficulty accounted for the lower recall performance of accented words compared to native words. In bimodal

Table 3. Summary of multivariate outcome, 3-level cross-classification analyses of intrusion, omission, repetition, and order errors for Experiment 1

Experiment 1 Explanatory variable Intrusion (SE) Omission (SE) Repetition (SE) Order (SE)

Constant 2.862*** (0.125) 3.284*** (0.413) 3.162*** (0.172) 1.247*** (0.128) Word identification error rate 0.012*** (0.001) 0.003* (0.001) 0.003** (0.001) Accented 0.318*** (0.084) 0.027 (0.082) Bimodal 2.016*** (0.249) 1.748* (0.731) 0.373 (0.225) Contact w non-native 0.126 (0.826) 0.236 (0.344) 0.206 (0.255) Time of trial 0.007*** (0.002) 0.010*** (0.002) 0.010*** (0.001) 0.005*** (0.001) SP 2 0.187** (0.063) 0.435** (0.153) 0.905*** (0.070) SP 3 0.368*** (0.056) 0.967*** (0.141) 1.462*** (0.068) SP 4 0.993*** (0.064) 1.152*** (0.138) 1.693*** (0.067) SP 5 0.214** (0.062) 1.553*** (0.061) 1.803*** (0.141) 1.711*** (0.067) SP 6 0.312*** (0.061) 1.795*** (0.060) 2.055*** (0.138) 1.466*** (0.067) SP 7 0.044 (0.087) 1.599*** (0.061) 2.249*** (0.136) 1.139*** (0.068) SP 8 1.669*** (0.062) 2.217*** (0.136) 0.607*** (0.071) SP 9 0.203** (0.069) 1.485*** (0.062) 1.142*** (0.149) 0.992*** (0.079) Accented  Bimodal 0.555*** (0.105) 0.175* (0.084) Accented  Time of trial 0.013** (0.004) Accented  SP 2 Accented  SP 3 Accented  SP 4 0.381* (0.171) Accented  SP 5 Accented  SP 6 Accented  SP 7 Accented  SP 8 Accented  SP 9 0.391* (0.164) Bimodal  SP 2 Bimodal  SP 3 0.198* (0.097) 0.380*** (0.079) Bimodal  SP 4 0.271** (0.079) Bimodal  SP 5 0.439*** (0.079) Bimodal  SP 6 0.293*** (0.080) Bimodal  SP 7 0.500** (0.168) 0.791*** (0.084) Bimodal  SP 8 1.339*** (0.086) Bimodal  SP 9 1.383*** (0.089) Contact w non-native  SP 2 0.275* (0.130) Contact w non-native  SP 3 0.382** (0.126) Contact w non-native  SP 4 0.635*** (0.120) 0.571*** (0.125) Contact w non-native  SP 5 0.821*** (0.113) 0.423* (0.175) 0.872*** (0.125) Contact w non-native  SP 6 0.948*** (0.111) 0.565** (0.166) 0.957*** (0.125) Contact w non-native  SP 7 1.139*** (0.113) 0.662*** (0.160) 0.911*** (0.126) Contact w non-native  SP 8 1.493*** (0.113) 0.903*** (0.160) 0.706*** (0.132) Contact w non-native  SP 9 1.631*** (0.114) 0.834*** (0.202) Time of trial  SP 2 Time of trial  SP 3 Time of trial  SP 4 0.015*** (0.004) Time of trial  SP 5 0.011*** (0.002) Time of trial  SP 6 0.012*** (0.002) Time of trial  SP 7 0.017*** (0.002) Time of trial  SP 8 0.018*** (0.002) Time of trial  SP 9 0.030*** (0.004)

(Continued on next page)

Experiment 1 Explanatory variable Intrusion (SD) Omission (SD) Repetition (SD) Order (SD)

Variance at each level

Subject 36% 65% 25% 14%

Trial

4%

6%

SP 64% 31% 75% 80% Explained variance at each level

Subject 0.603 0.125 0.084 0.076

Trial

0.000

0.000

SP 0.037 0.100 0.123 0.158 Total variance explained 0.240 0.111 0.113 0.137 BIC 0.245 1.267 0.375 0.873

Notes. The default category for comparison are: Accent type –Native; Presentation mode –Auditory; SP = SP1; Contact w non-native –No regular contact with non-native speakers. BIC = Bayesian information criterion. Initially, nonsignificant explanatory variables were removed to preserve degrees of freedom without increasing omitted variable bias. Some explanatory variables were initially significant but were no longer significant after addition of interaction terms; these variables remain in the model for proper interpretation of the results. SE = standard error. *p < .05; **p < .01; ***p < .001.

presentation, participants showed fewer intrusions for both native and accented words. With the visual display to reduce misperception, participants were more likely to successfully identify, encode, and retrieve words. However, recall of accented words still showed significantly more intrusions than native words, suggesting that the detrimental effects of accents go beyond just misidentification of words. Omissions and order errors occurred more often at late and middle SPs, respectively, in the bimodal condition than in the auditory condition. In the bimodal condition, integrating the visual display with the auditory stimuli might increase the overall cognitive load. This might cause poorer encoding of the relational information among middle items, as well as poorer maintenance of information in STM regardless of accent.

Overall, this pattern of results suggested that recognition difficulty induced by accents contributed to increased intrusions during recall. Apart from misperception, recognition difficulty induced by acoustic-phonetic deviations in accented speech might also disrupt encoding of the stimulus into memory. To test whether foreign accent also disrupts memory processing, we increased ISI in Experiment 2 to allow extra time for processing and encoding of the foreign-accented words.

Experiment 2

Experiment 2 aimed to examine whether foreign accents exert detrimental effects on memory processing in addition to misidentification of words. A foreign accent induces mismatches between the speech input and the representations stored in listeners’ memories, so more processing time might be required to resolve these mismatches during

accented word recognition (Munro & Derwing, 1995b; Van Engen & Peelle, 2014). With only 150-ms ISI in Experiment 1, identification and phonological encoding of the accented words might be incomplete and disrupted by successive stimuli. To test whether accents incur extra processing costs on the phonological encoding and rehearsal of stimuli, ISI was increased to 4 s in Experiment 2. The longer ISIs provided participants sufficient time to finish recognizing, encoding, and rehearsing accented words to facilitate later retrieval. Improvement of accented word recall performance in this experiment compared to the auditory condition in Experiment 1 would reflect the processing costs induced by accent on phonological encoding, as well as the benefit of having extra time for rehearsal. However, long ISIs do not help participants comprehend accented words that would otherwise be misrecognized, so accent-induced misperception was expected to remain. A native condition with longer ISIs serves as a baseline for comparison with the accented condition, in which accent-induced misperceptions and recognition failures cause errors. Increasing the ISIs increased the duration of the whole experiment. To reduce participants’ potential fatigue, we kept the total duration of the whole experiment comparable with Experiment 1 by collecting data on the native and accented conditions from two randomly assigned, separate groups of participants.

Method

Participants Fifty participants with the same profile of attributes described in Experiment 1 were recruited for Experiment 2. Participants were randomly assigned to the two conditions (25 participants in each condition).

(A)

0.0 20.0 40.0 60.0 80.0 100.0 Mean Accuracy Rate (%)

(B)

Mean Error Rate (%)

0.0 20.0 40.0 60.0 80.0 100.0 123456789 Output Serial Position

123456789 Output Serial Position

(C)

(D)

(E)

Mean Error Rate (%) 0.0 20.0 40.0 60.0 80.0 100.0

Mean Error Rate (%) 0.0 20.0 40.0 60.0 80.0 100.0 123456789 Output Serial Position

123456789 Output Serial Position

0.0 20.0 40.0 60.0 80.0 100.0 Mean Error Rate (%)

123456789 Output Serial Position

Auditory Accent

Auditory Native

Bimodal Accent

Bimodal Native

Auditory Accent

Auditory Native

Bimodal Accent

Bimodal Native

Auditory Accent Auditory Native Bimodal Accent Bimodal Native

Auditory Accent Auditory Native Bimodal Accent Bimodal Native

Auditory Accent Auditory Native Bimodal Accent Bimodal Native

Figure 1. (A) Mean accuracy rates (%), (B) mean error rates (%) for intrusion, (C) omission, (D) repetition, and (E) order error with the error bars representing 95% confidence intervals are plotted as a function of output SP for the two accent conditions in the serial recall task with auditory and bimodal presentation, Experiment 1.

Materials The same set of native and accented stimulus words from Experiment 1 were used as stimuli in the SR and PI tasks.

Procedure The procedure was identical to the auditory condition of Experiment 1 except for the following. Each participant only received the first 36 trials in the SR task with 4-s ISIs. For

the native condition, native stimuli from the SR task were used as stimuli in the PI task.

Results and Discussion

Data for the perceptual identification tasks and the serial recall tasks for each condition can be found in ESM 1. The scoring criteria in Experiment 1 were also used in Experiment 2. The mean identification error rate and standard deviation for the PI task are shown in Table 2. The native words were highly intelligible with a mean identification error rate of 5.95%, ranging from 0% to 20.8%. Like Experiment 1, the mean identification error rates of the accented words varied substantially, ranging from 0% to 53.5%.

Comparison of Accented Conditions Across Experiments For the SR task, we pooled only the participants in the accented condition across Experiments 1 and 2 during the data analysis. For an effect size of 0.1, statistical power exceeded .99 at both the SP level (25,272 potential errors) and trial level (2,808 trials). For 78 participants, statistical power is .95 for an effect size of 0.4. The analysis was the same as Equation 1 except that the explanatory variable, 4-second ISI, was added, and accented was omitted, along with their interaction variables. Figure 2 displays the mean accuracy rates (Figure 2A), mean error rates for intrusion (Figure 2B), omission (Figure 2C), repetition (Figure 2D), and order (Figure 2E) as a function of output SP for the accent conditions in the SR task with auditory presentation and 150-ms ISI in Experiment 1 and 4-s ISI in Experiment 2. A summary of the m ultivariate outcome, 3-level cross-classification analyses of intrusion, omission, repetition, and order errors for Experiment 2 is shown in Table 4. As the bimodal condition results were the same as those in Experiment 1, we focus on the results related to Experiment 2. Consistent with Experiment 1, accent word identification error rate significantly predicted intrusion (β = 0.019, SE = 0.001, p < .001), omission (β = 0.002, SE = 0.001, p < .01), and order error rates (β = 0.002, SE = 0.001, p < .01). The less intelligible the accented words were, the higher the likelihood of intrusion, omission, or order errors during recall. Long ISIs in Experiment 2 resulted in significantly fewer intrusions and repetitions during accented word recall, β intrusion = 0.591, SE = 0.252, p < .05; β repetition = 0.734, SE = 0.248, p < .01. Interactions between ISI and SP were significant for all types of errors during accented word recall. The auditory presentation mode with a longer 4-s ISI resulted in fewer intrusions during accented word recall at SPs 6 and 7, β SP6 = 0.333; β SP7 = 0.283, and fewer repetitions at

SPs 6, 8, and 9, β SP6 = 0.540; β SP8 = 0.742; β SP9 = 1.687. Compared to 150-ms ISIs, 4-s ISIs resulted in significantly fewer order errors during accented word recall at SPs 5, 8, and 9, β SP5 = 0.330; β SP8 = 0.410; β SP9 = 0.455, as well as fewer omissions in SPs 5 and 9, β SP5 = 0.623; β SP9 = 0.576. The intrusion and omission results implied that with short ISIs, listeners struggled to identify and encode the accented words into memory, especially those in the middle SPs, as the extra processing costs induced by accents accumulate during the progressive stimulus presentation. Longer ISIs provided more time for resolving mismatches induced by accent so that identification and encoding of the accented words could be completed without interference from incoming stimuli. Results from Experiment 2 demonstrated that the disruption of phonological encoding incurred by accents can partially account for performance deficits for recall of accented words presented auditorily. Experiment 2 consistently shows that longer ISIs aid encoding and recall of middle and late items. Fewer repetitions during accented word recall occurred with 4-s ISIs, especially in the middle and late SPs. This implies that with longer ISIs, participants could better encode the accented stimuli and were less likely to erroneously repeat an item that has been recalled earlier. Also, fewer order errors occurred in middle and late SPs during accented word recall with 4-s ISI than with 150-ms ISI. This suggested that longer ISIs allowed participants to better encode the relational information among the middle and late items. As people’s recall for middle items is typically worse than those in the initial and final SPs, it is not surprising that longer ISIs benefit the middle items more than others (Glanzer & Cunitz, 1966). With the build-up of cognitive load from rehearsing early items and interference from incoming stimuli, the middle items are typically not rehearsed enough to be transferred to LTM nor maintained long enough for retrieval from STM (Glanzer & Cunitz, 1966). With longer ISIs, the middle items were more likely to be processed completely and rehearsed enough to be transferred to LTM.

The fewer order and omission errors at late SPs with longer rather than shorter ISIs might be explained by the listeners using the additional acoustically coded representation of the final item stored in a separate sensory buffer store, namely echoic memory (Neisser, 2014) or precategorical acoustic storage (Crowder & Morton, 1969) with the longer ISIs. As the final list item was not followed by other stimuli, this additional acoustically coded representation of the final item stored in echoic memory (Neisser, 2014) was not overwritten by subsequent auditory events. With longer ISIs, there was sufficient time for the listeners to process the indexical information of the final stimuli in echoic memory, such as the gender, voice, and accent of the

(A)

(B)

(C)

Mean Accuracy Rate (%) 0.0 20.0 40.0 60.0 80.0 100.0

Mean Error Rate (%)

0.0 20.0 40.0 60.0 80.0 100.0 123456789 Output Serial Position

123456789 Output Serial Position

(D)

Mean Error Rate (%)

0 20 40 60 80 100

Mean Error Rate (%)

0.0 20.0 40.0 60.0 80.0 100.0

(E)

0.0 20.0 40.0 60.0 80.0 100.0 Mean Error Rate (%)

123456789 Output Serial Position

123456789 Output Serial Position

123456789 Output Serial Position

150-ms ISI Accent 150-ms ISI Native 4-s ISI Accent 4-s ISI Native

150-ms ISI Accent 150-ms ISI Native 4-s ISI Accent 4-s ISI Native

150-ms ISI Accent 150-ms ISI Native 4-s ISI Accent 4-s ISI Native

150-ms ISI Accent 150-ms ISI Native 4-s ISI Accent 4-s ISI Native

150-ms ISI Accent 150-ms ISI Native 4-s ISI Accent 4-s ISI Native

Figure 2. (A) Mean accuracy rates (%), (B) mean error rates (%) for intrusion, (C) omission, (D) repetition, and (E) order error with the error bars representing 95% confidence intervals are plotted as a function of output SP for the two accent conditions in the serial recall task with auditory presentation and 150 ms interstimulus interval (ISI) in Experiment 1 and 4 s ISI in Experiment 2.

speaker (Nygaard, Sommers, & Pisoni, 1995). This additional indexical information of the final items might make it more temporally distinctive from the prior items. Therefore, the final items were more likely to be accurately retrieved without being mis-ordered or lost.

The time of trial effect was significant for omissions (β = 0.013, SE = 0.002, p < .001) and order errors (β = 0.006, SE = 0.002, p < .01). Participants were slightly less likely to make omission and order errors in later trials,

showing small practice effects. The significant time of trial and SP interaction at SPs 4–9 for omission suggests fewer omission errors for middle and late SPs at later trials, β SP4 = 0.019; β SP5 = 0.025; β SP6 = 0.019; β SP7 = 0.018; β SP8 = 0.025; β SP9 = 0.045. The time of trial effect was also significant for repetitions (β = 0.010, SE = 0.003, p < .01), suggesting participants’ weaker response suppression in later trials. Other variables and interactions were not significant.

Table 4. Summary of multivariate outcome, 3-level cross-classification analyses of intrusion, omission, repetition, and order errors for Experiments 1 and 2, accented conditions only

Explanatory variable Intrusion (SE) Omission (SE) Repetition (SE) Order (SE)

Constant 2.817*** (0.132) 2.312*** (0.340) 3.595*** (0.141) 1.221*** (0.120) Word identification error rate 0.019*** (0.001) 0.002** (0.001) 0.002** (0.001) Bimodal 1.624*** (0.257) 1.406* (0.616) 0.407 (0.211) 4-second interstimulus interval 0.591* (0.252) 0.814 (0.621) 0.734** (0.248) 0.219 (0.214) Regular contact with non-native speaker 0.414 (0.574) 0.107 (0.260) 0.096 (0.200) Time of trial 0.002 (0.002) 0.013*** (0.002) 0.010** (0.003) 0.006** (0.002) SP 2 0.848*** (0.078)

SP 3 0.402*** (0.070) 0.788*** 1.347*** (0.077) SP 4 0.931*** (0.071) 1.022*** (0.134) 1.543*** (0.075)

SP 5 0.166* (0.067) 1.137*** (0.069) 1.438*** (0.125) 1.408*** (0.078) SP 6 0.281*** (0.068) 1.625*** (0.065) 1.240*** (0.146) 1.183*** (0.076) SP 7 0.138* (0.070) 1.799*** (0.077) 1.714*** (0.129) 0.855*** (0.080) SP 8 0.125 (0.067) 1.737*** (0.072) 1.500*** (0.140) 0.183* (0.087) SP 9 1.288*** (0.082) 0.284 (0.245) 1.364*** (0.110) Bimodal  SP 2 Bimodal  SP 3 0.426*** (0.092) Bimodal  SP 4 Bimodal  SP 5 Bimodal  SP 6 0.607*** (0.104) Bimodal  SP 7 0.943*** (0.123)

Bimodal  SP 8 1.227*** (0.108) Bimodal  SP 9 1.108*** (0.127) 4-second interstimulus interval  SP 2 4-second interstimulus interval  SP 3 4-second interstimulus interval  SP 4 4-second interstimulus interval  SP 5 0.623*** (0.115) 4-second interstimulus interval  SP 6 0.333* (0.132) 4-second interstimulus interval  SP 7 0.283* (0.136) 0.385** (0.128) 4-second interstimulus interval  SP 8

0.540* (0.224)

0.742*** (0.210)

0.330** (0.098)

0.194 (0.104) 0.410** (0.123)

4-second interstimulus interval  SP 9 0.576*** (0.148) 1.687*** (0.443) 0.455* (0.199) Contact w non-native  SP 2 0.314* (0.146)

Contact w non-native  SP 3 0.469** (0.143) Contact w non-native  SP 4 0.337** (0.120) 0.494*** (0.141) Contact w non-native  SP 5 0.764*** (0.142) Contact w non-native  SP 6 0.698*** (0.143) Contact w non-native  SP 7 0.320** (0.114) 0.411* (0.175) 0.686*** (0.145) Contact w non-native  SP 8 0.713*** (0.116) 0.626*** (0.154) Contact w non-native  SP 9 0.918*** (0.120) 0.544* (0.226) Time of trial  SP 2 Time of trial  SP 3

Time of trial  SP 4 Time of trial  SP 5

0.019** (0.006) 0.025*** (0.006)

Time of trial  SP 6

0.019*** (0.005)

Time of trial  SP 7

0.018** (0.005)

Time of trial  SP 8 0.015* (0.006)

0.025*** (0.005)

Time of trial  SP 9

0.045*** (0.006)

Variance at each level

Subject 32% 54% 22% 13%

Trial

6%

Explanatory variable Intrusion (SE) Omission (SE) Repetition (SE) Order (SE)

Explained variance at each level

Subject 0.515 0.081 0.184 0.105

Trial

0.000

0.000

SP 0.024 0.112 0.104 0.162

Total variance explained 0.183 0.088 0.122 0.145 BIC 0.073 0.697 0.702 0.871

Notes. The default category for comparison are: Accent type –Native; Presentation mode –Auditory; SP = SP1; Contact w non-native –No regular contact with non-native speakers. BIC = Bayesian information criterion; SE = standard error. Initially, nonsignificant explanatory variables were removed to preserve degrees of freedom without increasing omitted variable bias. Some explanatory variables were initially significant but were no longer significant after addition of interaction terms; these variables remain in the model for proper interpretation of the results. *p < .05; **p < .01; ***p < .001.

Comparison of the Accented and Native Conditions Participants from the native and accented conditions were pooled. The analysis equation is the same as 1 except that bimodal and its interactions were removed. For an effect size of 0.1, statistical power exceeded .99 at the SP level (16,800 potential errors) and is .99 at the trial level (1,800 trials). For 25 participants, statistical power is only .52 for an effect size of 0.4. Accent, serial position, time of trial, and their interactions were linked to recall errors. Participants with higher word identification error rates had more intrusions, β = 0.027, SE = 0.001, p < .001 (see Table 5). Participants made more intrusions when recalling accented words than native words, β = 0.798, SE = 0.276, p < .01. Even though word recognition was more likely to be completed with 4-s ISIs, misperceptions and recognition failure still occurred for accented words with low intelligibility. There were significantly more omissions, but fewer order errors for accented words than native words at SPs 8 and 9 (omission: β sp8 = 0.572, SE = 0.140, p < .001; β sp9 = 0.589, SE = 0.168, p < .001; order: β sp8 = 0.370, SE = 0.139, p < .01; β sp9 = 0.550, SE = 0.216, p < .005). Accented words were more likely than native words to be mis-recognized, so they were more likely to be omitted rather than mis-ordered during recall.

Time of trial was significant for intrusions, (β = 0.007, SE = 0.003, p < .05), and order errors (β = 0.010, SE = 0.003, p < .001). In later trials, participants were slightly more likely to make intrusions, but less likely to make order errors. Time of trial was significant for omissions at SPs 4–9, showing fewer omissions in later trials at the middle and late SPs. The interaction between time of trials and accent was significant for repetition, β = 0.019, SE = 0.004, p < .01, suggesting that participants showed weaker response suppression in later trials for accented words than native words.

Phonological Analyses of Intrusions and Misperceptions Previous research showed that misperceptions for native words (Vitevitch & Luce, 1999) and foreign-accented words

(Chan & Vitevitch, 2015; Cho & Feldman, 2013) are likely to sound similar to the target words. To determine whether intrusions were likely a result of successful recall of misperceptions, intrusions from SR tasks were matched with misperceptions from the PI tasks and their phonological similarity to the stimuli was examined. Details of the phonological transcription and matching on similarity are shown in Appendix B.

Intrusions in Experiments 1 and 2 were categorized into either matching with misperceptions or not, and the corresponding frequency distribution is displayed in Table 6. As expected, chi-square tests of independence showed a significant association between accent and matching with misperceptions for intrusions during the auditory presentation with 150-ms ISIs, w 2 (1, N = 2,994) = 232.0, p < .001; during the bimodal presentation with 150-ms ISIs, w 2 (1, N = 561) = 17.8, p < .001; and the auditory presentation with 4-s ISIs, w 2 (1, N = 1,897) = 71.1, p < .001. A higher proportion of intrusions from accented words than native words matched with misperceptions, confirming that intrusions from accented words were more likely to stem from misperception.

For intrusions from accented words in Experiment 1, a chi-square test of independence also showed a significant association between presentation mode and matching with misperceptions, w 2 (1, N = 2,171) = 148.5, p < .001. Compared to bimodal presentation, a higher proportion of intrusions from auditory presentation matched with misperceptions, confirming that accented stimuli were more likely to be misperceived with auditory presentation, and their successful recall manifested as intrusions.

Aligned with findings from Chan and Vitevitch (2015) and Cho and Feldman (2013), a majority of the misperceptions for accented words sounded similar to the stimulus words: 72.7% and 85.9% for the auditory groups with 150-ms ISIs and 4-s ISIs, respectively. These contrast with only 40% for the bimodal group. Also, only 31% of the misperceptions for native words sounded similar to the stimuli. Intrusions in Experiments 1 and 2 were categorized as sounding similar to the stimuli or not, and further

Explanatory variable Intrusion (SE) Omission (SE) Repetition (SE) Order (SE)

Constant 2.847*** (0.138) 2.804*** (0.264) 3.887*** (0.145) 1.352*** (0.126) Word identification error rate 0.027*** (0.001) Accented 0.798** (0.276) 0.140 (0.527) 0.104 (0.285) 0.406 (0.252) Regular contact with non-native speaker Time of trial 0.007* (0.003) 0.004 (0.003) 0.002 (0.004) 0.010*** (0.003) SP 2 0.681*** (0.084) SP 3 0.484*** (0.093) 0.502** (0.185) 0.876*** (0.083) SP 4 0.942*** (0.088) 0.576** (0.181) 1.205*** (0.081) SP 5 1.203*** (0.086) 1.110*** (0.162) 1.012*** (0.082)

SP 6 1.389*** (0.085) 0.662*** (0.178) 1.003*** (0.082) SP 7 1.377*** (0.085) 1.004*** (0.165) 0.548*** (0.085)

SP 8 0.855*** (0.089) 0.711*** (0.176) SP 9 0.082 (0.102) 0.710** (0.274) Accented  Time of trial 0.019* (0.009) Accented  SP 2 Accented  SP 3 Accented  SP 4 Accented  SP 5 Accented  SP 6 Accented  SP 7

0.115 (0.091) 1.449*** (0.123)

Accented  SP 8 0.572*** (0.140) Accented  SP 9 0.589*** (0.168) Contact w non-native  SP 2 Contact w non-native  SP 3 Contact w non-native  SP 4 Contact w non-native  SP 5 Contact w non-native  SP 6 Contact w non-native  SP 7 Contact w non-native  SP 8 Contact w non-native  SP 9 Time of trial  SP 2

0.370** (0.139) 0.550* (0.216)

Time of trial  SP 3 Time of trial  SP 4 0.016* (0.008) Time of trial  SP 5 0.037*** (0.007) Time of trial  SP 6 0.030*** (0.007) Time of trial  SP 7 0.037*** (0.007) 0.018** (0.006) Time of trial  SP 8 0.057*** (0.008) Time of trial  SP 9 0.070*** (0.009) Variance at each level Subject 31% 44% 20% 15%

Trial

8%

12%

SP 69% 48% 80% 73%

Explained variance at each level

Subject 0.456 0.028 0.070 0.047

Trial

0.000

0.000

SP 0.054 0.077 0.068 0.158 Total variance explained 0.179 0.053 0.069 0.122 BIC 0.223 0.509 1.150 0.743

Notes. The default category for comparison are: Accent type –Native; Presentation mode –Auditory; SP = SP1; Contact w non-native –No regular contact with non-native speakers. BIC = Bayesian information criterion; SE = standard error. Initially, nonsignificant explanatory variables were removed to preserve degrees of freedom without increasing omitted variable bias. Some explanatory variables were initially significant but were no longer significant after addition of interaction terms; these variables remain in the model for proper interpretation of the results. *p < .05; **p < .01; ***p < .001.

Table 6. The frequency distribution (relative frequency in parentheses) of intrusions in each of the conditions in Experiments 1 and 2 across matching with misperceptions or not

Accent Presentation mode ISI Matched with misperceptions Not matched with misperceptions

Accented Auditory 150 ms 1,060 (59.4%) 724 (40.6%) Native Auditory 150 ms 376 (31.1%) 834 (68.9%) Accented Bimodal 150 ms 98 (25.3%) 289 (74.7%) Native Bimodal 150 ms 17 (9.8%) 157 (90.2%) Accented Auditory 4 s 1,188 (84.0%) 227 (16.0%) Native Auditory 4 s 318 (66.0%) 164 (34.0%)

Note. ISI = interstimulus interval.

Table 7. The frequency distribution (relative frequency in parentheses) of intrusions in each of the conditions in Experiments 1 and 2 across matching with misperceptions or not and similar sounding to the stimulus words or not Similar Sounding Dissimilar Sounding Accent Presentation mode ISI Matched with misperceptions Not matched with misperceptions Total Matched with misperceptions Not matched with misperceptions Total Accented Auditory 150 ms 881 (49.40%) 99 (5.55%) 980 (54.9%) 179 (10.00%) 625 (35.0%) 804 (45.1%) Native Auditory 150 ms 366 (30.20%) 87 (7.19%) 453 (37.4%) 10 (0.83%) 747 (61.7%) 757 (62.6%) Accented Bimodal 150 ms 97 (25.10%) 95 (24.50%) 192 (49.6%) 1 (0.26%) 194 (50.2%) 195 (50.4%) Native Bimodal 150 ms 17 (9.77%) 70 (40.20%) 87 (50.0%) 0 (0.00%) 87 (50.0%) 87 (50.0%) Accented Auditory 4 s 1,052 (74.40%) 22 (1.55%) 1,074 (75.9%) 136 (9.61%) 205 (14.5%) 341 (24.1%) Native Auditory 4 s 318 (66.00%) 25 (5.19%) 343 (71.2%) 0 (0.00%) 139 (28.8%) 139 (28.8%)

Note. ISI = interstimulus interval.

categorized into matching with misperceptions or not; the corresponding frequency distribution is displayed in Table 7. As expected, chi-square tests of independence showed significant associations between accent and similarity with stimuli in intrusions during the auditory presentations with 150-ms ISIs, w 2 (1, N = 2,994) = 88.4, p < .001, and 4-s ISIs, w 2 (1, N = 1,897) = 4.27, p < .038. A higher proportion of intrusions was phonologically similar to the stimuli for accented words than native words in both auditory groups regardless of ISIs.

Intrusions for accented words during auditory or bimodal presentation did not differ with respect to similarity to stimuli, w 2 (1, N = 2,171) = 3.62, p > .05. Compared to the bimodal presentation, a much higher proportion of similar sounding intrusions for accented words during auditory presentation matched with misperceptions, w 2 (1, N = 1,172) = 180, p < .00001. This result confirms that similar sounding intrusions for accented words from the auditory presentation condition were likely misperceptions.

On the other hand, intrusions from the bimodal condition were equally likely to sound similar to the stimuli even though they did not stem from misperceptions. This intrusion of similar sounding words during the bimodal condition likely occurred during memory processing rather than perceptual processing. This implies that the similar sounding words were strongly activated during recall. This result is consistent with Cho and Feldman’s (2013) finding

in a memory recognition task: false alarms of phonologically similar words for both native and accented words presented bimodally.

General Discussion

This study disentangled the factors that might contribute to impaired memory for foreign-accented words –misperception/recognition failure and disruption of memory encoding. The auditory group in Experiment 1 tended to misidentify accented words and showed more intrusions in the recall of accented words than native words. The bimodal group almost identified the accented word perfectly with the synchronized visual display in the SR task to aid recognition. The bimodal group showed fewer intrusions than the auditory group, supporting our hypothesis that increased intrusions in accented word recall was partially due to successful recall of misperceptions. However, the bimodal group still showed more intrusions for accented words than native words, implying that foreign accents exert other detrimental effects on recall in addition to inducing recognition difficulty. The bimodal condition yielded more omission and order errors at the late and middle SPs, respectively. Experiment 2 used longer ISIs and demonstrated that the extra processing costs incurred by accents on phonological

encoding also account for the poorer accented word recall. Participants showed fewer intrusions and repetitions during accented word recall, especially in the middle and late SPs. During accented word recall, fewer order and omission errors occurred in middle and late SPs, respectively. Longer ISIs provided sufficient time to resolve accent-induced mismatches, so that participants could complete identification, encoding, and rehearsal of the accented words, particularly in the middle SPs, without interference from incoming stimuli.

Successful recall of misperceptions accounted for a higher proportion of intrusions for accented words than for native words, especially during auditory presentation. Intrusions for accented words from the auditory and bimodal presentation modes did not differ with respect to similarity to stimuli, but a much higher proportion of these similar sounding intrusions matched with misperceptions during auditory presentation than bimodal presentation. These similar sounding intrusions for accented words in the bimodal group were not a result of misperceptions, implying that the similar sounding words were strongly activated during recall.

Consistent with past research by Gill (1994) and Pickel and Staller (2012), the current results showed that foreign accents impaired memory. As the intelligibility of accented words in these previous studies was not measured, the possibility that accented word “perception” failure partially contributed to impaired recall performance could not be excluded. The current results suggest that accented words were likely to be misperceived or not recognized. Although these are actually perception errors, subsequent accurate encoding and retrieval of the misperceived words would then be incorrectly considered memory errors. Thus, it is particularly crucial for memory researchers studying foreign-accented speech to measure the intelligibility of the foreign-accented stimuli.

The current findings contrasted with previous findings from Cho and Feldman (2013) that recall of foreignaccented words and less intelligible words were better. A possible explanation for this discrepancy with our findings is that participants in that study were given visual feedback on word identification and given enough time for recognizing and encoding the words into memory. This study disentangled the impacts of visual feedback and extra processing time on accented word recall. With only synchronized visual display without extra processing time, recall of accented words still showed more intrusions compared to native words in the current study. Aligned with the effortfulness hypothesis, even when recognition was successful, foreign-accented speech induced effortful perceptual processing, like noise-degraded or synthetic speech. The increased effort in perceptual processing of accented speech appeared to drain the cognitive

resources deployed for phonological encoding of words and interfere with subsequent memory processes (Cousins, Dar, Wingfield, & Miller, 2014; Francis & Nusbaum, 2009; Wild et al., 2012). This effort likely results in poorer representation of accented words in memory, which might explain the mis-recall of similar sounding words that were not misperceptions in the bimodal presentation. Further research is needed to determine whether the intrusion of similar sounding words occur during encoding, storage, or retrieval.

This study also isolated the impact of accents on disrupting the encoding of items into memory. With shorter ISIs, there were more intrusions and repetitions for accented word recall, especially in middle and late SPs, and more order and omission errors in middle and late SPs. Without adequate time to process accented words, subsequent items can disrupt phonological encoding of earlier items into memory, as well as encoding of relational information between items. The observed trade-off between processing and storage in handling foreign-accented speech can be explained by the Ease of Language Understanding (ELU) model (Ronnberg et al., 2013). The ELU model emphasizes the important role of WM in online language processing and its interaction with LTM, especially for listening in adverse conditions. According to the ELU model, WM capacity is required for explicit compensatory processing, such as inference-making, semantic integration, and inhibiting irrelevant information. When lexical access is delayed by a mismatch between the speech signal and the listeners’ representation in LTM (Ronnberg et al., 2013), these explicit processing mechanisms are slower (in seconds) and are supported by the modality-general verbal WM that is limited and shared by memory operations and other higher-level cognitive functions.

The ELU model can also account for additional omission and order errors in the bimodal condition compared to the auditory condition in Experiment 1. Integrating the visual cue with the auditory input in the bimodal condition places extra demands on the shared pool of modality-general WM, thereby leaving less WM for phonological encoding and encoding relational information among items. This finding sho wed that synchronized visual cues might not be the best compensatory strategies for improvised accented speech. Other compensatory strategies that do not draw from the same pool of WM resources would be preferable.

We explore whether the current findings can be explained by the item-order trade-off observed in serial recall and recognition of long and short words (Hendry & Tehan, 2005). Item errors include intrusions, omissions, and repetitions. Serial recall of short words was more accurate than long words, as less time is needed for processing short words, leaving more time available for encoding their order information. But long words were recognized more

accurately than short words, likely due to the additional time needed for processing long words, resulting in substantial item processing. The item-order trade-off implies that increased item processing comes at the expense of order processing (Hendry & Tehan, 2005). In the current study, accented words were more difficult to process than native words, so more item processing was needed for accented words. If there is an item-order tradeoff, accented words are expected to have fewer item errors and more order errors than native words. But accented words showed more intrusions than native words, with no significant difference in order errors in either presentation mode with 150-ms ISIs. Accented word recall showed more intrusions and omissions, but fewer order errors than native word recall with 4-s ISIs. The item-order trade-off does not seem to account for the different results in the auditory and bimodal conditions either. The bimodal presentation had a differential influence across types of item errors. Based on Figure 1, the bimodal group had fewer intrusions across all SPs but more omissions only at late SPs. With the aid of visual display, the bimodal condition was expected to reduce overall item processing, leaving more resources for processing order information. Contrary to this prediction, more order errors were observed in the bimodal condition.

The current findings also have implications for echoic memory. With long ISIs, participants showed fewer order and omission errors at the final SPs. This result aligns with the findings by Nygaard et al. (1995) that variation in talker characteristics resulted in improved serial recall at 4-s ISIs. Superior recall of words at the final SPs can be attributed to its additional acoustic representation in echoic memory (Conrad & Hull, 1968). When given sufficient time, listeners could fully encode details of the speaker’s accent in echoic memory and use it as distinctive temporal order cues for serial recall (Nygaard et al., 1995). Findings from this study should be considered in light of some limitations. Like many previous studies on serial recall (Frankish, 2008; Roodenrys & Miller, 2008; Vitevitch, Chan, & Roodenrys, 2012), this study used the same set of stimuli across trials to facilitate comparison across experiments. This might create an interference effect that worsens recall performance (Baddeley, 1966). Further studies can use a larger set of stimuli spoken by multiple speakers to increase generalizability of the findings to other speakers, accents, and words, as well as minimizing the influence of perceptual adaptation to speakers and accent on performance.

In summary, the present findings suggest that foreign accents impact word recognition and serial recall by causing misperception and disrupting memory encoding. Effortful perceptual processing of accented speech can interfere with subsequent memory processes by exhausting

the limited shared pool of modality-independent WM. Given the crucial role of WM in processing foreignaccented speech, further studies can examine compensatory strategies for accented speech processing that require less engagement of WM. The relation between individual differences in WM capacity and their variability in accented speech recall also needs further studies.

Electronic Supplementary Material

The electronic supplementary material is available with the online version of the article at https://doi.org/10.1027/ 1618-3169/a000430

ESM 1. Data (.xlsx) The study design and data for the perceptual identification tasks and the serial recall tasks for each condition in Experiments 1 and 2.

References

Baddeley, A. D. (1966). The influence of acoustic and semantic similarity on long-term memory for word sequences. The Quarterly Journal of Experimental Psychology, 18, 302–309. https://doi.org/10.1080/14640746608400047 Benjamini, Y., Krieger, A. M., & Yekutieli, D. (2006). Adaptive Linear Step-up Procedures That Control the False Discovery Rate. Biometrika, 93(3), 491 –507. Bent, T., & Bradlow, A. R. (2003). The interlanguage speech intelligibility benefit. Journal of the Acoustical Society of America, 114, 1600–1610. https://doi.org/10.1121/1.1603234 Boersma, P., & Weenink, D. (2009). Praat: Doing phonetics by computer. (Version 5.1.05) [Computer program]. Retrieved from http://www.praat.org/ Caramazza, A., Yeni-Komshian, G. H., Zurif, E. B., & Carbone, E. (1973). The acquisition of a new phonological contrast: The case of stop consonants in French-English bilinguals. Journal of the Acoustical Society of America, 54, 421 –428. https://doi. org/10.1121/1.1913594 Chan, K. Y., & Vitevitch, M. S. (2015). The influence of neighborhood density on the recognition of Spanish-accented words. Journal of Experimental Psychology: Human Perception and Performance, 41 , 69–85. https://doi.org/10.1037/a0038347 Charlton, C., Rasbash, J., Browne, W. J., Healy, M., & Cameron, B. (2017). MLwiN (Version 3.00). Bristol, UK: Centre for Multilevel Modelling, University of Bristol. Cho, K. W., & Feldman, L. B. (2013). Production and accent affect memory. The Mental Lexicon, 8, 295–319. https://doi.org/ 10.1075/ml.8.3.02cho Clarke, C. M., & Garrett, M. F. (2004). Rapid adaptation to foreignaccented English. The Journal of the Acoustical Society of America, 116, 3647–3658. https://doi.org/10.1121/1.1815131 Conrad, R., & Hull, A. J. (1968). Input modality and the serial position curve in short-term memory. Psychonomic Science, 10, 135–136. https://doi.org/10.3758/bf03331446 Cousins, K. A., Dar, H., Wingfield, A., & Miller, P. (2014). Acoustic masking disrupts time-dependent mechanisms of memory encoding in word-list recall. Memory & Cognition, 42, 622–638. https://doi.org/10.3758/s13421-013-0377-7

Crowder, R. G., & Morton, J. (1969). Precategorical acoustic storage (PAS). Perception & Psychophysics, 5, 365–373. https://doi.org/10.3758/bf03210660 Flege, J. E., & Hillenbrand, J. (1984). Limits on phonetic accuracy in foreign language speech production. The Journal of the Acoustical Society of America, 76, 708–721. https://doi.org/ 10.1121/1.391257 Francis, A. L., & Nusbaum, H. C. (2009). Effects of intelligibility on working memory demand for speech perception. Attention, Perception, & Psychophysics, 71 , 1360–1374. https://doi.org/ 10.3758/app.71.6.1360 Frankish, C. (2008). Precategorical acoustic storage and the perception of speech. Journal of Memory and Language, 58, 815–836. https://doi.org/10.1016/j.jml.2007.06.003 Gill, M. M. (1994). Accent and stereotypes: Their effect on perceptions of teachers and lecture comprehension. Journal of Applied Communication Research, 22, 348–361. https://doi. org/10.1080/00909889409365409 Glanzer, M., & Cunitz, A. R. (1966). Two storage mechanisms in free recall. Journal of Verbal Learning and Verbal Behavior, 5, 351 –360. https://doi.org/10.1016/S0022-5371(66)80044-0 Goldstein, H. (2011). Multilevel statistical models (Vol. 922). West Sussex, United Kingdom: John Wiley & Sons. Hendry, L., & Tehan, G. (2005). An item/order trade-off explanation of word length and generation effects. Memory, 13, 364–371. https://doi.org/10.1080/09658210344000341 Henson, R. N. A. (1998). Item repetition in short-term memory: Ranschburg repeated. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 1162–1181. https://doi. org/10.1037/0278-7393.24.5.1162 Hurlstone, M. J., Hitch, G. J., & Baddeley, A. D. (2014). Memory for serial order across domains: An overview of the literature and directions for future research. Psychological Bulletin, 140, 339–373. https://doi.org/10.1037/a0034221 Imai, S., Walley, A. C., & Flege, J. E. (2005). Lexical frequency and neighborhood density effects on the recognition of native and Spanish-accented words by native English and Spanish listeners. The Journal of the Acoustical Society of America, 117, 896–907. https://doi.org/10.1121/1.1823291 Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear and hearing, 19, 1 –36. https://doi.org/10.1097/00003446-199802000-00001 Mattys, S. L., Davis, M. H., Bradlow, A. R., & Scott, S. K. (2012). Speech recognition in adverse conditions: A review. Language and Cognitive Processes, 27, 953–978. https://doi.org/10.1080/ 01690965.2012.705006 McCormack, T., Brown, G. D. A., Vousden, J. I., & Henson, R. N. A. (2000). Children’s serial recall errors: Implications for theories of short-term memory development. Journal of Experimental Child Psychology, 76, 222–252. https://doi.org/10.1006/jecp. 1999.2550 Munro, M. J., & Derwing, T. M. (1995a). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45, 73–97. https://doi.org/ 10.1111/j.1467-1770.1995.tb00963.x Munro, M. J., & Derwing, T. M. (1995b). Processing time, accent, and comprehensibility in the perception of native and foreignaccented speech. Language and Speech, 38, 289–306. https:// doi.org/10.1177/002383099503800305 Munro, M. J., & Derwing, T. M. (1998). The effects of speaking rate on listener evaluations of native and foreign-accented speech. Language Learning, 48, 159–182. https://doi.org/10.1111/ 1467-9922.00038 Murdock, B. B. J. (1962). The serial position effect of free recall. Journal of Experimental Psychology, 64, 482–488. https://doi. org/10.1037/h0045106

Experimental Psychology (2019), 66(1), 40–57 Neisser, U. (2014). Cognitive psychology: Classic edition. New York, NY: Taylor & Francis. Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1995). Effects of stimulus variability on perception and representation of spoken words in memory. Perception & Psychophysics, 57, 989–1001. https://doi.org/10.3758/bf03205458 Perception Research Systems. (2007). Paradigm. Retrieved from http://www.paradigmexperiments.com Pickel, K. L., & Staller, J. B. (2012). A perpetrator’s accent impairs witnesses’ memory for physical appearance. Law and Human Behavior, 36, 140–150. https://doi.org/10.1037/ h0093968 Rabbitt, P. M. A. (1968). Channel-capacity, intelligibility and immediate memory. The Quarterly Journal of Experimental Psychology, 20, 241 –248. https://doi.org/10.1080/14640746808400158 Reed, M. (2000). He who hesitates: Hesitation phenomena as quality control in speech production, obstacles in non-native speech perception. Journal of Education, 182, 67–91. https:// doi.org/10.1177/002205740018200306 Reinisch, E., & Holt, L. L. (2014). Lexically guided phonetic retuning of foreign-accented speech and its generalization. Journal of Experimental Psychology: Human, Perception and Performance, 40, 539–555. https://doi.org/10.1037/ a0034409 Riazantseva, A. (2001). Second language proficiency and pausing: A study of Russian speakers of English. Studies in Second Language Acquisition, 23, 497–526. https://doi.org/10.1017/ S027226310100403X Ronnberg, J., Lunner, T., Zekveld, A., Sorqvist, P., Danielsson, H., Lyxell, B., ... Rudner, M. (2013). The Ease of Language Understanding (ELU) model: Theoretical, empirical, and clinical advances. Frontiers in Systems Neuroscience, 7, 31. https://doi. org/10.3389/fnsys.2013.00031 Roodenrys, S., & Miller, L. M. (2008). A constrained Rasch model of trace redintegration in serial recall. Memory & Cognition, 36, 578–587. https://doi.org/10.3758/mc.36.3.578 Surprenant, A. M. (1999). The effect of noise on memory for spoken syllables. International Journal of Psychology, 34, 328–333. https://doi.org/10.1080/002075999399648 Surprenant, A. M. (2007). Effects of noise on identification and serial recall of nonsense syllables in older and younger adults. Aging, Neuropsychology, and Cognition, 14, 126–143. https:// doi.org/10.1080/13825580701217710 Temple, L. (2000). Second language learner speech production. Studia Linguistica, 54, 288–297. https://doi.org/10.1111/1467- 9582.00068 Van Engen, K. J., & Peelle, J. E. (2014). Listening effort and accented speech. Frontiers in Human Neuroscience, 8, 577. https://doi.org/10.3389/fnhum.2014.00577 Van Wijngaarden, S. J. (2001). Intelligibility of native and nonnative Dutch speech. Speech Communication, 35, 103–113. https://doi.org/10.1016/S0167-6393(00)00098-4 Vitevitch, M. S., Chan, K. Y., & Roodenrys, S. (2012). Complex network structure influences processing in long-term and short-term memory. Journal of Memory and Language, 67, 30–44. https://doi.org/https://doi.org/10.1016/j.jml.2012.02. 008 Vitevitch, M. S., & Luce, P. A. (1999). Probabilistic phonotactics and neighborhood activation in spoken word recognition. Journal of Memory and Language, 40, 374–408. https://doi. org/10.1006/jmla.1998.2618 Wild, C. J., Yusuf, A., Wilson, D. E., Peelle, J. E., Davis, M. H., & Johnsrude, I. S. (2012). Effortful listening: The processing of degraded speech depends critically on attention. The Journal of Neuroscience, 32, 14010–14021. https://doi.org/10.1523/ JNEUROSCI.1528-12.2012

Witteman, M. J., Weber, A., & McQueen, J. M. (2013). Tolerance for inconsistency in foreign-accented speech. Psychonomic Bulletin & Review, 21 , 512–519. https://doi.org/10.3758/ s13423-013-0519-8

History Received January 19, 2018 Revision received September 14, 2018 Accepted September 17, 2018 Published online February 19, 2019

Acknowledgment The authors also wish to thank several undergraduate research assistants, including Emily Wingate, Catherine Mathers, Ashley Heberling, Mariah Hawes, Erin Lee, and Allison Isrin, for their help with data collection.

Open Data The study design and data are available in the Electronic Supplementary Material, ESM 1.

ORCID Kit Ying Chan

https://orcid.org/0000-0002-5386-9020

Kit Ying Chan Department of Social and Behavioural Sciences Academic 1 Y7419 City University of Hong Kong Tat Chee Avenue Kowloon Hong Kong vivien.chanky@cityu.edu.hk

Appendix A

Analysis Equation for Experiment 1

For the vector Error ytij , the error type y (intrusion, omission, repetition, order) at serial position i in trial j by person k occurs with an expected value via the Logit or Probit link function (F) of the grand mean intercept β y , with unexplained components at the person-, trial-, and serial position-levels (or residuals) for the outcome variable y (g yk , f yjk . e yijk ).

Error yijk ¼ β y þ e yijk þ f yjk þ g yk

þ β y1 Contact N onnative S peaker yk þ β y2k Word I dentification E rror Rate yjk

þ β y3k Accented S peech yjk

þ β y4k Bimodal yjk þ β y5k Time o f t rial yjk

þ β y6jk Serial P osition yijk

þ β yxk Trial i nteractions yjk : ð1Þ

Appendix B

Details for Phonological Analyses of Intrusions and Misperceptions

Misperceptions from the perceptual identification tasks and intrusions from the serial recall tasks in Experiments 1 and 2 were phonologically transcribed and compared with those of the nine stimulus words. Misspelling, transpositions of letters, and typographical errors that involved a single letter in the intrusion were cleaned up and corrected in specific conditions: (a) The omission of a letter in a word was corrected only if the response did not form another English word, and (b) the transposition or addition of a single letter in the word was corrected if the letter was within one key of the target letter on the keyboard.

For each of the conditions in Experiments 1 and 2, phonological transcriptions of the intrusions, misperceptions, and the nine stimulus words were then compared to determine if they were exact match or phonologically similar. Two words are considered to be phonologically similar if addition, deletion, or substitution of a phoneme in one word forms the other word (Luce & Pisoni, 1998). For example, the word cat is phonologically similar to the words _at, scat, fat, cot, and cap.

This article is from: