ERP Responses to Congruent and Incongruent Audiovisual Pairs of Segments from Transforming Speech to Song Phrases Julie Herndon, Auriel Washburn & Takako Fujioka Center for Computer Research in Music and Acoustics (CCRMA), Department of Music, Stanford University , USA
Introduction
Results
The Speech to Song Illusion (SSI) Speech and song are both found within most human societies, with individuals typically capable of differentiating between the two without concerted effort. The Speech-to-Song Illusion occurs when short speech phrases come to be perceived as song after several consecutive repetitions (Deutsch, 2003, 2011). Notably, this illusion occurs for some speech phrases, but not all, suggesting that there is something unique about the characteristics of ‘transforming’ phrases that leads to the eventual perception of song after repetition. Acoustics of Music and Language The investigation of properties associated with transforming and non-transforming speech phrases can therefore provide novel insights into the acoustic properties associated more closely with music than language (e.g., Vanden Bosch der Nederlanden, Hannon, & Snyder, 2015), as well as the relatedness of the perceptual processes associated with each (e.g., Tierney, Dick, Deutsch, & Sereno, 2013). N400 ERP Component in EEG First found to occur in response to linguistic semantic incongruities (Kutas & Hillyard, 1984), the event-related potential (ERP) N400 component identified using electroencephalography (EEG) has also been found to occur for musical semantic incongruities (e.g., Steinbeis & Koelsch, 2008). While many studies have investigated N400 by labeling music with various iconic, indexical, or symbolic meanings (Koelsch, 2011) (Steinbeis and Koelsch, 2011), its occurrence has not been investigated in relationship to SSI. Musical and Linguistic Semantic Congruity: Current study In the current study, several transforming phrases were used to examine the association between the pitch contour and the written form of the words in a shortened segment of the original phrase. In doing this we were interested in knowing whether individuals would exhibit neural activity associated with unexpected pairs of events when presented with pitch contours and written words from different segments of a phrase. More specifically, we expected that the use of EEG would reveal a N400 component when the pitch contours and written words did not come from the same part of the transforming speech phrase. The goal of the present study was therefore to determine whether individuals process semantic congruity between the pitch and word content of a transforming speech phrase in a similar manner to the way in which they process musical and linguistic semantic incongruities separately.
• Our initial interest for this study was in an N400 response to incongruent trials, but it is hard to isolate N400 activity in the current results given the diffused motor activities starting around 200 ms and continuing toward the end of the epoch. The difference waveform for the incongruent and congruent conditions did not reveal any dramatic differences between them. • Notable though slight differences between congruent and incongruent pairs were found at 140 ms. • Given the location of negativity for the topography of the difference at 140 ms, we chose to further examine the difference between the congruent and incongruent waveforms at the POz electrode.
All participant average waveforms of EEG activity at the POz electrode for congruent trials, incongruent trials, and the difference between them.
Hypotheses •
We expected that perceiving the transforming speech phrases as song would result in a linking between pitch contour and word content that would lead to automatic processing of the semantic relation between the two.
•
We therefore predicted that a larger N400 response would occur when individuals were presented incongruent musical and linguistic stimuli (i.e., pitch contours and written words from different segments of the previously heard speech phrase).
Method
Average reaction times (left) and percent responses (right) for trials in which the correct responses were provided. The four trial conditions used are presented as separate for comparison (Congruent: 1-SS, 2-WW; Incongruent: 3SW, 4-WS).
Participants • Five CCRMA affiliates (1 male, 4 female) between 27 and 30 years (M = 28.0, SD = 1.41). • Participants had between 20 and 25 years musical experience (M = 22.0, SD = 2). EEG Recordings • We used the Neuroscan SymAmpRT whole-head with a 64-channel EEG QuikCap. • Sound stimulation delivered to both ears using insert ear phones. Stimuli • Stimuli consisted of 12 transforming phrases from audiobooks used in previous studies of the SSI (Tierney et al., 2013). • Each phrase was split into “strong” and “weak” segments. • Splitting was based an experience of pitch change, a change in volume, or the accentuation of certain syllables. The start of the strong section could be marked by a dramatic change in one or more of these characteristics, the whole section could contain more variation in one or more of the characteristics, or both might occur. Pink highlight indicates strong segments of a phrase. • Following the identification of strong and weak segments associated with each of the 12 phrases, we created 1) equal-tempered contours of piano notes that closely matched the pitch changes of each of the strong and weak segments of the spoken phrases and 2) visual stimuli displaying written representations of the words comprising each of the strong and weak segments. Procedure • During a single experimental trial, participants initially saw a cross on a TV screen in front of them for 800 ms. This was followed by three repetitions of one of the 12 speech phrases, the presentation of which was allowed a total of 6 s. They then heard the piano contour of either the strong or weak segment of the same phrase, allowed a total of 2 s. Last, participants saw the written words associated with either the strong or weak segment of the phrase. • Participants were required to press a button in order to indicate whether the piano contour and written words came from the same segment of the trial (‘yes’ response, congruent) or from different segments (‘no’ response, incongruent). Data analysis • EEG epoch length was 1200ms (-200 to 1000 ms). • Channels with electrical voltage exceeding ±100µV were discarded in each trial.
Contact: jherndon@stanford.edu, auriel@stanford.edu
Average reaction times (left) and percent responses (right) for trials in which the incorrect responses were provided. The four trial conditions used are presented as separate for comparison (Congruent: 1-SS, 2-WW; Incongruent: 3-SW, 4-WS).
• Participants were able to correctly identify corresponding phrase segments and responding correctly more often to Strong-Strong and Weak-Strong stimuli pairs. • Reaction times for trials in which participants made correct responses (as evaluated through the button press measure) were generally shorter in the congruent conditions than the incongruent conditions.
Conclusions 1.
2. 3.
Given the finding that the 400 ms timepoint was mostly influenced by the motor responses participants made, we chose to evaluate a component with a peak around 140 ms. The timing of this response along with the location, as indicated by the topography, suggest that this was likely a visual N1 component. The fact that participants correctly responded to strong targets more often indicates that melodic and syntactic emphasis affects retention and identification of speech segments. Three of the participants had familiarity with the spoken phrases. These included the two participants with the fastest reaction times and the two participants who showed a higher number of correct responses in the WW condition than any of the other three conditions, indicating that previous experience with the stimuli likely affects how an individual perceives and reacts to the stimuli in the current task.
References Deutsch, D., Henthorn, T., and Lapidis, R. (2011). Illusory transformation from speech to song. Journal of the Acoustical Society of America, 129, 2245-2252. Deutsch, D. (2003). Phantom words and other curiosities. Philomel Records. Koelsch, S. (2011). Toward a neural basis of music perception - a review and updated model. Frontiers in Psychology, 2(110), 1-20. Kutas, M., & Hillyard, S.A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307, 161–163. Steinbeis , N, & and Koelsch, S. (2011). Affective Priming Effects of Musical Sounds on Processing of Word Meaning. Journal of Cognitive Neuroscience, 23(3), 604-621. Steinbeis, N., & Koelsch, S. (2008). Comparing the processing of music and language meaning using EEG and fMRI provides evidence for similar and distinct neural representations. PloS one, 3(5), e2226. Tierney, A., Dick, F., Deutsch, D., & Sereno, M. (2013). Speech versus song: multiple pitch-sensitive areas revealed by a naturally occurring musical illusion. Cerebral Cortex, 23(2), 249-254. Vanden Bosch der Nederlanden, C. M., Hannon, E. E., & Snyder, J. S. (2015). Everyday musical experience is sufficient to perceive the speech-to-song illusion. Journal of Experimental Psychology: General, 144(2), e43.