Tone deafness by John Sloboda

Musicae Scientiae Spring 2008, Vol XII, n° 1, 000-000

Establishing an empirical profile of self-defined “tone deafness”: Perception, singing performance and self-assessment KAREN J. WISE* AND JOHN A. SLOBODA School of Psychology Keele University

• ABSTRACT Research has suggested that around 17% of Western adults self-define as “tone deaf” (Cuddy, Balkwill, Peretz & Holden, 2005). But questions remain about the exact nature of tone deafness. One candidate for a formal definition is “congenital amusia” (Peretz et al., 2003), characterised by a dense music-specific perceptual deficit. However, most people self-defining as tone deaf are not congenitally amusic (Cuddy et al., 2005). According to Sloboda, Wise and Peretz (2005), the general population defines tone deafness as perceived poor singing ability, suggesting the need to extend investigations to production abilities and selfperceptions. The present research aims to discover if self-defined tone deaf people show any pattern of musical difficulties relative to controls, and to offer possible explanations for them (e.g. perceptual, cognitive, productive, motivational). 13 self-reporting “tone deaf” (TD) and 17 self-reporting “not tone deaf” (NTD) participants were assessed on a range of measures for musical perception, cognition, memory, production and self-ratings of performance. This paper reports on four measures to assess perception (Montreal Battery of Evaluation of Amusia), vocal production (songs and pitch-matching) and self-report. Results showed that the TD group performed significantly less well than the NTD group in all measures, but did not demonstrate the dense deficits characteristic of “congenital amusics”. Singing performance was influenced by context, with both groups performing better when accompanied than unaccompanied. The TD group self-rated the accuracy of their singing significantly lower than the NTD group, but not disproportionately so, and were less confident in their vocal quality. The TD participants are not facing an insurmountable difficulty, but are likely to improve with targeted intervention.

INTRODUCTION

Over the last few decades of music research, evidence has been growing that the ability to engage with the music of one’s culture is a fundamental human characteristic. It seems we may be “hard-wired” to be musical. Infancy research is uncovering a number of musical processing predispositions, with infants appearing to perceive music in fundamentally similar ways to adults (see Trehub, Schellenberg & Hill, 1997; Trehub 2003 for reviews). Meanwhile, neuroscientific evidence has identified neural networks specialised for music processing (Peretz, 2003) and illuminated sophisticated implicit abilities in adult non-musicians (Koelsch, Gunter, Friederici & Schröger, 2000). Increasingly, it appears there can be no hard and fast distinction between those who “have it” and those who “don’t”, the “musical” and the “unmusical”. Instead, there is a continuum linking the skills of the average person, who may have little or no musical training, to elite professionals (Howe, Davidson & Sloboda, 1998). Unfortunately, notions of “talent”, and therefore also “lack of talent” are often 1

associated with individual differences in musical ability. Music is particularly susceptible to a talent explanation of success. O’Neill (2002) reported that children were more ready to ascribe success in music to innate abilities, than success in sport. The fallout from this discourse of “talent” is that large numbers of people in our society believe they lack the capacity to be musical. In a recent large-scale survey of Canadian undergraduates, 17% considered themselves to be “tone deaf ” (Cuddy, Balkwill, Peretz & Holden, 2005), suggesting that this belief is widespread. The term “tone deaf ” is, however, problematic, in the sense that it is essentially a lay-term, with no agreed definition. The term is one of many used indiscriminately in the literature to refer to a range of anomalous musical behaviours, and implies some underlying disability. However, the categorisation of people into musically “able” and “disabled” on the basis of behaviour can be problematic and inappropriate (see for example the evidence for developmental and contextual variability in children’s singing described in Welch, 2001). It has been suggested that “tone deafness” may be a myth (Kazez, 1985) and that it is simply a misnomer for underdeveloped skill, perhaps due to lack of experience or inappropriate learning environments (Welch, 2001). However, recently evidence has been growing that a small minority of people might indeed have an innate musical impairment. Isabelle Peretz and colleagues have identified a dense perceptual deficit they have called “congenital amusia” (Ayotte, Peretz & Hyde, 2002; Peretz, Ayotte, Zatorre, Mehler, Ahad, et al., 2002). Those affected have severe difficulties in a variety of basic musical tasks, such as detecting wrong notes in a melody, recognising familiar tunes and tapping with the beat. However, they have no difficulties in identifying other environmental sounds or processing speech prosody. The basis of congenital amusia is thought to be a neurological anomaly affecting the processing of fine-grained pitch information (Peretz et al., 2002). The current assessment measure for congenital amusia is the Montreal Battery of Evaluation of Amusia (MBEA), which reliably distinguishes between people with congenital amusia and the general population (Peretz, Champod & Hyde, 2003). It covers the main perceptual aspects of music processing, including melody, meter, rhythm and incidental memory. The MBEA was originally designed for use with brain injured patients, and is conceptually based on a model of music processing derived primarily from clinical studies of acquired amusias (Peretz et al., 2003). The term “congenital” is used to differentiate between acquired amusias and musical deficits occurring in the absence of brain trauma, or indeed any more general cognitive or organic impairment. However this term must for the time being be regarded with scepticism, since empirical evidence for a genetic basis is currently lacking, and as yet there has been no childhood research that would identify early signs and a developmental course. It is not certain exactly how many people are affected by congenital amusia. One estimate frequently quoted in the literature is around 4% of the Western population. However, this figure comes from a much earlier paper by Kalmus and Fry (1980), and refers to what they called “tune deafness” or “dysmelodia”. This estimate therefore cannot be relied upon and may be on the high side, as the criteria used by Kalmus and Fry were less rigorous than those that have since been specified for congenital amusia. There are as yet no available data for the occurrence of congenital amusia in the population, as currently defined and measured. Nonetheless, what is evident is that the numbers of people claiming to be tone deaf are much higher than any estimate of the occurrence of musical processing deficits. As mentioned above, Cuddy and colleagues (2005) found that 17% of their sample of over 2000 university students self-defined as tone deaf. When 100 of these 17% were tested on the MBEA, the majority scored normally. The present research aims to explore the possible explanations for this discrepancy. Initially, two obvious possibilities appear. One is that people claiming to be tone deaf do not actually have any musical difficulties relative to the general population, but for some reason believe they do. The second is that they are impaired

Establishing an empirical profile of self-defined “tone deafness” KAREN J. WISE AND JOHN A. SLOBODA

in some domain, but this is not identified by the MBEA. However, in order to begin untangling these issues, it is first necessary to address the question of what people actually mean when they talk of themselves or others being “tone deaf ”. In a recent study, 15 people between the ages of 18 and 70 took part in semistructured interviews about their musical abilities, and their ideas about tone deafness (Sloboda, Wise & Peretz, 2005). They had previously answered a screening questionnaire in which they were asked how musical they thought they were, and whether they considered themselves tone deaf. Participants were recruited to reflect a range of responses to those questions. Some participants self-defined as tone deaf, whereas others did not, and self-ratings of musicality ranged from “not at all” to “extremely”. Three main findings emerged from thematic analysis of the interview transcripts. First, “tone deafness” was primarily associated with perceived difficulties with singing. This is not surprising, but previous research in this area has largely focused on perception, with relatively little attention given to the productive elements of reported musical difficulties. Second, participants did not use the term “tone deafness” simply to describe a lack of musical skill and experience. Being “unmusical” was often associated with a lack of opportunity or training, and was therefore seen as open to improvement. By contrast, being “tone deaf ” was not only specifically associated with singing difficulties, but was also often seen as permanent. Furthermore, people sometimes described themselves as both musical and tone deaf at the same time, for example they might play an instrument or feel they have a sense of rhythm, but feel unable to sing. Third, people’s self-assessments had an intensely social nature, being based on comparisons with other people, or other people’s judgements. Negative feedback on their singing, or the fear of it, was salient to those who self-defined as tone deaf. For that reason they often reported acute embarrassment and the avoidance of singing when others might hear them. Along the same lines, Cuddy et al. (2005, p. 317) reported that self-defined tone deaf participants “rated their vocal abilities more negatively than the NTD [self-defined not tone deaf ], and rated attraction to, and engagement with, music less positively than the NTD.” These findings are in agreement with other research that has demonstrated the links between negative musical-social experiences, musical self-concept and musical behaviour. A belief that one is musically impaired can have consequences for a person’s musical engagement (e.g. avoidance, inhibition), and such beliefs can be socially generated and maintained, for example through negative judgements of one’s singing (Knight, 1999; Lidman-Magnusson, 1997). The present research therefore aims to keep in focus the role of self-perception and beliefs in perceived tone deafness, alongside a more thorough investigation of the associated musical difficulties. To this end, we have designed a comprehensive battery involving both perceptual and performance tests, for use with musically untrained adults. Also included are self-report measures, to gain an insight into how people’s self-assessments relate to their performance skills. We aim to address two research questions. First, are there any patterns of musical difficulties shown by adults who consider themselves tone deaf? Second, if there are, can the precise patterns suggest functional explanations for the difficulties? Table 1 shows the categories of tasks included in the battery, and some simple possible outcome patterns. The functional explanations they would imply can be seen on the left. For example, if a deficit were found on all measures except basic vocal control, this would imply a strong perceptual deficit (row 1). This is the pattern that would be expected for true amusics. If a deficit were found on the vocal tasks only, as in row 3, then this would imply some production or sensorimotor deficit. If there were no observed deficits on any task, as in the last row, then false attribution would be a likely explanation. Analysis of performance in individual tasks will allow these broad categories of explanation to be further broken down, for example,

distinguishing between low-level (e.g. organic or motor) and higher-level (e.g. cognitive or sensorimotor) explanations of singing difficulties. TABLE 1 The focus of this paper is singing performance, together with aspects of the selfreport measures and the MBEA scores.

METHOD

PARTICIPANTS Participants were 13 self-defined “tone deaf ” (TD) and 17 self-defined “not tone deaf ” (NTD) participants, all undergraduate psychology students at Keele University, UK. They took part in return for course credits. Participants were 11 females and 2 males in the TD group, aged 18-21 (M = 19.15, SD = 0.987), and 11 females and 6 males in the NTD group, aged 18-24 (M = 19.41, SD = 1.46). Participants were recruited by a combination of poster advertising and personal invitation following a short screening survey. This was distributed to the first-year cohort, asking “Do you consider yourself to be musical?” and “Do you consider yourself to be tone deaf?” Both questions were answered on a scale of 1-5 from “not at all” to “extremely”. Participants scoring themselves highly for tone deafness (4 or 5) were personally invited to participate in the study. This was done because they may have been reluctant to respond to usual advertising. Participants were made fully aware of the nature of the study before agreeing to take part. The average years of music involvement were 1.92 (SD = 2.72) for the TD group and 4.65 (SD = 4.97) for the NTD group. This difference was not significant. Music involvement was defined as vocal and/or instrumental practical activity (for example in an orchestra, band or choir), or learning to play an instrument (including self-taught) or sing. In addition, all participants reported engaging in informal singing activities, such as singing to themselves or along to the radio, even if only in private. MEASURES AND PROCEDURE The categories of measures reported here are as follows: - Singing (matching pitches and short patterns; the song “Happy Birthday”) - Perception (MBEA) - Self-report (self-assessment of performance; background questionnaire) These measures are part of a larger battery of tests designed to assess basic musical capacities in perception and production. The overriding concern for the design of the battery was to allow participants to give of their best. The tasks were therefore designed to be achievable, and not intimidating to people without musical training. The researcher, who is the primary author of this paper, is a singing teacher experienced in working with unconfident adult beginners. Before completing the reported singing tasks participants were first taken through a graded series of basic vocal tasks starting with speech, and progressing gently into sung sounds. This was done by way of a warm-up, and to help participants feel comfortable, but will also provide valuable information about participants’ vocal skills for future analysis. Participants were tested individually in a sound-attenuated room. Each testing session lasted approximately two hours, including breaks as needed. Water was provided. Montreal Battery of Evaluation of Amusia (MBEA) The MBEA is described in detail in Peretz et al. (2003) hence a brief description is given here. It comprises six subtests assessing the main aspects of musical processing as follows: Melodic: Scale Contour Interval 4

Establishing an empirical profile of self-defined “tone deafness” KAREN J. WISE AND JOHN A. SLOBODA

Temporal: Rhythm Meter Memory: Incidental memory Stimuli are derived from 30 specially written melodies, composed according to Western tonal rules and delivered with digitized piano timbre. Test items are designed such that responses are binary choice. In the first four subtests listed above, melodies are presented twice, the second time either with or without a change, and participants are required to respond “same” or “different”. In the Meter subtest participants hear an extended harmonised version of each melody and are required to identify whether it is a “waltz” or a “march”. The incidental memory test appears last in the testing sequence and requires participants to distinguish melodies previously heard in the battery from unheard foils. In all subtests participants are provided with practice examples on which they are given feedback. Average performance in the general population is extremely high at around 88% (Peretz et al., 2003). By contrast, congenital amusics perform poorly, often around chance level. The criterion adopted by Peretz and colleagues for congenital amusia is a score of two standard deviations or more below the mean of the general population. Singing measures: songs In singing songs, participants first sang a song of their own choice. They were asked to choose something they liked and knew well, and to sing it twice to give them the chance to do their best. Participants were then asked to sing the traditional song “Happy Birthday”, chosen because it was known to all participants, and because of the (infamous) upward octave leap in the middle. They sang “Happy Birthday” twice unaccompanied starting on a pitch of their own choice. They then sang it again, accompanied by the researcher playing on a digital piano, once starting on a comfortable pitch that had been previously identified for that person, and once starting on a different pitch. This was one tone (two semitones) either higher or lower than previously. Accompaniment consisted of the melody played in three octaves: at the participant’s actual pitch, plus one octave higher and one octave lower for audibility. All the song performances were rated for accuracy by two independent judges, each an accomplished musician and singer, and experienced in working with amateur voices through teaching or choir training. Judges heard the performances in random order and were not aware which performances were by TD participants. Quantitative assessment of pitch accuracy in the song performances was not undertaken in this study. We believe that a rating scale is more ecologically valid than a purely mathematical analysis in this context. In addition, most of the child research has used rating scales, and this kind of analysis allows closer comparison between adult singing and child development. This connection is important, given the social nature of singing and singing judgements. Our mathematical analysis of pitch accuracy in the vocal pitch-matching task (see below) provides a useful comparison. Finally, as demonstrated below, there was high inter-rater reliability using this scale. TABLE 2 Table 2 shows the rating scale used by the expert judges. The first four levels are informed by Welch, Rush and Howard’s developmental continuum of singing skill (1991) and Rutkowski’s (1990) scale of voice use. It has been suggested that adults who have difficulty with singing in tune may have been halted at some early stage in their singing development (Lidmann-Magnusson, 1997; Welch, 1994a). This notion is the rationale for using a developmental framework as the basis for singing assessment in this study. However, both the scales mentioned were developed to describe children’s singing. In Welch’s sequence, beyond the level of pitch stability within individual phrases (our level 4), there is only one further category, namely accurate singing or “no significant melodic or pitch errors”. Since the present research is with adults, we added categories to distinguish between performances at 5

higher levels of accuracy. For example, level 5 incorporates key “drift” and levels 6 and 7 both deal with mistunings of the occasional note, but differentiate by the degree of mistuning. Singing measures: Pitch matching This task required participants to sing back, to the neutral syllable /na/, 6 single pitches, and four each of patterns of 2, 3 and 5 notes. Stimuli were pre-recorded by a male and a female model, so that each participant responded to a model of their own gender. Models were musicians who could sing accurately and clearly, but were not vocally trained, so as to provide stimuli with little vibrato and a timbre as close as possible to participants’ own voices. This was done in light of research showing that the vocal model can affect the accuracy of inexperienced singers. Children’s accuracy is best when they respond to a child model, slightly worse when responding to an adult female model singing at the same pitch, and worst when responding to an adult male model singing an octave lower (Green, 1989). This suggests that both timbre and octave of vocal models play a role. Vibrato has also been shown to adversely affect the pitch-matching accuracy of uncertain singers (Yarborough, Bowers & Benson, 1992). Stimuli had an overall range of one octave, from B3 (B below middle C) to B4 for women and B2 to B3 for men, using the notes of the D major scale. However, no leaps were greater than a perfect 5th (7 semitones). Patterns were composed to make musical sense, i.e. to sound tuneful. This kind of test is not new (cf. Welch, 1994b), but as far as the authors are aware it has always been presented in an echo format, so that the participant hears the pattern then sings it back during silence. However, this may not reflect the kind of singing activities most people engage in, namely sing-along activities, such as singing with CDs or the radio, or with other people, for example at church or sporting events. Singing along may improve pitch accuracy by allowing people to compare their output against a concurrent external reference. However, Minami (1994) reported that children were less accurate when singing along to a recording, possibly because they devoted less attention to their accuracy. Therefore, along with a standard echo condition, a “synchronised” (sync) condition was included in which participants heard the pattern or note sung, then sang it back along with a replay of the same pattern. To help participants synchronise with the second playing, it was preceded each time by two “clicks” in tempo, though participants were not assessed on the accuracy of their timing. All participants performed both conditions and the order of presentation was counterbalanced. Stimuli in both conditions were presented in blocks of increasing length, starting with single notes and ending with 5-note patterns. There were two practice items at the beginning of each new block. The internal order of stimuli within each block was varied across conditions. The sung responses were analysed for pitch accuracy. The fundamental frequency (in Hertz) of each produced note was extracted using the voice analysis software Praat, 1 and the difference calculated between the participant’s produced note and the model. In so doing, readings were transformed to give final difference scores in cents (100ths of a semitone). Negative values indicated notes sung lower in pitch than the model, and positive values indicated notes sung higher than the model. The absolute values of these differences were averaged to give each participant’s mean deviation from the model in cents. Eight mean scores were calculated for each participant, one for each stimulus length (1, 2, 3 or 5 notes) in each of the two conditions (echo and sync). Self-report measures Participants rated their own vocal performances by answering these questions on a scale of 1-7: a) How accurately do you think you sang the tune? (By accurate I mean whether you think you got the notes right). (Very inaccurately-Very accurately)

Establishing an empirical profile of self-defined “tone deafness” KAREN J. WISE AND JOHN A. SLOBODA

b) To what extent did you feel in control of the quality of the sound that you were able to produce? (Not at all-Completely) c) How did you think you did compared to how an average person of your age would do on the task? (Much worse-Much better) These questions were answered on paper as the testing session proceeded.

RESULTS

PERCEPTION (MBEA) A t-test showed that the TD group achieved significantly lower overall scores on the MBEA than the NTD group (t(28) = 1.859, p = .037). However, the difference was very small, with the TD group having an overall mean score of 81.04% (SD = 6.45) and the NTD group 85.58% (SD = 6.77). FIGURE 1 Figure 1 shows the profile of scores for the TD group, the NTD group and a group of people with congenital amusia (from Peretz et al., 2003) on the MBEA. As can be seen, the profile of the TD group is very similar to the NTD group, both distributions being negatively skewed, and very dissimilar to the rather flat distribution of the congenital amusics’ scores, which are on average much lower. SINGING: PITCH MATCHING Results for the pitch matching battery showed that there was an outlier in the TD group, performing consistently much less accurately than the rest of the group. The data from this participant were excluded from the analysis. Also excluded were the data from one of the five-note patterns, which was difficult to remember for over half the participants in both groups. Participants frequently sang an incorrect but musically plausible tune, resulting in large deviations from the model for this pattern, biasing their results. The scores for the five-note stimuli are therefore derived from the remaining three five-note patterns. FIGURE 2 FIGURE 3 Figure 2 shows results for the echo condition, and Figure 3 the synchronised condition. Scores represent average cents deviation from the model, therefore lower scores mean greater accuracy. A three-way (2 × 2 × 4) mixed ANOVA was employed, with the between-participants factor of group (TD and NTD) and the withinparticipants factors of response condition (echo and sync) and stimulus length (1, 2, 3 and 5 notes). Where assumptions of sphericity were violated, the more stringent Greenhouse-Geisser adjustment is reported. There were three main effects. First, there was a main effect of group, with the TD group (M = 40.00) being less accurate overall than the NTD group (M = 24.55), F(1, 27) = 10.90, p = .003. There was also an effect of stimulus length, with accuracy decreasing as stimuli got longer, F(2.008, 54.229) = 36.32, p < .001. Lastly, there was a main effect of condition, with the synchronised condition (M = 28.49) performed more accurately than the echo condition (M = 36.06), F(1, 27) = 8.98, p = .006. There were also three significant two-way interactions. First, there was a significant group by length interaction, F(2.008, 54.229) = 5.90, p = .005, with the TD group being more disadvantaged than the NTD group by the longer stimuli. The TD were aided much more than controls by singing along, as shown by the group by condition interaction, F(1,27) = 5.77, p = .023. Singing along mitigated the effect of the 5-note stimuli, especially for the TD group (condition by length

interaction, F(1.771, 47.804) = 4.70, p = .017). It is also interesting to note that as shown by the large standard deviations (Fig. 2), the TD were very variable in their performance in the echo condition, especially in five-note stimuli. This variability was much reduced in the sync condition (Fig. 3). SINGING: “HAPPY BIRTHDAY” As shown in Table 1, inter-rater reliabilities were 0.79 and above between the two expert judges for each of the four performances of “Happy Birthday”. TABLE 3 Scores were therefore averaged across the two judges, and since there were no significant differences between participants’ first and second performances in each condition (accompanied and unaccompanied), the mean of the two performances was taken. TABLE 4 An extra inter-rater reliability check was done after collapsing the scores of first and second performances. Table 4 shows that it remained high, even when the groups were considered separately, being above 0.8 for each type of “Happy Birthday” performance (accompanied or unaccompanied) in each group (TD and NTD). FIGURE 4 Figure 4 shows the median ratings for the groups’ performances of “Happy Birthday” in the two conditions. The data did not meet assumptions for parametric tests, the distributions being differentially non-normal across the groups/conditions and therefore not correctable by transformation. Judges’ ratings for both groups in accompanied performance were negatively skewed, while for unaccompanied performance the TD group’s scores were positively skewed. Therefore nonparametric tests were used. The results of a Mann-Whitney U test showed that the TD group (Mdn = 3.75) were rated significantly lower in accuracy than NTD group (Mdn = 5.25) in unaccompanied performance (U = 47.00, p = .002). The TD group (Mdn = 5.5) also ranked significantly lower than the NTD group (Mdn = 6.5) in accompanied performance (U = 44.5, p = .002). A Wilcoxon Signed Ranks Test showed that both groups significantly improved their performances when they had accompaniment (TD group: Z = –1.773, p = .04; NTD group: Z = –2.383, p = .007). SELF-RATINGS Figure 5 shows participants’ self-ratings for their “Happy Birthday” performances in accompanied and unaccompanied mode. For each there are three ratings — accuracy of the tune, the participant’s feeling of control over their vocal quality, and how well they felt they did compared to average. FIGURE 5 A series of three 2 × 2 mixed ANOVAs were carried out, one for each rating type (accuracy, quality and performance compared to average), with the withinparticipants factor of mode (accompanied and unaccompanied), and the betweenparticipants factor of group (TD and NTD). Results for accuracy ratings showed that the TD group (M = 3.69) rated themselves significantly lower than the NTD group (M = 4.62) overall, F(1,28) = 4.467, p = .044. There was no significant effect of mode and no interaction. Results for voice quality ratings also showed a

Establishing an empirical profile of self-defined “tone deafness” KAREN J. WISE AND JOHN A. SLOBODA

significant effect of group, with the TD group (M = 3.27) being less confident about their vocal quality and their feeling of control over it than the NTD group (M = 4.62) F(1,28) = 13.126, p = .001. Again, there was no significant effect of mode and no interaction. Results for ratings of performance compared to average did not show any significant effects or interaction. Relationship of self-ratings to performance To examine how participants’ self-ratings related to their actual performance, a correlation was carried out between participants’ and judges’ ratings of accuracy in the “Happy Birthday” performances. As judges’ ratings were non-normally distributed, the non-parametric Spearman’s Rho was used. Overall the participants had moderate success in judging their accuracy for both the unaccompanied (Rho = .440, df = 28, p = .015) and unaccompanied (Rho = .492, df = 28, p = .006) conditions. Lastly, we need to determine whether the TD group’s lower ratings of themselves are a realistic reflection of their performance, or whether they are underestimating themselves. To assess this, partial correlations were carried out to find the relationship between group and self-ratings for accuracy, controlling for actual performance as rated by the judges. For both conditions, there was no relationship between group and self-ratings when performance was controlled for (unaccompanied: R = .104, df = 27, p = .590; accompanied: R = .122, df = 27, p = .528). Therefore the difference between the groups’ accuracy self-ratings disappears when their actual level of performance is taken into account.

DISCUSSION

The results showed differences in performance between the two groups, both in perception and singing, with the TD group scoring significantly lower on objective measures (singing “Happy Birthday”, vocal pitch matching, MBEA). However, as a group the TD participants are not congenitally amusic. The results from the MBEA replicate the findings of Cuddy et al. (2005) that the self-defined tone deaf score marginally less well than those self-defining as not tone deaf, but with a profile of scores very similar to average and very dissimilar to people with congenital amusia. Our results as well as those of Cuddy et al. produced the negatively skewed profile for TD and NTD groups that is typical of normal performance on the MBEA. Like Cuddy et al. we also found that for the NTD group the distribution peaked at 8589%. This is slightly lower than the peak of the control participants in Peretz et al. (2003), which was 90-94%, but nonetheless the majority of participants scored very highly. There may be some self-defined tone deaf people who are genuinely amusic, and indeed Cuddy and colleagues identified a small number of these potential “true amusics” in their TD group. However, in their study as well as our own there were low scorers present in both groups, so perhaps this warrants further investigation. What is not clear is the exact reason for the small but significant difference between the TD and NTD groups in overall MBEA scores. One possibility put forward by Cuddy et al. is that the NTD may have more highly developed mental schemata for music. There was evidence in their own study that the TD participants were less motivated to seek out and engage with music than the NTD, and that individuals reporting high listening interest were “privileged on the MBEA tests” (p. 320). The results from the pitch matching battery also support the conclusion that the “tone deaf ” participants do not have basic perceptual problems, as they can sing as accurately as controls on shorter stimuli, especially when accompanied. It must also be noted that the accuracy of both groups overall was extremely good, with the mean deviations from target pitches being in most cases less than half a semitone, and therefore small in musical terms. The only exceptions were the TD group’s five-note patterns in unaccompanied mode, and even here the mean deviations were still within a semitone. The accuracy of around 20-30 cents on single notes is also 9

considerably better than previous studies have shown for the (untrained) general population, with typically reported deviations of over a semitone on single pitches (Murry, 1990; Murry & Zwiner, 1991; Amir, Amir & Kishon-Rabin, 2003). Watts, Moore and McCaghren (2005) reported that untrained singers judged as accurate in song-singing matched single pitches with an average deviation of 0.93 semitones, while those judged as inaccurate deviated by 2.2 semitones on single pitches. The greater accuracy obtained in the present study may be due to the use of human voice stimuli rather than the digital tones typically used in other studies. The TD group also performed better than we might expect true amusics to do. Even when singing “Happy Birthday” unaccompanied, the tone deaf group by and large reproduced the contour of the tune correctly, and achieved melodic accuracy within phrases. Furthermore, throughout the singing tasks, even the simple scaffolding of accompaniment or singing along to a model significantly improved their performance. The large variability in singing accuracy displayed by the TD group was also markedly reduced by singing along, implying that far from being beyond help, those who performed least accurately when singing alone were most helped by support. The self-defined tone deaf are therefore not suffering from an insurmountable deficit, but are likely to benefit from some kind of targeted intervention. If the self-defined tone deaf participants are not suffering from a perceptual deficit, it might be asked what accounts for their poorer singing performance. In the pitch-matching tasks the TD participants are more disadvantaged than the NTD group by longer patterns, yet do improve with accompaniment. This might suggest that they simply have not encoded the tunes efficiently on a first hearing, and the accompaniment provides a prompt. The MBEA only assesses recognition memory, not reproductive memory, and it may be possible to succeed in a recognition task without having a strong enough representation of the tune to reproduce it accurately. This possibility might be investigated by giving participants repeated hearings of stimuli before asking them to sing. However, the same pattern of improvement with accompaniment is seen in “Happy Birthday” performances, a song that all participants know well. It might be said though, that listening to any gathering of people singing “Happy Birthday” sometimes makes one wonder if everyone learns it accurately in the first place, and this is something for future research. In any case, singing is a complex skill and involves far more than just remembering the tune. Singing accurately when unaccompanied requires both precise memory for the pitch (or sequence of pitches) and the ability to compare one’s actual output with the intended one, when the original target is no longer present (Welch, 1985). It is perhaps this more complex combination of exact pitch (and interval) memory and self-monitoring in which the TD participants lack skill. The accompaniment may be facilitating the selfmonitoring process by providing an audible target. A further related issue for these participants is of course vocal skill. While singing accompanied improves the TD group’s performance it does not bring it quite to the level of the NTD group, and the TD group also report low confidence in their vocal control. Future analysis of participants’ vocal use in these and other tasks in the battery will aim to clarify these issues. With regard to the role of self-assessments, the results show that the TD group is not disproportionately negative when their actual level of performance is taken into consideration. It seems they are quite good at making consistent judgements of their own performance within this specific frame of reference. A self-label of tone deafness therefore does reflect less accurate singing performance as well as perceived singing difficulties. It must also be noted that although the TD group’s self-ratings for accuracy and vocal control were lower than the NTD group’s, mean ratings for both groups fell between 3 and 5 on a seven-point scale. The differences were therefore not large and the TD group did not seem to be reporting extremely negative views of their performance. However, the relationship between self-concept and

Establishing an empirical profile of self-defined “tone deafness” KAREN J. WISE AND JOHN A. SLOBODA

performance is complex and though every effort was made to help participants feel at ease, negative self-beliefs may also be contributing to the TD group’s performance. For example, the expectation of doing badly may lead to tension and inhibition of the respiratory system and vocal mechanism, leading to sub-optimal performance, low self-assessments and thus establish a negative cycle. If we want to improve people’s musical self-concept, then perhaps we should focus not on momentary selfassessments of performance, but on the more lasting (mis)attributions people make for their level of skill. The label of tone deafness may often carry with it the limiting assumption of permanent impairment. However, this research has demonstrated the possibility of improvement of singing skill in those who believe themselves tone deaf. In conclusion, most people who self-define as tone deaf do not have a perceptual deficit, but do sing less accurately and feel less confident in their singing than average controls. However, they are able to make accurate judgements about the quality of their singing, and can improve with appropriate scaffolding. Challenging the label of “tone deafness” may involve changing people’s belief that their difficulties are caused by a permanent impairment, through demonstrating this possibility of improvement.

ACKNOWLEDGEMENTS

Graham Welch for guidance with the pitch matching battery, Alexandra Lamont for her helpful feedback on the research, Alinka Greasley and Richard Laing for help with preparation of stimuli, and all the participants who agreed to sing. We would also like to thank Lola Cuddy and an anonymous reviewer for their helpful comments on an earlier draft.

Address for correspondence: Karen J. Wise and John A. Sloboda School of Psychology Keele University Keele, Staffordshire, ST5 5BG, UK e-mail: k.j.wise@psy.keele.ac.uk j.a.sloboda@psy.keele.ac.uk

• REFERENCES Amir, O., Amir, N., & Kishon-Rabin, L. (2003). The effect of superior auditory skills on vocal accuracy. Journal of the Acoustical Society of America, 113, 1102-8. Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital Amusia: A group study of adults afflicted with a music-specific disorder. Brain, 125, 238-51. Cuddy, L. L., Balkwill, L., Peretz, I., & Holden, R. R. (2005). Musical difficulties are rare: A study of “tone deafness” among university students. In G. Avanzini, L. Lopez, S. Koelsch, & M. Majno (eds), The Neurosciences and Music II: From perception to performance. Annals of the New York Academy of Sciences, 1060, 311-21. Green, G. A. (1989). The effect of vocal modelling on pitch-matching accuracy of elementary school children. Journal of Research in Music Education, 38, 225-31. Howe, M. J. A., Davidson, J. W., & Sloboda, J. A. (1998). Innate talents: Reality or myth? Behavioural and Brain Sciences, 21, 399-407. Kalmus, H., & Fry, D. B. (1980). On tune deafness (dysmelodia): Frequency, development, genetics and musical background. Annals of Human Genetics, 43, 369-82. Koelsch, S., Gunter, T., Friederici, A. D., & Schröder, E. (2000). Brain indices of music processing: “Non-musicians” are musical. Journal of Cognitive Neuroscience, 12, 520-41. Knight, S. (1999). Exploring a cultural myth: What adult non-singers may reveal about the nature of singing. In B. A. Roberts & A. Rose (eds), The phenomenon of singing (pp. 144-54). St John’s, Newfoundland: Memorial University Press. Kazez, D. (1985). The myth of tone deafness. Music Education Journal, 71, 46-7. Lidmann-Magnusson, B. (1997). Factors influencing singing development in poor pitch singers. Proceedings of the third triennial ESCOM conference, 339-43. Minami, Y. (1994). Some observations on the pitch characteristics of children’s singing. In G. Welch & T. Murao (eds), Onchi and singing development: A cross-cultural perspective (pp. 18-24). London: The Roehampton Institute / David Fulton Publishers. Murry, T. (1990). Pitch-matching accuracy in singers and nonsingers. Journal of Voice, 4, 317-21. Murry, T., & Zwiner, P. (1991). Pitch matching ability of experienced and inexperienced singers. Journal of Voice, 5, 197-202. O’Neill, S. (2002). The self-identity of young musicians. In R. MacDonald, D. Hargreaves, & D. Miell (eds), Musical Identities (pp. 79-96). Oxford University Press. Peretz, I. (2003). Brain specialization for music: New evidence from congenital amusia. In I. Peretz & R. Zatorre (eds), The cognitive neuroscience of music (pp. 192-203). New York: Oxford University Press. Peretz, I., Ayotte, J., Zatorre, R. J., Mehler, J., Ahad, P., Penhune, V., et al. (2002). Congenital amusia: A disorder of fine-grained pitch discrimination. Neuron, 33, 185-91. Peretz, I., Champod, A. S., & Hyde, K. (2003). Varieties of musical disorders: The Montreal Battery of Evaluation of Amusia. Annals of the New York Academy of Sciences, 999, 5875. Rutkowski, J. (1990). The measurement and evaluation of children’s singing voice development. The Quarterly Journal of Teaching and Learning, 1, 81-95. Sloboda, J. A., Wise, K. J. & Peretz, I. (2005). Quantifying tone deafness in the general population. In G. Avanzini, L. Lopez, S. Koelsch, & M. Majno (eds), The Neurosciences and Music II: From perception to performance. Annals of the New York Academy of Sciences, 1060, 255-61. Trehub, S. (2003). Musical predispositions in infancy: An update. In I. Peretz & R. Zatorre (eds), The cognitive neuroscience of music (pp. 3-20). New York: Oxford University Press. Trehub, S., Schellenberg, G., & Hill, D. (1997). The origins of music perception and cognition: A developmental perspective. In I. Deliège & J. Sloboda (eds), Perception and cognition of music (pp. 103-28). Hove, UK: Psychology Press. Watts, C., Moore, R & McCaghren, K. (2005). The relationship between vocal pitch-matching skills and pitch discrimination skills in untrained accurate and inaccurate singers. Journal of Voice, 19, 534-43. Welch, G. F. (1979). Poor pitch singing: A review of the literature. Psychology of Music, 7, 50-8. Welch, G. F. (1985). A schema theory of how children learn to sing in tune. Psychology of Music, 13, 3-18. Welch, G. F. (1994a). Onchi and singing development: Pedagogical implications. In G. Welch &

Establishing an empirical profile of self-defined â&#x20AC;&#x153;tone deafnessâ&#x20AC;? KAREN J. WISE AND JOHN A. SLOBODA

T. Murao (eds), Onchi and singing development: A cross-cultural perspective (pp. 82-94). London: The Roehampton Institute / David Fulton Publishers. Welch, G. F. (1994b). The assessment of singing. Psychology of Music, 22, 3-19. Welch, G. F. (2001). The misunderstanding of music: An inaugural lecture. University of London, Institute of Education. Welch, G. F., Rush, C., & Howard, D. M. (1991). A developmental continuum of singing ability: Evidence from a study of five-year-old developing singers. Early Child Development and Care, 69, 107-19. Yarborough, C., Bowers, J., & Benson, W. (1992). The effect of vibrato on the pitch-matching accuracy of certain and uncertain singers. Journal of Research in Music Education, 40, 3038.

* Escom Young Researcher Award 2006. (1) Paul Boersman and David Weenink, Institute of Phonetic Sciences, University of Amsterdam, www.praat.org.

Establishing an empirical profile of self-defined â&#x20AC;&#x153;tone deafnessâ&#x20AC;? KAREN J. WISE AND JOHN A. SLOBODA

Table 1 Outcome predictions

X = poor performance relative to controls or norms A = average or above average performance relative to controls or norms

Table 2 Rating scale for singing accuracy in song performance

Table 3 Interrater reliabilities between two judges for the four Happy Birthday performances

* Significant at the 0.001 level

Table 4 Interrater reliabilities between two judges for mean accompanied and unaccompanied Happy Birthday ratings, split by group

* Significant at the 0.001 level

Establishing an empirical profile of self-defined “tone deafness” KAREN J. WISE AND JOHN A. SLOBODA

Figure 1. Distribution of MBEA scores in three groups.

Error bars show standard deviations. Figure 2. Echo condition. Mean deviation in cents from model pitches.

Error bars show standard deviations Figure 3. Sync condition. Mean deviation in cents from model pitches.

Figure 4. Median expert ratings of â&#x20AC;&#x153;Happy Birthdayâ&#x20AC;? performances.

Establishing an empirical profile of self-defined “tone deafness” KAREN J. WISE AND JOHN A. SLOBODA

Error bars show standard deviations Figure 5. Self-ratings of accompanied and unaccompanied “Happy Birthday” performances for accuracy, vocal quality and performance compared to average.