Effects of Culture on Musical Pitch Perception

Patrick C. M. Wong; Valter Ciocca; Alice H. D. Chan; Louisa Y. Y. Ha; Li-Hai Tan; Isabelle Peretz

doi:10.1371/journal.pone.0033424

Abstract

The strong association between music and speech has been supported by recent research focusing on musicians' superior abilities in second language learning and neural encoding of foreign speech sounds. However, evidence for a double association—the influence of linguistic background on music pitch processing and disorders—remains elusive. Because languages differ in their usage of elements (e.g., pitch) that are also essential for music, a unique opportunity for examining such language-to-music associations comes from a cross-cultural (linguistic) comparison of congenital amusia, a neurogenetic disorder affecting the music (pitch and rhythm) processing of about 5% of the Western population. In the present study, two populations (Hong Kong and Canada) were compared. One spoke a tone language in which differences in voice pitch correspond to differences in word meaning (in Hong Kong Cantonese, /si/ means ‘teacher’ and ‘to try’ when spoken in a high and mid pitch pattern, respectively). Using the On-line Identification Test of Congenital Amusia, we found Cantonese speakers as a group tend to show enhanced pitch perception ability compared to speakers of Canadian French and English (non-tone languages). This enhanced ability occurs in the absence of differences in rhythmic perception and persists even after relevant factors such as musical background and age were controlled. Following a common definition of amusia (5% of the population), we found Hong Kong pitch amusics also show enhanced pitch abilities relative to their Canadian counterparts. These findings not only provide critical evidence for a double association of music and speech, but also argue for the reconceptualization of communicative disorders within a cultural framework. Along with recent studies documenting cultural differences in visual perception, our auditory evidence challenges the common assumption of universality of basic mental processes and speaks to the domain generality of culture-to-perception influences.

Citation: Wong PCM, Ciocca V, Chan AHD, Ha LYY, Tan L-H, Peretz I (2012) Effects of Culture on Musical Pitch Perception. PLoS ONE 7(4): e33424. https://doi.org/10.1371/journal.pone.0033424

Editor: Alexander Borst, Max-Planck Institute of Neurobiology, Germany

Received: September 29, 2011; Accepted: February 13, 2012; Published: April 11, 2012

Copyright: © 2012 Wong et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work is supported by grants from the National Science Foundation (BCS-0719666) and the National Institutes of Health (R01DC008333, R21DC007468, R03HD051827, and R21DC009652) awarded to PW, by grants from the Natural Sciences and Engineering Research Council of Canada, Canada Institute of Health Research and a Canada Research Chair to IP, and by a grant from College of Humanities, Arts & Social Sciences, Nanyang Technological University (M58100050), awarded to AHDC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The present study examines how differences in cultural backgrounds affect the way in which people perceive auditory signals. Specifically, we focus on language, a prominent aspect of culture, and on pitch, a perceptual attribute that not only forms the basic building blocks of music, but also conveys crucial information about talker identity, spoken emotion, and in some instances word meaning (in the case of tone languages). We investigate how speakers of a language in which pitch is used to mark word meaning (Cantonese Chinese speakers) differ from those who do not speak such a language (English and French speakers) when processing musical pitch.

Definitions of culture often include systems of communication (e.g., language), visual and performing arts (e.g., music), religions, and social norms [1]. Often excluded from these definitions are basic mental processes, such as sensory perception, memory and attention [2], [3]. Interestingly, there is now growing evidence suggesting that some aspects of perceptual processing also differ among people from different societies [4]–[6]. Much of the evidence comes from studies of visual recognition memory and eye-tracking, in which participants view pictures with a clear focal object in the context of a background scene [7]. Westerners (mostly comprised of North Americans, but also including Western Europeans) in these studies have a higher rate of attending to the foreground object whereas East Asians (Koreans, Japanese, and Chinese) attend to both the foreground object and the background. In addition, cultural differences in worldview, representations of self, and even thinking styles have been documented extensively: East Asians tend to be more collective, interdependent, and holistic, while Westerners tend to be more individualistic, independent, and analytic [8]–[10]. These broader and higher-level cultural differences have been used to explain differences in visual perception. For example, because East Asians view the world more holistically, they see both the background and foreground object in a picture, and because Westerners are more analytic, they focus more on the salient foreground object alone [11].

The present study focuses specifically on one prominent aspect of culture, language, and examines how experience in speaking different languages may affect auditory perception, which is itself the foundation of spoken language processing. We examine speakers of tone and non-tone languages. In tone languages, pitch is used to signal word meaning in addition to phrasal meaning through intonation (“speech melody”), whereas in non-tone languages such as English and French, pitch is used to signal intonation only. Languages that use lexical pitch are estimated to account for about 70% of the world's languages [12], and include Cantonese and Mandarin Chinese. For example, in Mandarin, the syllable /ma/ can mean ‘mother’ or ‘to scold’ depending on whether it is spoken with a high or falling pitch pattern, respectively. In languages where pitch is used at the phrasal level, changing pitch cannot change word meaning, but can signify the speaker's emotion and intent, as well as the declarative/interrogative status of the phrase. Participants in our study included Cantonese speakers recruited in Hong Kong, China and Singapore, as well as English and French speakers recruited in Montreal, Canada and Singapore. We focused on a dominant aspect of auditory processing, pitch, with rhythm as a control condition, by using the On-Line Identification Test of Congenital Amusia [13]. Previous research has been conducted on amusia, a condition affecting music processing which is sometimes known as tone deafness [14] and dysmelodia [15]. In typically developing children, sensitivity to aspects of music including pitch and rhythm can be observed before one year of age [16]; however, congenital amusia is found in about 4–6% of the Western (non-tone language speaking) population [13], [15], and is associated with neural [17]–[19] and genetic [20], [21] factors.

The ascending auditory pathway contains neural structures that are associated with increasing levels of complexity of sound processing [22]. What makes the connection between musical processing (including amusia) and lexical tone processing a particularly interesting line of inquiry is that it offers a unique opportunity for understanding the convergence, divergence, and interactions of types of auditory processing along this pathway, especially in relation to the association between language and music [23], [24], [25], [26], [27], [28], [29]. More specifically, as pitch is used as a primary functional unit in both music and tone languages, questions arise as to whether extensive experience with one type of pitch processing may influence another type of pitch processing, and under what circumstance (or at which level of the auditory pathway) mutual influence occurs. This question has gained much attention in recent years, including studies that found musical training to facilitate lexical tone perception [30], tone language learning [31], [32], [33], and the encoding of lexical tone patterns in the rostral brainstem [24], [34], [35], [36]. What remains to be investigated is whether extensive experience with tone languages influences musical pitch processing, including on listeners with no obvious pitch deficits. Acknowledging the fact that “music” has an intricate combinatorial property that spans beyond its basic building block of pitch, we define musical pitch processing as the processing of musical melodies that require the integration of local and longer distance pitch syntactic information. This definition also acknowledges the important contributions of pitch intervals [37], [38], [39], [40] and rapid frequency modulations [41], [34], [35], [36] that when combined following musical combinatorial rules, build music. We believe an investigation of language-to-music relationship investigations will provide evidence for a true double association of pitch processing in music and speech.

Participants in the current study completed the On-Line Identification Test of Congenital Amusia (Hong Kong version) in Experiment 1, the original version of which was normed in Montreal on native Canadian English and Canadian French speakers. This test has three conditions [13] that require detection of incongruities in short melodies composed of Western tonal keys and rhythmic structures. The Off-beat condition assesses rhythm perception and requires participants to detect melodies in which temporal incongruities are present. The Out-of-Key and Mistuned conditions both assess pitch perception, with the former containing notes that are tuned correctly but violate the tonal (syntactic) relationships given the key of the melody, and the latter containing notes that are mistuned by a quarter tone in addition to syntactic violations. The psychometrics of this test have been reported elsewhere [13]. Sample melodies from each condition are presented in Figure 1. Participants in Hong Kong completed a version of this test in which the written instructions were translated into Chinese, and their performance was compared with those of the Montreal participants. Given that pitch is used more extensively in tone than in non-tone languages, we investigate not only if amusia exists in tone language speaking populations [42], [43], but also whether tone language speakers have better pitch (but not rhythm) perception skills relative to non-tone language speakers as a group regardless of whether they demonstrate poor music perception abilities. We found that Cantonese speakers show enhanced ability in processing musical pitch in the Out-of-Key condition relative to those who speak a non-tone language. As will be discussed, both groups of participants performed at near-ceiling in the Mistuned condition in Experiment 1. As such, we performed a follow-up experiment (Experiment 2) by increasing the difficulty of the task to determine whether group differences exist in judging mistuned musical notes.

Download:

Figure 1. Example of a melody in the On-Line Identification Test of Congenital Amusia.

Conditions for the melody are shown: no incongruity (A), time incongruity (B: ∥ refers to the silence of 5/7 of the beat duration), mistuned pitch incongruity (C: the circled note refers to the mistuned pitch), and out-of-key pitch incongruity (D).

https://doi.org/10.1371/journal.pone.0033424.g001

Experiment 1

Methods

Ethics Statement.

All experimental procedures for both experiments were approved and all participants provided informed consent in accordance with the Institutional Review Boards of Northwestern University, University of Hong Kong, and Nanyang Technological University of Singapore.

Participants.

Hong Kong participants were recruited by posting an advertisement on the University of Hong Kong internet communication system during spring-summer 2007 and winter-spring 2008. Potential participants were asked to encourage their friends and family members outside of the university to participate. All Hong Kong participants self-reported that Chinese was their native language and that they had no known hearing or brain deficits. Characteristics of the Canadian participants were reported in Peretz et al. (2008) [13]. Comparisons of basic characteristics of the Hong Kong and Canadian participants are described in the Supporting Information S1.

Because normative data of the Hong Kong version of the On-Line Identification Test of Amusia have not been reported previously, we include such data in the Supporting Information S1. Table S1 in the Supporting Information S1 summarizes the characteristics of the Hong Kong participants. In total, 446 participants completed the entirety of the test, with 408 of them being younger than 40 years old. Because younger and older adults were consistently found to differ in their performance [13], [44], and because very few Hong Kong participants were 40 years old and older, direct comparisons of the Hong Kong and Canadian participants only included those who were younger than 40 years old. However, the Supporting Information S1 provides details of the performance from all Hong Kong participants (Figure S1 & Table S2).

Methods and Materials.

The test materials consisted of a translated version of the On-Line Identification of Congenital Amusia reported in Peretz et al. (2008) [13]. A native Cantonese speaker from Hong Kong who is fluent in both Cantonese and English translated the test from English to idiomatically appropriate Chinese (traditional Chinese characters are used per conventions in Hong Kong). Eight native Cantonese speakers from Hong Kong proofread the translation and ensured cultural appropriateness for Hong Kong. After several revisions, the final version of the Online Identification of Congenital Amusia (Hong Kong Version) was used for testing. The test procedures are identical to Peretz et al. (2008) [13] and will only be described briefly here. Participants completed the test at a location convenient to them over the internet using a standard web browser. They made (in)congruity judgments by using a computer mouse to indicate ‘yes’ or ‘no’ in three test conditions: Off-beat, Out-of-Key, and Mistuned. In the Off-beat condition, half the melodies contained a silence of 5/7 of the beat duration prior to the first downbeat in the third bar of the four-bar melody; in the Out-of-Key condition, the pitch of the same downbeat violated the tonal/syntactic relationship when considering the rest of the melody; and in the Mistuned condition, the same pitches were mistuned by a quarter tone with the addition of a tonal/syntactic violation. In other words, melodies in the Mistuned condition are incongruent in two ways. Figure 1 illustrates the three conditions (taken from Figure 1 of Peretz et al., 2008) [13]. Participants were first tested with the “off-beat” condition followed by the “mistuned” condition and finally the “out-of-key” condition. In each condition, participants were presented with 24 melodies (12 containing no incongruity and 12 containing an incongruity) one at a time, in a random but fixed order. The task was to detect whether an incongruity occurred in each melody, by way of clicking a “yes” button whenever there was an anomaly, and a “no” button when there was none. Participants received 2 examples before each condition and were provided with feedback after these two trials only. The entire test lasted about 15 minutes. After the test, participants were asked to answer a series of questions concerning their health, musical, and educational history.

Calculation of Adjusted Scores.

When comparing amusic participants from the two groups, we adjusted their Out-of-Key scores by degrees of musical training based on the general linear model adjusted for the covariate. The specific formula includes:

ŷ = a+B * (degrees of musical training), where ŷ is the fitted value, a is the intercept, and B is the linear coefficient for musical training.

Adjusted Score = Raw Out-of-Key Score−ŷ

Results

We report here results from participants between 18 and 40 years old, which include 408 participants from Hong Kong (267 females) and 154 participants from Canada (99 females). Table 1 summarizes the results from the two groups of participants. The Supporting Information S1 provides a detailed report. We first report comparisons of the two participant populations broadly, with a focus on the amusic individuals in the second section.

Download:

Table 1. Comparison of Hong Kong and Canadian Younger Participants (18–39 years old only).

https://doi.org/10.1371/journal.pone.0033424.t001

General Populations.

Figure 2 shows group performance for each condition and the Global Score (the Global Score is the composite of the three conditions). To assess group differences, a 2×3 mixed-effects ANOVA was conducted and revealed a main effect of condition [F (2, 1120) = 234.392, p<.001, η_p² = .295], a main effect of group [F (1, 560) = 7.389, p = .007, η_p² = .013], and a significant group×condition interaction [F (2, 1120) = 18. 650, p<.001, η_p² = .032]. Posthoc One-Way ANOVAs performed on each condition revealed no significant group difference in the Off-beat [F (1, 560) = .073, p = .788] and Mistuned [F (1, 560) = .017, p = .896] conditions, but a significant difference in the Out-of-Key condition F (1, 560) = 30.261, p<.001, η_p² = .051] (significant after Bonferroni correction). Thus, the main effect of group was driven by the substantial difference in the Out-of-Key condition. A one-way ANOVA also revealed a significant group difference on the Global Score [F (1, 560) = 7.389, p = .007, η_p² = .013]. It is worth pointing out that over 82% of the participants in each group correctly classified 22 out of 24 stimuli in the Mistuned condition, which suggests that the lack of group difference was likely due to a ceiling effect.

Download:

Figure 2. Comparisons of Hong Kong and Canadian participants (younger participants only) on the Online Amusia Test.

Error bars indicate standard error of the mean. Only the Out-of-Key condition shows a significant group difference (**p<.001); note that after controlling for age, education, and musical training differences, the group difference in the Out-of-Key condition remained. The Global Score also showed a significant group difference (*p = .007).

https://doi.org/10.1371/journal.pone.0033424.g002

The two groups of participants also differed in age, education, and level of musical training (see Supporting Information S1 for details). Therefore, it is important to ascertain that the group difference in the Out-of-Key condition still remained after these factors were controlled. We conducted an ANCOVA with the Out-of-Key score as the dependent variable, group as a random variable, and age, education, and musical training as covariates. The main effect of group remained [F (1, 540) = 5.913, p = .015, η_p² = .011]. Figure 3, which shows participants' Out-of-Key performance divided by musical training, demonstrates that both groups were affected by musical training, but overall Hong Kong participants outperformed Canadian participants. This shows that speakers of a tone language have increased ability in musical pitch (but not rhythm) processing.

Download:

Figure 3. Hong Kong and Canadian younger participants' Out-of-Key performance divided by group.

Musical training is classified into 5 levels: 1 = less than one year, 2 = 1–3 years, 3 = 4–6 years, 4 = 7–10 years, and 5 = more than 10 years.

https://doi.org/10.1371/journal.pone.0033424.g003

It is worth noting that some of our data did not show a normal distribution (see Supporting Information S1). Although the use of parametric statistics was still justified because of our large sample size [45], we performed a non-parametric test (Independent-Samples Mann-Whitney U Test) on all measures, including the Global Score, to further validate our results. Our initial findings were replicated. There was no statistical group difference in the Off-Beat (p = .962) and Mistuned (p = .601) conditions, but a significant group difference was found in the Out-of-Key condition (p<.0001) and the overall Global Score (p = .016).

Amusic Participants.

In defining amusia, it is a common practice to consider participants' overall performance in both pitch and rhythm processing [13], [44]. Following the same procedures as previous research [13], participants' performance in all three conditions were averaged to form a Global Score and amusia was defined as performance below two standard deviations of the mean of this Global Score. Using this criterion, Hong Kong and Canadian participants with a Global Score below 78.4% and 73.9%, respectively, were classified as amusic. We found 3.9% of the Hong Kong participants to be amusic, compared to 5.2% of the Canadians. Note that the use of 2 standard deviations as a criterion simply reflects our effort to conform to previous studies. Alternatively, we also employed a criterion-based approach with a global score of 70%, 80%, and 90% (selected arbitrarily) and found a higher percentage of the Canadian participants to perform below all of these cutoffs (Table 2).

Download:

Table 2. Percentage of participants from each group who performed below three arbitrary cutoff criteria (based on Global Score).

https://doi.org/10.1371/journal.pone.0033424.t002

As the distributions of participant populations were not normal in our study nor in previous studies (see Supporting Information S1), a standard deviation does not readily translate into a fixed percentage of the populations for a meaningful quantitative comparison of amusic participants. This fact is complicated by group differences in degree of musical training. In order to understand and compare differences of the amusic participants in the two groups, we adopted two statistical procedures. We first derived a set of scores adjusted for differences in degrees of musical training (see Methods) from the two groups. We only focused our analyses on the Out-of-Key condition as it is the only condition that showed group differences as discussed. We then compared the performance of the bottom 5% of the participants based on the adjusted scores derived from the Hong Kong (n = 20) and Canadian (n = 7) group and found the lowest-performing (amusic) Hong Kong participants to significantly outperform the Canadian participants [One-way ANOVA: F(1, 25) = 8.448, p<.008, η_p² = . 253].

Responses to Self-Assessment Questionnaire.

Peretz et al. (2003; 2008) [13], [31] observed that three questions on their questionnaire were particularly important for identifying amusic participants. These descriptions are: “I cannot recognize tunes without the help of the lyrics,” “I cannot tell if I sing out of tune” and “I have been told I sing out of tune.” Of the three, the participant's ability to detect whether someone else is singing out-of-tune was reported to most effectively distinguish amusic from non-amusic individuals. Table 3 summarizes the results and shows that in general, more amusics than non-amusics from both populations self-reported ‘yes’ to all three questions. However, this pattern is much less pronounced in the Hong Kong group.

Download:

Table 3. Percentage of responses (proportion of participants) to questions relevant for the identification of amusic individuals.

https://doi.org/10.1371/journal.pone.0033424.t003

Discussion

We found evidence in Experiment 1 that Hong Kong participants outperformed Canadian participants in the Out-of-Key condition but not in the Off-Beat condition, suggesting that Hong Kong participants have elevated melodic pitch perception abilities. Both groups of participants performed at near-ceiling in the Mistuned condition. To determine whether group differences exist in judging mistuned notes, we increased the difficulty level of the task by conducting a follow-up experiment (Experiment 2). In the Mistuned condition in Experiment 1, the incongruent melodies each contained a musical note that was not only mistuned by a quarter-tone, but was also a note that resulted in a tonal/syntactic violation. In Experiment 2, the tonal/syntactic violation was eliminated. New participants were tested.