1. Introduction
In English, prosodic prominence (or pitch accents) is associated with several acoustic cues, such as duration, F0, intensity, and spectral emphasis (Beckman, 1986; Breen et al., 2010; Cole et al., 2010, among others). Prominence is assigned to words delivering the semantic or pragmatic meaning of a word in discourse context (Pierrehumbert & Hirschberg, 1990). It tends to occur on content words since they often convey information about what and who are being discussed in discourse. Function words can also be landing locations of prominence if they are under contrastive focus. In Autosegmental-Metrical Theory (Liberman, 1975; Pierrehumbert, 1980), the rightmost content word in a prosodic phrase (i.e., intermediate phrase) carries the nuclear pitch accent. In other words, only one pitch accent is obligatory in an intermediate phrase. More than one pitch accent is optional and can occur depending on rhythm, information status, or focus in discourse context. Hirschberg (1993) examined the distribution of pitch accents in relation to word classes, information status, and focus in various English corpora. She found that pitch accent assignment is most predicted by the parts-of-speech of a word. Information status is not a strong factor for predicting the distribution of pitch accents. Although the parts-of-speech are the strongest factor accounting for pitch accent assignment in the corpora, this does not imply that content words are one-to-one mapped with pitch accents. In the same study, Hirschberg stated that direct mapping between pitch accents and content words is not felicitous and would result in unnatural sounding utterances.
The complex relation between prominence and linguistic factors (e.g., parts-of-speech, rhythm, discourse meaning) may raise difficulties in learning prosody for Korean learners of English (Im, 2019; Lee et al., 2017; Um et al., 2001, among others). Im (2019) investigated the perception of prominence by Korean learners of English and native English speakers. Participants were asked to rate prominence while listening to a speech in real time. Results showed that Korean learners of English rated prominence similarly with native English speakers in relation to the referential, lexical, and contrastive meanings of a word in the speech, but the two groups of speakers differed in that Korean learners of English were more likely to mark prominence on lexically given words. This tendency seems to be related to the parts-of-speech of the lexically given words. All the lexically given words were content words in the speech. Um & her colleagues (2001) examined the production of pitch accents by Korean learners of English compared with native English speakers. Both groups of speakers were asked to read aloud question-answer pairs of sentences. Results showed that native English speakers produced prominence on few content words carrying new information or focus. Native English speakers tended not to assign prominence on repeated expressions or on new expressions where the preceding expressions were accented (i.e., prominence clash condition). Korean learners of English, however, were found to differ from native English speakers. Korean learners of English assigned prominence on every content word and some function words (e.g., I, you, they), regardless of the givenness or rhythm in utterances.
The previous study above suggests that Korean learners of English privilege content words as landing locations of prominence, especially in their production, but it is still unclear how Korean learners of English have established such a relationship between prominence and content words. It is possible that Korean learners of English perceive all the content words as prominent, which leads to the one-to-one association between prominence and content words in their production. Or, it is possible that Korean learners of English perceive prominence on few content words, similarly with native English speakers, but they produce prominence on all the content words, differently from native English speakers. These questions cannot be addressed sufficiently by the production data only and lead us to consider perception data for a better understanding of the mapping between prominence and content words among Korean learners of English.
The current study investigates the perception of prosodic prominence by Korean learners of English compared with native English speakers in relation to word class information in a speech. Both groups of speakers judged prosodic prominence while listening to the speech in real time. Parts-of-speech and three acoustic cues (max F0, mean phone duration, and mean intensity) were obtained for each word in the speech. The three acoustic cues were included in the analysis to control their potential influence on perceived prominence. The current study asks whether Korean learners of English perceive prominence as a function of word classes similarly with native English speakers and attempts to provide an explanation for the association between prominence and content words in the productions of Korean learners of English observed in the previous study.
2. Method
The speech material was a TED talk entitled “Try something new for thirty days” (Cutts, 2011) delivered by a male speaker of American English in a clear and engaging manner (361 words, t=2’ 25”). The speech material was selected because it is a clear speech covering a non-technical topic. Korean learners of English might have difficulties in understanding a conver-sational (reduced) speech or a speech with technical topics (e.g., lectures, political speeches), which might influence their perceptual judgments of prominence. For these reasons, a clear speech with a non-technical topic was considered as the ideal speech material for perception experiments.
In perception experiments, thirty-five native English speakers (23 females and 12 males, mean age 24.3) and thirty native Korean speakers (26 females and 4 males, mean age 20.6) were asked to select words perceived as prominent while listening to the speech online (Rapid Prosody Transcription; Cole et al., 2010). The native English speakers were undergraduate or graduate students at a midwestern university in the U.S. The native Korean speakers were undergraduate students majoring in English at a university in Seoul in Korea and advanced learners of English (average TEPS scores 820 over 990). Advanced learners of English were considered because they were expected to show few difficulties in understanding the meanings of utterances, which might influence their perceptual judgments of prominence.
In the experiments, prominent words were described as the “words that stand out in the speech stream by virtue of being louder, longer, more extreme in pitch, or more crisply articulated than other words in the same utterance.” Participants were able to listen to the speech twice. They were provided with a transcript of the speech presented without punctuation and capitalization on an online interface (Language Markup and Experimental Design Software; Mahrt, 2013), as shown in Figure 1. Punctuation and capitalization were removed because they indicate the syntactic boundaries of utterances and might influence listeners’ judgments of prominence.
Parts-of-speech were annotated for each word using Penn Treebank P.O.S. tags (Taylor et al., 2003). Adjectives, adverbs, conjunctions, determiners, interjections, modals, nouns, numbers, prepositions, pronouns, and verbs were obtained from the speech. They were categorized into content words (adjectives, adverbs, nouns, numbers, and verbs) versus function words (conjunctions, determiners, interjections, modals, prepositions, and pronouns).
Three acoustic cues, max F0 (Hz), duration (ms), and mean intensity (dB), were obtained for each word using ProsodyPro (Xu, 2013). Mean phone duration was calculated by dividing the entire duration of a word by the number of phones consisting of the word. The three acoustic cues of each word were z-normalized (centered and scaled) using the mean and standard deviation of the words in the entire speech. Max F0, mean phone duration, and mean intensity will be referred to as F0, duration and intensity, respectively.
A generalized linear mixed-effects model was run using the lme4 package (Bates et al., 2015) in R (R Core Team, 2019). The listeners’ binary judgment of perceived prominence (1 for the words perceived as prominent and 0 for the words perceived as non-prominent) was modeled in relation to L1 groups (native English speakers and Korean learners of English), word classes (function words and content words), three acoustic cues (F0, duration, and intensity), the interaction between L1 groups and word classes, the interactions between L1 groups and three acoustic cues, and the interactions between word classes and three acoustic cues. Participants were submitted as random effects. Note that the three acoustic cues and their interactions with L1 groups or word classes were included as control variables to tease apart the effects of the acoustic cues from those of the L1 groups or word classes on the perceived prominence in the statistical analysis.
For further analysis, prominence (p-) scores (Cole et al., 2010) were obtained to visualize the perceived prominence of each word by the two L1 groups. The p-scores were calculated by dividing the sum of prominence responses for a word by the total number of participants in each L1 group. It ranges from 0 to 1, where 1 indicates that all the listeners have rated the word as salient, while 0 means that none of the listeners have done so. The p-scores will be discussed informally only.
It is expected that Korean learners of English judge prominence on content words more frequently than do native English speakers. Korean learners of English, however, would not differ from native English speakers in rating prominence on function words. The reasoning is that in the previous study (Um et al., 2001), Korean learners of English were found to be more likely to produce prominence on content words than native English speakers. The difference in assigning prominence on function words was small between the two groups of speakers. Considering these findings from the previous production study, similar predictions were made in the current perception study.
3. Results
In Section 3.1., the current study examines how word class information is phonetically encoded by the speaker in this speech. In section 3.2., the present study describes the results from the perception experiments on how word class information influences the judgments of prominence by Korean learners of English compared with native English speakers.
In this speech, word class information is found to be weakly associated with phonetic cues. Word classes are presented in relation to F0 of a word in Figure 2, duration in Figure 3, and intensity in Figure 4. Across Figures 2–4, there are overlaps of phonetic values between content words (left boxplot) and function words (right boxplot), although F0 and duration tend to be higher for content words than function words in Figures 2 and 3, respectively. Put differently, in this TED talk, content words are produced with higher pitch and longer duration, but not stronger intensity, than function words, although the differences in pitch and duration are small between content versus function words.
The results from the perception experiments show that Korean learners of English differ from native English speakers in judging prominence in relation to word classes. Figure 5 describes prominence scores by native English speakers (red) and Korean learners of English (blue) for one of the utterances analyzed in this study, “so, I decided to follow in the footsteps of great American philosopher, Morgan Spurlock, and try something new for thirty days.”
In Figure 5, both groups of speakers rate prominence infrequently on function words such as “so,” “I,” “to,” “in,” “the,” “of,” “and,” and “for” in the utterance. However, Korean learners of English tend to judge prominence more frequently than native English speakers on most content words, for example, “decided,” “follow,” “footsteps,” “philosopher,” “try,” “thirty,” and “days.” This suggests that there are differences between the two L1 groups in rating prominence on content words in this speech.
Table 1 presents the results from the generalized linear mixed-effects model, where listeners’ prominence judgment was modeled in relation to L1s, word classes, acoustic cues, and their interactions.
In Table 1, L1 groups are not a significant factor in estimating perceived prominence. This means that the two groups of speakers do not significantly differ in making overall judgments of prominence. Word classes and the three acoustic cues are significant factors in estimating prominence judgment, which suggests that these are important factors for listeners in rating prominence in this speech. There is a significant interaction between L1 groups and word classes in the model. This means that Korean learners of English significantly differ from native English speakers in rating prominence between content versus function words. Also, the interactions between word class information and acoustic cues are significant, suggesting that acoustic cues influence listeners’ prominence judgments differently between content words versus function words. Finally, the interactions between L1 groups and acoustic cues are significant in the model, which means that the L1 groups differ in relying on acoustic cues while making prominence judgments.
Figure 6 is a visualization of the estimated effects of the interaction between L1 groups and word classes on perceived prominence. The x-axis is word classes, and the y-axis is estimated effects of word classes on prominence judgments by native English speakers (left panel) and Korean learners of English (right panel).
In Figure 6, the estimated effects of content words are higher than those of function words for both groups of speakers, but there are greater differences between content words versus function words for Korean learners of English than for native English speakers. This suggests that Korean learners of English are more likely to mark prominence on content words than function words compared with native English speakers, although both L1 groups favor content words as locations of perceived prominence.
Figures 7–9 show the estimated effects of F0, duration, and intensity of a word, respectively, on prominence ratings by native English speakers (left panel) and Korean learners of English (right panel). The x-axis is z-normalized acoustic values, and the y-axis is the estimated effects of acoustic values on perceived prominence.
In Figure 7, the slope is greater for Korean learners of English than for native English speakers, suggesting that Korean learners of English are more likely than native English speakers to rate prominence if the F0 of a word increases. In other words, Korean learners of English tend to be more sensitive to changes in pitch than are native English speakers while judging prominence.
In Figure 8, the slope is greater for native English speakers than Korean learners of English, meaning that native English speakers are more likely to rely on duration than Korean learners of English in judgments of prominence.
In Figure 9, the slope is negative for both L1 groups. If the intensity of a word increases, both L1 groups are less likely to rate prominence. This is surprising because the increase of intensity is expected to be associated with a greater likelihood of perceived prominence. The results seem to be related to the speech style of the speaker. The speaker tends to speak loudly throughout the speech to address a large audience. Instead of speaking more loudly, the speaker softens his voice to draw the attention of the audience. Due to this speech style, both L1 groups tend to rate prominence if the intensity of a word decreases. In Figure 9, the slope is steeper for Korean learners of English than for native English speakers, suggesting that Korean learners of English are more sensitive to the changes in intensity of a word than are native English speakers while rating prominence.
For further analysis, the relation between the parts-of-speech of a word and prominence scores is examined. Figure 10 shows the parts-of-speech of the content words while Figures 11–12 describe those of the function words. The distinction between Figure 11 and Figure 12 is made for display purposes only. Across Figures 10–12, the x-axis is prominence scores obtained from native English speakers (left panel) and Korean learners of English (right panel). The y-axis is the distribution of the data. Color-coded lines represent different parts-of-speech.
In Figure 10, all the content words, especially verbs (purple) and nouns (blue), are skewed on lower prominence scores for native English speakers, while they are more evenly distributed along the prominence scores for Korean learners of English. In other words, Korean learners of English are more likely to mark prominence on content words, especially verbs and nouns, than are native English speakers. Further qualitative examination reveals that nouns and verbs are not necessarily produced in higher F0 than adjectives and adverbs in the speech. In this speech, the speaker talks about his experiences (e.g., the places he visited) for thirty days and uses some expressions, especially nouns and verbs, which are repeated or inferable from the previous expressions in discourse context. Native English speakers were found to consider the givenness of a word in utterances (Um et al., 2001), and they must have rated the repeated or inferable words as non-prominent in the current study, which results in low p-scores in Figure 10. Korean learners of English, however, must have judged repeated or inferable words as prominent, which is reflected in high p-scores in Figure 10, since they were found to assign prominence on every content word regardless of its information status in discourse context (Um et al., 2001).
In Figure 11, determiners (red), modals (green), prepositions (blue), and pronouns (purple) are distributed on lower prominence scores for both L1 groups, confirming that these function words are weakly associated with perceived prominence by both L1 groups.
In Figure 12, interjections (blue) are similarly distributed for both L1 groups. Conjunctions (red) are skewed on lower prominence scores for native English speakers, while they are more evenly distributed along the prominence scores for Korean learners of English. Qualitative examination reveals that conjunctions are not produced in higher F0 than the other parts-of-speech in this speech. Overall, Figures 11–12 show that most function words, except conjunctions, are weakly correlated with perceived prominence by both L1 groups. Korean learners of English rate prominence on conjunctions more frequently than do native English speakers.
4. Discussion
The current study has examined how word class information influences perceptions of prosodic prominence by Korean learners of English compared with native English speakers in a clear and engaging speech style. Both L1 groups were asked to judge prominence while listening to the speech in real time. Parts-of-speech and three acoustic cues, F0, duration, and intensity, were obtained for each word in the speech. In this public speech, the speaker tended to produce content words higher in pitch and longer in duration than function words. The intensity, however, was not different between content words and function words. The results from the perception experiments showed that Korean learners of English were more likely to judge prominence on content words than were native English speakers. Both groups did not differ in rating prominence on function words, except conjunctions. Korean learners of English rated conjunctions as prominent more frequently than did native English speakers.
Perception of prosodic prominence is influenced by expectation-driven and signal-driven factors (Cole et al., 2010). In the current study, word classes and acoustic cues were operationalized as expectation-driven and signal-driven factors, respectively. These factors were found to be significant in judgments of prominence by both L1 groups. Word classes and acoustic cues seemed to contribute to perception of prominence independently, although acoustic cues could moderate word class information or vice versa. One might argue that content words are associated with more enhanced acoustic cues than are function words, and this could have driven the biased judgment of prominence on content words over function words by Korean learners of English. If enhanced acoustic cues led to more frequent judgments of prominence on content words, both L1 groups must benefit from those acoustic cues and show similar patterns of rating prominence. However, this was not the case. Another might argue that Korean learners of English and native English speakers differ in relying on acoustic cues, and this might have led to the different judgments of prominence on some parts-of-speech. Indeed, Korean learners of English were found to weigh F0 and intensity to a greater extent than native English speakers. Korean learners of English were more likely to rate prominence on nouns and verbs among content words and conjunctions among function words, but from our qualitative analyses, these parts-of-speech were not necessarily higher in F0 than other parts-of-speech in this speech. This suggests that Korean learners of English relied on word classes to a greater extent than acoustic cues because they rated prominence on content words, especially nouns and verbs, even in the absence of strong acoustic cues. Native English speakers, however, do not weigh word class information as much as do Korean learners of English since the exclusive association between prominence and content words is not observed among native English speakers.
There are other expectation-driven factors that may influence perception of prominence, for instance, information status (Pierrehumbert & Hirschberg, 1990; Sityaev, 2000), speech rhythm (Buring, 2007; Calhoun, 2007), speech style (Hirschberg, 1993; Im et al., 2018), and emotion (Chodroff & Cole, 2018). Native English speakers were found to consider the aforementioned factors in their perception of prominence. Korean learners of English, however, may have limited access to these factors in their perception of prominence because they are less fluent in English than are native English speakers. While processing word-level information (e.g., parts-of-speech), Korean learners of English might have fewer opportunities for considering utterance-level or discourse-level information (e.g., speech rhythm, discourse meaning) than native English speakers. This is perhaps the reason why Korean learners of English in the current study rate prominence on most, if not all, content words in the speech. This is in alignment with the findings from the previous study (Um et al., 2001) that Korean learners of English produce prominence on content words while paying less attention to rhythms and discourse meanings of the words.
Another speculation about how Korean learners of English have established the one-to-one relation between content words and prominence is that most Korean learners of English in the current study did not have experiences living in English-speaking countries for more than one month and could mostly have been exposed to clear speech (e.g., textbook recordings, news, interviews). In clear speech, words are frequently accented compared with those in conversational speech (Im et al., 2018). Korean learners of English could have explicitly or implicitly learned that prominence is associated with content words, which carry semantic or pragmatic meanings in discourse. Such knowledge could have guided Korean learners of English to make frequent judgments of prominence on content words in this speech, which also happened to be a clear speech. Will Korean learners of English favor content words as the locations of prominence in other speech styles, such as conversational speech? This needs to be examined in a future study, but it is probable that direct mapping between prominence and content words will also be observed in conversational speech. One of the differences between Korean learners of English and native English speakers lies in the relative weight of word class information in perception of prominence, in alignment with the results on production of prominence from the previous study (Um et al., 2001). In learning L2 English prosody, it needs to be underlined to Korean learners of English that the assignment of prominence is subject to the givenness of a word or speech rhythm in utterances, among many other factors, and the frequent assignment of prominence to content words is infelicitous.
5. Conclusion
This study has investigated how Korean learners of English perceive prosodic prominence in relation to word classes in a public speech, compared with native English speakers. Korean learners of English were found to judge prominence on content words, especially nouns and verbs, more frequently than native English speakers. The two groups of speakers, however, did not differ in perceiving prominence on function words, except conjunctions. The current study presents evidence on the link between perceived prominence and content words by Korean learners of English, in alignment with the previous study on production of prominence.