1. Introduction
Second-language acquisition presents a plethora of challenges for learners of the target language, particularly in mastering segmental and prosodic elements of the phonetic and phonological systems. For instance, a multitude of prior studies have demonstrated the incomplete non-native-like proficiency of L2 speakers in the acquisition of English, especially at segmental, prosodic, syllabic, and phonotactic levels (de Jong & Park, 2012; Han, 1996; Jang, 2009; Kabak & Idsardi, 2007; Lee, 2016, 2018, 2022; Shin & Lee, 2016; Masuda & Arai, 2010; Park, 2020a, 2020b; Shin & Iverson, 2011, 2013, 2014; Verdonschot & Masuda, 2020; White & Mattys, 2007). Among these, one of the most notable challenges faced by L2 learners of English is the phenomenon of vowel epenthesis and the misinterpretation of stress patterns, stemming from the typological disparities between the native language and the second language.
As for the vowel epenthesis error, previous studies have posited that one of the underlying causes lies in the divergent phonotactic systems of the two languages. For example, Dupoux et al. (1999) found that Japanese listeners perceived illusory vowels in syllable structures where their native language’s phonotactic constraints prohibited them. Korean L2 learners of English also exhibit phonotactic constraints absent in English, where consonant clusters in either the onset or coda of syllables are prohibited. Consequently, when Korean speakers encounter adjacent consonants, they perceive them as belonging to different syllables, resulting in consonant sequences rather than clusters. In English, this disparity prompts Korean speakers to insert epenthetic vowel [ɨ] to break up consonant clusters. Notably, these speakers do not exhibit epenthesis when encountering adjacent consonant sequences in Korean, as these consonants are belonged to different syllables (Kang, 2002). Additionally, in Korean, only seven consonants, [p], [k], [t], [m], [n], [l], and [ŋ], are permissible in coda position, while other consonants undergo neutralization or elimination during production. When a CVC word is succeeded by a coda consonant (thus making consonant clusters), Korean speakers employ various strategies such as nasalization (e.g., /tam-lon/ → [tamnon] for ‘discourse’), gemination (e.g., /sil-lok/ → [sillok] for ‘chronicle’), and neutralization (e.g., /sus-ča/ → [sutč’a] for ‘number’).
Regarding prosodic attributes, Korean lacks lexical stress or sentence-level prosodic focus, which are present in English, thus presenting another hurdle for Korean L2 learners. Korean L2 speakers encounter challenges in both perceiving and producing English prosody due to the absence of contrastive stress or focus patterns in their first language. Previous studies have investigated origins of these prosodic difficulties, suggesting that they may arise from the lack of variable word-stress or fixed-stress patterns in the first language of L2 speakers (Dupoux et al., 2001, 2008; Peperkamp et al., 2010; but also see Altmann, 2006). For instance, given the absence of contrastive stress in French, Dupoux et al. (2001) conducted a perceptual discrimination task with stress nonword minimal pair with French speakers and comparing their perceptual patterns to those of Spanish speakers, whose L1 features contrastive fixed stress. Their results revealed French speakers’ difficulties in identifying contrastive stress patterns, suggesting that the linguistic characteristics of learners’ native language play a pivotal role in determining the acquisition of prosodic features in a second language.
Nevertheless, these two prevalent errors in second language (L2) acquisition can significantly influence each other, as the occurrence of an epenthetic vowel may also be influenced by the different rhythmic structure of L1 and L2. Ding & Xu (2016) discovered that three types of phonetic correlates – epenthetic vowels, unstressed vowels that are not reduced, and the absence of contrastive stress – could collectively contribute to the syllable-timed impression of Chinese speakers’ L2 English speech. Also, Park (2006) demonstrated that Korean learners of English inserted an epenthetic vowel between oral-nasal stop sequence (e.g., witness [ˈwɪtnəs] → witVness [ˈwɪtənəs]; atmology [ætˈmɒlədʒɪ] → atVmology [ætəˈmɒlədʒɪ]) instead of applying the L1 phonological rule (e.g., stop-nasal assimilation: natmal → [nanmal]). However, this L1 phonology rule application was only observed when the first and second syllables were both stressed (e.g., Batman [ˈbætˌmæn] → [bænmæn]). Thus, it is plausible that epenthetic vowels interact with the stress contrast in the production of L2 speakers whose rhythmic structure is disparate between their first and second language.
Thus, the current study primarily aims to investigate whether L2 speech perception of an epenthetic vowel is correlated with perception of stress contrast and sentence focus, as these two prosodic levels manifest the realization of rhythmic structure of a stress-timed language. Additionally, we also test whether the perception of an epenthetic vowel can be improved as the L2 learners’ proficiency increases. Additionally, we test whether the identification of English vowels is also correlated with the perception of epenthetic vowels. This is done simply to determine whether L2 listeners can identify the existence of epenthetic vowels based on accurate identification of English vowels as well. A detailed methodological description is provided below.
2. Method
A total of 25 Korean learners of English (10 males, 15 females) participated in this study. All of the participants were Korean learners of L2 English who were born and raised in Korea. The average age of the subjects was 21.32 years (SD=1.41) and the age of acquisition was 8.7 years old (SD=2.61) at the time of the experiment. They reported their English proficiency as intermediate to upper-intermediate level. Before beginning the experiment, the participants took a cloze test (Brown, 1980), in which they were asked to fill in the blanks of a paragraph to measure their ability to understand context, vocabulary size, and grammatical ability. The average score on a cloze test was 35.32 (SD=7.55) out of 50 (Brown, 1980), indicating that all participants' proficiency levels ranged from upper-intermediate to advanced.
In this study, four types of perception experiments were conducted. For the oddity experiments (vowel epenthesis oddity test, lexical stress oddity test, sentence focus oddity test), participants were asked to identify the oddball auditory stimulus after listening to three consecutive stimuli among which the oddball was randomly chosen by the computer. In the vowel identification experiment, participants were asked to identify the corresponding vowel in the auditory stimuli. The stimulus type for each experiment is described in its respective section.
The stimuli used in this experiment consisted of one real word (“abduction”) and six compound words (“egg timer,” “garbage truck,” “package tour,” “pig tail,” “milk tea,” “ridge tile”). These words were selected based on a previous phonetic experiment where Korean second language (L2) speakers exhibited the highest frequency of epenthetic vowel production (See Shin & Iverson, 2011, 2014 for more detail). Therefore, the English production tokens of three Korean L2 learners, both with and without epenthetic vowels, were employed for this study.
Each trial consisted of three tokens paired together, with one token in each pair deviating from the others in terms of syllable structure. This deviant token either included or excluded an epenthetic vowel. For instance, in a sequence like “abduction [əbˈdʌkʃən]- abduction [əbˈdʌkʃən] – abuduction [əbuˈdʌkʃən],” the third token with the epenthetic vowel served as the oddball. Conversely, in a sequence such as “abuduction [əbuˈdʌkʃən] – abduction [əbˈdʌkʃən] – abuduction [əbuˈdʌkʃən],” the second token without the epenthetic vowel was considered the oddball. See Figure 1 for an example of a spectrogram for the token 'abuduction' with the epenthetic vowel. Thus, a total of 78 trials (13 sets of three tokens×6 repetitions) were presented during the epenthetic vowel experiment.
The lexical stress stimuli consisted of 68 pairs of English disyllabic stress pairs, which were recorded by one female and one male native speakers of Southern British English. Fifty-eight pairs were sourced from an English dictionary, supplemented by an additional ten pairs of words chosen from Cutler (1986). The speakers were instructed to produce stress in accordance with the grammatical form (verb or noun) of the given words for the stimulus recording (e.g., the noun “compact” [ˈkɑ:mpӕkt] vs. the verb “compact” [kəmˈpӕkt]). After the recordings were completed for twice, a trained phonetician manually inspected each token to select the clearest and most unambiguous stress patterns for use in the perception experiment. Additionally, F0 and duration values of the vocalic portion of the first and second syllable were examined to ensure that these stimuli were lexically contrastive in stress pattern (p<.01) before constructing the lexical stress oddity experiment.
For the sentence focus recognition stimuli, 61 Bamford-Kowal-Bench (BKB) sentences were employed (Bench et al., 1979). Each BKB sentence was accompanied by a question aimed at eliciting stress on either the initial or final words of the target sentence. For instance, a BKB sentence like “The house had nine rooms” would be paired with two questions: “What has nine rooms?” and “What did the house have?”. The first question will prompt the answer with the focus on the first noun phrase, such as "the HOUSE had nine rooms," while the second question will prompt the focus on the second noun phrase, like "the house had NINE ROOMS. These questions and their corresponding answers were presented randomly on a computer monitor, with speakers of Southern British English instructed to only read the answer in response to the question. After the recording sessions, a trained phonetician carefully evaluated each sentence and selected the best BKB sentences that exhibited the most distinctive and clear sentence stress patterns for use in the sentence focus oddity experiment. Subsequently, F0 values and the F0 contour of the contrastive focus position were examined to ensure that the sentence was produced with the intended focus (p<.01).
For the vowel identification test, 14 minimal pairs in the /bVt/ position were utilized based on a previous study (Iverson & Evans, 2009). A Southern British English female speaker produced the following words: beat [i], bit [ɪ], bet [e], Burt [ɜ], bat [ӕ], Bart [ɑ], bot [ɒ], but [ʌ], bought [ɔ], boot [u], bait [eɪ], bite [aɪ], bout [aʊ], and boat [əʊ]. Each word was repeated four times, constructing a total of 56 trials. For each trial, subjects were asked to identify the word they heard from the 14 options displayed on the computer monitor while listening to an auditory stimulus randomly played over the headphone.
For the three types of the oddity tests (vowel epenthesis oddity test, lexical stress oddity test, sentence focus oddity test), the participants were asked to identify which auditory stimulus was different from three stimuli that was played consecutively. On each trial, the participants heard three tokens that were paired together, among which two have the same pattern in terms of syllabic structure (vowel epenthesis oddity test) or prosodic structure (lexical stress oddity test; sentence focus oddity test), and were asked to identify which stimulus was the oddball of the three tokens. On the computer monitor, three possible choices indicating ‘first’, ‘second’, and ‘third’ were presented after each trial was played, and the participants were asked to choose the corresponding button to the stimulus that was different from the other two. The interstimulus interval was 1,000 ms.
For the vowel identification experiment, participants randomly heard one of the 14 English vowel minimal pairs in the /bVt/ position and were asked to identify the corresponding orthography of the auditory stimulus. After each token was played, the 14 English words were displayed on the computer monitor as options for participants to choose from.
Each perception experiment included a short practice session before the main session began, allowing participants to familiarize themselves with the task. Each trial was played only once, and no feedback was provided to the subjects during the experiment. Following the completion of the perception experiment, all participants completed a Cloze test (Brown, 1980) to assess their English proficiency and background questionnaire.
The results were analyzed in logistic mixed regressions using a generalized linear mixed effect model from the lme4 package (Bates et al., 2015) in R (R Core Team, 2021). The models analyzed the accuracy of the vowel epenthesis oddity experiment as the dependent variable (1=correct, 0=incorrect), and the accuracy of lexical stress oddity test results (1=correct, 0=incorrect), sentence focus oddity test (1=correct, 0=incorrect), vowel identification test (1=correct, 0=incorrect), and Cloze test (1–50) as independent variables. Subject and trial were submitted as random effects. Since we were examining the effect of two prosodic levels (word stress & sentence focus) on the perception of epenthetic vowels, the main effect and interaction of these two oddity experiments and the main effect of the vowel identification was examined on the accuracy of the vowel epenthesis oddity test.
3. Results
Binomial logistic analysis revealed significant main effects of Word Stress, Sentence Focus, and Proficiency (p<.01). These findings suggest that the `s is influenced by the recognition of word stress, sentence focus, and performance on the Cloze test. Negative coefficient values for Word Stress and Sentence Focus indicate that participants were less accurate in identifying lexical stress oddity stimuli (mean accuracy 64%, chance level=33%) and sentence focus oddity stimuli (mean accuracy 51%; chance level=33%) compared to epenthesis oddity stimuli (mean accuracy 87%, chance level=33%). Conversely, the positive coefficient value for Proficiency indicates that higher scores on the Cloze test are associated with increased predictability of performance on the epenthesis oddity test (p<.01).
The model also revealed a significant two-way interaction between Word Stress and Sentence Focus (p<.01), indicating that the probability of correctly perceiving epenthetic vowels varied depending on the combination of these two prosodic perception tests. The significant interaction coefficient of Word Stress and Sentence Focus (b=0.43) suggests that the likelihood of predicting epenthetic vowel perception increases when both Word Stress and Sentence Focus are correctly identified. Table 1 presents the results of the logistic regression analysis for the epenthesis oddity test.
Figures 2–4 represent the predicted probabilities of vowel epenthesis oddity test as a function of lexical stress oddity test (Figure 2), sentence stress oddity test (Figure 3), and Cloze test (Figure 4).
4. Discussion
In this study, we investigated whether L2 learners’ perceptual ability to detect epenthetic vowels is correlated with their perceptual sensitivity to prosodic structures such as lexical stress recognition or sentence focus. Given that differences in cross-linguistic rhythmic structures stem from variations in syllabic structures and the presence of stress, the current study aimed to determine if these perceptual abilities are interconnected. Our findings reveal a strong association between the detection of epenthetic vowels and the perception of prosodic stress (both at the word and sentence levels), as supported by the main effects of Word Stress and Sentence Focus.
This association may be attributed to the necessity for L2 learners to discern the relative differences of the vocalic segments between stressed and unstressed syllables of the lexical words. In other words, the necessity of acquiring a different type of rhythmic structure for Korean learners of English might have enhanced their perceptual sensitivity to the vocalic segment of the stressed to unstressed syllables. Thus, with increased correct rate of stress identification, the L2 learners would also be able to detect epenthetic vowel in the coda sequence, as they become more attuned to the acoustic characteristics of vocalic segments necessary for perceiving lexical stress.
Furthermore, the same explanation can be applied to the relationship between vowel epenthesis perception and sentence focus perception. Since the same acoustic cues, such as vowel duration, F0 differences, and intensity, are utilized at both the word and sentence levels to signal lexical stress and sentence focus, L2 learners could become more sensitive to these acoustic cues. This increased sensitivity leads to an improvement in epenthetic vowel perception. Thus, it seems plausible to suggest that the perceptual sensitivity to prosodic focus at both word and sentence levels contributes significantly to the perception of epenthetic vowels, as evidenced by the significant interaction between Word Stress and Sentence Focus.
The current study also found a significant effect of proficiency, indicating that the perceptual sensitivity of the epenthetic vowel increases as learners’ L2 proficiency increases, in support with previous studies (Masuda & Arai, 2010; Park, 2020a, 2020b; Sung, 2014; Verdonschot & Masuda, 2020). For example, Sung (2014) found that higher-level Korean learners of English exhibit a perceptual tendency similar to that of native English listeners than lower-level of Korean learners in a word-likeness judgment task involving nonword stimuli with initial sequences. Similarly, Park (2020a, 2020b) investigated whether English proficiency among Korean L2 learners correlates with their perception and production of epenthetic vowels in word-initial consonant clusters. These findings revealed that while L2 proficiency was associated with epenthesis production, It was linked to the perception of epenthetic vowels in different cluster types: in production, while [voiceless stop+liquid] was related with L2 proficiency, it was related with the perception of an epenthetic vowel in [voiced/voiceless stop+glide] clusters implemented in an AXB discrimination task. Thus, it seems plausible to conclude that, in addition to the types of consonant clusters (see Berent et al., 2007), L2 proficiency affects production and perception of vowel epenthesis. This likely occurs because, as L2 proficiency increases, L2 learners acquire new rhythmic structures and features of L2, thereby becoming less affected by their L1 phonology in their perception. However, further studies are required to better understand the link between vowel epenthesis and phonetic involvement.
Taken together, the current study successfully demonstrates the interconnectedness of perceptual abilities involved in acquiring a new rhythmic structure. Moreover, it highlights that the influence of L1 phonology on the acquisition of L2 rhythmic structure diminishes as L2 proficiency increases.