1. Introduction
The present study replicated and extended a previous study on the orthography effect on second language (L2) production (Han et al., 2024). Specifically, Han et al. (2024) investigated whether multiple orthographic length markers in English for the same consonants or vowels might mislead L2 Korean learners to produce the target sound as reflected in their spellings. This study was based on the facts that 1) English and Korean use distinct alphabetic orthographies, for which the inter-orthographic effect may not happen; and 2) Korean has no phonemic length contrast for vowels, while there is some disagreement on the contrastiveness of the consonant length, which can lead to an asymmetry in the grapheme-to-phoneme correspondence between vowels and consonants. The authors found that unlike the native English speakers, Korean learners of English produced the same vowel as short when spelled with one letter and long when spelled with two letters (e.g., /i/ in delete vs. defeat), whereas they did not show such impact of spellings in consonants (e.g., /k/ in acute vs. accuse). Further, the mean long-to-short duration ratios for vowels were shown to be relatively small in Korean speakers as compared to those in the language speakers who have phonemic length contrasts in their native language (L1) such as Italian or Japanese.
The lack of orthography effect for consonants in Korean was explained based on two facts. First, there might be no interorthographic transfer when Korean learners decoded English spellings. Korean and English do not share an orthography, even though both languages use an alphabetic system (Taylor & Taylor, 2014). Second, the length contrast of consonants in Korean is still under debate. The Korean tense consonants have been considered as either geminates of the corresponding lax consonants or a single consonantal category represented by the laryngeal feature [+stiff vocal cords]. The results of Han et al. (2024) are aligned with the latter, the featural view of the tense consonants. Korean learners, who experienced the doubling of the letters in tense consonants, did not associate the two letters in English words with long consonant durations.
The findings of Han et al. (2024), however, may be attributed from the task type employed to evaluate the L2 learners’ production. Han et al. (2024) used a delayed repetition task with a purpose of preventing direct imitation from sensory memory. Specifically, this task consisted of four stages (following Bassetti et al., 2020; Sokolović-Perović et al., 2020). First, participants were presented with a simple English sentence where the target word was underlined (e.g., ‘I have thrown away all my floppy disks’) while hearing the same sentence produced by a native English speaker. In the second stage, participants were asked to count five numbers on the computer screen in reverse order to eliminate traces of their productions from the working memory. Then, they heard a truncated sentence where the target word and the following were removed (e.g., ‘I have thrown away all my’), which was followed by the carrier sentence ‘I say ___ only.’ In the final stage, participants were asked to read the frame sentence with the first missing word in the carrier sentence (e.g., ‘I say floppy only’). These experimental procedures raise the possibility that orthography effects for consonants as well as vowels may appear in the production task which can elicit the orthographic information more directly.
There is evidence that activation of orthography in speech production is task dependent. For example, orthographic inconsistency (e.g., canon, contract, konijn cf. contract, Colbert, cadeau) revealed an inhibitory effect in read-aloud task but not in picture naming, word generation, or associative naming task (i.e., no direct involvement of orthography) (Bi et al., 2009; Roelofs, 2006). This suggests that orthography influences speaking when the orthographic representation is highly relevant during speaking. However, some recent studies showed that orthographic priming effects appear even in picture-naming tasks (Qu & Damian, 2019; Wang et al., 2023). Wang et al. (2023), for example, showed that without written forms, a facilitative orthographic effect was found when picture names shared part of their written forms. Using a blocked cyclic picture-naming paradigm, Mandarin Chinese participants were asked to name a series of pictures repeatedly in orthographically homogeneous or heterogeneous conditions. In the homogeneous condition, the written form was not provided, but the radical of the first character (e.g., 钅) overlapped between the four pictures (e.g., 钉子 ‘nail’ /ding1zi0/, 钱包 ‘wallet’ /qian2bao1/, 锦旗 ‘silk banner’ /jin3qi2/, 钻石 ‘diamond’ /zuan4shi2/). Wang et al. (2023) found that even without presenting the actual spellings, picture-naming latencies were shorter when stimuli were orthographically related than when they were not orthographically related, indicating the influence of orthography in spoken word production. Taken together, it is still unclear whether activation of orthography in spoken word production is task dependent.
To examine the possible effects of overt presentation of the number of letters on the production of English target sound length by Korean learners, the present study used a read-aloud task (Bassetti et al., 2018; Glushko, 1979), because this task was verified in previous studies as a task with clear involvement of orthographic representation during speaking. In the read-aloud task, only the spellings of target words were presented overtly to participants, without any intervening filler task, or any auditory material. Accordingly, participants may focus on the spellings provided in the experiment.
In addition, the present study investigated the role of individual L2 proficiency in the orthography effects, in answering the question of whether orthographic effects on the spoken word production could be updated with the expansion of L2 vocabulary size. The choice of vocabulary size for measuring L2 proficiency was based on the PAM-L2 based hypothesis that a larger L2 vocabulary drives a process of rephonologization for adult L2 learners (Best & Tyler, 2007), and the recent results of research suggesting that the lexicon, rather than general L2 proficiency or syntactic proficiency, is central to phonological development (Bundgaard-Nielsen et al., 2011a, 2011b, 2012; Llompart, 2021). For instance, Bundgaard-Nielsen and colleagues showed that L2 vocabulary size is positively associated with vowel perception and/or production in L2. Overall intelligibility scores for the English vowels showed that Japanese learners of English with larger L2 vocabulary produced more intelligible English vowels than those with smaller vocabularies (Bundgaard-Nielsen et al., 2012). Recently, Daidone & Darcy (2021) investigated multiple factors that may influence the lexical encoding of the L2 Spanish consonantal contrasts by L1 English learners, including perceptual accuracy, phonological short-term memory, inhibitory control, attention control, and L2 vocabulary size. Their results revealed that perceptual accuracy, inhibitory control, and attention control did not influence the lexical encoding of the Spanish target contrasts, and phonological short-term memory was a significant factor only for certain sounds, whereas vocabulary size predicted lexical encoding across all target contrasts. These results suggest that vocabulary size is the most important factor in the accuracy of lexical encoding of L2 contrasts.
In the present study, a multiple-choice L2-English vocabulary size test (The Vocabulary Size Test, Nation & Beglar, 2007) was administered, which was known as offering an estimate of learners’ recognition vocabulary irrespective of which level of vocabulary the learner has focused on in his/her L2 acquisition process (Bundgaard-Nielsen et al., 2011b). Given the criteria by Nation (2006), if 98% coverage of a text is needed for unassisted comprehension, then a vocabulary of 6,000 to 7,000 word-family vocabulary is needed for comprehension of spoken text, and a vocabulary of 8,000 to 9,000, for written text.
2. Method
Forty-five Korean learners of English participated in the task, who had participated in the study of Han et al. (2024). They were all late learners who had started learning English after the age of 7 and had no experience of living in a foreign country more than 6 months. They were not using a dialect with a length contrast (i.e., Gyeongsang Dialect) and had no experience of learning a foreign language that comprised length contrasts in consonants or vowels (e.g., Japanese, Italian). All Korean learners completed The vocabulary size test (VST), revealing that they had an average L2 vocabulary of 7,480 words (range=5,200–10,300, SD=1,113.9). Given the criteria by Nation (2006), most of the participants could take part in normal English conversation. In addition, 9 native English speakers served as controls. The information about the participants’ biographic and language background is presented in Table 1.
The experiment was conducted following the standard ethical guidelines of Konkuk University, and all participants gave their written informed consent.
The stimulus materials (test words) for the spoken word production task were the same as Han et al. (2024), which were mostly obtained from Bassetti (2017), Bassetti et al. (2018, 2020), and Sokolović-Perović et al. (2020), with few other items selected on our own. They were 14 word pairs for vowel and 11 word pairs for consonant productions. Each pair consisted of the same target sounds (both consonant and vowel) which were spelled with a single letter in one word of the pair and with a digraph in the other (e.g., acute vs. accuse for the target consonant /k/). For the consonant production task, the target sounds were /p, k, n/, which were included in the monosyllabic, or disyllabic common English words where the immediately preceding and the following vowels were the same between the two words of the pair for the disyllabic words. The position of the primary stress is the same between the words of the pair. For the vowel production task, the target sounds were /i, u, oʊ/. Unlike the consonant targets, the digraphs for the vowels were either with the same letters (e.g., sees), or with the different letters (e.g., steam). All other conditions were the same as in the consonant targets. In addition, 106 fillers were selected which did not contain words with the double letters for the target sounds used as test words. The exhaustive list of the test words is presented in Appendix I.
Participants were first presented with a printed list of English sentences, each of which contained an underlined target word. For example, the sentence had the target word ‘copy’ as in ‘I kept a copy of the letter.’ The sentences were all taken from the Naver online dictionary, which were not used in Han et al. (2024). When participants found a word whose meaning was unknown to them, they were informed the meaning of the word. On each trial, when each participant sit in front of the computer, the blank screen appeared on the computer screen for 500 ms, which was followed by a fixation point ‘+’ for 500 ms. After that, the sentence with the target word underlined appeared on the screen (I kept a copy of the letter), which was followed by the carrier sentence ‘I say ___ only.’ Participants were asked to read the sentence with the underlined word in it (e.g., ‘I say copy only’) three times, and to press the space bar to proceed to the next trial. The participants’ responses were recorded, using the Tascam HD-P2 solid-state recorder and the microphone (Shure KSM 44). In total, 154 words (48 test words+106 filler words) were presented in four blocks, each of which contained similar, but not exactly the same, numbers of trials. Before the experiment began, there was a short practice session with 6 filler tokens. Within the block, the target words were presented in a randomized order, and between the blocks, the participants were allowed to take a rest as much as they wanted. The task lasted approximately 30 to 35 minutes. After the production task, the word familiarity scores for each test word were obtained, using a 7-point Likert scale (1=unknown, 7=well known). The mean familiarity scores were 6.19 (SD=1.02) for Korean learners. The participants were tested individually in a soundproof booth, which had in it a computer with E-Prime installed. They were paid for their participation.
The target consonants and vowels produced by the participants were acoustically analyzed using PRAAT (Boersma & Weenink, 2016), based on their waveforms and spectrograms. The vowel duration was measured from the onset to the offset of a clear formant pattern, particularly the second formant (F2). If the target vowel was immediately followed by approximants or nasals, the boundary was marked at the point of visible spectral discontinuity. For consonants, durations were measured as the duration of closure for stops (/p, k/), and the interval of formant frequencies for /n/. If there were multiple stop releases, the first one was taken as the release. There were a few cases for pre-aspiration, which were not included in closure duration (Sokolović-Perović et al., 2020). The acoustic analysis was conducted independently by the author and a trained phonetician who was not exposed to the purpose of study. The mean values of these two measurements for each token were used in the analysis. If there were any more than 15% difference in duration measurements between the two phoneticians, they reanalyzed the data together, which were 5.5% of the data.
3. Results
Among the 8,100 tokens with the targets (50 words×3 repetitions×54 participants), 5.8% of the tokens were excluded from the analysis, due to stuttering, no responses, pronunciation errors, or difficulties to segment the target sounds. Among the consonant errors, fricativization of the stop /p/ was the most common ([f] from /p/ in floppy, propose, copper, pepper, weapons, oppose), while vowel errors exhibited various patterns (e.g., del[ɛ]ete; [papi] for pope). For each word pair, a duration ratio was calculated by dividing the duration of the sound spelled with double letters by the duration of the sound spelled with a single letter (e.g., dividing the duration of [p] in floppy by the duration of [p] in copy). This approach may prohibit any confound of different speaking rate or inherent vowel durations. Results for the geometric mean of the CC/C (onsonant) ratio or VV/V (owel) ratio (the duration of the same sound when spelled with two letters (CC or VV) and when spelled with a single letter (C or V) are shown in Table 2.
CC/C | VV/V | |
---|---|---|
Native English speakers | 1.08 (0.19) | 1.01 (0.16) |
Korean learners of English | 1.07 (0.24) | 1.09 (0.27) |
The mean CC/C ratio of the Korean learners of English (1.08) was closer to the mean ratio of the native English speakers (1.07). Based on the result, Korean learners of English appeared to produce the target consonant sounds with very similar duration regardless of the number of letters as the native English speakers did. However, the mean VV/V ratio of the Korean learners of English (1.09) was higher than the ratio of the native English speakers (1.01). Compared to the consonant targets, Korean learners of English seemed to produce the target vowels longer when spelled with double letters than spelled with a single letter.
The observations for consonant and vowel ratios were confirmed by statistical analyses. The ratios were analyzed using a mixed effects linear regression model from the package lmerTest (Kuznetsova et al., 2017) in R (version 4.1.2, R Core Team, 2021). The mean ratio was a dependent variable, and group (Korean learners of English and native English speakers) and target type (consonant and vowel) were fixed effects, including random effects by-subject and by-word pair. An optimal model was built by employing a stepwise method, resulting in ‘ratio~group* type+(1|subject)+(1|word pair)’ as shown in Table 3.
Random effects | |||||
---|---|---|---|---|---|
Variance | SD | ||||
Subject | 0.0000 | 0.0000 | |||
Word pair | 0.0161 | 0.1269 |
Pairwise comparison | |||||
---|---|---|---|---|---|
Estimate | SE | df | t ratio | p-value | |
Consonant: Korean – native | –0.0123 | 0.0192 | 193 | –0.640 | 0.5227 |
Vowel: Korean – native | 0.0715 | 0.0189 | 183 | 3.774 | p<.001*** |
As displayed in Table 3, the mean CC/C ratios were not significantly different between Korean learners of English and native English speakers, whereas the interaction term revealed that performance of the two group speakers was significantly different when it came to vowel type. To look closely into the interaction, a post-hoc analysis from the package emmeans (Lenth et al., 2024) was conducted. The pairwise comparison demonstrated that Korean learners of English and native English speakers showed a significant difference in the mean VV/V ratios, but not in the mean CC/C ratios.
Given the orthography effects observed in English vowel production by Korean learners, the relation between the vowel duration ratios and individual differences in vocabulary size was further examined by utilizing Pearson correlation analysis. The statistical analysis showed that the vowel ratios were not correlated with the vocabulary size scores (coefficient=–0.1476885, t=–0.9792, df=43, p=0.333) as illustrated in Figure 1.
Figure 1 plots the non-significant relationship between individual differences in vocabulary size scores and the orthography effects in Korean learners’ production of the English vowels. If the four quarters of the scatterplot are inspected, the left-hand and right-hand bottom quarters are crowded, suggesting that the participants with a great deal of variation in the vocabulary size scores showed similar range of orthography effects. Namely, some of the participants had relatively high vocabulary scores but demonstrated similar vocabulary effects with those who had lower vocabulary scores. The results in Figure 1 indicate no effect of vocabulary size, indicating that L2 learners with good vocabulary command may not be able to suppress the influence of orthographic information.
4. Discussion
The present study examined whether multiple orthographic length markers for the same phoneme in English mislead Korean learners to produce the phoneme corresponding to the spellings. It was observed in a previous study that Korean learners of English produced the same vowel, but not the same consonant, corresponding to the spelling. This study was revisited to assess whether the use of the task eliciting more instant effects of orthography on L2 spoken word production might exhibit orthography effects for consonants as well as vowels. For this purpose, we used the read-aloud task where spellings of the target words were presented overtly without any intervening filler task and a display of auditory material. Additionally, the present study addressed the question of how the orthographic effects on the speech production would be updated with the expansion of L2 vocabulary size, based on the previous results that a large L2 vocabulary drives a more nativelike phonology of the target languages.
The results of the read-aloud task demonstrated that L1 Korean speakers of L2 English produced vowels, but not consonants, as long or short depending on the number of letters presented in the spelling of the words, whereas the native English speakers did not show such impact of spellings. The same vowel (e.g., [i]) was produced 1.09 times as long when orthographically represented with a digraph (e.g., defeat) than when represented with a single letter (e.g., delete), whereas the Korean learners produced the consonants of similar duration regardless of spellings (e.g., [k] in acute vs. accuse). Comparing the mean long-to-short duration ratios (CC/C and VV/V ratios) in the results of the read-aloud task and those of the delayed repetition task in Han et al. (2024), the same group of participants showed slightly larger ratios in the read-aloud task for both consonants and vowels. The mean long-to-short duration ratios for consonants were 1.04 in the delayed repetition task and 1.07 in the read-aloud task, while the mean long-to-short duration ratios for vowels were 1.08 in the delayed repetition task and 1.09 in the read-aloud task. However and more importantly, the overall patterns in the mean long-to-short duration ratios appeared to be similar to each other. Thus, the results of Han et al. (2024) were replicated in the present study, indicating that the orthography effect shown in the previous work was not attributed to the task type employed to evaluate L2 production. Whether the orthographic input was provided overtly or indirectly, orthographic length markers for the same vowel, but no consonant phonemes, may mislead L2 learners to produce homophonic word pairs as phonological minimal pairs.
The significant differences in the orthography effects between consonants and vowels can be explained by their differences in L1 grapheme-to-phoneme correspondences (See Han et al., 2024 for more detailed explanations). In Korean consonants, the use of double letters is associated with a laryngeal category, not a phonemic length contrast. Namely, the Korean tense consonants are represented by doubling the letters for the corresponding lax consonants [e.g., <ㄷ> and <ㄸ> for /t/ (voiceless alveolar lax) and /t’/ (voiceless alveolar tense), respectively]. The tense and lax consonants were mainly discriminated by the voice onset time (VOT) and fundamental frequency (F0) of the following vowel. Thus, Korean learners who experienced this L1 grapheme-to-phoneme correspondence produced the English target consonants spelled with a single versus double letters with a similar duration. Unlike the consonants, Korean vowels do not use double letters; and all seven monophthongs are represented by separate letters. Considering the fact that all target vowels were tense vowels (/i, u, oʊ/), and Korean learners of English were known to use duration to discriminate tense from lax vowels in English, the Korean learners in the present study were likely to have difficulties in producing the target vowels with a single letter (e.g., lose, skis) as long, similar to the tense vowels, as compared to the vowels with double letters or digraphs (e.g., choose, release).
The present results are in accordance with the previous studies arguing for the automatic activation of orthography during spoken word production (Qu & Damian, 2019; Wang et al., 2023). In particular, Wang et al. (2023) showed that even without presenting the actual orthography, picture-naming latencies were shorter when stimuli were orthographically related (i.e., overlapping radicals) in Mandarin Chinese, a language with opaque mappings between orthography and phonology. Thus, this finding provides evidence for the contribution of orthography to spoken word production even when orthographic information is not directly relevant for production. Thus, it is clear that the orthography-induced L2 spoken word production is not task-dependent.
The present study further showed that individual differences in vocabulary size were not related to the orthographic effects in L2 vowel production as above. Korean learners showed similar vowel ratios irrespective of the vocabulary size. The present results do not align with the proposals by recent work such as Bundgaard-Nielsen et al. (2011a, 2011b, 2012), Daidone & Darcy (2021), and Llompart (2021), all of which posited that vocabulary acquisition could help bootstrap phonological learning in L2. The lack of vocabulary size effect is likely due to the characteristics of L2 learners participated in the present study. Namely, the vocabulary test scores may not be associated with the L2 speech production if the participants are not immersed but typical EFL L2 learners. The vocabulary-tuning model of L2 phonology (PAM-L2) suggested that “improvements in L2 production as well as perception during the early stages of L2 immersion may be positively associated with an expanding L2 vocabulary, because the need to decipher and comprehend L2 speech rapidly guides the learner to tune in to the phonological system of that particular language, rather than continue to perceive L2 speech on the basis of its superficial phonetic similarities to the L1.” If L2 learners had little chance to communicate with native speakers, a large L2 vocabulary size might not be linearly related to better production of the sounds in L2. For this reason, de Ruiter et al. (2022), for example, assessed the Chinese children’s English vocabulary and argued for a culturally appropriate receptive vocabulary test for Chinese learners of English.
Alternatively, nonsignificant effect of vocabulary size on the duration of vowels may be associated with the relatively small long-to-short ratios in vowel duration. The effects of the number of letters on the duration of target vowels were statistically significant but much less pronounced than those in previous studies (Bassetti, 2017; Sokolović-Perović et al., 2020). It may be that if L1 phonological contrast affects L2 production as in Italian or Japanese learners dealt with in aforementioned studies, a strong relationship between individual differences in vocabulary size and the effects of orthography may occur.
In conclusion, the present results reveal that regardless of the task type to evaluate L2 production, effects of the number of letters on the target sound length was shown in the production of vowels, but not consonants, by Korean learners of English. Furthermore, these orthography effects in L2 vowel production were not shown to be modulated by vocabulary size of the learners.