1. Introduction
In processing auditory signals into phonetic categories, listeners are known to be sensitive to acoustic differences across categories but little (or not) so to differences within a category, even though the difference is physically the same along the acoustic dimension. This well-known speech processing mechanism, namely categorical perception (e.g., Liberman et al., 1957; Liberman et al., 1961; Pisoni & Tash, 1974), has emphasized the role of abstract phonetic labels in listeners’ linguistic representation in warping relevant acoustic space. While this characteristic of acoustic space distortion has been viewed by many researchers as listeners’ insensibility to within-category acoustic details masked by abstract phonological labels, other researchers also have demonstrated that fine-grained acoustic details persistently affect listeners’ processing segments and words (Carney et al., 1977; McMurry et al., 2008; Pisoni & Tash, 1974; Schouten et al., 2003). That is, auditory stimuli with ambiguous or less typical acoustic properties delayed listeners’ category decision or word recognition according to experimental studies measuring RTs or eye-movements during phonetic labelling tasks. Moreover, depending on the type of identification tasks, some listeners could be less categorical in assigning category labels in the task suggesting individual differences in phonetic encoding process (e.g., Massaro & Cohen, 1983; Schouten et al., 2003).
As one way to understand those processing delays and associated individual differences, several recent studies proposed that listeners’ utilizations of acoustic cues would modulate perceptual manner of judging speech categories (Kapnoula et al., 2017, Kong & Edwards, 2016). The studies demonstrated that gradient (or less categorical) manner of perception was positively linked to listeners’ greater sensitivities to redundant cues (e.g., f0 in English /d/–/t/contrast) in the identification tasks. Also, listeners with gradient perception tended to have better working memory capacity than others. In this sense, gradient perception indicates listeners’ ability of integrative phonetic encoding.
Inspired by the existing studies on gradient perception by English-speaking monolinguals, the current study examined the manner of category judgments by English-speaking adult learners of L2 Korean to investigate whether the gradient/categorical perception would also be modulated by learners’ utilizations of multiple acoustic cues in L2 and individuals’ general cognitive ability. This question is particularly interesting to be examined with the task of L1 English learners’ perception of L2 Korean stops, since the roles of acoustic cues for laryngeal stop contrasts are defined differently between the languages. In English, voice onset time (VOT) is the primary cue that differentiates voiceless stops (e.g., /t/[th]) and voiced stops (e.g., /d/[t]) although secondary cues such as f0 also differ systematically between voiceless and voiced stops (Abramson & Lisker, 1985; Francis et al., 2008; Gordon et al., 1993; Shultz et al., 2012). By contrast, in Korean, VOT and f0 are equally important in distinguishing aspirated stops (e.g., /th/) from the other two phonation-type categories, lax and tense stops (e.g., /t/, /t’/) (Kong & Lee, 2018; Kong et al., 2011; Lee & Jongman, 2011; Silva, 2006). Given existing evidence that L2 learners gradually learn to use cue weighting strategies of their second language (e.g., Escudero & Boersma, 2004; Escudero et al., 2009; Flege et al., 1997; Llanos et al., 2013), we wonder whether English-speaking L2 learners’ individual variability of gradient perception, if any, might be associated with acoustic cues of VOT and f0 in a different way from the pattern observed in monolingual English-speakers’ perception of native stops.
The specific questions addressed in the present study were the following. One, would L2 learners show individual differences in their manner of judging L2 stop categories? If so, how would those individual differences in L2 perception be related to those in L1 perception? Two, would L2 learners’ variability in gradient or categorical responses be explained by perceptual sensitivity to VOT and f0? How would re-weighted roles of f0 and VOT for Korean stop contrasts affect learners’ manner of judging the categories? The last question concerns an interaction between cognitive resources and L2 speech processing. Would L2 learners of better cognitive control differentiate auditory signals in a fine-grained scale (gradient perception) or map them into abstract labels (categorical perception)? We tested L2 learners’ working memory capacity to explore this question.
In addressing the questions, we take L2 proficiency into consideration, given the well-known effect of L2 proficiency in categorical perception and relative cue-weighting strategies (Flege et al., 1997; Kong & Yoon, 2013; Piske et al., 2001). Experimental findings of the current study would broaden our understanding of how encoding processing of L2 categories interacts with language-specific uses of sub-phonemic acoustic details and with general cognitive ability.
2. Methods
The audio stimuli were adopted from Kong & Edwards (2016), two sets of synthesized CV syllables based on male speakers’ natural productions. For the English stimuli, one token of /da/ and one of /ta/ (from dot and tot) productions by an adult male native English speaker were selected to process acoustic dimensions of VOT and f0. VOT values were manipulated by excising a portion of the burst release/aspiration from a /ta/ sample and pasting it before the voicing onset of the /da/ token: Six log-scale values from 9 ms to 59 ms of VOT were used (9 ms, 13 ms, 19 ms, 28 ms, 40 ms, and 59 ms). In each VOT step, the f0 dimension was manipulated by replacing the original f0 value during the vowel of /da/ with one of five f0 values (98 Hz, 106 ms, 114 Hz, 122 Hz, and 130 Hz). This procedure yielded 30 different CV syllables (six VOT steps× five f0 steps). The same procedure was implemented with a /tha/ and a /ta/ token (from ‘탓’ [that] and ‘닷’ [tat]) produced by a male Seoul Korean speaker. By pasting a portion of the burst release/aspiration from a /tha/ sample to a /ta/ token, we created another set of 30 different syllables (six VOT×five f0 combinations) for the L2 Korean session of the speech perception task.
The participating learners were given two kinds of tasks: Speech perception tasks (visual analogue scaling (VAS) (Massaro & Cohen, 1983), and forced-choice identification (2/3AFC)) and executive function task (Digit N-Back) for working memory measure.
Carrying out the speech perception task of VAS, we exclusively employed task procedures of Kong & Edwards (2016). Listeners were presented a horizontal arrow with ‘d’ and ‘t’ labels (in the English session; ‘ㄷ’ /t/ and ‘ㅌ’ /th/ in the Korean session) at two ends and were instructed to judge perceptual /d/- or /t/-likeness (in English; /t/- or /th/-likeness in L2 Korean) by clicking locations close or distant from the relevant labels. How precisely the learners tend to differentiate auditory tokens would indicate the individual listeners’ manner of category judgement. The other speech perception task was an identification task where listeners choose either ‘d’ or ‘t’ labels displayed on the monitor upon hearing auditory tokens in the L1 English session (2AFC). In the Korean session, listeners were to select one of the three consonant labels (‘ㄸ’ /t’/, ‘ㄷ’ /t/ and ‘ㅌ’ /th/: 3AFC). The learners’ categorical responses are used to estimate how listeners utilized acoustic information of VOT and f0 in the stimulus tokens for phonetic labeling.
The VAS and AFC tasks were counter-balanced across the participants. In each language session of both tasks, a set of 30 different CV syllables was repeated three times in a random order, yielding 90 trials per listener. The speech perception tasks were administered in E-Prime2 so that responses were automatically recorded.
Besides the speech perception tasks, we administered Digit N-Back task where learners were asked to answer whether the digit in the current slide matches the digit in the one slide before (One-Back), one in the two slides before (Two-Back), and one in the three slides before (Three-Back). Performance of accuracy and speed of the N-Back task blocks is known to indicate one’s working memory capacity, i.e., ability to hold and process information. Counts of accurate answers and response times were automatically obtained on E-Prime2 platform.
22 English native speakers (F:10, M:12 in their 20s–40s) participated in the experiment. All were living in Seoul or Gyeonggido at the time of recruitment and testing. Since the length of their residence in Korea widely varied from 18 months to 13 years, we evaluated their L2 proficiency by giving them a short test composed of question items from the TOPIK (Test of Proficiency in Korean, a test administered by the National Institute of International Education in Korea). The test was designed to assess listening comprehension, vocabulary comprehension, and reading comprehension. We used the composite scores from this test as a numeric index of L2 proficiency. Participants were monetarily compensated for their participation.
Responses from the five tasks were processed to numerate individual-level perceptual (VAS, and 2/3AFC) or cognitive (WM) characteristics. In VAS, distributions of click locations along the arrow (i.e., distance from the two given labels) were quantified in terms of polarity of the distribution envelope (Kong & Edwards, 2016). Polynomial regression models were made for each individual listener, of which quadratic coefficients served to exhibit spread or steepness of the click increases toward the edges of arrow. In this assessment, smaller quadratic coefficients indicate more gradient judgment of the phonetic labelling.
To estimate listeners’ sensitivities to the two acoustic cues (VOT and f0) for identification (2/3AFC tasks), we built the mixed-effects logistic regression models separately for the L1 English and the L2 Korean tasks. In the models, the dependent variables were the binary responses of “ da” and “ ta” for the English session and the binary responses of “ 다” /ta/ and “ 타” /tha/ (lax-aspirated) excluding the responses of “ 따” /t’/. The fixed effect variables were VOT and f0 values of each stimuli as continuous variables, whose units were scaled to make beta coefficients comparable between two acoustic dimensions. The model included random intercepts and slopes of VOT and f0 varying at the listener level. We used individual random slope coefficients as numeric index of individual listeners’ sensitivity to VOT and f0.
Individual learners’ performance in Digit N-Back task was evaluated by accuracy counts and speed of those correct responses. Accuracy and response times (RTs) were calculated in each sub-block of the tasks: i.e., One-/Two-/Three-Back blocks – we later discarded values from One-Back due to its ceiling effect. Trials whose response time ranges out of two standard deviations of each participant’ performance were excluded from the calculation. Averaged RTs of correct responses per listener in each block were used as WM scores for relevant analyses below.
Finally, all the numeric indices of perceptual and cognitive characteristics entered (partial-) correlation tests to figure out its relationship with perceptual manner of L2 category decision. L2 proficiency (i.e., test scores of Korean language) was considered as a control variable to partial out its potential effect on L2 sound category processing. We applied a relaxed criterion in interpreting the strength of correlation between variables to accommodate the relatively small sample size, treating correlation coefficients greater than .3 as a meaningful tendency to be noted.
3. Results
Figure 1 presents click distributions of the English and Korean VAS tasks, which were collapsed across all listeners. When visually inspected, click distributions of the L2 Korean task were overall flatter than those of the L1 English task, suggesting more gradient responses of category judgement in L2 perception. It is noted that the clicks in the L2 Korean task form a peak at the center of the distribution, which might reflect learners’ uncertainty in labelling L2 stop sounds. Regression curves overlaid on top of histograms confirm the observation by displaying a curve with steeper rises (i.e., categorical judgement) towards both ends of the distribution in the English task than in the Korean one.
The scatter-plot at the rightmost panel of Figure 1 shows individual listeners’ quadratic coefficients of English VAS responses as a function of those of L2 Korean VAS responses. It is observed that most data-points were clustered below a diagonal line, indicating many individual English listeners were more gradient in perceiving L2 Korean stops than native English stops. A correlation test between perceptual gradiency of English and Korean (quadratic coefficients of click histograms from VAS) returned a modestly meaningful coefficient with a positive sign (r=.32, p=.16), suggesting that perceptual gradiency in L1 is somewhat (but not strongly) suggestive of L2 gradiency.
Confirming this (consistent but) loose correlation between L1 and L2 perception in terms of gradient judgments, individual listeners’ histograms in Figure 2 present examples of individual learners whose gradiency in L1 English perception is similar to or different from degrees of gradiency in L2 Korean perception. For one, a listener ID220 was categorical and yet a listener ID216 was gradient in perceiving Korean stop although both were similarly gradient in the English stop perception.
Figure 3 [top panels] presents logistic curves of VOT and f0 variables estimated from a logistic mixed-effects regression model of the English task (see parameter estimations summarized in Table 1). In the English session, the group-averaged curves for VOT and f0 (black lines) rise as VOT and f0 values increase, indicating that perception of the voiceless, and phonetically aspirated stop category is associated with greater VOT and higher f0. Relatively speaking, the group-averaged VOT curve switches from /d/ to /t/ more abruptly than the f0 curve: βVOT= 5.91, βf0=1.30, which is also confirmed in the stimulus matrix (center panel), where darker cells (/t/ response) were distributed at the right side (longer VOTs). This pattern is in line with previous findings that VOT is the primary perceptual cue for monolingual English listeners (Abramson & Lisker, 1985; Francis et al., 2008; Gordon et al., 1993; Shultz et al., 2012). Looking into individual-level variations of VOT and f0 sensitivities (rightmost panel), a correlation test between by-listener coefficients of VOT and f0 slopes yielded a pattern that individual listeners’ VOT uses were inversely correlated with those of f0 although the magnitude of effect was modest (r=–.33, p=.12). English-speaking listeners who depended on VOT more than others tended to rely on f0 less in perceiving L1 stops.
Parameters | Estimate | Std.err | p-value | |
---|---|---|---|---|
English | VOT | 5.86 | 0.70 | <.001 |
(/d/–/t/) | f0 | 1.38 | 0.14 | <.001 |
Korean | VOT | 3.21 | 0.52 | <.001 |
(/t/–/th/) | f0 | 1.11 | 0.21 | <.001 |
Figure 3 [bottom panels] presents regression curves of VOT and f0 based on model outputs of /t/–/th/ responses. Parameter estimations are summarized at Table 1. The curves based on a regression pattern of Korean /t/–/th/ responses were similar to those of L1 English /d/–/t/ responses, in that group-averaged curves rise as VOT or f0 values increase, and a switch from /t/ to /th/ is more abrupt along a VOT dimension than along a f0 dimension (βVOT=3.18, βf0=1.06). One noticeably different pattern between the two models is that coefficient differences between VOT and f0 were smaller in L2 Korean than in L1 English, suggesting that listeners’ reliance on VOT was less dominant in L2 perception and their dependency on f0 was relatively greater for a /t/–/th/ contrast. This slightly adjusted cue-weighting pattern in L2 stop perception is visualized in the stimulus matrix (middle panel) where darker cells are located at the top-right corner (longer VOT and higher f0) and lighter cells at the right bottom corner (shorter VOT and lower f0). Although English-speaking L2 learners’ reliance on f0 was not as enhanced as Korean native speakers’ primary use of f0, this re-weighting pattern suggests that f0 is an important acoustic dimension (a targeted primary cue) that English-speaking L2 learners deliberately attend to enhance its use.
We examined a relationship between L2 test scores and manner of perceiving L1 and L2 stops. Considering a modest amount of correlation between L1 and L2 perceptual gradiency, we employed partial correlation tests to exclude effect of L1 perception in estimating the relation of L2 perception with L2 proficiency and vice versa. (ppcor package in R, Kim, 2012). Results revealed no significant relationship between L2 test scores and perceptual gradiency in L1 or L2 (with L1: r=–.18, p=.43; with L2: r=–.20, p=.39).
We conducted further correlation tests to examine whether L2 proficiency is related to learners’ utilizations of VOT and f0 in L1 English and L2 Korean stop perception. Table 2 summarizes test outputs. The relationship of L2 proficiency with L2 acoustic sensitivity was consistent in that higher L2 test scores were negatively correlated with VOT coefficients (r=–.58, p<.01) and positively with f0 coefficients (r=.44, p<.05). These results suggest that, as learners become proficient in L2, they become better able to attend to a primary information (f0) for a stop contrast.
Interestingly, L2 proficiency was also negatively correlated with L1 VOT use, although a strength of correlation was relatively weak. This finding that proficient L2 learners tended to utilize VOT (a primary cue in English) less (r=–.37) may suggest an effect of L2 learning on the perception of native stops (e.g., Lev-Ari & Peperkamp, 2013).
With L2 proficiency considered as a control variable, partial correlation tests showed no meaningful correlation coefficient between gradient responses in L1 task and L1 f0 utilizations (r=–12). This is not congruent with observations in earlier studies that f0 utilizations was responsible for English monolinguals’ gradient responses in the native stop perception (Kapnoula et al., 2017; Kong & Edwards, 2016). Instead, it was L1 VOT coefficients that were meaningfully correlated with L1 gradiency scores, suggesting that L2 learners who were sensitive to VOT, a primary cue in English /d/–/t/ contrast, tended to be categorical in labelling the sounds.
Along the same lines, L2 gradiency scores were correlated with L2 f0 coefficients despite a weak strength of correlation (r=.29). That is, greater sensitivity to L2 f0 (a primary cue to a Korean /t/–/th/ pair) was associated with categorical manner of judgment. We consider this direction of relationship in L2 analogous to the relationship between categorical responses and VOT utilizations in L1, in that greater uses of a primary cue were responsible for categorical judgements of speech signals. In a general sense, the current finding conforms to existing evidence that listeners’ utilizations of acoustic information modulate encoding process of speech categories accounting for individual variability in categorical perception (Kapnoula et al., 2017; Kong & Edwards, 2016).
Finally, partial-correlation tests were performed to examine the relationship between working memory capacity (WM: RTs and accuracy from N-Back tasks) and (1) perceptual gradiency scores (from VAS tasks) and (2) acoustic sensitivities to VOT and f0 in L1 and L2 stop identification tasks. L2 proficiency was also considered as a control variable in the tests to assess the correlation between cognitive control and speech processing independently of one’s control of L2 language.
L1 Gradiency (Var 1) | ||
---|---|---|
Var 2 | Estimate | p-value |
L1 VOT coeff. (/d/–/t/) | .37 | .111 |
L1 f0 coeff. (/d/–/t/) | –.12 | .59 |
L2 Gradiency (Var 1) | ||
---|---|---|
L2 VOT coeff. (/t/–/th/) | –.22 | .33 |
L2 f0 coeff. (/t/–/th/) | .29 | .21 |
As summarized in Table 4 [top panels], faster RTs of N-Back task were significantly correlated with categorical judgements of the stops both in L1 and L2 perception, indicating an association of better WM with categorical perception. This direction of relationship is contrastive with the association of better WM with gradient responses observed in English-speaking monolinguals’ native stop perception (Kapnoula et al., 2017). This discrepancy might reflect different nature of speech processing mechanism between L1 and L2 perception, as L2 learners’ phonetic encoding would be more oriented to abstract level of speech (i.e., phonetic category decisions) due to established labels in L1 (e.g., Best & Tyler, 2007). Accordingly, in L2 speech perception, cognitive resources might operate in a different way from L1 perception by being linked to higher level of speech processing.
L1 Gradiency | L2 Gradiency | ||||
---|---|---|---|---|---|
RT | 2-Back | –0.037 | 0.878 | –0.351 | 0.129 |
3-Back | –0.328 | 0.158 | –0.327 | 0.16 | |
ACC | 2-Back | 0.091 | 0.702 | –0.079 | 0.739 |
3-Back | 0.125 | 0.599 | –0.208 | 0.379 |
L1 VOT coefficients | L2 VOT coefficients | ||||
---|---|---|---|---|---|
RT | 2-Back | 0.178 | 0.452 | 0.126 | 0.595 |
3-Back | 0.119 | 0.618 | –0.107 | 0.654 | |
ACC | 2-Back | 0.306 | 0.189 | 0.061 | 0.798 |
3-Back | 0.378 | 0.1 | –0.105 | 0.659 |
L1 f0 coefficients | L2 f0 coefficients | ||||
---|---|---|---|---|---|
RT | 2-Back | –0.16 | 0.499 | –0.167 | 0.481 |
3-Back | 0.006 | 0.979 | –0.356 | 0.124 | |
ACC | 2-Back | 0.021 | 0.931 | 0.331 | 0.154 |
3-Back | –0.381 | 0.097 | 0.419 | 0.066 |
WM was also correlated with L1 and L2 acoustic coefficients. With L1 coefficients, higher accuracy of WM was associated with smaller L1 f0 coefficients and greater L1 VOT coefficients, indicating that listeners with better WM attended to VOT, a primary cue in L1, more than others and utilized f0, a redundant cue, less than others in identifying the English stops /d/–/t/. The pattern suggests that cognitive resources are devoted to perceptual attention to a primacy cue.
Mirroring the pattern related to L1 acoustic coefficients, higher accuracy and faster RTs in the WM task were modestly correlated with greater use of f0 in L2 Korean perception. This echoes a tendency that better WM was associated with greater reliance on a primary cue, which is f0 in Korean /t/–/th/ contrast. Again, this seems to support that bilinguals’ or L2 learners’ cognitive resources operated to maximize utilizations of primary acoustic information that is defined for a specific language and phonological contrast.
4. Discussion
The current study explored English speaking L2 learners’ individual differences in their manner of judging stop laryngeal contrasts in L1 Korean in order to investigate the roles of acoustic cue uses and cognitive controls as sources of individual variability in speech information processing. Experimental evidence showed that some individual learners of L2 Korean were gradient in judging the stop categories /t/ and /th/, while others were strictly categorical between the two consonant labels in category judgments. This pattern of individual variabilities in L2 perception is consistent with existing findings in English monolinguals’ native perception (e.g., Kapnoula et al., 2017; Kong & Edwards, 2016, from which we employed methodological framework in a large scale). Importantly, those individual differences in gradient or categorical judgments of L2 stops were accounted for by learners’ perceptual sensitivity to an acoustic cue primarily important for the stop contrasts, and they interacted with listeners’ working memory (WM) control, whose relationship was assessed independently of learners’ L2 proficiency.
In L1 stop perception, the relationship of gradient/categorical responses with a primary cue to the /d/–/t/ contrast (i.e., VOT) was meaningful, although the relation with a redundant cue (f0) was not significant. In a broad sense, this finding conforms to the proposal based on monolingual English speakers’ L1 stop perception that acoustic sensitivity is responsible for listeners’ manner of perception (Kapnoula et al., 2017; Kong & Edwards, 2016). In its details, however, the current finding slightly differs from evidence from monolingual listeners in that it was learners’ acoustic sensitivity to a primary cue but not to a secondary cue that was modulated the manner of judging stop categories: Listeners with a greater sensitivity to a primary cue tended to be categorical in labelling the stimuli. We may argue for different nature of modulating acoustic cues for category judgments between monolinguals and bilinguals (adult L2 learners), where L2 learners modulate utilizations of primary acoustic information for the sake of perceptual differentiations of between-category identities.
The same account might be applied to explain the meaningful relationship between f0 utilizations and judgment manner of L2 Korean stops: Individual listeners’ categorical responses were associated with a greater sensitivity to f0, a primary cue in perceiving Korean stops /t/–/th/. Analogous to how VOT modulated the listeners’ categorical perception in English stop perception, f0 must have affected the L2 learners’ categorial manner of labelling L2 Korean stops playing as a primary cue to the lax-aspirated stop contrast in Korean (e.g., Lee et al., 2013; Schertz et al., 2015). Although the current participants of English-speaking L2 learners poorly realized this relative cue-weighting between f0 and VOT in Korean stop perception, the increased role of f0 as their target primary cue functioned to affect the manner of categorical perception.
While a follow-up study of a larger sample size is called for to draw a robust conclusion in terms of categorical/gradient responses in L2 perception, it is still an important finding that L2 learners’ manner of phonetic encoding is modulated by a primary cue to yield categorical responses of the stops. Existing models of L2 speech perception (e.g., PAM-L2, Best & Tylor, 2007) appears closely relevant to an understanding of the trend that adult L2 learners’ signal processing had to do with a primary cue, but less so with a secondary cue. That is, L2 learning adults process speech inputs to map onto existing categories, difficulties of which depend on phonemic status and acoustic similarity of target phonetic categories in question between L1 and L2 languages. Weighing between within- and between- category processing, adult L2 learners might be oriented more toward differentiations of category identities, emphasizing a role of primary cue, than toward enrichment of category representation with within-category acoustic details. The current findings show that bilinguals (including adult L2 learners) utilize linguistic resources to operate higher level information relatively more than lower level information, which is probably different from how monolinguals utilized redundant cues to yield gradient responses.
We could not find strong evidence that L2 proficiency modulated L2 learners’ perceptual manner of judging stop categories. L2 proficiency, however, turned out to be an important factor in explaining individual differences in cue weighting strategies in L2, which is consistent with existing findings (e.g., Flege et al., 1997; Piske et al., 2001). Consistent trends were that, as the listeners with higher proficiency were relatively more sensitive to f0 (a primary information in Korean /t/–/th/ contrast) and less so to VOT in differentiating the Korean stops. This direction of relationship suggests that more proficient learners are better able to adjust to the language-specific acoustic characteristics of the L2 phonetic categories in terms of relative cue-weighting. Kong & Yoon (2013) reported a similar finding that L1 Korean learners of L2 English showed differential degrees of dependency on VOT (a primary cue in English) between high-proficient and low-proficient learners in the L2 English stops identification. More generally, current results support that learning L2 sounds concerns fine-grained tuning of acoustic cue-weightings as examined in other languages than English and Korean (e.g., Escudero & Boersma, 2004; Escudero et al., 2009; Holt & Lotto, 2006).
Relationship of gradient/categorical responses with EF control was meaningful both in L1 and L2 speech perception. Listeners with greater WM tended to judge the stop sounds in a categorical manner than others. Differently from observations in English-speaking monolingual listeners’ native perception (Kapnoula et al., 2017), L2 learners’ WM functioned to facilitate categorical responses instead of gradient responses. As non-native listeners utilized cognitive resources less on processing low-level acoustic information (Mattys et al., 2010), current evidence also shows cognitive emphasis on a phonetic encoding of auditory signal. Together with L2 adult learners’ manner of operating linguistic resources at an abstract level of category encoding, present findings related to cognitive resources serve to generalize that bilingual listeners’ encoding process of speech input is different from that of monolinguals as a role of cognitive efforts is defined differently.
5. Conclusion
To conclude, the present study showed that individual differences in categorical/gradient responses existed in adult L2 learners’ perception of stop consonants. Not only that, those variations were also systematic in terms of learners’ perceptual utilizations of acoustic information and their general cognitive ability. It was characteristic of adult L2 learners’ stop perception that greater utilizations of primary cue and better cognitive function facilitated categorical perception, supporting that effective L2 learners process speech in a top-down manner.