1. Introduction
Gestural overlap has been hypothesized to exist as an essential aspect of speech (Browman & Goldstein, 1986, 1988, 1989, 1990, 1991, 1992). By hypothesis, constriction in the oral cavity can be functionally classified as consonants and vowels: the consonantal tier is defined by higher stiffness as compared to the vocalic tier with less stiffness (see Saltzman & Munhall (1989) for more detailed descriptions of dynamical parameters), and gestural coordination can occur within a tier (e.g., C-to-C, V-to-V) and across tiers (e.g., C-to-V, V-to-C) (Browman & Goldstein, 1992). In particular, consonantal and vocalic gestures overlap in syllable onset (C-V) or coda (V-C): a consonantal gesture is synchronously coordinated with a vocalic gesture in syllable onset, and sequentially in syllable coda (Browman & Goldstein, 1995; Nam, 2007; Nam et al., 2009, inter alia). Synchronous coordination observed in CV sequences is considered a more stable mode of speech: CV is more frequent cross-linguistically than VC, and the former is acquired earlier than the latter at the developmental stages (Nam et al., 2013; Vihman & Greenlee, 1987). To quote Nam (2007), "… the coupling pattern for onsets results in faster stabilization into steady-state intergestural phasing than for codas..." In his computational simulation and reaction-time study with a working hypothesis that a stop consonant consists of close and release gestures, greater stability was observed in C-V coupling. In one production study, an abrupt transition shift was observed from V-C coordination (anti-phase as a less stable mode of speech production) to C-V coordination (in-phase as a more stable mode of speech production) in a repetitive task involving the production of an /ip/ sequence (Kelso et al., 1986). Furthermore, the stability of gestural overlap differed as a function of the nature of a compound structure: non-lexicalized C#C sequences exhibited less stability in terms of gestural overlap compared to lexicalized C#C sequences (Cho, 2001).
At the level of phonetic execution, physically measured gestures exhibit context-dependent properties (Browman & Goldstein, ms.). In Mooshammer et al. (1995), vocalic effects on horizontal movement of dorsal gestures during acoustic constriction duration were systematically examined in German with two subjects. Combining tense vowels (/i/, /u/, /a/) between V1 and V2 in /bV1gV2/, egg-shaped forward movement of the tongue body was consistently observed in most combinations of vowels; such movement was stimulated by /i/ in V2. However, this was noticeably inhibited by /i/ in V1. Similar results were obtained with a high front lax vowel /I/ in V1 compared with /ʊ/ and /a/ in the context /bV1Cɐ/. In addition, consonantal effects were observed on the dorsal gesture in non-assimilating contexts, where lingual-lingual sequences demonstrated less gestural overlap, compared to labial-lingual sequences (/k(#)t/</p(#)t/) (Kochetov et al., 2007) (cf., similar degrees of gestural overlap (/k(#)t/=/p(#)t/) in Son (2011)). With varying manners of articulation in C2, Son (2011) showed that dorsal stop /k/ in C1 was more overlapped by a coronal stop /t/ than coronal fricative /s/ in C2 (/k(#)t/>/k(#)s/). Likewise, a target of place assimilation in American English is more overlapped by a trigger, as compared to the reverse order (/d#g/>/g#d/) or non-target coronal fricative (/d#g/>/s#g/) (Byrd, 1996).
Varying degrees of inter-consonantal gestural overlap were also attributed to other phonological factors. Different degrees of gestural overlap and variability in American English were also observed depending on different prosodic conditions: less gestural overlap and variability were attested in #CC, compared to CC# or C#C (Byrd, 1996). Depending on the phonological knowledge of native speakers (e.g., that some sequences undergo place assimilation and gestural overlap is greater in assimilating contexts than non-assimilating contexts within an assimilating language (e.g., Korean)), gestural overlap of comparable sequences was also greater in non-assimilating contexts of an assimilating language, compared to a non-assimilating language (e.g., /k(#)t/ in Korean > /k(#)t/ in Russian (Kochetov et al., 2007).
Rate effects on gestural overlap have also been observed with Russian C1C2 sequences: speech rate effects were more robust in high frequency C1C2 sequences compared to low frequency clusters (Pouplier et al., 2017). Increasing speech rate was considered a possible factor triggering gestural reorganization, which in turn resulted in coronal stop deletion for Brazilian Portuguese (e.g., /nd/ → [n] in partindo, ‘leaving’ (Oliveira & Marin, 2005)).
In articulatory phonology (Browman & Goldstein, 1986, 1988, 1989, 1990, 1991, 1992), a gesture is a basic phonological unit of an event taking place in the vocal tract. To quote Browman & Goldstein (1989: 202), "… gestures are units of action that can be identified by observing the coordinated movements of the vocal tract articulators." Articulators are combined in a coordinative way to achieve a task-controlled gesture (Saltzman & Kelso, 1987). This gesture is further hypothesized to be specified for a set of task variables, constriction location (CL) and constriction degree (CD) (Browman & Goldstein, 1986). Using a task-dynamic model, track-variable movement trajectories are generated by applying phasing principles and activation time for employed gestures (Saltzman & Kelso, 1987). Intergestural coordination is represented by a gestural score for linguistically meaningful units such as a word whose y-axis has a set of articulatory tiers relevant for a given word and whose x-axis has information relevant to timing (Browman & Goldstein, 1989).
In order to feed dynamic parameter values applied to gestural scores, articulatory data acquired from kinematic studies has been used. Since human speech data is much more variable than machine-generated speech, measurements of gestural overlap vary across kinematic studies. Possible measurements are tangential velocity signals of a tract variable (e.g., lip aperture (LA), tongue tip constriction degree, tongue body constriction degree), vertical movement of an articulator, and horizontal movement of an articulator. Previous kinematic studies have selected articulator movements of interest and provided analysis of overlap measuring either tangential velocity signals, vertical/horizontal movement signals, or both (Kochetov et al., 2007; Kühnert et al., 2006; Pouplier et al., 2017; Son, 2008, inter alia). On the other hand, there has not been, to the best of our knowledge, a single preferred overlap measurement. Raw onset-to-onset lag values have been used by estimating the temporal interval between movement onset of C1 and movement onset of C2 (Son, 2008). In contrast, raw constriction time lag values between the constriction offset of C1 and the constriction onset of C2 have been employed (Kochetov et al., 2007; Son et al., 2007). In Byrd (1996), a variety of measurements were utilized–raw lag (e.g., constriction time lag, the movement onset lag, the maxima lag) as well as C1 overlap as a percentage and C2 overlap as a percentage. Referring to the constriction interval of C1C2, percentages of overlap have been calculated for the interval between the constriction offset of C1 and the movement onset of C2, as well as the interval between the constriction offset of C1 and constriction onset of C2 (Kühnert et al., 2006). More recently, normalized measurements have been used to evaluate gestural overlap–the normalized interval of C1C2 as well as C1, normalized onset lag (the movement onset time point of C2 standardized by the interval of C1), normalized plateau lag (the relative target achievement time point of C2 standardized by the interval of C1) (Pouplier et al., 2017). Normalized gestural overlap was also employed in Marin & Pouplier (2014) when referring to the constriction interval between the constriction offset of C1 and the movement onset of C2 relative to the overall constriction duration of C1C2.
In this paper, we revisit articulatory data from three Korean speakers who participated in a cross-linguistic study on Seoul-Korean and Russian (Kochetov, 2007). Several different gestural overlap measures in various non-assimilating sequences (/k(#)t/, /k(#)p/, /p(#)t/) are examined to uncover whether similar temporal lags are distributed consistently across different overlap measures. In particular, we examine whether several different measures of gestural overlap indicate any correlations in Korean non-assimilating contexts (/k(#)t/, /k(#)p/, /p(#)t/) as we consider different speech rates (fast vs. comfortable) and morphosyntactic boundaries (within-word vs. across-word boundary). In addition, limiting the scope of analysis to the tongue dorsum gesture in C1 in non-assimilating contexts (/k(#)t and /k(#)p/), we describe spatio-temporal coarticulatory characteristics of the tongue body and tongue tip articulators in terms of horizontal advancing movement as a function of place of articulation in C2, and interpret its implications on physiological limitations between two consecutive lingual gestures (/k(#)t/). Along with this, we examine whether there is greater intergestural stability observed in physically limited lingual-lingual sequences (/k(#)t/) distinct from lingual-labial sequences (/k(#)p/).
Firstly, we examine whether there is any similarity in temporal lags distributed consistently across three overlap measures. Previous literature has reported specific measures of interest (Kochetov et al., 2007; Pouplier et al., 2017; Son, 2008, 2011; Son et al., 2007, among others), but has not systematically compared among different measures (cf.,Byrd, 1996). Son’s (2008) kinematic study used movement onset lag in C1C2, showing more gestural overlap in assimilating contexts (/p(#)k/), compared to non-assimilating contexts (/k(#)p/) with inter-speaker variability. Byrd’s (1996) electropalatography study of heterorganic sequences (e.g., /d#g/, /g#d/, /s#g/, /g#s/, /k#s/, /s#k/) with five speakers of English in Southern/Central California examined various overlap measures, including constriction onset lag in C1C2. Kochetov et al. (2007) used raw constriction plateau lag values between the constriction offset of C1 and the constriction onset of C2. More recently, normalized measures were employed for Russian heterorganic C1C2 sequences (e.g., normalized movement onset lag, normalized plateau lag, etc.) in Pouplier et al. (2017), and to a limited extent for Seoul-Korean non-assimilating heterorganic C1C2 (/k(#)t/, /p(#)t/, /k(#)s/) sequences (e.g., normalized plateau lag values (cf., raw movement onset lag values)) in Son (2011). In this paper, we examine three normalized gestural overlap measures (i.e., normalized movement onset lag, normalized constriction onset lag, and normalized constriction plateau lag) and systematically compare them to uncover whether a similar temporal coordination is attested over different time periods of C1C2 sequences (/k(#)t/, /k(#)p/, /p(#)t/).
Secondly, we further examine temporal organization in a subset of non-assimilating sequences, lingual-lingual sequences (/k(#)t/), as compared to lingual-labial sequences (/k(#)p/). In the analysis of tongue dorsum trajectories during acoustic closure of the German dorsal stop /g/ in the context /bV1gV2/ with two speakers, Mooshammer et al. (1995) consistently observed egg-shaped advancing movement of the tongue body in all possible combinations among /i/, /u/, /a/, except for high front vowel /i/ in V1 (see also Gay, 1977; Mooshammer & Hoole, 1993). In addition, consonantal effects on horizontal displacement of the tongue dorsum (/k/, /g/, /ŋ/) were greater for voiceless velar stop /k/ across the board (/k/>/g/>/ŋ/ for Speaker1; /k/>(/g/=/ŋ/) for Speaker2). Based on Mooshammer et al.’s (1995) findings, we are presently concerned with Korean velar stop /k/ in C1 followed by either a lingual gesture /t/ or a nonlingual gesture /p/ in C2 (/k(#)t/ vs. /k(#)p/) in low central vowel contexts (/a/-to-/a/). As we examined the horizontal position of the tongue dorsum at the constriction onset of the vertical tongue dorsum gesture in C1 with respect to gestural overlap, we will ponder the implications conveyed in terms of physiological constraints (Mooshammer et al., 1995).
Lastly, we examine whether lingual-lingual sequences (/k(#)t/) sharing a physically indiscrete organ exhibit more intergestural stability compared to controls (/k(#)p/) with relatively greater articulatory freedom. Greater stability has been observed in C-V sequences and lexicalized compounds (Cho, 2001; Nam, 2007), which was taken to reflect distinct phonological representations of speakers’ grammatical knowledge. In comparing intergestural variability in the horizontal distance from the tongue dorsum to the tongue tip position measured at the constriction onset in C2, we aim to uncover whether less variability is consistently observed in the two consecutive lingual gestures, which are also used as active articulators.
2. Method
We revisited Kochetov et al.’s (2007) articulatory data for three non-assimilating sequences (/k(#)t/, /k(#)p/, /p(#)t/) from Seoul Korean (see Son et al. (2007) for an elaborate description of the production experiment using electromagnetic midsagittal articulometer (Perkell et al., 1994)). Previously analyzed in Kochetov et al. (2007), the original data set was collected from three subjects (two male (K1 & K3) and one female (K2)) who produced the stimuli with two speech rates (comfortable vs. fast) and two morphosyntactic conditions (across-word vs. within-word). While Kochetov et al. (2007) used the first five repetitions, pooled across subjects, in order to balance out with their Russian EMMA data for a systematic cross-linguistic comparison, we used all tokens collected (89 tokens for K1; 72 tokens for K2; 70 tokens for K3) and ran statistical analysis for each speaker. The carrier phrases were not identical across speakers so as to reduce data collection time (nanɨn __lanɨn malɨl tɨlə poassta (‘I have heard of ___) for K1; neka __lako tɨləssə (‘I have heard it as __’) for K2 and K3). Speakers treated all target words as real words and naturally produced them. The stimuli used for elicitation are listed in (1).
-
Stimuli (mostly reproduced from Kochetov et al. (2007:1362))
We used the function of lp_Findgest (with threshold of 0.2) in MVIEW (Tiede, 2005) for gestural demarcation (e.g., the movement onset, peak velocity of the formation duration, constriction onset, constriction maxima, constriction offset, and movement offset). We demarcated gestural landmarks for the vertical tongue dorsum gesture, vertical tongue tip gesture, and lip aperture gesture. Normalized gestural overlap of C1C2 sequences was estimated and raw lag values of interest between C1 and C2 were divided by the activation duration values of C1. Greater values represent less overlap.
We also estimated corresponding horizontal position of the tongue dorsum lined up with the time point of the constriction offset of C1 (Figure 1(i)) to determine how advanced the tongue dorsum was at the release of the constriction. Greater values represent greater anteriority. In order to calculate the horizontal distance between the tongue dorsum and the tongue tip (raw values of 1.ii subtracted from 1.iii), we estimated corresponding horizontal position of the tongue dorsum and the tongue tip lined up with the time point of the constriction onset of C2 (Figures 1(ii) & 1(iii)). Greater values represent greater distance between TDx and TTx.
Linear models in R (R Development Core Team, 2014) were constructed for each subject. Normalized gestural overlap values (in z-scores) were fitted with the lm function from the lme4 package (Bates et al., 2015). Sequence types ((/kt/, /kp/, /pt/) or (/kt/, /kp/)), boundary types (across-word vs. within-word), and speech rates (comfortable vs. fast) were used as fixed factors. We used Tukey’s Honest Significance Difference tests for post-hoc analysis. The pairs function is used to generate scatter plots and the cor.test function to estimate Spearman’s rank correlation coefficient (rho (ρ)) using a non-parametric measure of rank correlation. Levene’s test for homogeneity of variance, using medians as the center, was used to determine the stability of gestural coordination.
3. Results
In Figure 2, three normalized gestural overlap measures are positively correlated with each other. This indicates that similar degrees of gestural overlap are manifested over three different time periods of C1C2 sequences (/kt/, /kp/, /pt/) (e.g., the movement onset, the constriction onset, and the constriction plateau).
The results for /kt/ and /kp/ sequences from three speakers are shown in Table 1(a) for K1, Table 1(b) for K2, and Table 1(c) for K3. For two speakers (K1 and K3), there is an interaction between Sequence type (/kt/ vs. /kp/) and Boundary type (across-word vs. within-word) (t=−2.51, p<0.05 for K1; t=−3.38, p<0.01 for K3) and a main effect of Sequence type (/kt/ vs. /kp/) (t=2.18, p<0.05 for K1; t=6.61, p<0.0001 for K3).
With the results of post-hoc tests using Tukey HSD, we find that less overlap is observed in /kt/ sequences in the across-word boundary condition for K1 (e.g., Figure 3(a)), and less overlap is consistently present for both boundary conditions for K3 (e.g., Figure 3(b)). For Speaker K2, there is neither interaction between factors nor main effects (p>0.05).
There is a correlation between the horizontal tongue dorsum anteriority (measured as it is lined up with the constriction offset of the tongue dorsum gesture in C1) and the gestural overlap in C1C2 (estimated with normalized constriction plateau lags). The results from two speakers (K1 and K3) indicate that the tongue dorsum gesture in C1 has progressed further in the pre-/t/ context, compared to the pre-/p/ context, and these two speakers also show less gestural overlap in lingual-lingual sequences (/kt/) (the across-word condition for K1 and both boundary conditions for K3) as shown in Figure 4.
That is, the less overlapped C1C2 is (e.g., /k(#)t/>/k(#)p/ in terms of normalized constriction plateau lag), the more advanced the horizontal tongue dorsum position is at the offset of the constriction in the dorsal gesture. This may imply that there is a chain shift: the tongue tip movement in C2 has induced more advanced tongue dorsum movement in C1 (when measured at the tongue dorsum constriction offset in C1), which, in turn, is responsible for less gestural overlap with the tongue tip gesture. This is indirectly supported by the observation where K2 did not differ either in terms of normalized constriction plateau lags or the horizontal position of the tongue dorsum at this particular time point as a function of different sequence types (/k(#)t/=/k(#)p/).
We measured horizontal distance from the tongue dorsum to the tongue tip, and both positional values are acquired by being lined up with the constriction onset of C2 (e.g., the tongue tip gesture in the pre-/t/ context and lip aperture in the pre-/p/ context). At this time point, the tongue tip is more spatially separated from the tongue dorsum in the /k(#)t/ sequences for all speakers as shown in Figure 5 (cf., no difference in gestural overlap for K2). This may be due to the fact that the tongue tip gesture of C2 is extended to the alveolar ridge for the constriction, being an active articulator for the /t/ event. In contrast, the tongue tip articulator is passively moving during the constricting gesture of LA, and it is relatively less extended out. This may also indicate that regardless of whether speakers demonstrate more gestural overlap and more advanced tongue dorsum position in /k(#)t/ (see section 3.2.1), greater distance between the tongue dorsum and the tongue tip is consistently observed in /k(#)t/ sequences across the board. Given this, we infer that two consecutive lingual gestures in the /k(#)t/ sequences are executed in the more anterior area.
Referring to variations in standard deviation values for each sequence type, we observed more stable intergestural coordination (e.g., smaller variation in standard deviation values) in the lingual-lingual gestures (/k(#)t/), compared to the lingual-labial gestures (/k(#)p/) for two speakers (F(1, 46)=9.55, p<0.01 for K2; F(1, 44)=16.74, p<0.0001 for K3). The reason for which Speaker K1 does not exhibit more stability in the lingual-lingual /k(#)t/ sequences is due to the fact that standard deviations in the lingual-labial /k(#)p/ sequences are not as large as those for Speakers K2 and K3.
4. Discussion
In the current study, we confirm that gestural overlap (e.g., /k(#)t/ vs. /k(#)p/ vs. /p(#)t/) can be defined by either one of three arbitrarily chosen time points - the movement onset, the constriction onset, and the constriction plateau. The consistency we observe with several normalized gestural overlap measurements suggests that we should be able to arbitrarily choose one particular measurement, and use it to reliably serve whatever hypothesis one wishes to test.
Limiting the scope of analysis to the dorsal segment in C1 (e.g., /k(#)t/ vs. /k(#)p/) with respect to normalized constriction plateau overlap, speech rate effects are absent across the board, and morphosyntactic boundary effects are observed for two speakers (e.g., K1 and K3). Speaker K1 demonstrates that gestural overlap does not differ between two sequence types in the within-word context (/kt/=/kp/), and shows a longer constriction plateau lag in the across-word context (/k#t/>/k#p/). Speaker K3 consistently exhibits a longer constriction plateau lag in /kt/ as well as in /k(#)t/, compared to their counterparts (e.g., /kp/ and /k(#)p/). That is, less overlap in a lingual-lingual sequence is observed as long as there is any difference in terms of constriction plateau lag values. The results of our current study show inter-speaker variability in line with other articulatory studies: inter-speaker variability has been observed in articulatory studies of speakers with different palate shapes in multiple languages (Bulgarian, German, English (including British English, Scottish English, American English, and Australian English), Norwegian, and Polish (Brunner et al., 2009)) and three monozygotic and two dizygotic twin pairs in German (Weirich, 2010). Note that the results of morphosyntactic boundary effects, however, were not attested in Kochetov et al. (2007), where articulatory data from identical subjects were employed. Some possible reasons we suspect are that i) Kochetov et al. (2007) used data pooled across subjects, ii) in the current study, /maktambe/ (‘mild tobacco’) was used instead of their /aktam/ (‘curse’), and iii) we tested a subset (/k(#)t/ vs. /k(#)p/) instead of their comparisons among three sequence types (/k(#)t/ vs. /k(#)p/ vs. /p(#)t/). As it seems to be beyond the scope of the current study to trace what might have caused the difference in terms of morphosyntactic boundary effects between the two studies, we leave this issue for future study.
With respect to greater reduction in the fast speech rate observed in Kochetov et al. (2007), it should be mentioned that no statistical analysis (e.g., t-tests, analysis of variance, etc.) was carried out providing mere comparisons between fast and comfortable speech rates for three individual speakers. In the current study, we referred to the results of Tukey HSD tests and concluded that there is no speech rate effect. Note that gestural reduction is prone to occur more frequently in fast speech rate (Kirchner, 1998; Son, 2015) and this has been attested in place-assimilating contexts (e.g., /p(#)k/) for Korean (Son, 2008; Son et al., 2007). In contrast, we find that speech rate-dependent reduction is absent with the non-assimilating contexts of Korean: this implies that speech rate-dependent reduction is responsible for phonological processes such as place assimilation, which in turn reflects that gestural reduction is one of factors for deriving this phonological process (Jun, 1995).
In Kochetov et al. (2007), less gestural overlap was observed in back-to-front sequences (/kt/ and /kp/) compared to front-to-back sequences (/pt/) in terms of constriction plateau lag values ((/kt/=/kp/)>/pt/). As a possible explanation, this was attributed to the perceptual recoverability hypothesis (Chitoran et al., 2002), where the audible release cue of C1 in the front-to-back sequences can be recovered at any point. Kochetov et al. (2007) also mentioned physiological constraints stating that two consecutive lingual-lingual gestures such as /kt/ are physiologically limited due to mutual entrenchment and thus less overlapped. Finding no statistical difference between lingual-lingual /k(#)t/ and lingual-labial /k(#)p/ sequences, they concluded that their data did not support the alternative hypothesis. In contrast, we find evidence in the current study that physiological constraints are active for two speakers (K1 and K3) to account for greater normalized constriction plateau lag values in lingual-lingual sequences (/k(#)t/), compared to lingual-labial ones (/k(#)p/). This incompatibility between the two studies may be due to the contribution of Speaker K2 to the data pooled across subjects in Kochetov et al. (2007) to the extent that it might have made the physiological effects void. Based on this, we suggest that physiological constraints be considered active, being speaker-dependent and sensitive to morphosyntactic boundary conditions in the Korean non-assimilating contexts.
Lastly, physiological constraints are further evaluated with regard to the horizontal distance from the tongue dorsum to the tongue tip–it is greater for lingual-lingual /k(#)t/ sequences where both lingual articulators are activated, compared to lingual-labial /k(#)p/ sequences where the tongue tip articulator is passively moving during the activation duration of the upper and lower lip articulators in C2. The intergestural stability measured by Levene’s test for homogeneity of variance indicates that speakers K2 and K3 demonstrate less variability in two consecutive lingual-lingual /k(#)t/ sequences compared to lingual-labial /k(#)p/ sequences. This can be partly, but not fully, attributed to physiological constraints that might have inhibited greater variability in /k(#)t/ sequences. Although speaker K1 does not exactly hold to this pattern as the other two speakers, it is worth noting that there is not greater variability in lingual-lingual /k(#)t/ sequences for this speaker, too.
Greater intergestural stability has been observed in CV sequences and lexical compounds. Synchronous coordination is observed in the syllable onset–it is considered a more stable mode of speech and accomplished faster into steady-state phasing relations (Nam, 2007; Nam et al., 2009). As Cho (2001) applied Levene’s tests for homogeneity of variance to assessing intergestural stability between gestures, less variability was also observed in lexical compounds (e.g., /pek+pal/ ‘white hair’) and within a single morpheme (/pani/ ‘name’) as compared to the non-lexicalized compounds (e.g., /pek+pal/ ‘white foot’) and across a morpheme boundary (e.g., /pan+i/ ‘class + NOM.’). Despite that Cho’s (2001) stimuli might not be balanced in terms of the frequency of occurrence (cf., Lin et al., 2014), his data has supported tighter intergestural coordination for lexical entries before morphological processes. Based upon data we have acquired in the current study, we learn that a tighter intergestural coordinative structure for physiologically entrenched tongue articulators (e.g., the tongue dorsum and the tongue tip) can be established even for clusters with morphosyntactic boundaries since they allow smaller degrees of freedom.