Phonetics/음성학

ToBI and beyond: Phonetic intonation of Seoul Korean ani in Korean Intonation Corpus (KICo)*

Ji-eun Kim 1 , **
Author Information & Copyright
1Department of Korean Language and Literature, Duksung Women’s University, Seoul, Korea
**Corresponding author : smart173@duksung.ac.kr

© Copyright 2024 Korean Society of Speech Sciences. This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Jan 31, 2024; Revised: Mar 05, 2024; Accepted: Mar 06, 2024

Published Online: Mar 31, 2024

Abstract

This study investigated the variation in the intonation of Seoul Korean interjection ani across different meanings (“no” and “really?”) and speech levels (Intimate and Polite) using data from Korean Intonation Corpus (KICo). The investigation was conducted in two stages. First, IP-final tones in the dataset were categorized according to the K-ToBI convention (Jun, 2000). While significant relationships were observed between the meaning of ani and its IP-final tones, substantial overlap between groups was notable. Second, the F0 characteristics of the final syllable of ani were analyzed to elucidate the apparent many-to-many relationships between intonation and meaning/speech level. Results indicated that these seemingly overlapping relationships could be significantly distinguished. Overall, this study advocates for a deeper analysis of phonetic intonation beyond ToBI-based categorical labels. By examining the F0 characteristics of the IP-final syllable, previously unclear connections between meaning/speech level and intonation become more comprehensible. Although ToBI remains a valuable tool and framework for studying intonation, it is imperative to explore beyond these categories to grasp the “distinctiveness” of intonation, thereby enriching our understanding of prosody.

Keywords: intonation; ani; aniyo; phonetic intonation; Korean Tones and Break Indices (K-ToBI); Korean Intonation Corpus (KICo); speech corpus

1. Introduction

The TOnes and Break Indices (ToBI) convention serves as an annotation system designed for labeling prosody at the phonology level (Jun, 2022). Initially developed between 1991 and 1994 for annotating databases of spoken Mainstream American English (Silverman et al., 1992), it has since been adapted for various languages, including Seoul Korean (Jun, 2000). Grounded in the Auto-segmental Metrical (AM) Theory (e.g. Beckman & Pierrehumbert, 1986), ToBI conventions utilize binary level tones, High (H) and Low (L), to annotate pitch accents and edge tones (Beckman et al., 2005). With the aim of comprehending the phonological properties of intonation, ToBI tone labels are devised to represent underlying tonal targets that are systematically meaningful across native speakers of the language variety (Jun, 2022).

K-ToBI (Beckman & Jun, 1996; Jun, 2000),1 the ToBI convention tailored for Seoul Korean, stands out due to its inclusion of the phonetic tone tier in its third version (Jun, 2000). The tier delineates “non-distinctive yet categorical tonal targets” (Jun, 2022:159), interpreted as ‘not yet’ distinctive based on current knowledge. The inclusion of the phonetic tone tier stemmed from the aspiration of the developer “to determine what factors and conditions trigger such variation, and thus improve the current model of Korean intonational phonology” (Jun, 2022:159-160). It signifies an open invitation for future research to refine our understanding of Korean intonation by building upon the existing K-ToBI framework.

However, it is notable that certain studies may excessively confine their analyses within the bounds of the K-ToBI convention. Previous research sometimes assumed utterances unequivocally belong to the same intonational category without further investigation solely based on identical Intonation Phrase (IP)-final tones in K-ToBI. This simplistic perspective results in an excess of many-to-many relationships between intonation and (para)linguistic factors (e.g. Park, 2012). Furthermore, the guidelines for labeling IP-final tones in Jun (2000) are occasionally perceived as definitive (e.g. Jeong, 2002; Kim et al., 2008), whereas they should be regarded as a framework for further exploration of Korean intonation.

This study aims to build upon the foundation laid by K-ToBI to refine our understanding of Korean intonation utilizing Korean Intonation Corpus (KICo), a large speech corpus comprising 240 hours of recordings from forty native Seoul Korean speakers and forty L2 learners of Korean. Given that the labeling process is ongoing, we extracted data pertaining to the interjection ani from KICo for analysis in this study. In addition to labeling our database according to the K-ToBI convention, we conducted acoustic analyses on the final syllable aligned with the IP-final tone.2 Specifically, we scrutinized the F0 characteristics to determine whether there are any meaningful acoustic cues not readily discernible through the binary tones alone.

We selected the interjection ani for three reasons. First, it is a basic word with high frequency in spoken Korean, rendering our findings pertinent for practical applications such as speech synthesis, speech recognition, and Korean language education. Second, it serves dual pragmatic purposes - as both a negation and a turn-initial marker - without undergoing any segmental alternations (Kim, 2016; Yang, 2002). This emphasizes the weight of its prosody in effectively conveying meanings, which could potentially lead to clear and informative outcomes. Third, its usage across different speech levels enables us to explore the impact of politeness on Korean intonation, aligning with studies on the global acoustic profile of Korean polite speech (Grawunder & Winter, 2010; Winter & Grawunder, 2012). Consequently, our main research questions regarding ani in Seoul Korean are twofold:

  • Do the IP-final tones of ani differ by meaning?

  • Do the IP-final tones of ani differ by speech level?

2. Data and Method

2.1. Data
2.1.1. Interjection ani in Korean

In Korean, the interjection ani is predominantly recognized as a negative response to Yes/No interrogatives. For instance, it is commonly used to indicate “No” to questions like “Are you sleeping?” or “Is there any problem?.” Beyond its primary role, ani can also function as a discourse marker, often as a turn-initial particle in ani-prefaced sentences (Kim, 2016; Yang, 2002), similar to ‘wait’ in phrases like “Wait, really?” and “Wait, how can that be?,” providing a moment of pause or interjection.

Given that the Korean Standard Language Dictionary (KSLD) serves as the primary reference of KICo3 and includes two pragmatic meanings of ani, both are incorporated into KICo. The definition of ani in the KSLD are as follows, translated into English by the first author:

  • Meaning 1: A word used to negatively answer a question posed by a junior or an equal.

  • Meaning 2: A word used to express surprise, awe, or puzzlement.

Meaning 1 aligns with the primary function of ani as a negation, while Meaning 2 also reflects the secondary function as a discourse marker to initiate conversational turns. This is exemplified in the KSLD, where ani is used to initiate a dialogue or turn:

  • ani, geureol suga inni? (Wait, can that really be?)

  • ani, beolsseo dochakhaenni? (Wait, have you arrived already?)

  • ani, ige eotteoke doen irinya. (Wait, what in the world happened here?)

Meanwhile, the description “asked by a junior or an equal” in Meaning 1 relates to the speech level of ani, which is categorized as Intimate. The speech levels in Korean were traditionally divided into six levels based on politeness: Deferential (-(su)pnita), Polite ((-a/e)yo), Blunt (-(s)o), Familiar (-ney), Intimate (-a/e), and Plain ((-[nu]n)ta), ranging from the most polite to the least (Sohn, 2001). While the speech level of ani is Intimate, suitable for addressing juniors or equals, it can be flexibly inflected to various speech levels. Notably, when ani is used as a discourse marker, it is not exclusively employed in its Intimate form and is not utilized in other speech levels, even when followed by a Polite sentence (Yang, 2002:114)4. Accordingly, this study analyzed only the Intimate level for Meaning 2, while analyzing both Intimate and Polite levels for Meaning 1.

2.1.2. ani data in Korean Intonation Corpus (KICo)

KICo is a speech corpus specifically designed to study the intonation of Korean, responding to the need for guidelines to teach intonation to L2 learners (Kim, 2023). KICo comprises two main sections: KICo-N (natives) and KICo-L (learners). KICo-N contains recordings from native Seoul Korean speakers, while KICo-L includes recordings from Mongolian and Vietnamese learners of Korean.

The ani and aniyo data analyzed in this study were sourced from KICo-N. The participants were native Seoul Korean speakers, all born and raised in either Seoul or Gyeonggi Province (the suburban area surrounding Seoul), and without any speaking or hearing impairments. The speech data were collected using the online experimental platform Finding Five (FindingFive Team, 2023). All recordings were supervised by a research assistant in a soundproof booth to ensure quality and consistency. The setup included an audio interface (AG03, Yamaha) connected to a laptop and a head-mounted microphone (SM10A, Shure), with a sampling rate of 48k samples per second.

The scripts used to elicit ani speech included three variations: Intimate ani with Meaning 1 (henceforth, ‘Ani_1’), Intimate ani with Meaning 2 (henceforth, ‘Ani_2’), and Polite ani with Meaning 1 (henceforth, ‘Aniyo_1’ or ‘Aniyo’). Polite ani with Meaning 2 was excluded, as ani with Meaning 2 is primarily used in its Intimate form. Each item was represented by eleven short scripts, resulting in a total of thirty-three scripts for each speaker (N=1,320=33 scripts×40 speakers). The sample Romanized scripts for ani, along with their English translations are provided as follows:5

  • (1) Sample scripts for eliciting Ani_1

    • a. cal cinae? - ani. (Are you doing alright? - No.)

    • b. siheom cal bwasseo? - ani. (Did you do well on the test? - No.)

    • c. ne chinguya? - ani. (Are they your friend? - No.)

  • (2) Sample scripts for eliciting Ani_2

    • a. ani, cinccaro? (Wait, really?)

    • b. ani, nollaunde? (Wait, isn't it surprising?)

    • c. ani, cincca singihada. (Wait, that's really amazing.)

  • (3) Sample scripts for eliciting Aniyo_1

    • a. cal cinaeseyo? - aniyo. (Are you doing alright? - No.)

    • b. cam cal cayo? - aniyo. (Do you sleep well? - No.)

    • c. chuwo? - aniyo. (Is it cold? - No.)

2.2. Annotation

Our speech data were annotated using Praat (Boersma & Weenink, 2024), with TextGrid tiers added for organization. Each TextGrid file comprises four tiers: three interval tiers and a point tier. Figure 1 presents a screenshot of a fully annotated TextGrid file:

pss-16-1-1-g1
Figure 1. Screenshot of a recorded sample from our dataset, accompanied by a fully annotated TextGrid tiers.
Download Original Figure

As depicted in Figure 1, the interval tiers were designated for marking the entire utterance duration (‘utt’), phoneme intervals (‘phon’), and the final syllable interval (‘finSyl’). The point tier was assigned for transcribing the IP-final tones based on the K-ToBI convention (Jun, 2000) (‘int’). The first and second tiers were attributed semi-automatically through WebMAUS (Kisler et al., 2017; Schiel, 1999), as the Korean Phonetic Aligner (Yoon, 2021; Yoon & Kang, 2013), the sole forced alignment tool for Korean, was not operational due to technical issues6. Since WebMAUS lacks a specific Korean option, alignment adjustments were performed manually by research assistants and further refined by the first author.

2.3. Acoustic Analysis

Five F0 parameters were selected to examine the acoustic variability of the final syllable, aiming to discern if IP-final tones, despite appearing identical, exhibit distinct acoustic realizations. These parameters include the valley, peak, range, and slope of F0. We focused on F0 characteristics because F0 curves form the fundamental basis for transcribing IP-final tones. Figure 2 shows the F0 curves of an Ani_1 with H% and an Aniyo_1 with LH%:

pss-16-1-1-g2
Figure 2. F0 curves of Ani_1(upper panel) and Aniyo_1 (lower panel), exemplifying the F0 measures of Ani_1 (above) and Aniyo_1 (below).
Download Original Figure

All measurements were conducted in Praat by extracting an F0 contour (Pitch object) using Praat’s “To Pitch…” function. The measurements primarily relied on temporal segmental information of the IP-final syllable, as depicted in Figure 2. Additionally, contour tones were further subdivided to measure local changes. For instance, LH% was divided into L and H to analyze each component separately. Figure 3 illustrates the measurements of the tone segments (L and H in this case) for Aniyo_1 with LH%:

pss-16-1-1-g3
Figure 3. Local measurements of the L (upper panel) and H (lower panel) of LH% tones.
Download Original Figure

The values of F0 valley and peak were obtained using functions in Praat: ‘Get minimum…’ for the valley, and ‘Get maximum…’ for the peak. The F0 range was calculated by subtracting F0 valley from F0 peak (i.e., F0 range=F0 peak–F0 valley). The F0 slope was calculated by dividing F0 range by the duration of the segment (i.e., F0 slope=F0 range/duration). The duration itself was obtained by subtracting the start point from the end point of the segment (i.e., duration=end point–start point).

2.4. Statistical Analysis
2.4.1. Chi-square test of independence

Chi-square tests of independence were performed at a significance level of 95% using R (R Core Team, 2020). To address the issue of sparse data in contingency tables, cells with fewer than five cases were aggregated under ‘Others’ by using the pooling method (Bresnahan & Shapiro, 1966).

The analyses were performed to evaluate whether the changes in the IP-final tone, as transcribed according to the K-ToBI convention, were independent of the meaning (Ani_1 vs. Ani_2) or the speech level (Ani_1 vs. Aniyo_1). More specifically, we tested two hypotheses below:

  • (a) IP-final tones vary by the meaning of ani

  • (b) IP-final tones vary by the speech level of ani

2.4.2. Welch’s t-test

Welch’s t-tests, which assume unequal variances between two samples, were conducted at a significance level of 95% using R (R Core Team, 2020). These tests were applied to determine if the F0 characteristics of the final syllable change significantly between (1) Ani_1 and Ani_2, and (2) Ani_1 and Aniyo_1, despite having the same IP-final tone.

Considering only IP-final tones that accounted for more than 5% of each group, the comparison was narrowed down to Ani_1 H% vs. Ani_2 H%, Ani_1 H% vs. Aniyo_1 H%, and Ani_1 LH% vs. Aniyo_1 LH%. More specifically, we tested the following three hypotheses:

  • (a) The F0 characteristics of H% in Ani_1 differ from those in Ani_2 (meaning).

  • (b) The F0 characteristics of H% in Ani_1 differ from those in Aniyo_1 (speech level).

  • (c) The F0 characteristics of LH% in Ani_1 differ from those in Aniyo_1 (speech level).

3. Results

3.1. Intonational Analysis

Chi-square tests of independence were performed to examine the relationship between IP-final tones, transcribed by the K-ToBI convention, and two variables: the meaning of ani (Ani_1 vs. Ani_2) and the speech level of ani (Ani_1 vs. Aniyo_1). The results are summarized in Table 1:7

Table 1. Results of chi-square tests of independence (N=880)
Meaning
(Ani_1 vs. Ani_2)
Speech level
(Ani_1 vs. Aniyo_1)
DoF 4 2
p-value <0.001*** 0.499
χ2-value 543.09 1.391

*** p<0.001.

Download Excel Table

First, concerning the relationship between IP-final tones and the meaning of ani, the test revealed a significant association (χ2(4)=543.09, p<0.001). This indicates that speakers significantly alter the IP-final tones based on the meaning of ani. This finding is consistent with the visual distribution of IP-final tones illustrated in Figure 4:

pss-16-1-1-g4
Figure 4. Bar charts of the distribution of IP-final tones realized in Ani_1 and Ani_2, with infrequent tones grouped under ‘Others’.
Download Original Figure

As shown in (1) of Figure 4, the dominant IP-final tone for Ani_1 is H% (82.46%), followed by LH% (14.5%). The ‘Others’ category for Ani_1 includes L% (2.90%) and HLH% (0.14%)8. Conversely, as described in (2), the dominant IP-final tone of Ani_2 is L% (63.41%), with H% (21.59%) and LHL% (10.45%) following. In this case, ‘Others’ includes LH% (2.27%), HL% (1.82%), HLH% (0.23%) and HLHL% (0.23%).

In contrast, the association between IP-final tones and the speech level of ani (Ani_1 vs. Aniyo_1) is not significant (χ2(2)=1.391, p=0.499). This suggests that the speech level does not significantly influence the variation in IP-final tones for ani. This finding is also in line with Figure 5:

pss-16-1-1-g5
Figure 5. Bar charts of the distribution of IP-final tones realized in Ani_1 and Aniyo_1, with infrequent tones grouped under ‘Others’.
Download Original Figure

As shown in (2) of Figure 5, the most frequent IP-final tone for Aniyo_1 in our data is H% (81.36%), followed by LH% (15.91%). The ‘Others’ category includes L% (2.5%) and HL% (0.23%). Notably, the distribution pattern of Aniyo_1 closely mirrors that of Ani_1 shown in (1), with H% being the most common IP-final tone and LH% the second most common.

3.2. Acoustic Analysis

A series of Welch’s t-tests were performed to explore any significant differences in F0 characteristics of the final syllable when Ani_1 and Ani_2, as well as Ani_1 and Aniyo_1, shared the same IP-final tone. Focusing on the IP-final tones that accounted for more than 5% of occurrences in each group, three comparisons were analyzed: Ani_1 H% vs. Ani_2 H%, Ani_1 H% vs. Aniyo_1 H%, and Ani_1 LH% vs. Aniyo_1 LH%. Our acoustic analyses suggest that both the meaning and speech level of ani significantly influence the F0 characteristics of the IP-final syllable.

First, Table 2 presents the results of acoustic analyses on Ani_1 with H% vs. Ani_2 with H%:9

Table 2. Results of Welch’s t-tests of all acoustic measures for the final syllable of Ani_1 and Ani_2, where the IP-final tone was realized as H%
Ani_1 H% Ani_2 H% DoF t-value p-value Cohen’s d
M SD M SD
Valley 154.01 47.21 149.21 45.31 129.45 2.406 0.018* 0.103
Peak 224.05 71.49 240.89 85.13 111.82 –1.97 0.051 0.226
Range 70.04 36.1 91.69 52.14 104.96 –5.474 <0.001*** 0.226
Slope 362.59 217.97 476.87 271.88 110.32 –3.646 <0.001*** 0.497

* p<0.05,

*** p<0.001.

Download Excel Table

The results reveal significant variation in the production of H% tones in ani according to their meaning. Notably, Ani_1 exhibits the most pronounced effect size among all other F0 measures, displaying a significantly sharper F0 slope compared to Ani_2 [t(110.32)=–3.646, p<0.001, Cohen’s d=0.497]. This indicates a steeper rise in the F0 curve at the final syllable with an H% tone in Ani_2. Also, Ani_1 demonstrates a wider F0 range than Ani_2, though with a smaller effect size [t(104.96)=–5.474, p<0.001, Cohen’s d=0.226], suggesting greater F0 variability in the H% of Ani_2. This difference in the F0 range primarily stems from Ani_1 having a significantly lower F0 valley than Ani_2, while the F0 peak doesn’t show any significant variation between the two.

Second, Table 3 displays the results of the acoustic analyses on Ani_1 and Aniyo_1, both realized with the H% tones:

Table 3. Results of Welch’s t-tests of all acoustic measures for the final syllable of Ani_1 and Aniyo_1, where the IP-final tone was realized as H%
Ani_1 H% Aniyo_1 H% DoF t-value p-value Cohen’s d
M SD M SD
Valley 154.01 47.21 168.81 56.21 718.5 –2.555 0.011* 0.19
Peak 224.05 71.49 221.77 70.28 719.98 0.534 0.593 0.04
Range 70.04 36.1 52.96 29.96 714.19 1.779 0.076 0.132
Slope 362.59 271.88 379.68 215.89 717.84 –0.595 0.551 0.044

* p<0.051.

Download Excel Table

The findings indicate that H% tones in ani also exhibit variation by speech level, albeit significant differences are limited. Notably, the F0 valley was the sole variable to significantly differ across speech levels, demonstrating a small effect size [t(718.5)=–2.555, p=0.011, Cohen’s d=0.19], with Ani_1 (M=154.01, SD=47.21) exhibiting a lower valley than Aniyo_1 (M=168.81, SD=56.21). Although not statistically significant, there was a noticeable trend in the F0 range, where Ani_1 H% exhibited a marginally wider range (M=70.04, SD=36.10) compared to Aniyo_1 H% (M=52.96, SD=29.96).

Third, Table 4 shows the results of the acoustic analyses on the final syllable of Ani_1 and Aniyo_1, both realized with the LH% tones:

Table 4. Results of Welch’s t-tests of all acoustic measures for the final syllable of Ani_1 and Aniyo_1, where the IP-final tone was realized as LH%
LH% Ani_1 LH% Aniyo_1 LH% DoF t-value p-value Cohen’s d
M SD M SD
Valley 122.45 42.46 132.92 43.72 123.04 1.19 0.236 0.21
Peak 197.9 72.77 202.85 60.58 119.05 –0.682 0.497 0.121
Range 75.45 48.18 69.92 29.4 122.84 –1.84 0.068 0.325
Slope 279.56 192.54 360.22 180.86 121.12 –2.901 0.004** 0.514

** p<0.01.

Download Excel Table

Table 4 reveals that LH% tones in Ani_1 exhibit variation by speech level, particularly notable in the F0 slope with a large effect size [t(121.12)=–2.901, p=0.004, Cohen’s d=0.514]. This suggests a steeper slope for Aniyo_1 LH% (M=360.22, SD=180.86) compared to Ani_1 LH% (M=279.56, SD=192.54). However, Table 4 does not specify whether the difference is driven by the L component or the H component of the contour tone. Thus, additional acoustic analyses were conducted on the individual tone components (L, H) of the LH% tones in Ani_1 and Aniyo_1, with the results summarized in Tables 5 and 6:

Table 5. Results of Welch’s t-tests of all acoustic measures for the L component of Ani_1 and Aniyo_1, where the IP-final tone was realized as LH%
L Ani_1 LH% Aniyo_1 LH% DoF t-value p-value Cohen's d
M SD M SD
Valley 122.65 41.89 130.46 44.92 127.07 –1.025 0.307 0.179
Peak 129.63 43.78 136.07 45.63 126.35 –0.82 0.414 0.144
Range 6.98 4.46 5.61 3.757 115.92 1.884 0.062 0.336
Slope 60.7 40.28 95.77 68.474 114.19 –3.617 <0.001*** 0.613

*** p<0.001.

Download Excel Table
Table 6. Results of Welch’s t-tests of all acoustic measures for the H component of Ani_1 and Aniyo_1, where the IP-final tone was realized as LH%
H Ani_1 LH% Aniyo_1 LH% DoF t-value p-value Cohen's d
M SD M SD
Valley 126.38 45.5 138.27 45.59 125.05 –1.484 0.141 0.261
Peak 213.9 80 213.48 67.47 116 0.032 0.974 0.006
Range 87.53 54.77 75.21 32.99 93.644 1.521 0.132 0.278
Slope 572.5 303.16 555.28 267.62 118.81 0.34 0.734 0.061
Download Excel Table

Tables 5 and 6 indicate that the L component significantly influences the F0 slope in LH% tones for Ani_1 and Aniyo_1. Specifically, the F0 slope of the L component in Aniyo_1 is notably steeper than in Ani_1, with a large effect size [t(114.19)=–3.617, p<0.001, Cohen’s d=0.613]. Additionally, a trend was observed in the F0 range of the L component, with Ani_1 displaying a marginally wider range (M=6.98, SD=4.46) than Aniyo_1 H% (M=5.61, SD=3.757), although this did not reach the significance threshold [t(115.92)=1.884, p=0.062]. Conversely, the F0 characteristics of the H component did not differ significantly between Ani_1 and Aniyo_1.

4. Discussion

In this section, we discuss the implications of our results to answer the two central research questions posed in the introduction section: Do the IP-final tones of ani differ by meaning? Do the IP-final tones of ani differ by speech level? By examining these questions, we advocate for a deeper analysis of the F0 curve, extending beyond the categorical labels assigned by the ToBI conventions, to enrich our understanding of intonation.

4.1. Does the Intonation of ani Differ by Meaning?

In this study, we investigated whether the intonation of ani differs depending on its meaning. We hypothesized that ani utilizes prosodic variation to convey its two distinct meanings: as a negation (Ani_1) and as a discourse marker (Ani_2). While previous research by Yang (2002) and Kim (2016) explored the extended functions of ani, they did not closely examine the prosodic distinctions between these two meanings.

Based on Chi-square tests of independence, our ToBI-based analyses uncovered a significant relationship between the meaning of ani and its IP-final tones. Specifically, we found that H% was more prevalent in Ani_1, while L% was more common in Ani_2. However, relying solely on a categorical analysis according to K-ToBI (Jun, 2000) might have led us to underestimating the distinctiveness of intonation in Ani_2, where H% accounted for 22% of the data. This could create the impression of substantial overlap in the usage of H% between Ani_1 and Ani_2.

However, our acoustic analyses painted a different picture. They revealed that, despite sharing the label of H%, the tones in Ani_1 and Ani_2 are phonetically distinct. Specifically, Ani_1 exhibited a significantly steeper F0 slope, a wider F0 range, and a lower F0 valley compared to Ani_2, though the F0 peak did not significantly vary by meaning.

These findings clarify that the subtle yet significant differences in intonation, which might be overlooked in a purely categorical approach, become evident when analyzing the actual phonetic details. Thus, while ToBI serves as a valuable scaffold for intonation studies, further investigation into acoustic properties may be necessary for a comprehensive understanding of Korean intonation.

4.2. Does the Intonation of ani Differ by Speech Level?

In this study, we hypothesized whether the intonation of ani differs based on the speech level, specifically between Intimate and Polite forms. Previous research has suggested that Korean speakers adjust their overall F0 characteristics, such as average, range, and standard deviations, according to the politeness or formality level of speech (Grawunder & Winter, 2010; Winter & Grawunder, 2012)10. Still, these studies did not directly investigate the relationship between intonation and politeness.

Our results indicate that, according to the categorization by K-ToBI, the IP-final tones of ani do not significantly vary based on the speech level. This observation is primarily due to a substantial overlap in IP-final tones: H% tones constitute over 80% of both Ani_1 and Aniyo_1 data, while LH% tones are also present in 14% of Ani_1 and 16% of Aniyo_1 data11.

However, our acoustic analyses show notable variations in the F0 characteristics, even within categorically identical IP-final tones. Although both Ani_1 and Ani_2 include the H% tone, as well as the LH% tone, they exhibit distinct F0 characteristics at the IP-final syllable. Specifically, Ani_1 demonstrated a significantly lower F0 valley compared to Aniyo_1 for the H% tone, albeit with a small effect size. Regarding the LH% tone, Aniyo_1 exhibited a significantly steeper F0 slope than Ani_1, with this difference significantly influenced by the F0 slope of the L component.

These findings suggest that significant intonational differences between speech levels may exist, which might not have been evident in a categorical analysis but become apparent through further phonetic analyses. Therefore, while ToBI provides a useful framework, additional analyses on the phonetic realization of tones may contribute to a more complete understanding of Korean intonation.

5. General Discussion and Conclusion

This study investigated the variation in the intonation of Seoul Korean ani across different meanings and speech levels using data from KICo, employing two main approaches:

  • First, we categorically labeled the IP-final tones in the dataset following the K-ToBI convention (Jun, 2000). Chi-square tests of independence were then used to explore whether these tones were significantly associated with either meaning (Ani_1 vs. Ani_2) or speech level (Ani_1 vs. Aniyo_1). While we observed considerable overlap between groups, significant relationships emerged between the meaning of ani and its IP-final tones.

  • Second, we scrutinized the F0 characteristics of the final syllable of ani to determine if the observed many-to-many relationships between intonation and meaning/speech level could be clarified. The results of Welch’s t-tests indicated that these apparent overlaps could indeed be significantly distinguished.

In summary, our study highlights the necessity of further analysis on phonetic intonation beyond ToBI-based categorical labels. By examining the F0 characteristics of the IP-final syllable, we gained a clearer understanding of the previously ambiguous connections between meaning/speech level and intonation. ToBI serves as a valuable tool and framework for studying intonation, as it facilitates segmentation, which is not only practical but necessary for analyzing intonation from a linguistic perspective (Bolinger, 1949). However, it’s essential to explore beyond these categorical labels to gain a deeper understanding of the “distinctiveness” of intonation, thus enhancing our understanding of prosody.

Acknowledgements

We would like to thank our research assistants, Ayeon Sim and Min-yeong Oh, from the Department of Korean Language and Literature at Duksung Women’s University, for diligently assisting with the establishment of the KICo data since 2022. Additionally, we extend our gratitude to the students from the Department of Korean Language and Literature at Duksung Women’s University for their help in recruiting participants for the KICo-N data. Lastly, but not least, we are grateful to all participants who were willing to invest their valuable time to travel to the recording site and participate in the sessions. This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2022S1A5A8050526).

Notes

* This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2022S1A5A8050526).

1 In the K-ToBI framework (Jun, 2000), the intonational structure of Seoul Korean is composed of Accentual Phrase (AP) and Intonation Phrase (IP). An AP, being smaller than an IP but larger than a prosodic word, is characterized by a tonal sequence without final lengthening. An IP consists of one or more APs and is distinguished by a boundary tone (%) along with final lengthening. IP-final tones convey a range of linguistic and paralinguistic information, including pragmatic meaning, sentence structure, as well as attitudes and emotions.

2 Our decision to concentrate on the IP-final tones, particularly concerning ani, stems from their crucial role in conveying a wide range of linguistic and paralinguistic information in Seoul Korean, including pragmatic meanings, sentence structures, and the speaker's attitudes and emotions (Jun, 2000). In contrast, the AP-initial tone, systematically determined by the glottal status of the phrase-initial segment, irrespective of the lexical information, does not show any discernible difference between ani and aniyo, rendering a comparative analysis based on AP-initial tones less informative. Specifically, ani is characterized by having only an AP-initial tone beyond the IP-final tone, making it challenging to discern any meaningful distinctions between ani and aniyo based solely on the AP-initial tone alone. Our transcription of the AP tone pattern further revealed that the AP-initial tone for both ani and aniyo remains consistently low (L), whereas the AP-medial tone for aniyo is uniformly high (H).

3 To maintain consistency across the entire dataset of KICo, we relied on KSLD for script design. This decision was made to address the diverse levels of research available on the meanings of different words and the difficulty in determining the most pertinent studies to cite. By referencing KSLD we adopted a consistent and authoritative source for meanings, thereby upholding a standardized approach to script design for KICo.

4 Yang (2002) provides an illustrative example to support the observation that ani used as a discourse marker remains in its Intimate form even when followed by a Polite sentence.

5 The entire script is available in the appendix.

6 Please note that the first author reached out to the developer regarding technical issues on the website. The developer generously devoted considerable effort and time to resolve these challenges. Despite these efforts, the technical problems could not be fully resolved within the timeframe required for the completion of the forced alignment process of KICo and subsequent manual adjustments. We extend our gratitude to the developer for their dedication to both the development of the system for broader use and the attempts to rectify the issues in a timely manner.

7 The significance level of the p-values is indicated by star symbols: *p<0.05, **p<0.01, ***p<0.001.

8 For clarity, IP-final tones constituting less than 5% of occurrences were grouped into the ‘Others’ category.

9 Mean (‘M’) and standard deviation (‘SD’) values were rounded up to two decimal places, while the remaining values were rounded up to three decimal places. Mean and standard deviation values of F0 slope were reported in Hz/ms and those of the remaining measures were reported in Hz.

10 In the context of Grawunder & Winter (2010) and Winter & Grawunder (2012), the terms “politeness” and “formality” are used interchangeably to refer to the level of social decorum in speech.

11 It is interesting that H% was the most frequent IP-final tone for ani used as a negation in our study. This might partly attribute to the uptalk phenomenon (Warren, 2016), prevalent among young adult Korean speakers. Given that the participants of KICo were all in their 20s and 30s, the intonation of our data may have been influenced by the age group of the speakers. However, further investigation is necessary to validate this potential connection.

References

1.

Beckman, M., & Jun, S. A. (1996). K-ToBI (Korean ToBI) labeling convention. Version 2. Los Angeles, CA: Ohio State University and UCLA.

2.

Beckman, M. E., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). The original tobi system and the evolution of the ToBI framework. In: S. A. Jun (Ed.), Prosodic typology: the phonology of intonation and phrasing (pp. 9-54). Oxford, UK: Oxford University Press.

3.

Beckman, M. E., & Pierrehumbert, J. B. (1986). Intonational structure in Japanese and English. Phonology, 3, 255-309.

4.

Boersma, P., & Weenink, D. (2024). Praat: Doing phonetics by computer (version 6.4.05) [Computer program]. Retrieved from http://www.praat.org/

5.

Bolinger, D. L. (1949). Intonation and analysis. Word, 5(3), 248-254.

6.

Bresnahan, J. L., & Shapiro, M. M. (1966). A general equation and technique for the exact partitioning of chi-square contingency tables, Psychological Bulletin, 66(4), 252-262.

7.

FindingFive Team. (2023). FindingFive: An online platform for creating, running, and managing your experiments. Retrieved from https://www.findingfive.com

8.

Grawunder, S., & Winter, B. (2010, May). Acoustic correlates of politeness: Prosodic and voice quality measures in polite and informal speech of Korean and German speakers. Proceedings of the Speech Prosody 2010-Fifth International Conference. Chicago, IL.

9.

Jun, S. A. (2000). K-tobi (Korean ToBI) labelling conventions. Korean Journal of Speech Science, 7(1), 143-169.

10.

Jun, S. A. (2022). The ToBI transcription system: Conventions, strengths, and challenges. In: J. Barnes, & S. Shattuck-Hufnagel (Eds.), Prosodic theory and practice (pp. 151-181). Cambridge, MA: The MIT Press.

11.

Jeong, M. S. (2002). The teaching method of Korean intonation by basic pattern. The International Association for Korean Language Education, 3(1), 225-241.

12.

Kim, E., Park, K., Park, H., & Jin, M. (2008). Methodology of teaching Korean intonation. Journal of Korean Language Education, 19(2), 93-123.

13.

Kim, J. E. (2023). Phonetic intonation patterns of Geurae-yo in KICo. Journal of Korean Language Education, 35(1), 207-232.

14.

Kim, S. H. (2016). When speakers account for their questions: Ani-prefaced accounts in Korean conversation. In: J. D. Robinson (Ed.), Accountability in social interaction (pp. 294-320). Oxford, UK: Oxford University Press.

15.

Kisler, T., Reichel, U., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45, 326-347.

16.

Park, M. J. (2012) The meaning of Korean prosodic boundary tones. Leiden, the Netherlands: Brill.

17.

R Core Team (2020). R: R foundation for statistical computing [Computer software]. Vienna, Austria: The R Foundation. Retrieved from https://www.R-project.org

18.

Schiel, F. (1999, August). Automatic phonetic transcription of non-prompted speech. Proceedings of the ICPhS (pp. 607-610). San Francisco, CA.

19.

Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., ... Hirschberg, J. (1992, Oct). ToBI: A standard for labeling English prosody. Proceedings of the Second International Conference on Spoken Language Processing (pp. 867-870). Banff, AB.

20.

Sohn, H. (2001). The Korean language. Cambridge, UK: Cambridge University Press.

21.

Warren, P. (2016). Uptalk: The phenomenon of rising intonation. Cambridge, UK: Cambridge University Press.

22.

Winter, B., & Grawunder, S. (2012). The phonetic profile of Korean formal and informal speech registers. Journal of Phonetics, 40(6), 808-815.

23.

Yang, C. (2002). Ani 'no': It doesn't always mean 'no' in Korean. Discourse and Cognition, 9(2), 101-119.

24.

Yoon, T. J. (2021). The Korean phonetic aligner program suite. Korean Forced Alignment. Retrieved from https://tutorial.tyoon.net

25.

Yoon, T. J., & Kang, Y. (2013). The Korean phonetic aligner program suite. Retrieved from https://korean.utsc.utoronto.ca/kpa

Appendices

Appendix

Below is the Romanized version of the scripts used to elicit ani data in KICo, along with their English translations in parentheses. Note that the actual recordings were conducted with the scripts written in the Korean alphabet:

Ani_1

cal cinae? - ani. (Are you doing alright? - No.)

siheom cal bwasseo? - ani. (Did you do well on the test? - No.)

repoteu da sseosseoyo? - ani. (Did you finish writing the report? - No.)

ceonyeok meogeosseo? - ani. (Did you have dinner? - No.)

chukgu caemiisseosseo? - ani. (Was soccer fun? - No.)

undong calhaesseo? - ani. (Did you exercise well? - No.)

uri naeil mannaneun geoci? - ani. (We're meeting tomorrow, right? - No.)

cigeum haecul su isseoyo? - ani. (Can you do it now? - No.)

ne chinguya? - ani. (Are they your friend? - No.)

neo godeunghaksaengini? - ani. (Are you a high school student? - No.)

geu gasu coahae? - ani. (Do you like that singer? - No.)

Ani_2

ani, iyuga mwoya? (Wait, what's the reason?)

ani, cinccaro? (Wait, really?)

ani, geureon mal haci ma. (Wait, don't say that.)

ani, mot mitgenneunde. (Wait, I can't believe it.)

ani, cincca singihada. (Wait, that's really amazing.)

ani, nollaunde? (Wait, isn't it surprising?)

ani, ceongmal? (Wait, really?)

ani, naega geureon mareul haetdago? (Wait, are you saying I said that?)

ani, ige an museowo? (Wait, isn't this scary?)

ani, daedanhada. (Wait, that's impressive.)

ani, geureohgekkaci? (Wait, to that extent?)

Aniyo_1

cal cinaeseyo? - aniyo. (Are you doing alright? - No.)

undong yeolsimhi haeyo? - aniyo. (Do you exercise hard? - No.)

noraebureuneun geo coahaeyo? - aniyo. (Do you like singing? - No.)

nagalkkayo? - aniyo. (Shall we go out? - No.)

cungguk gabwasseoyo? - aniyo. (Have you been to China? - No.)

cam cal cayo? - aniyo. (Do you sleep well? - No.)

chuwo? - aniyo. (Is it cold? - No.)

cigeum bakke biwayo? - aniyo. (Is it raining outside now? - No.)

baegopayo? - aniyo. (Are you hungry? - No.)

pamahaenneoyo? - aniyo. (Did you get a perm? - No.)

ce sarami cieunssiga coahaneun saramieyo? - aniyo. (Is that person the one Ji-eun likes? - No.)