Table 6. High-frequency vocabulary coverage of corpora
Token
Type
Vocabulary coverage (%)
Whole script
549,683
38,356
73.82
Read-aloud style
402,958
34,743
68.37
Conversational style
146,725
13,063
44.03
CNN_549K
549,849
28,920
61.53
CNN_402K
403,183
25,031
56.73