Table 6. High-frequency vocabulary coverage of corpora

Token Type Vocabulary coverage (%)
Whole script 549,683 38,356 73.82
Read-aloud style 402,958 34,743 68.37
Conversational style 146,725 13,063 44.03
CNN_549K 549,849 28,920 61.53
CNN_402K 403,183 25,031 56.73