[TTS+화자 인코더] 모델 | English | Korean | ||||||
---|---|---|---|---|---|---|---|---|
Seen | Unseen | Seen | Unseen | |||||
P-MOS | SECS | P-MOS | SECS | P-MOS | SECS | P-MOS | SECS | |
Ground truth | 3.53±0.06 | 0.98 | 3.80±0.02 | 0.98 | 4.09±0.04 | 0.99 | 3.85±0.03 | 0.99 |
FastSpeech2+Speaker ID | 2.97±0.05 | 0.85 | ND | ND | 3.45±0.05 | 0.93 | ND | ND |
FastSpeech2+GE2ESV | 2.94±0.05 | 0.88 | 2.44±0.02 | 0.89 | 3.57±0.05 | 0.94 | 3.43±0.03 | 0.85 |
FastSpeech2+ResNet34SE | 2.75±0.05 | 0.78 | 2.29±0.02 | 0.80 | 3.52±0.05 | 0.93 | 3.35±0.02 | 0.83 |
(Proposed) | 3.00±0.05 | 0.90 | 2.54±0.03 | 0.89 | 3.58±0.05 | 0.94 | 3.74±0.02 | 0.87 |