[TTS+화자 인코더] 모델 | Seen | Unseen | ||
---|---|---|---|---|
NMOS | SMOS | NMOS | SMOS | |
Ground truth | 4.34±0.06 | 4.72±0.02 | 4.32±0.04 | 4.72±0.02 |
FastSpeech2+Speaker ID | 3.33±0.08 | 3.58±0.07 | ND | ND |
FastSpeech2+GE2ESV | 3.48±0.06 | 3.67±0.07 | 3.12±0.05 | 2.83±0.07 |
FastSpeech2+ResNet34SE | 3.28±0.08 | 3.43±0.13 | 3.23±0.05 | 3.03±0.07 |
(Proposed) | 3.66±0.06 | 3.97±0.04 | 3.36±0.04 | 3.16±0.04 |