Phoneme embedding dimension | 256 |
Encoder layers | 4 |
Encoder hidden | 256 |
Encoder Conv1D kernel | 9 |
Encoder Conv1D filter size | 1,024 |
Encoder attention heads | 2 |
Mel-spectrogram decoder layers | 4 |
Mel-spectrogram decoder hidden | 256 |
Mel-spectrogram decoder Conv1D kernel | 9 |
Mel-spectrogram decoder Conv1D filter size | 1,024 |
Mel-spectrogram decoder attention headers | 2 |
Encoder / decoder dropout | 0.1 |
Variance predictor Conv1D kernel | 3 |
Variance predictor Conv1D filter size | 256 |
Variance predictor dropout | 0.5 |
Waveform decoder convolution blocks | 30 |
Waveform decoder dilated Conv1D kernel size | 3 |
Waveform decoder transposed Conv1D filter size | 64 |
Waveform decoder skip channel Size | 64 |
Batch size | 32 |