Table 2. The range of transformer decoder network hyperparameters

Hyperparameters Values
attention heads 1 2 4 8
linear units 512 1,024 2,048 4,096
num blocks 2 4 6 8 12
dropout rate 0.0 0.1 0.2 0.3 0.4
positional dropout rate 0.1
self attention dropout rate 0.0 0.1 0.2 0.3 0.4
src attention dropout rate 0.0