Table 2. The range of transformer decoder network hyperparameters
Hyperparameters
Values
attention heads
1
2
4
8
linear units
512
1,024
2,048
4,096
num blocks
2
4
6
8
12
dropout rate
0.0
0.1
0.2
0.3
0.4
positional dropout rate
0.1
self attention dropout rate
0.0
0.1
0.2
0.3
0.4
src attention dropout rate
0.0