Table 1. The range of hyperparameters in the transformer encoder network

Hyperparameters Values
output size 256
input layer 2d conv
normalized before True false
attention heads 1 2 4 8
linear units 512 1,024 2,048 4,096
num blocks 2 4 6 8 12
dropout rate 0.0 0.1 0.2 0.3 0.4
positional dropout rate 0.1
attention dropout rate 0.0 0.1 0.2 0.3 0.4