Table 1. The range of hyperparameters in the transformer encoder network

Hyperparameters	Values
output size	256
input layer	2d conv
normalized before	True			false
attention heads	1	2	4	8
linear units	512	1,024	2,048	4,096
num blocks	2	4	6	8	12
dropout rate	0.0	0.1	0.2	0.3	0.4
positional dropout rate	0.1
attention dropout rate	0.0	0.1	0.2	0.3	0.4