Transformer-XL
anwarvic.github.io › language-modeling › Transformer-XLJul 28, 2019 · To address this challenge, Transformer-XL employs novel relative positional encodings. In the vanilla transformer, positional encodings were depending on the index of the tokens. This positional encoding is depending on the relative distance between tokens, hence the name: relative positional encoding. In the paper this was done by expanding the simple query-key multiplication of the Attention Head’s Score.