[2010.04903] What Do Position Embeddings Learn? An ...
https://arxiv.org/abs/2010.0490310.10.2020 · In recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of self-attention. Embedding the position information in the self-attention mechanism is also an indispensable factor in …