[2104.12753v1] Improve Vision Transformers Training by ...
arxiv.org › abs › 2104Apr 26, 2021 · we observe that the instability of transformer training on vision tasks can be attributed to the over-smoothing problem, that the self-attention layers tend to map the different patches from the input image into a similar latent representation, hence yielding the loss of information and degeneration of performance, especially when the number of …