Transformer — PyTorch 1.10.1 documentation
https://pytorch.org/docs/stable/generated/torch.nn.Transformer.htmlTransformer¶ class torch.nn. Transformer (d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation=<function relu>, custom_encoder=None, custom_decoder=None, layer_norm_eps=1e-05, batch_first=False, norm_first=False, device=None, dtype=None) [source] ¶. A transformer model. User is able to …
TransformerEncoder — PyTorch 1.10.1 documentation
pytorch.org › torchforward (src, mask = None, src_key_padding_mask = None) [source] ¶ Pass the input through the encoder layers in turn. Parameters. src – the sequence to the encoder (required). mask – the mask for the src sequence (optional). src_key_padding_mask – the mask for the src keys per batch (optional). Shape: see the docs in Transformer class.
How to add padding mask to nn.TransformerEncoder module ...
discuss.pytorch.org › t › how-to-add-padding-mask-toDec 08, 2019 · I think, when using src_mask, we need to provide a matrix of shape (S, S), where S is our source sequence length, for example, import torch, torch.nn as nn q = torch.randn(3, 1, 10) # source sequence length 3, batch size 1, embedding size 10 attn = nn.MultiheadAttention(10, 1) # embedding size 10, one head attn(q, q, q) # self attention