Encoder-Decoder with Atrous Separable Convolution for ...
link.springer.com › chapter › 10Oct 06, 2018 · Abstract. Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries ...
Rethinking Atrous Convolution for Semantic Image Segmentation
arxiv.org › abs › 1706Jun 17, 2017 · In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in ...
[2106.10270] How to train your ViT? Data, Augmentation, and ...
arxiv.org › abs › 2106Jun 18, 2021 · Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation. In comparison to convolutional neural networks, the Vision Transformer's weaker inductive bias is generally found to cause an increased reliance on model regularization or data augmentation ...