Transformer

SETR Dosovitskiy, Alexey, et al. “An image is worth 16x16 words: Transformers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020). Zheng, Sixiao, et al. “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. 论文简介 SETR：用 Transformer 从 Sequence-to-Sequence 的角度重新思考语义分割...