Summary of Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders For Referring Image Segmentation, by Yubin Cho et al.
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentationby Yubin…