Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation

Overview and Highlights

We oversample salient object regions by warping source-domain images in-place during training while performing domain adaptation.

Technical Highlights include:

First work to show that saliency-guided image warping enhances backbone features and improves model adaptability for unsupervised domain adaptation.
Our approach is agnostic to the adaptation task, target domain, saliency guidance, model architecture, and domain adaptation algorithm.
Our approach is efficient, with minimal training overhead and no additional inference latency.
Our approach improves adaptation across geographies, lighting and weather conditions.

Result Highlights include:

+6.1 mAP50 for BDD100K Clear → DENSE Foggy
+3.7 mAP50 for BDD100K Day → Night
+3.0 mAP50 for BDD100K Clear → Rainy
+6.3 mIoU for Cityscapes → ACDC

Why is Unsupervised Domain Adaptation Difficult?

Unsupervised Domain Adaptation (UDA) is challenging because models are trained with dominant scene backgrounds that appear dramatically different across domains. Specifically:

Object-Background Pixel Imbalance: Backgrounds occupy much more pixels than objects.
Differences in Cross-Domain Object-Background Variations: Backgrounds exhibit significantly larger cross-domain variations than objects.

Why Image Warping for Domain Adaptation?

Our image warping oversamples object regions and undersamples background areas, helping to focus on salient object regions while reducing attention to background context, leading to more robust and adaptable recognition models.

Which Image Regions to Warp?

We propose instance-level saliency guidance, which explicitly oversamples all objects during training. While methods like Static Prior Guidance [Thavamani 2021, Thavamani 2023] or Geometric Prior Guidance [Ghosh 2023] could be used, they are not designed for domain adaptation and do not prioritize object instance regions.

Comparison of InstanceWarp for Domain Adaptation.

Saliency-Guided Warping for Domain Adaptation

Our warping focus on oversampling object regions and undersampling background regions. We warp source images based on saliency guidance and then unwarp the backbone features using the same guidance before making predictions. This method can be seamlessly integrated into existing domain adaptation algorithms and is agnostic to the task, domain adaptation algorithm, saliency guidance, and underlying model architecture. Empirically, we observed that warping source domain images is more effective than warping both source and target domain images. We do not warp or unwarp at test time.

Workflow of InstanceWarp for Domain Adaptation.

Improved Backbone Features

Grad-CAM visualization shows that the model trained with our method demonstrate a higher focus on salient objects, indicating better-learned features and improved scene comprehension.

Improves a variety of adaptation scenarios

As the learned backbone features are better, our approach improves performance across various real-to-real domain adaptation tasks, including changing weather, lighting conditions, and geographies. The results below are from an adapted model pre-trained on BDD100K images taken during the day and in good weather conditions.

Method is Task, Algorithm and Backbone Agnostic

Our method is agnostic to tasks, adaptation algorithms, and backbone architectures. It seamlessly applied to object detection and semantic segmentation, using either ConvNet or Transformer backbones.