Our adapted detection results on in-the-wild driving videos taken in bad weather.
We oversample salient object regions by warping source-domain images in-place during training while performing domain adaptation.
Technical Highlights include:
Result Highlights include:
Unsupervised Domain Adaptation (UDA) is challenging because models are trained with dominant scene backgrounds that appear dramatically different across domains. Specifically:
Our image warping oversamples object regions and undersamples background areas, helping to focus on salient object regions while reducing attention to background context, leading to more robust and adaptable recognition models.
We propose instance-level saliency guidance, which explicitly oversamples all objects during training. While methods like Static Prior Guidance [Thavamani 2021, Thavamani 2023] or Geometric Prior Guidance [Ghosh 2023] could be used, they are not designed for domain adaptation and do not prioritize object instance regions.
Our warping focus on oversampling object regions and undersampling background regions. We warp source images based on saliency guidance and then unwarp the backbone features using the same guidance before making predictions. This method can be seamlessly integrated into existing domain adaptation algorithms and is agnostic to the task, domain adaptation algorithm, saliency guidance, and underlying model architecture. Empirically, we observed that warping source domain images is more effective than warping both source and target domain images. We do not warp or unwarp at test time.
Grad-CAM visualization shows that the model trained with our method demonstrate a higher focus on salient objects, indicating better-learned features and improved scene comprehension.
As the learned backbone features are better, our approach improves performance across various real-to-real domain adaptation tasks, including changing weather, lighting conditions, and geographies. The results below are from an adapted model pre-trained on BDD100K images taken during the day and in good weather conditions.
Our method is agnostic to tasks, adaptation algorithms, and backbone architectures. It seamlessly applied to object detection and semantic segmentation, using either ConvNet or Transformer backbones.