Our adapted detection results on in-the-wild driving videos taken in bad weather.
We oversample salient object regions by warping source-domain images in-place during training while performing domain adaptation.
Our approach improves adaptation across geographies, lighting and weather conditions, is agnostic to the task, domain adaptation algorithm, saliency guidance, and underlying model architecture. Our approach adds minimal memory during training and has no additional latency at inference time.
Highlights include:
We propose in-place warping of source domain images based on the locations of object instances present in them, to mitigate scale bias in domain adaptation. Warped images have the same resolution as the original images, but object regions are oversampled -- making small objects appear larger.
What in-place warping thus accomplishes is shifting the object scale distribution, which in turn improves adaptation across diverse datasets.
A warping guidance, i.e. a saliency-based guidance to oversample some regions over another, is needed to oversample image regions. We could have used Static Prior Guidance [Thavamani 2021, Thavamani 2023] or Geometric Prior Guidance [Ghosh 2023], they are not designed for domain adaptation and do not explicitly oversample object instance regions which performs the best.
We warp source images based on the saliency guidance, and before prediction, unwarp the backbone features using the same saliency guidance. This can be easily incorporated into existing domain adaptation algorithms, and is agnostic to task, domain adaptation algorithm, saliency guidance, and underlying model architecture.
Our investigation shows that our method by shifting the source scale distribution improves backbone features.
As the learned backbone features are better, our approach improves performance in a variety of real to real domain adaptation tasks -- changing weather, lighting conditions and geographies. Shown results are from adapted model pre-trained on BDD100K images in day and good weather.
Our method is Task, Adaptation Algorithm, and Backbone agnostic, it can used for semantic segmentation, object detection, and other tasks with both CNN or Transformer backbones.