Transferring Labels to Solve Annotation Mismatches Across Object Detection Datasets

1NVIDIA, 2University of Toronto, Vector Institute, 3University of Ottawa, 4Georgia Institute of Technology

Can you find any cyclist/bicycle label errors across the six images?
In fact, they are all correct! From left to right columns, they are Cityscapes, Waymo, nuImages, respectively.

Abstract

In object detection, varying annotation protocols across datasets can result in annotation mismatches, leading to inconsistent class labels and bounding regions. Addressing these mismatches typically involves manually identifying common trends and fixing the corresponding bounding boxes and class labels. To alleviate this laborious process, we introduce the label transfer problem in object detection. Here, the goal is to transfer bounding boxes from one or more source datasets to match the annotation style of a target dataset. We propose a data-centric approach, Label-Guided Pseudo-Labeling (LGPL), that improves downstream detectors in a manner agnostic to the detector learning algorithms and model architectures. Validating across four object detection scenarios, defined over seven different datasets and three different architectures, we show that transferring labels for a target task via LGPL consistently improves the downstream detection in every setting, on average by 1.88 mAP and 2.65 AP^{75}. Most importantly, we find that when training with multiple labeled datasets, carefully addressing annotation mismatches with LGPL alone can improve downstream object detection better than off-the-shelf supervised domain adaptation techniques that align instance features.

Annotation mismatches

What are annotation mismatches?

Annotation mismatches stem from differences in annotation protocols, including class taxonomies, instructions, and label post-processing, etc.
For example, in the figure below, Mapillary Vistas Dataset (MVD) annotates cyclists as ‘riders’, while Waymo Open Dataset (Waymo) combines riders and bicycles into the ‘cyclist’ class. On the other hand, nuImages annotates bikes on sidewalks, but Waymo excludes these per the annotation instructions. In addition to the ontological mismatches, discrepancies of annotation instructions, human-machine misalignment, and cross-modality labels result in unique annotation mismatches.

Why does it matter?

Training on datasets with annotation mismatches, such as combining nuImages and Waymo, can introduce unwanted biases and confuse models, like causing detectors to misidentify bikes on sidewalks. We propose a data-centric approach to resolve these mismatches across datasets, enhancing the performance of downstream detectors in a model-agnostic manner.

Label Transfer

In this work, we propose the "Label Transfer" problem, where a label transfer model needs to adjust the source labels such that the tranferred labels follow the annotation protocol in the target dataset. We evaluate the effectiveness by the performances of the induced downstream detectors.

Main challenge: there is no paired labels on the same image in the datasets.

Label-Guided Pseudo-Labeling (LGPL)

To mitigate annotations mismatches, we may use a model trained on the target data to generate pseudo-labels on the source images (Arazo et al., 2019; Lee, 2013), but this discards the existing source labels. On the other hand, statistical normalization (Wang et al., 2020) aligns boxes statistics but ignores the image content. Label-guided pseduo-lableing aims to fully leverate all available information for the label transfer problem

LGPL is inspired by identifying that the strategy used in two-stage object detectors. In short, we trained the RPN network on the source datasets and apply source-trained RPN to produce source-like proposals on the target images. Finally, we train the RoI head to transfer the source-like proposals on the target images to the target labels. All the components can be trained end-to-end.

Experimental results

We experimented across 4 transferring scenarios and 3 detector architecures. We adopt 5 baseline apporaches to the label transfer problem to showcase the effectiveness of LGPL. Please see the paper for more details

Here are our findings

  1. LGPL outperforms all other baseline methods for every architecture.
  2. Transferring labels leads to higher-quality object detectors.
  3. LGPL outperforms off-the-shelf supervised domain adaptation.
  4. Off-the-shelf segmentation foundation models fall short in label transfer.
</section>

BibTeX

@inproceedings{
liao2024translating,
title={Translating Labels to Solve Annotation Mismatches Across Object Detection Datasets},
author={Yuan-Hong Liao and David Acuna and Rafid Mahmood and James Lucas and Viraj Uday Prabhu and Sanja Fidler},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=ChHx5ORqF0}
}