In object detection, varying annotation protocols across datasets can result in annotation mismatches, leading to inconsistent class labels and bounding regions. Addressing these mismatches typically involves manually identifying common trends and fixing the corresponding bounding boxes and class labels. To alleviate this laborious process, we introduce the label transfer problem in object detection. Here, the goal is to transfer bounding boxes from one or more source datasets to match the annotation style of a target dataset. We propose a data-centric approach, Label-Guided Pseudo-Labeling (LGPL), that improves downstream detectors in a manner agnostic to the detector learning algorithms and model architectures. Validating across four object detection scenarios, defined over seven different datasets and three different architectures, we show that transferring labels for a target task via LGPL consistently improves the downstream detection in every setting, on average by 1.88 mAP and 2.65 AP^{75}. Most importantly, we find that when training with multiple labeled datasets, carefully addressing annotation mismatches with LGPL alone can improve downstream object detection better than off-the-shelf supervised domain adaptation techniques that align instance features.
Annotation mismatches stem from differences in annotation protocols, including class taxonomies, instructions, and label post-processing, etc.
For example, in the figure below, Mapillary Vistas Dataset (MVD) annotates cyclists as ‘riders’, while Waymo Open Dataset (Waymo) combines riders and bicycles into the ‘cyclist’ class. On the other hand, nuImages annotates bikes on sidewalks, but Waymo excludes these per the annotation instructions. In addition to the ontological mismatches, discrepancies of annotation instructions, human-machine misalignment, and cross-modality labels result in unique annotation mismatches.
Training on datasets with annotation mismatches, such as combining nuImages and Waymo, can introduce unwanted biases and confuse models, like causing detectors to misidentify bikes on sidewalks. We propose a data-centric approach to resolve these mismatches across datasets, enhancing the performance of downstream detectors in a model-agnostic manner.
In this work, we propose the "Label Transfer" problem, where a label transfer model needs to adjust the source labels such that the tranferred labels follow the annotation protocol in the target dataset. We evaluate the effectiveness by the performances of the induced downstream detectors.
Main challenge: there is no paired labels on the same image in the datasets.
To mitigate annotations mismatches, we may use a model trained on the target data to generate pseudo-labels on the source images (Arazo et al., 2019; Lee, 2013), but this discards the existing source labels. On the other hand, statistical normalization (Wang et al., 2020) aligns boxes statistics but ignores the image content. Label-guided pseduo-lableing aims to fully leverate all available information for the label transfer problem
LGPL is inspired by identifying that the strategy used in two-stage object detectors. In short, we trained the RPN network on the source datasets and apply source-trained RPN to produce source-like proposals on the target images. Finally, we train the RoI head to transfer the source-like proposals on the target images to the target labels. All the components can be trained end-to-end.
We experimented across 4 transferring scenarios and 3 detector architecures. We adopt 5 baseline apporaches to the label transfer problem to showcase the effectiveness of LGPL. Please see the paper for more details
Here are our findings
@inproceedings{
liao2024translating,
title={Translating Labels to Solve Annotation Mismatches Across Object Detection Datasets},
author={Yuan-Hong Liao and David Acuna and Rafid Mahmood and James Lucas and Viraj Uday Prabhu and Sanja Fidler},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=ChHx5ORqF0}
}