Patent attributes
Among other things, techniques are described for cross-modality active learning for object detection. In an example, a first set of predicted bounding boxes and a second set of predicted bounding boxes is generated. The first set of predicted bounding boxes and the second set of predicted bounding boxes are projected into a same representation. The projections are filtered, wherein predicted bounding boxes satisfying a maximum confidence score are selected for inconsistency calculations. Inconsistencies are calculated across the projected bounding boxes based on filtering the projections. An informative scene is extracted based on the calculated inconsistencies. A first object detection neural network or a second object detection neural network is trained using the informative scenes.