Patent attributes
A system detects multiple instances of an object in a digital image by receiving a two-dimensional (2D) image that includes a plurality of instances of an object in an environment. For example, the system may receive the 2D image from a camera or other sensing modality of an autonomous vehicle (AV). The system uses a first object detection network to generate a plurality of predicted object instances in the image. The system then receives a data set that comprises depth information corresponding to the plurality of instances of the object in the environment. The data set may be received, for example, from a stereo camera of an AV, and the depth information may be in the form of a disparity map. The system may use the depth information to identify an individual instance from the plurality of predicted object instances in the image.