Patent attributes
The invention relates to method and system for automatically generating and labelling reference images. In some embodiments, the method includes tracking a plurality of highlighted objects in a set of input images along with audio data associated with the plurality of highlighted objects. The method further includes cropping each of the plurality of highlighted objects from each of the set of images based on tracking, contemporaneously capturing an audio clip associated with each of the plurality of highlighted objects from the audio data based on tracking, and labelling each of the plurality of highlighted objects based on text data generated from the audio clip associated with each of the plurality of objects to generate a labelled reference image.