Systems and methods presented herein are configured to train a neural network model using a first set of photographs, wherein each photograph of the first set of photographs depicts a first set of objects and include one or more annotations relating to each object of the first set of objects; to automatically create mask images corresponding to a second set of objects depicted by a second set of photographs; to enable manual fine tuning of the mask images; to re-train the neural network model using the second set of photographs, wherein the re-training is based at least in part on the manual fine tuning of the mask images; and to identify one or more individual objects in a third set of photographs using the re-trained neural network model.