Cameras capture time-stamped images of predefined areas. Individuals and items are tracked in the images. Occluded items detected in the images are preprocessed to remove pixels associated with occluded information and the pixels remaining associated with the items are cropped. The preprocessed and cropped images are provided to a trained machine-learning algorithm or a trained neural network trained to classify and identify the items. Output received from the trained neural network provides item identifiers for the items that are present in the original images.