Patent attributes
A multiple-object tracking system includes a convolutional neural network that receives a set of images of a scene that have each been extracted from a frame of a scene. Each of the images corresponds to a detected instance of one of multiple objects that appears in the scene. The convolutional neural network computes, for each image of the set, an appearance embedding vector defining a set of distinguishing characteristics for the image, and a graph network then modifies the appearance embedding vector for each image based on determined relationships between the image and a subset of the images corresponding to detection times temporally separated from a detection time. The modified appearance embedding vectors are then used to identify subsets of the images corresponding to identical targets.