Patent attributes
A method includes the following steps. A video sequence including detection results from one or more detectors is received, the detection results identifying one or more objects. A clustering framework is applied to the detection results to identify one or more clusters associated with the one or more objects. The clustering framework is applied to the video sequence on a frame-by-frame basis. Spatial and temporal information for each of the one or more clusters are determined. The one or more clusters are associated to the detection results based on the spatial and temporal information in consecutive frames of the video sequence to generate tracking information. One or more target tracks are generated based on the tracking information for the one or more clusters. The one or more target tracks are consolidated to generate refined tracks for the one or more objects.