Patent attributes
Technology is disclosed herein for learning motion in video. In an implementation, an artificial neural network extracts features from a video. A correspondence proposal (CP) module performs, for at least some of the features, a search for corresponding features in the video based on a semantic similarity of a given feature to others of the features. The CP module then generates a joint semantic vector for each of the features based at least on the semantic similarity of the given feature to one or more of the corresponding features and a spatiotemporal distance of the given feature to the one or more of the corresponding features. The artificial neural network is able to identify motion in the video using the joint semantic vectors generated for the features extracted from the video.