Patent attributes
A method, a computer-readable medium, and an apparatus for zero-exemplar event detection are provided. The apparatus may receive a plurality of text blocks, each of which may describe one of a plurality of pre-defined events. The apparatus may receive a plurality of training videos, each of which may be associated with one of the plurality of text blocks. The apparatus may propagate each text block through a neural network to obtain a textual representation in a joint space of textual and video representations. The apparatus may propagate each training video through the neural network to obtain a visual representation in the joint space. The apparatus may adjust parameters of the neural network to reduce, for each pair of associated text block and training video, the distance in the joint space between the textual representation of the associated text block and the visual representation of the associated training video.