US Patent 11842541 Multi-resolution attention network for video action recognition

This invention classifies an action that appears in a video clip by receiving a video clip for analysis, applying a convolutional neural network mechanism (CNN) to the frames in the clip to generate a 4D embedding tensor for each frame in the clip, applying a multi-resolution convolutional neural network mechanism (CNN) to each of the frames in the clip to generate a sequence of reduced resolution blocks, computing a kinematic attention weight that estimates the amount of motion in the block, applying the attention weights to the embedding tensors for each frame in a clip, to generate a weighted embedding tensor, or context, that represents all the frames in the clip, at the resolution, combining the contexts across all resolutions to generate a multi-resolution context, performing a 3D pooling to obtain a 1D feature vector and classifying a primary action of the video clip based on the feature vector.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 11842541 Multi-resolution attention network for video action recognition

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 11842541 Multi-resolution attention network for video action recognition