Patent attributes
Implementations described herein discloses a multi-modality video recognition system. Specifically, the multi-modality video recognition system is configured to train a plurality of classifier networks, each of the classifier network trained with a different one of the plurality of video streams, wherein each of the plurality of different classifier networks includes multiple intermediate layers, determine correlation matrices of related intermediate layers of each of the plurality of the different classifier networks, and align the correlation matrices of the related intermediate layers of each of the plurality of the different classifier networks.