Patent attributes
Embodiments relate to a system, program product, and method for employing deep learning techniques to fuse data across modalities. A multi-modal data set is received, including a first data set having a first modality and a second data set having a second modality, with the second modality being different from the first modality. The first and second data sets are processed, including encoding the first data set into one or more first vectors, and encoding the second data set into one or more second vectors. The processed multi-modal data set is analyzed, and the encoded features from the first and second modalities are iteratively and asynchronously fused. The fused modalities include combined vectors from the first and second data sets representing correlated temporal behavior. The fused vectors are then returned as output data.