Patent attributes
Systems and techniques for scene classification and prediction is provided herein. A first series of image frames of an environment from a moving vehicle may be captured. Traffic participants within the environment may be identified and masked based on a first convolutional neural network (CNN). Temporal classification may be performed to generate a series of image frames associated with temporal predictions based on a scene classification model based on CNNs and a long short-term memory (LSTM) network. Additionally, scene classification may occur based on global average pooling. Feature vectors may be generated based on different series of image frames and a fusion feature vector may be obtained by performing data fusion based on a first feature vector, a second feature vector, a third feature vector, etc. In this way, a behavior predictor may generate a predicted driver behavior based on the fusion feature.