Methods, apparatus, and system to summarize an audio-visual media with a neural network machine learning architecture.