Patent 10535371 was granted and assigned to Intel on January, 2020 by the United States Patent and Trademark Office.
Techniques are provided for video summarization, based on speaker segmentation and clustering, to identify persons and scenes of interest. A methodology implementing the techniques according to an embodiment includes extracting audio content from a video stream and detecting one or more segments of the audio content that include the voice of a single speaker. The method also includes grouping the one or more detected segments into an audio cluster associated with the single speaker and providing a portion of the audio cluster to a user. The method further includes receiving an indication from the user that the single speaker is a person of interest. Segments of interest are then extracted from the video stream, where each segment of interest is associated with a scene that includes the person of interest. The extracted segments of interest are then combined into a summarization video.