Patent attributes
A system and computer based method for transcribing and extracting metadata from a source media. A processor-based server extracts audio and video stream from the source media. A speech recognition engine processes the audio and/or video stream to transcribe the audio and/or video stream into a time-aligned textual transcription and to extract audio amplitude by time interval, thereby providing a time-aligned machine transcribed media. The server processor measures the aural amplitude of the extracted audio amplitude and assigns a numerical value that is normalized to a single, normalized, universal amplitude scale. A database stores the time-aligned machine transcribed media, time-aligned video frames and the assigned value from the normalized amplitude scale.