US Patent 9548048 On-the-fly speech learning and computer model generation using audio-visual synchronization

A speech recognition computer system uses video input as well as audio input of known speech when the speech recognition computer system is being trained to recognize unknown speech. The video of the speaker can be captured using multiple cameras, from multiple angles. The audio can be captured using multiple microphones. The video and audio can be sampled so that timing of events in the video and audio can be determined from the content independent of an audio or video capture device's clock. Video features, such as a speaker's moving body parts, can be extracted from the video and random sampled, to be used in a speech modeling process. Audio is modeled at the phoneme level, which provides word mapping with minor additional effort. The trained speech recognition computer system can then be used to recognize speech text from video/audio of unknown speech.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 9548048 On-the-fly speech learning and computer model generation using audio-visual synchronization

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 9548048 On-the-fly speech learning and computer model generation using audio-visual synchronization