US Patent 11978435 Long-context end-to-end speech recognition system

This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 11978435 Long-context end-to-end speech recognition system

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 11978435 Long-context end-to-end speech recognition system