US Patent 11295721 Generating expressive speech audio from text data

A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features. The system includes one or more modules configured to process the predicted acoustic features, the one or more modules comprising a machine-learned vocoder configured to generate a waveform of the expressive speech audio.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 11295721 Generating expressive speech audio from text data

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 11295721 Generating expressive speech audio from text data