Patent attributes
An audio processing system for generating audio including emotionally expressive synthesized content includes a computing platform having a hardware processor and a memory storing a software code including a trained neural network. The hardware processor is configured to execute the software code to receive an audio sequence template including one or more audio segment(s) and an audio gap, and to receive data describing one or more words for insertion into the audio gap. The hardware processor is configured to further execute the software code to use the trained neural network to generate an integrated audio sequence using the audio sequence template and the data, the integrated audio sequence including the one or more audio segment(s) and at least one synthesized word corresponding to the one or more words described by the data.