Patent attributes
In one general aspect, a computer-implemented method for text generation based on an audio speech signal can include receiving the audio speech signal, extracting acoustic feature values of the speech signal at a predefined sampling frequency, mapping written words of a transcription of the audio speech signal to the units of the corresponding pronunciation objects, segmenting the audio speech signal including mapping the units of corresponding pronunciation objects to the received audio speech signal to determine a beginning time and an end time of the mapped units, aligning one or more units of the corresponding pronunciation objects to one or more graphemes based on a unit-grapheme mapping, determining a speed parameter for each aligned grapheme, determining acoustic parameters for each aligned grapheme, and generating, for each character of the aligned graphemes, a character shape representative of the speed parameter and the acoustic parameters associated with the respective grapheme.