Patent attributes
A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.