Patent attributes
A method (and structure and computer product) to permit zero-shot voice conversion with non-parallel data includes receiving source speaker speech data as input data into a content encoder of a style transfer autoencoder system, the content encoder providing a source speaker disentanglement of the source speaker speech data by reducing speaker style information of the input source speech data while retaining content information and receiving target speaker input speech as input data into a target speaker encoder. The output of the content encoder and the target speaker encoder are combined in a decoder of the style transfer autoencoder, and the output of the decoder provides the content information of the input source speech data in a style of the target speaker speech information.