Implementations of the present disclosure provide a solution for cross-domain image translation. In this solution, a first learning network for geometric deformation from a first to a second image domain is determined based on first and second images in the first and second domains, images in the two domains having different styles and objects in the images having geometric deformation with respect to each other. Geometric deformation from the second to the first domains is performed on the second image or geometric deformation from the first to the second domains is performed on the first image, to generate an intermediate image. A second learning network for style transfer from the first to the second domains is determined based on the first and intermediate images or based on the second and intermediate images generated. Accordingly, processing accuracy of leaning networks for cross-domain image translation can be improved and complexity is lowered.