Techniques are provided for converting a self-portrait image into a neutral-pose portrait image, including receiving a self-portrait input image, which contains at least one person who is the subject of the self-portrait. A nearest pose search selects a target neutral-pose image that closely matches or approximates the pose of the upper torso region of the subject in the self-portrait input image. Coordinate-based inpainting maps pixels from the upper torso region in the self-portrait input image to corresponding regions in the selected target neutral-pose image to produce a coarse result image. A neutral-pose composition refines the coarse result image by synthesizing details in the body region of the subject (which in some cases includes the subject's head, arms, and torso), and inpainting pixels into missing portions of the background. The refined image is composited with the original self-portrait input image to produce a neutral-pose result image.