In one embodiment, a system may capture one or more images of a user using one or more cameras, the one or more images depicting at least an eye and a face of the user. The system may determine a direction of a gaze of the user based on the eye depicted in the one or more images. The system may generate a facial mesh based on depth measurements of one or more features of the face depicted in the one or more images. The system may generate an eyeball texture for an eyeball mesh by processing the direction of the gaze and the facial mesh using a machine-learning model. The system may render an avatar of the user based on the eyeball mesh, the eyeball texture, the facial mesh, and a facial texture.