A method may include obtaining a frame of a video stream of multiple video streams of a video conference, obtaining face detection information identifying a face size and a face position of at least one face detected in the frame, and cropping and scaling the frame according to at least one crop and scale parameter using the face detection information to obtain a modified first frame. The at least one crop and scale parameter is based on frames of the multiple video streams. The frames include the frame. The method may further include presenting the modified frame.