Aspects of the disclosure provide methods and apparatuses for video conference and/or telepresence. In some examples, video conference/telepresence can be performed by multiple client devices, user devices and a media control device (e.g., server device). For example, a first client device determines a grouping control that limits a grouping of an overlay media from a second client device with an immersive media of the first client device, and transmits a grouping control signal indicative of the grouping control to inform a media control device. Further, the first client device provides one or more media including the immersive media to the media control device. The media control device can group a plurality of immersive media streams in a single or multiple groups based on the grouping control signal received from the first client device.