A system generates an output audio signal for an object or virtual object using image data of a room to select a room impulse response from a database. A headset may include a depth camera assembly (DCA) and processing circuitry. The DCA generates depth image data of a room. The processing circuitry determines room parameters such as the dimensions of the room based on the depth image data. A room impulse response for the room is determined based on referencing a database of room impulse responses using the room parameters. An output audio signal is generated by convolving a source audio signal of an object with the room impulse response.