Patent attributes
Image analysis of a video signal is performed to produce first metadata, and audio analysis of a multi-channel sound track associated with the video signal is performed to produce second metadata. A number of time segments of the sound track are processed, wherein each time segment is processed by either (i) spatial filtering of the audio signals or (ii) spatial rendering of the audio signals, not both, wherein for each time segment a decision was made to select between the spatial filtering or the spatial rendering, in accordance with the first and second metadata. A mix of the processed sound track and the video signal is generated. Other embodiments are also described and claimed.