Patent attributes
Disclosed is a system and method for generating a model of the geometric relationships between various audio sources recorded by a multi-camera system. The spatial audio scene module associates source signals, extracted from recorded audio, of audio sources to visual objects identified in videos recorded by one or more cameras. This association may be based on estimated positions of the audio sources based on relative signal gains and delays of the source signal received at each microphone. The estimated positions of audio sources are tracked indirectly by tracking the associated visual objects with computer vision. A virtual microphone module may receive a position for a virtual microphone and synthesize a signal corresponding to the virtual microphone position based on the estimated positions of the audio sources.