Patent attributes
The technology described herein uses a multiple-output layer RNN to process an acoustic signal comprising speech from multiple speakers to trace an individual speaker's speech. The multiple-output layer RNN has multiple output layers, each of which is meant to trace one speaker (or noise) and represent the mask for that speaker (or noise). The output layer for each speaker (or noise) can have the same dimensions and can be normalized for each output unit across all output layers. The rest of the layers in the multiple-output layer RNN are shared across all the output layers. The result from the previous frame is used as input to the output layer or to one of the hidden layers of the RNN to calculate results for the current frame. This pass back of results allows the model to carry information from previous frames to future frames to trace the same speaker.