A conference device and a computer-implemented method for training a neural network are disclosed, the conference device comprising a conference controller; a microphone array comprising a plurality of microphones for provision of audio signals representing audio from one or more sound sources; a direction estimator connected to the conference controller and the microphone array, the direction estimator configured to obtain, from the microphone array, a plurality of audio signals including a first audio signal and a second audio signal; determine direction data based on the plurality of audio signals, the direction data comprising an indication of an estimated probability of voice activity for one or more directions, wherein to determine direction data comprises to apply an offline-trained neural network; and output audio data based on the direction data to the conference controller.