Patent attributes
A method of electronically documenting a conversation is provided. The method includes capturing audio of a conversation between a first speaker and a second speaker; generating conversation audio data from the captured audio; and segmenting the conversation audio data into a plurality of utterances according to a speaker segmentation technique. The method further includes, for each utterance: storing time data indicating the chronological position of the utterance in the conversation; passing the utterance to a neural network model, the neural network model configured to receive the utterance as an input and generate a feature representation of the utterance as an output; assigning the utterance feature representation to a first speaker cluster or a second speaker cluster according to a clustering technique; assigning a speaker identifier to the utterance based on the cluster assignment of the utterance; and generating a text representation of the utterance.