A computer-implemented method includes obtaining training data including utterances of speakers in acoustic conditions, preparing at least one machine learning model, each machine learning model including a common embedding model for converting an utterance into a feature vector and a classification model for classifying the feature vector, and training, by using the training data, the machine learning model to perform classification by speaker and to perform classification by acoustic condition.