Patent attributes
Pairs of feature vectors are obtained that represent speech. Some pairs represent two samples of speech from the same speakers, and other pairs represent two samples of speech from different speakers. A neural network feeds each feature vector in a sample pair into a separate bottleneck layer, with a weight matrix on the input of both vectors tied to one another. The neural network is trained using the feature vectors and an objective function that induces the network to classify whether the speech samples come from the same speaker. The weights from the tied weight matrix are extracted for use in generating derived features for a speech processing system that can benefit from features that are thus transformed to better reflect speaker identity.