A computer receives a first audio content item and applies a process to generate a representation of first audio content item. A portion is extracted from the representation of the first audio content item. A first representative vector that corresponds to the first audio content item is determined by applying a variational autoencoder (VAE) to a first segment of the extracted portion the audio content item. The computer stores the first representative vector that corresponds to the first audio content item.