Patent attributes
A system accesses a first digital audio file that includes a plurality of spoken instructions. The system converts the first digital audio file to a first spectrogram image, applies a filter to determine whether an image quality of the first spectrogram image is below a predetermined image quality, and in response, generates a second spectrogram image from the first spectrogram image using a training model. The system converts the second spectrogram image to a second digital audio file and converts the second digital audio file into multiple vectors that each correspond to a particular spoken instruction. The system identifies related vectors and concatenates the related vectors together in order to create a plurality of concatenated vectors. The system generates, using the plurality of concatenated vectors, a third digital audio file that includes concatenated spoken instructions from the first digital audio file.