There is provided automatic detection of pronunciation errors in spoken words utilizing a neural network model that is trained for a target phoneme. The target phoneme may be a phoneme in English language. The pronunciation errors may be detected in English words.