Patent attributes
Disclosed are a speech data based language modeling system and method. The speech data based language modeling method includes transcription of text data, and generation of a regional dialect corpus based on the text data and regional dialect-containing speech data and generation of an acoustic model and a language model using the regional dialect corpus. The generation of an acoustic model and a language model is performed by machine learning of an artificial intelligence (AI) algorithm using speech data and marking of word spacing of a regional dialect sentence using a speech data tag. A user is able to use a regional dialect speech recognition service which is improved using 5G mobile communication technologies of eMBB, URLLC, or mMTC.