Disclosed is a method for determining a content associated with a voice signal, which is performed by a computing device. The method may include converting a voice signal and generating text information. The method may include determining a plurality of target word candidates. The method may include determining a target word among the plurality of target word candidates based on a comparison between the plurality of target word candidates and the generated text information. The method may also include determining a content associated with the target word.