Method and systems to voice-enable a user interface using a voice extension module are provided. A voice extension module includes a preprocessor, a speech recognition engine, and an input handler. The voice extension module receives user interface information, such as, a hypertext markup language (HTML) document, and voice-enables the document so that a user may interact with any user interface elements using voice commands.