Patent attributes
Embodiments of a system include a client device (102), a voice server (106), and an application server (104). The voice server is distinct from the application server. The client device renders (316) a visual display that includes at least one display element for which input data is receivable though a visual modality and a voice modality. The client device may receive speech through the voice modality and send (502) uplink audio data representing the speech to the voice server over an audio data path (124). The application server receives (514) a speech recognition result from the voice server over an application server/voice server control path (122). The application server sends (514), over an application server/client control path (120), a message to the client device that includes the speech recognition result. The client device updates (516) one or more of the display elements according to the speech recognition result.