The disclosed embodiments relate to a method for controlling a virtual character (or “avatar”) using a multi-modal model. The multi-modal model may process various input information relating to a user and process the input information using multiple internal models. The multi-modal model may combine the internal models to make believable and emotionally engaging responses by the virtual character. The link to a virtual character may be embedded on a web browser and the avatar may be dynamically generated based on a selection to interact with the virtual character by a user. A report may be generated for a client, the report providing insights as to characteristics of users interacting with a virtual character associated with the client.