A system and method of creating a customized multi-media message to a recipient is disclosed. The multi-media message is created by a sender and contains an animated entity that delivers an audible message. The sender chooses the animated entity from a plurality of animated entities. The system receives a text message from the sender and receives a sender audio message associated with the text message. The sender audio message is associated with the chosen animated entity to create the multi-media message. The multi-media message is delivered by the animated entity using as the voice the sender audio message wherein the mouth movements of the animated entity conform to the sender audio message.