Patent attributes
A method for transforming Video-To-Text is disclosed that automatically generates text descriptions of the content of a video. The present invention first segments an input video sequence according to predefined semantic classes using a Mixture-of-Experts blob segmentation algorithm. The resulting segmentation is coerced into a semantic concept graph and based on domain knowledge and a semantic concept hierarchy. Then, the initial semantic concept graph is summarized and pruned. Finally, according to the summarized semantic concept graph and its changes over time, text and/or speech descriptions are automatically generated using one of the three description schemes: key-frame, key-object and key-change descriptions.