Patent attributes
Techniques and systems are described to model and extract knowledge from images. A digital medium environment is configured to learn and use a model to compute a descriptive summarization of an input image automatically and without user intervention. Training data is obtained to train a model using machine learning in order to generate a structured image representation that serves as the descriptive summarization of an input image. The images and associated text are processed to extract structured semantic knowledge from the text, which is then associated with the images. The structured semantic knowledge is processed along with corresponding images to train a model using machine learning such that the model describes a relationship between text features within the structured semantic knowledge. Once the model is learned, the model is usable to process input images to generate a structured image representation of the image.