Automatically mapping and combining the application of machine learning models to answer queries according to semantic specification. A query is parsed to extract keywords from the query and to contextualize the query. Based on the keywords, machine learning models are selected that process concepts associated with the keywords. The machine learning models are sorted according to the contextualization of the query. The machine learning models are run on multimodal data according to a sorted order, where data resulting from an output of one of the machine learning models is used as input to another one of the machine learning models. A query result is output based on a result from running the machine learning models.