An example method includes receiving, by one or more processors, a representation of an utterance spoken at a computing device; identifying, by a first computational agent from a plurality of computational agents and based on the utterance, a multi-element task to be performed, wherein the plurality of computational agents includes one or more first party computational agents and a plurality of third-party computational agents; and performing, by the first computational agent, a first sub-set of elements of the multi-element task, wherein performing the first sub-set of elements comprises selecting a second computational agent from the plurality of computational agents to perform a second sub-set of elements of the multi-element task.