Data is collected from a network graph, wherein the collected data is useful for training a machine learning model on a query domain. A domain-specific template corresponding to the query domain is received, the domain-specific template defining one or more classifiers to guide collection of content relevant to the query domain from the network graph. A collection starting point is analyzed based on the one or more classifiers of the domain-specific template to identify one or more relevant instances of the content. The one or more identified relevant instances of the content are added to a contextual protocol package. Each identified relevant instance of the content is analyzed based on the one or more classifiers of the domain-specific template to identify one or more additional relevant instances of the content. The one or more identified additional relevant instances of the content are added to the contextual protocol package.