Strategies for an objective associated with an offering set are obtained. A strategy assigns respective selection probabilities of receiving content associated with the offering set to users of a user population. Strategy optimization iterations are performed with respect to a sub-sample of the population and a subset of the strategies. In a given iteration, weights assigned to the strategies are used to determine aggregated selection probabilities for users, content pertaining to the offering set is presented to users selected based on the aggregated probabilities, and the weights are adjusted based on feedback metrics and an exploration-exploitation tradeoff parameter. Based on weights updated in the iterations, content associated with the offering set is presented to users which were not in the sub-sample.