Patent attributes
Technologies are provided for the generating of optimal policies for bidding in auctions having unknown dynamics. In some embodiments, a computing system can configure many multi-armed bandit (MAB) models defining candidate directed contents for a sequence of pages. A particular MAB model of the many MAB models defines candidate directed contents for a particular page in the sequence of pages, where each arm in the particular MAB model corresponds to a candidate impression on the particular page. The computing system can then determine a solution to an optimization problem with respect to an objective function based on an expected long-term reward for a defined impression on the first page, a defined impression on the second page, and a defined impression on the third page. The solution results in respective directed content for presentation on the first, second, and third pages.