Patent attributes
An approach for spending allocation, executed by one or more processors to provide one or more monetary output values in response to a request for determining spending allocation in a digital marketing channel, is provided. The approach fits one or more models to train a business environment simulator. The approach generates a supervised learning policy. The approach evolves a supervised learning policy into a distribution estimator policy by adjusting network weights of the supervised learning policy. The approach generates an optimized policy by evolving the distribution estimator policy through interaction with the business environment simulator. The approach determines a profit uplift of the optimized policy by comparing the optimized policy and the supervised learning policy. Further, in response to the optimized policy outperforming the supervised learning policy, the approach deploys the optimized policy in a live environment.