Patent attributes
Systems and methods are described for training a machine learning model to make a series of sequential decisions, in which the results of previous decisions are known prior to the next decision in the sequence being made. A safe reinforcement learning model estimates the results of choosing various options for a first decision in the sequence, and further estimates the amount of information that will be gained by choosing each of the options. The estimated information gain associated with each option is then used to forecast how the remaining decisions in the sequence would be improved by using the gained information to improve the prediction model and make better decisions. The safe reinforcement learning model further incorporates decision constraints provided by subject matter experts, which may set requirements for the selection such as a minimum required result and allow the safe reinforcement learning model to explore options within those constraints.