Systems and methods for controlling drilling operations are provided. A controller for a drilling system may provide drilling parameters such as weight-on-bit and rotation rate parameters to the drilling system, based on a machine-learned reward policy and a model-based prediction. The machine-learned reward policy may be generated during drilling operations and used to modify recommended values from the model-based prediction for subsequent drilling operations to achieve a desired rate-of-penetration.