A computation graph of a machine learning model is accessed from memory and a constraint solver is used to compute a partition of the computation graph into ordered stages of an execution pipeline. In use, when inference or training of the machine learning model takes place by executing the pipeline, execution cost of the stages are balanced according to the computed partition.