Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element is enabled to execute instructions in accordance with an ISA. The ISA is enhanced in accordance with improvements with respect to deep learning acceleration.