Disclosed are systems and methods to perform neural architecture search (“NAS”) that automatically optimizes for the number of channels to allocate to each layer of a deep neural network. Some implementations include a pairwise slimming that includes a global optimization step. Likewise, in some implementations, a bias toward a region of interest may be applied to channel path selection during training.