Compressing a machine learning network, such as a neural network, includes replacing one layer in the neural network with compressed layers to produce the compressed network. The compressed network may be fine-tuned by updating weight values in the compressed layer(s).