An example device for coding (encoding or decoding) video data includes a memory configured to store video data; and one or more processors implemented in circuitry and configured to: partition a coding unit (CU) of video data into sub-blocks, the sub-blocks being arranged into a number of rows and a number of columns, the number of rows being greater than 1 and the number of columns being greater than 1; form intra-prediction blocks for each of the sub-blocks; and code the CU using the intra-prediction blocks.