A memory holds D rows of N words and receives an address having log2 D bits and an extra bit. Each of N processing units (PU) of index J has first and second registers, an accumulator, an arithmetic unit that performs an operation thereon to accumulate a result, and multiplexing logic receiving memory word J, and for PUs 0 to (N/2)−1 also memory word J+(N/2). In a first mode, the multiplexing logic of PUs 0 to N−1 selects word J to output to the first register. In a second mode: when the extra bit is a zero, the multiplexing logic of PUs 0 to (N/2)−1 selects word J to output to the first register, and when the extra bit is a one, the multiplexing logic of PUs 0 through (N/2)−1 selects word J+(N/2) to output to the first register.