Patent attributes
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, energy efficiency, and cost. In a first embodiment, a scaled array of processing elements is implementable with varying dimensions of the processing elements to enable varying price/performance systems. In a second embodiment, an array of clusters communicates via high-speed serial channels. The array and the channels are implemented on a Printed Circuit Board (PCB). Each cluster comprises respective processing and memory elements. Each cluster is implemented via a plurality of 3D-stacked dice, 2.5D-stacked dice, or both in a Ball Grid Array (BGA). A processing portion of the cluster is implemented via one or more Processing Element (PE) dice of the stacked dice. A memory portion of the cluster is implemented via one or more High Bandwidth Memory (HBM) dice of the stacked dice.