Patent 11886934 was granted and assigned to Graphcore Limited on January, 2024 by the United States Patent and Trademark Office.
A data processing system comprising a plurality of processing nodes, each comprising at least one memory configured to store an array of data items, wherein each of the plurality of processing nodes is configured to execute compute instructions during a compute phase and following a precompiled synchronisation barrier, enter at least one exchange phase. During the at least one exchange phase, a series of collective operations are carried out. Each processing node is configured to perform a reduce scatter collective in at least one first dimension. Using the results of the reduce scatter collective, each processing node performs an allreduce in a second dimension. The processing nodes then perform an all-gather collective in the at least one first dimension using the results of the allreduce.