Multiprocessor systems and methods are disclosed. One embodiment may comprise a plurality of processor cores. A given processor core may be operative to generate a request for desired data in response to a cache miss at a local cache. A shared cache structure may provide at least one speculative data fill and a coherent data fill of the desired data to at least one of the plurality of processor cores in response to a request from the at least one processor core. A processor scoreboard arbitrates the requests for the desired data. A speculative data fill of the desired data is provided to the at least one processor core. The coherent data fill of the desired data may be provided to the at least one processor core in a determined order.