7A is a block diagram of the arbiter within the L1 cache, SM, and PE of FIG. 3A, according to one embodiment of the present invention;FIG. 7B is a flow diagram of method steps for the L1 cache arbitration, according to one embodiment of the present invention;FIG. 8A is a block diagram of the replay unit within SM and the arbiter and PRT within the L1 cache, according to one embodiment of the pres