on and data storage units, receives instructions and data, and execute instructions, wherein there are N2 PEs, N communicating ALU trees, and N root tree processors for a N array structure where N is a positive integer; and wherein each communicating ALU tree connects to N PEs at leaf nodes of the tree and one root tree processor which connects to a root of the tree providing results to a Host int