The affecting of the execution of the parallel program may comprise, for example, providing a load balancing functionality for reducing the number of wasted processor cycles due to waiting for all tasks of a group, i.e. a job, to return a synchronization operation call.