The joint processing of captured frame-rate audio and video images enables applications such as visual identification of noise sources, beamforming and noise-suppression in video conferencing and others, provided it is possible to account for the spatial differences in the location of the audio and the video cameras.