The joint processing of and captured frame-rate audio and video images enables applications such as visual identification of noise sources, beamforming and noise-suppression in video conferencing and others, by accounting for the spatial differences in the location of the audio and the video cameras.