As a matter of fact, I think it is now pretty standard practise that MapReduce, potentially accelerated by a column oriented database, is the preferred way to do this data analysis, particularly if you have this data already in the 'cloud' as is the case in Yahoo and Google's data centers.