The system of claim 1, wherein the cluster estimation tool estimates the first number of clusters based on the equation ##EQU2## where CE is the estimated first number of clusters, C is a clustering constant, N is a number of records in the database, F is a number of fields in the database, PS is a percentage sampling rate of records in the database, PN is a percentage of overall data in the datab