"This invention teaches how one can learn the location of the largest subsets and/or one or more significant subsets within a collection of documents where the distinguishing characteristic is a part of the semantic content of the document." . . . .