. . "This step segments a series of characters into tokens. (words, symbols, punctuation, etc.) Rules are applied in an attempt to normalize such things as phone numbers, email addresses, hyphenated words, etc." . . . .