. . "As a method for collecting data as the base of a language model, various methods are available, for example, a large amount of human's natural vocalization may be recorded and written in the form of a transcript, corpus may be collected from document such as newspaper articles and magazines and recording media such as tapes, videos, CDs, DVDs and the like, and sentences from mail and chat may be c" . . .