The data from flat file representation 38 is analyzed by conversion engine 22 and based on factors such as position, inter-character spacing, font characteristics, related geometric shapes and color, and the like, the characters of flat file representation 38 are grouped into words and groups of words in GPML representation 40.