Additionally, in this embodiment, it controls the following operations for example: inputted text data is converted into voice data, and the obtained voice data and appropriate image data for the text data are connected to each other, and a file including the image data and the voice data (hereinafter, it will be also referred to as ???voice-attached file???) is created.