Drawing a face for a suspect just based on the descriptions of the eyewitnesses is a difficult task. There are some state-of-the-art methods in generating images from text, but there is only a little research in generating face images from text, and almost none in generating sketches from text. As a result, there is no dataset available to tackle this task. We developed a text-to-sketch dataset derived from the CelebA dataset, which comprises 200,000 celebrity images, thereby facilitating the investigation of the novel task of generating police sketches from textual descriptions. Furthermore, we demonstrated that the application of AttnGAN for generating sketch images effectively captures the facial features described in the text. We identified the optimal configuration for AttnGAN and its variants through experiments involving various recurrent neural network types and embedding sizes. We provided commonly used metric values, such as the Inception score and Fréchet Inception Distance (FID), for the two-attention-based state-of-the-art model we achieved. However, we also identified areas for improvement in the model's application. Experiments conducted with a new dataset consisting of 200 sketch images from Beijing Normal University revealed that the model encounters challenges when handling longer sentences or unfamiliar terms within descriptions. This limitation in capturing features from such text contributes to a decrease in image diversity and realism, adversely impacting the overall performance of the model. For future improvements, consider exploring alternative models such as Stack-GAN, Conditional-GAN, DC-GAN, and Style-GAN, which are known for their capabilities in face image generation. Simplifying architecture while maintaining performance can also help deploy models on mobile devices for real-world use.
Copyrights © 2024