Ravisankar, Priyadharsini
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

FaceSynth: text-to-face generation using CLIP and its variants with generative adversarial networks Ravisankar, Priyadharsini; Dhanvanth, Shruthi; Jenane Padmanabhan, Vaishnave
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 14, No 5: October 2025
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v14.i5.pp3588-3598

Abstract

In recent years, there have been massive developments in the field of generative AI, especially in generative adversarial networks (GANs). GANs generate original images that haven't been seen during training and have had several advancements like StyleGAN, StyleGAN2, and StyleGAN2-adaptive discriminator augmentation (ADA). Contrastive language-image pre-training (CLIP), by OpenAI, is a visual linguistic model that has been trained to associate texts with images. Recently, new CLIP variants were developed, such as metadata-curated language-image pre-training (MetaCLIP), released by Facebook and trained on a larger dataset, and Multilinigual-CLIP, which adapts CLIP to multiple languages. We compare CLIP and its variants in text-to-face synthesis with a custom StyleGAN2-ADA model and a pre-trained StyleGAN2 model. Our training-free algorithm starts with an initial image latent code that is iteratively manipulated to match a given text description. It achieves this by minimizing the distance between the text and image embedding in the multi-modal embedding space of the CLIP models. An examination of CLIP and its variants showed that MetaCLIP outperformed its competitors in LPIPS similarity and closeness of the synthesized image to the actual prompt. CLIP produced the most realistic images with the best FID score and multilingual-CLIP presented a choice of input text language and generated decent images.