Bulletin of Electrical Engineering and Informatics
Vol 15, No 2: April 2026

DeepFloyd-IF via diffusion and U-Net based cross-model attention for semantic coherence

Veilumuthu, Kowsalya (Unknown)
Chandrasekar, Divya (Unknown)
Parvathi, Sakthidevi Shunmugalingam (Unknown)



Article Info

Publish Date
01 Apr 2026

Abstract

Text to image synthesis is getting harder in artificial intelligence, impacting gaming, advertising, and multimedia. The practical use of current Text to Image models is limited by the trade-off between semantic coherence and visual quality. To address this, this work presents stable diffusion cross-modal attention with multi-head attention (SD-CMA-MHA), a framework for the DeepFloyd-IF task. This combines stable diffusion with U-Net based cross-modal attention and multi-head attention (MHA) to improve DeepFloyd-IF, a standard for high quality image synthesis. This allows the model to capture subtle semantic relationships between text and images while dynamically focusing on relevant input features. Experiments on LAION-1.2B and MS-COCO datasets show that the model achieves 80% generation accuracy, 70% text-image alignment similarity and reduced divergence from real images, better than previous methods. This shows that SD-CMA-MHA improves semantic alignment and fidelity. The conclusion is that by enabling more reliable and context aware visual generation, this work not only bridges the gap between text and visual modalities but also has implications for creative industries, education and human-computer interaction.

Copyrights © 2026






Journal Info

Abbrev

EEI

Publisher

Subject

Electrical & Electronics Engineering

Description

Bulletin of Electrical Engineering and Informatics (Buletin Teknik Elektro dan Informatika) ISSN: 2089-3191, e-ISSN: 2302-9285 is open to submission from scholars and experts in the wide areas of electrical, electronics, instrumentation, control, telecommunication and computer engineering from the ...