Garuda - Garba Rujukan Digital

Bulletin of Electrical Engineering and Informatics

Vol 15, No 2: April 2026

Veilumuthu, Kowsalya (Unknown)
Chandrasekar, Divya (Unknown)
Parvathi, Sakthidevi Shunmugalingam (Unknown)

Publish Date
01 Apr 2026

Text to image synthesis is getting harder in artificial intelligence, impacting gaming, advertising, and multimedia. The practical use of current Text to Image models is limited by the trade-off between semantic coherence and visual quality. To address this, this work presents stable diffusion cross-modal attention with multi-head attention (SD-CMA-MHA), a framework for the DeepFloyd-IF task. This combines stable diffusion with U-Net based cross-modal attention and multi-head attention (MHA) to improve DeepFloyd-IF, a standard for high quality image synthesis. This allows the model to capture subtle semantic relationships between text and images while dynamically focusing on relevant input features. Experiments on LAION-1.2B and MS-COCO datasets show that the model achieves 80% generation accuracy, 70% text-image alignment similarity and reduced divergence from real images, better than previous methods. This shows that SD-CMA-MHA improves semantic alignment and fidelity. The conclusion is that by enabling more reliable and context aware visual generation, this work not only bridges the gap between text and visual modalities but also has implications for creative industries, education and human-computer interaction.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Bulletin of Electrical Engineering and Informatics

Website

Abbrev

EEI

Publisher

Universitas Ahmad Dahlan

Subject

Electrical & Electronics Engineering

Description

Bulletin of Electrical Engineering and Informatics (Buletin Teknik Elektro dan Informatika) ISSN: 2089-3191, e-ISSN: 2302-9285 is open to submission from scholars and experts in the wide areas of electrical, electronics, instrumentation, control, telecommunication and computer engineering from the ...

Article Info

Abstract

DeepFloyd-IF via diffusion and U-Net based cross-model attention for semantic coherence

Article Info

Abstract