ISTEK
Vol. 14 No. 2 (2025)

Transforming Story Ideas from Images to Text Using Convolutional Neural Networks (CNN) and Generative Pre-trained Transformer (GPT-2)

Rizqullah, Moh Hasbi (Unknown)
Nurlatifah, Eva (Unknown)
Budiawan Zulfikar, Wildan (Unknown)



Article Info

Publish Date
29 Dec 2025

Abstract

The gap between rich visual inspiration and the challenge of creative articulation (writer’s block) remains a major obstacle in the writing process. This study aims to bridge this gap by designing a two-stage artificial intelligence system based on deep learning to provide automated narrative stimuli. The proposed method implements a custom Convolutional Neural Network (CNN) architecture to detect seven classes of natural objects from 4,362 images. The detected objects are then used as prompts for a fine-tuned Generative Pre-trained Transformer (GPT-2) model to generate poetic narratives. Experimental results indicate that the CNN module achieved a peak classification accuracy of 61.96%. Confusion matrix analysis reveals that this limitation is not caused by overfitting, but rather by high inter-class visual ambiguity. Although the GPT-2 module is capable of generating narratives with a BERT Score F1 of up to 0.6455, the primary finding of this study is that the overall narrative quality is highly dependent on the accuracy of the CNN output, which acts as a critical bottleneck in the system.

Copyrights © 2025






Journal Info

Abbrev

istek

Publisher

Subject

Agriculture, Biological Sciences & Forestry Biochemistry, Genetics & Molecular Biology Chemical Engineering, Chemistry & Bioengineering Computer Science & IT Electrical & Electronics Engineering

Description

ISTEK is a scientific journal that publishes research findings and theoretical studies focusing on the interaction between science, technology, and Islamic values. The journal aims to provide a platform for academics, researchers, and practitioners to share their discoveries and innovations that ...