Garuda - Garba Rujukan Digital

Scientific Journal of Informatics

Vol. 10 No. 1 (2025): Jurnal Ilmiah Informatika

Maulana Nur Rokhim (Universitas Islam Negeri Mulana Malik Ibrahim)
Muhammad Akmaluddin Az Zamrudi (UNIVERSITAS ISLAM NEGERI MAULANA MALIK IBRAHIM MALANG)
Muhammad Ainul Yaqin (UNIVERSITAS ISLAM NEGERI MAULANA MALIK IBRAHIM MALANG)

Publish Date
29 Jun 2025

Generating effective user stories is essential yet time-consuming in software development, especially in large scale Agile projects. This study evaluates the performance of three Large Language Models (LLMs): ChatGPT-4.0, DeepSeek, and Gemini 2.5 in generating user stories automatically. The objective is to compare their accuracy and precision to determine the most suitable model for automating requirements documentation. Using seven test prompts from various industry domains, each model generated user stories evaluated with BLEU-4, ROUGE-L F1, and METEOR metrics. Results show that while all models produced structurally valid outputs, Gemini 2.5 achieved the highest average scores (0.386), surpassing DeepSeek (0.355) and ChatGPT (0.348). Gemini 2.5 demonstrated superior consistency, clarity, and semantic completeness. This research contributes a performance benchmark for LLMs in software requirement generation and highlights the practical benefits of LLM-based automation over manual methods, including speed, consistency, and adaptability. Gemini 2.5 is recommended as the optimal model for generating user stories in software engineering contexts.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Scientific Journal of Informatics

Website

Abbrev

JIMI

Publisher

Universitas Ibrahimy

Subject

Computer Science & IT

Description

Topics cover the following areas (but are not limited to): 1. Information Technology (IT) a. Software engineering b. Game c. Information Retrieval d. Computer network e. Telecommunication f. Internet g. Wireless technology h. Network security i. Multimedia technology j. Mobile Computing k. ...

Article Info

Abstract

Evaluasi Akurasi dan Presisi Large Language Model (LLM) dalam Generasi User Story untuk Perangkat Lunak

Article Info

Abstract