Garuda - Garba Rujukan Digital

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol 14, No 6: December 2025

Darwiyanto, Eko (Unknown)
Gusnaen, Rizky Akbar (Unknown)
Nurtantyana, Rio (Unknown)

Publish Date
01 Dec 2025

Automatic code repair is an important task in software development to reduce bugs efficiently. This research focuses on developing and evaluating a chain-of-thought (CoT) prompting approach to improve the ability of large language models (LLMs) in automated program repair (APR) tasks. CoT prompting is a technique that guides LLM to generate step-by-step explanations before providing the final answer, so it is expected to improve the accuracy and quality of code repair. This research uses the QuixBugs dataset to evaluate the performance of several LLM models, including DeepSeek-V3 and GPT-4o, with two prompting methods, namely standard and CoT prompting. The evaluation is based on the average number of plausible patches generated as well as the estimated token usage cost. The results show that CoT prompting improves performance in most models compared with the standard. DeepSeek-V3 recorded the highest performance with an average of 36.6 plausible patches and the lowest cost of $0.006. GPT-4o also showed competitive results with an average of 35.8 plausible patches and a cost of $0.226. These results confirm that CoT prompting is an effective technique to improve LLM reasoning ability in APR tasks.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

IAES International Journal of Artificial Intelligence (IJ-AI)

Website

Abbrev

IJAI

Publisher

Institute of Advanced Engineering and Science

Subject

Computer Science & IT Engineering

Description

IAES International Journal of Artificial Intelligence (IJ-AI) publishes articles in the field of artificial intelligence (AI). The scope covers all artificial intelligence area and its application in the following topics: neural networks; fuzzy logic; simulated biological evolution algorithms (like ...

Article Info

Abstract

A comparative study of large language models with chain-of thought prompting for automated program repair

Article Info

Abstract