Claim Missing Document
Check
Articles

Found 2 Documents
Search

Evaluating Text Quality of GPT Engine Davinci-003 and GPT Engine Davinci Generation Using BLEU Score Heryanto, Yayan; Triayudi, Agung
SAGA: Journal of Technology and Information System Vol. 1 No. 4 (2023): November 2023
Publisher : CV. Media Digital Publikasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58905/saga.v1i4.213

Abstract

The improvement of text generation based on language models has witnessed significant progress in the field of natural language processing with the use of Transformer-based language models, such as GPT (Generative Pre-trained Transformer). In this study, we conduct an evaluation of text quality using the BLEU (Bilingual Evaluation Understudy) score for two prominent GPT engines: Davinci-003 and Davinci. We generated questions and answers related to Python from internet sources as input data. The BLEU score comparison revealed that Davinci-003 achieved a higher score of 0.035, while Davinci attained a score of 0.021. Additionally, for the response times, with Davinci demonstrating an average response time of 4.20 seconds, while Davinci-003 exhibited a slightly longer average response time of 6.59 seconds. The decision of whether to use Davinci-003 or Davinci for chatbot development should be made based on the specific project requirements. If prioritizing text quality is paramount, Davinci-003 emerges as the superior choice due to its higher BLEU score. However, if faster response times are of greater importance, Davinci may be the more suitable option. Ultimately, the selection should align with the unique needs and objectives of the chatbot development project.
Evaluasi Responsivitas dan Akurasi: Perbandingan Kinerja ChatGPT dan Google BARD dalam Menjawab Pertanyaan seputar Python Heryanto, Yayan; Fauziah, F; Farahdinna, Frenda; Wijanarko, Sigit
Jurasik (Jurnal Riset Sistem Informasi dan Teknik Informatika) Vol 9, No 1 (2024): Edisi Februari
Publisher : STIKOM Tunas Bangsa Pematangsiantar

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30645/jurasik.v9i1.731

Abstract

This reseach aims to evaluate the responsiveness and accuracy of two natural language processing systems, namely ChatGPT and Google BARD, in answering questions related to the Python programming language. The evaluation is conducted using the Bleu Score metric as an indicator of the accuracy of answers generated by both systems. This research involves experiments with various Python-related questions to measure the level of alignment with expected reference answers. The results indicate that the average Bleu Score for ChatGPT is 0.0088, while the average Bleu Score for Google BARD is 0.0073. Additionally, the response time for ChatGPT is recorded at 12.05 seconds, whereas Google BARD has a response time of 18.38 seconds. Although there is a small difference in accuracy, ChatGPT shows a slightly higher Bleu Score and faster response time compared to Google BARD. The conclusion of this research states that, in the context of answering questions related to the Python programming language, ChatGPT performs slightly better than Google BARD, measured in terms of answer accuracy and response time.