Garuda - Garba Rujukan Digital

Jurnal Bumigora Information Technology (BITe)

Vol. 7 No. 1 (2025)

Darmawan, Irwan (Unknown)
Ramadhani, Nilam (Unknown)
Nazir Arifin, Mohammad (Unknown)
-, Ubaidi (Unknown)
Puspa Dewi, Nindian (Unknown)
Innuddin, Muhammad (Unknown)

Publish Date
30 Jun 2025

Background: This study investigates the use of the Proximal Policy Optimization (PPO) algorithm in two text-based case studies: alignment of large language models (LLMs) with human preferences and dynamic pricing based on customer reviews. In the LLM case, PPO combined with preference-based learning significantly improves alignment, BLEU, and human-likeness scores.Objective: This research aims to evaluate PPO’s effectiveness in text-based decision-making through these two cases.Methods: The method employed is reinforcement learning experimentation using the PPO approach. For the LLM case, PPO is integrated with preference learning to enhance alignment, BLEU, and human-like output. Meanwhile, in the economic scenario, PPO produces adaptive pricing strategies with high accuracy or low Mean Absolute Error (MAE) and the best cumulative rewards, outperforming the A3C and DDPG algorithms. Cross-validation and ablation studies assessed PPO’s generalization capability and the contribution of reward components, clipping, and exploration strategies.Result: The findings demonstrate that PPO excels across distinct domains and offers a stable and efficient solution for text-based tasks.Conclusion: The findings confirm its flexibility for various NLP applications and intelligent decision-making systems

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal Bumigora Information Technology (BITe)

Website

Abbrev

bite

Publisher

Universitas Bumigora

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering

Description

Jurnal Bumigora Information Technology (BITe) is one of the journals owned at Bumigora University which is managed by the Department of Computer Science. This journal is intended to provide publications for academics, researchers and practitioners who wish to publish research in the field of ...

Article Info

Abstract

Optimalisasi Model Bahasa dan Sistem Ekonomi Berbasis Teks dengan Proximal Policy Optimization: Studi Kasus dalam NLP Modern

Article Info

Abstract