The limited quality datasets is a fundamental challenge in developing automatic classification of business description into the Indonesia Standard Industrial Classification (KBLI) using machine learning models. This research aims to develop a synthetic KBLI dataset using Generative AI via ChatGPT chatbot with a one-shot prompting technique. This technique is employed to generate business descriptions based on five-digit KBLI codes in order to address the limitations of labeled data and the variability of existing business descriptions. The dataset generated through prompt engineering and manual validation shows that 93,25% of the business descriptions align with the established KBLI standards. The average number of business descriptions per category demonstrates a fairly uniform distribution, ensuring sufficient representation for each five-digit code. This research makes a significant contribution in providing a dataset for training machine learning models in the automatic classification of business descriptions into the five-digit KBLI categories.
Copyrights © 2025