Fahira, Fani
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Exploring a Large Language Model on the ChatGPT Platform for Indonesian Text Preprocessing Tasks Suhaeni, Cici; Kamila, Sabrina Adnin; Fahira, Fani; Yusran, Muhammad; Alfa Dito, Gerry
Indonesian Journal of Statistics and Applications Vol 9 No 1 (2025)
Publisher : Statistics and Data Science Program Study, IPB University, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v9i1p100-116

Abstract

Preprocessing is a crucial step in Natural Language Processing, especially for informal languages like Indonesian, which contain complex morphology, slang, abbreviations, and non-standard expressions. Traditional rule-based tools such as regex, IndoNLP, and Sastrawi are commonly used but often fall short in handling noisy, user-generated text. This study explores the capability of Large Language Model, particularly ChatGPT-o3, in performing Indonesian text preprocessing tasks, namely text cleaning, normalization, stopword removal, and stemming/lemmatization, and compares it to conventional rule-based approaches. Using two types of datasets, consisting of a small example dataset of five manually constructed sentences and a real-world dataset of 100 tweets about the Indonesian “Makan Bergizi Gratis” program, both preprocessing methods were applied and evaluated. Results show that ChatGPT-o3 performs equally well in text cleaning and significantly better in normalization. However, rule-based methods like IndoNLP and Sastrawi still outperform ChatGPT-o3 in stopword removal and stemming. These findings indicate that while ChatGPT-o3 demonstrates strong contextual understanding and linguistic flexibility, they may underperform in rigid, token-based operations without fine-tuning. This study provides initial insights into using Large Language Models as an alternative preprocessing engine for Indonesian text and highlights the need for hybrid approaches or improved prompt design in future applications.
Identification of Latent Dimensions of Digital Readiness and Typology of Districts/Cities in Indonesia Using PCA and K-Means Clustering Sari, Jefita Resti; Fahira, Fani; Zahra, Latifah; Fitrianto, Anwar; Alifviansyah, Kevin
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11487

Abstract

Digital transformation is a key agenda in Indonesia’s national development that requires balanced readiness across regions. However, the level of digital readiness among districts and cities still varies widely, highlighting the need for a typology that can comprehensively describe existing disparities. This study aims to identify the latent dimensions of digital readiness and to develop a regional typology of Indonesian districts/cities using Principal Component Analysis (PCA) and K-Means clustering. The data were obtained from the 2024 Indonesian Digital Society Index (IMDI), which consists of four pillars—Infrastructure and Ecosystem, Digital Skills, Empowerment, and Employment—with ten sub-pillars. PCA reduced these correlated indicators into two main latent components, namely Digital Capacity and Participation and Digital Infrastructure Foundation, which together explain 70.4% of the total variance. Cluster validation using the Silhouette Score and Davies–Bouldin Index (DBI) showed that K = 2 yielded the best internal validity (Silhouette = 0.402; DBI = 0.906), but a three-cluster configuration (K = 3) was adopted to obtain a more interpretable typology of high-, medium-, and low-readiness regions (Silhouette = 0.346; DBI = 1.007). Spatial mapping reveals that high-readiness districts are concentrated in Java, Bali, and parts of Sumatra, whereas low-readiness areas dominate eastern Indonesia. These findings confirm persistent digital inequality across regions and provide a quantitative basis for targeted policy interventions, including infrastructure development, digital literacy programs, and innovation ecosystem strengthening, to support an inclusive digital transformation in Indonesia.