Anwar Fitrianto
Statistics and Data Science, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

EFFECTIVENESS OF DIMENSIONALITY REDUCTION METHODS ON DATA WITH NON-LINEAR RELATIONSHIPS Lukmanul Hakim; Asep Saefuddin; Kusman Sadik; Anwar Fitrianto; Bagus Sartono
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 20 No 3 (2026): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol20iss3pp2507-2522

Abstract

The phenomenon of big data presents distinct challenges in the analysis process, especially when the data contains a very large number of variables. High complexity, potential redundancy, and the risk of overfitting are major issues that must be addressed through dimensionality reduction techniques. Principal Component Analysis (PCA) is a common method effective for data with linear relationships but has limitations in identifying nonlinear patterns. This research aims to improve performance of classification by introducing autoencoder for dealing with nonlinear relationship, data noise, missing values, outliers, and data with various scales. This study employs a quantitative approach through analysis of simulated and empirical data in the form of the Village Development Index from the Central Statistics Agency, which contains variables with various measurement scales. Both dimensionality reduction methods—PCA and neural network-based autoencoders—are tested across various data scenarios. The evaluation is conducted based on their effectiveness in preserving data structure, as well as the Mean Squared Error (MSE) values in the reconstruction process. The results indicate that PCA excels in computational efficiency and accuracy for data with linear relationships. In contrast, the autoencoder demonstrates superior performance in detecting nonlinear patterns, achieving lower Mean Squared Error (MSE) values with stable MSE standard deviations. Additionally, the autoencoder proves to be more robust in handling missing values and outliers compared to PCA. The selection of dimensionality reduction methods highly depends on the characteristics of the analyzed data. Autoencoders represent a superior alternative for handling complex and nonlinear data, although they require model parameter tuning. Further research is recommended to explore the influence of network architecture and training strategies of autoencoders on dimensionality reduction performance.
EVALUATING ROBERTA AND GPT-BASED MODELS FOR SDG MULTICLASS TEXT CLASSIFICATION ACROSS DIFFERENT DOCUMENT LENGTHS Uswatun Hasanah; Agus Mohamad Soleh; Cici Suhaeni; Anwar Fitrianto
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 20 No 3 (2026): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol20iss3pp2645-2664

Abstract

Multiclass text classification remains a difficult task, primarily due to semantic ambiguity and differences in input length. This study evaluates RoBERTa and GPT-based models for multiclass text classification, focusing on how prompting strategies and document length affect accuracy and robustness. Experiments were conducted using the OSDG Community Dataset, which contains approximately 15,000 labeled samples. The dataset was partitioned into four subsets based on input length: short, medium, long, and all combined. Three GPT variants (zero-shot, few-shot, and fine-tuned) were compared against a RoBERTa baseline. Fine-tuning was implemented via OpenAI’s supervised API with prompt-response formatting. Performance was assessed through F1-score, precision, recall, and balanced accuracy. Fine-tuned GPT achieved the strongest results in all settings, with a macro F1-score of 0.9204 on the all-combined dataset, representing a 4.61% improvement over RoBERTa. Consistent gains were also observed across short (8.63%), medium (3.83%), and long (20.31%) texts. The largest improvement occurred on long documents, while medium-length inputs provided the most stable performance across models. These findings highlight the effectiveness of task-specific fine-tuning in enhancing GPT’s capability to classify SDG-related texts across diverse input lengths.