Claim Missing Document
Check
Articles

Found 28 Documents
Search

Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter: Peningkatan Kinerja Model Klasifikasi dengan Pembelajaran Aktif dalam Mendeteksi Ujaran Kebencian di Twitter Muhammad Ilham Abidin; Khairil Anwar Notodiputro; Bagus Sartono
Indonesian Journal of Statistics and Applications Vol 5 No 1 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i1p26-38

Abstract

Efforts from the police to address hate speech on social media such as Twitter will not be sufficient to rely solely on manual checks. Therefore, it is necessary to use statistical modelling like the classification model to detect hate speech automatically. Classification is a type of predictive modelling to produce accurate predictions based on labelled data. Generally, the available data are usually unlabelled implying that the labelling process needs to be done beforehand. Data labelling is time consuming, high cost, and often fails to produce correct labels. This research aims to improve the performances of classification models by adding a small amount of data through the so called active learning method. The results showed that there was no significant difference in the performances of logistic regression and naïve bayes classification models in detecting hate speech. However, the results also showed that adding data through the active learning method substantially improved the logistics regression performance in detecting hate speech when compared to data addition based on a simple random sampling method. Therefore, the performances of classification models in detecting hate speech on Twitter could be improved by using an active learning method.
Determinant Factors of Working Children based on Conditional Logistics Regression for Matched Pairs Data: Determinan Anak Bekerja Berdasarkan Model Regresi Logistik Bersyarat untuk Data Berpasangan Rizky Zulkarnain; Tri Listianingrum; Khairil Anwar Notodiputro
Indonesian Journal of Statistics and Applications Vol 5 No 1 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i1p161-172

Abstract

Working children may create problem since it relates to human right as well as to the development of children especially in getting sufficient education. This paper discusses determinant factors of working children by using conditional logistics regression for matched pairs data. Matching is employed to adjust confounding factors and to avoid bias. In this paper there are three confounding factors that have been considered, i.e. residential area, gender, and income of household head. The results showed that the conditional regression model outperformed the standard regression model. The number of household members, whether the head of household was married or single, age of the head of household, educational attainment of the head of household, as well as the work status of the head of household were the determinant factors of the working children.
A Conditional Logistic Regression Model for Analyzing Unemployment Rates in West Java: Model Regresi Logistik Bersyarat untuk Analisis Tingkat Pengangguran di Provinsi Jawa Barat Dwi Jayanti; Septian P Palupi; Khairil Anwar Notodiputro
Indonesian Journal of Statistics and Applications Vol 5 No 1 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i1p195-204

Abstract

Unemployment is a critical problem faced by developing countries.  It is a complex problem which creates other social and economic problems such as poverty, economic gaps, and crimes. This paper discusses the determinant factors of unemployment rates based on empirical data using the conditional logistic regression model.  The model was used to analyze matched pair data using gender, age and residence as matching factors.  The result showed that household status, marriage status, as well as levels of education were the determinant factors of a person being unemployed in West Java.  It is also shown that the conditional logistic regression outperformed the standard logistic regression for analyzing the cause of unemployment.
Comparison of GMERF and GLMM Tree Models on Poverty Household Data with Imbalanced Categories Bukhari, Ari Shobri; Notodiputro, Khairil Anwar; Indahwati, Indahwati; Fitrianto, Anwar
Inferensi Vol 8, No 2 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i2.21901

Abstract

Decision tree and forest methods have become popular approaches in data science and continue to evolve. One of these developments is the combination of decision trees with Generalized Linear Mixed Models (GLMM), resulting in the GLMM Tree, which is applicable to multilevel and longitudinal data. Another model, Generalized Mixed Effect Random Forest (GMERF), extends the concept of decision forests with GLMM, effectively handling complex data structures with non-linear interactions. This study compares the performance of GLMM Tree and GMERF models in classifying poor households in South Sulawesi Province, characterized by imbalanced categories. GLMM Tree provides a simple, interpretable classification through tree diagrams, while GMERF highlights variable importance. Initial tests show all three models (GLMM, GLMM Tree, and GMERF) achieve high accuracy and specificity but exhibit low sensitivity. By applying oversampling, sensitivity and AUC are significantly improved, though this is accompanied by a decline in accuracy and specificity, revealing a trade-off. The study concludes that while GLMM, GLMM Tree and GMERF have their strengths, using them together offers a more comprehensive understanding of poverty classification. Handling imbalanced data with oversampling is effective in increasing sensitivity, but careful consideration is needed due to its impact on overall accuracy.
Choosing the Right Tool: Practical Considerations for GLMM and GEE in Longitudinal Studies, with a Focus on Data Challenges Sihombing, Pardomuan Robinson; Erfiani, Erfiani; Notodiputro, Khairil Anwar; Kurnia, Anang
ZERO: Jurnal Sains, Matematika dan Terapan Vol 9, No 1 (2025): Zero: Jurnal Sains Matematika dan Terapan
Publisher : UIN Sumatera Utara

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30829/zero.v9i1.24602

Abstract

The proposed research systematically reviews the comparative issues between GLMM and GEE for longitudinal data. The review discusses the competing arguments regarding the practical strengths and weaknesses of the two arrests. Empirical evidence demonstrates that GLMM generally provides subject-specific estimates and performs better than GEE in hierarchical and individual variance. In contrast, GEE provides resilient population-level findings, which are crucial for policy. The choice of method depends on the data structure and scope of inference. GLMM is consistently better when characterizing individuals, for example, in studies where we assume random effects are drawn from a complex distribution. GEEs shine most brightly in large datasets, obtaining robust population-level estimates even when the working correlation is misspecified. Finally, the results provide hands-on recommendations for researchers from various domains who apply statistical models to longitudinal studies to select solid, context-fitting statistical models for long-term studies.
MULTILEVEL REGRESSIONS FOR MODELING MEAN SCORES OF NATIONAL EXAMINATIONS Nurfadilah, Khalilah; Aidi, Muhammad Nur; Notodiputro, Khairil A.; Susetyo, Budi
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 18 No 1 (2024): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol18iss1pp0323-0332

Abstract

National Exam known as UN score is the final evaluation to determine the achievement of national graduate competency standards in the school. The determinants of the achievement of the standards can’t be separated from the role of schools and local governments in which this regard is known as nested. In the field of statistics, this phenomenon can be described with a multilevel model, where level-1 is the school while level-2 is the district where the school is located. Several multilevel models are used to describe the phenomenon, the result shows that the two-level regression model without interaction is selected as the best model and the variables which affect the UN average scores significantly at level-1 are school status , the ratio between laboratories and students , while the variable at level-2 is expenditure per capita of district/city . From this study, that educational institutions' steps in achieving a graduation standard can be right on the target.
Stacking Ensemble RNN-LSTM Models for Forecasting the IDR/USD Exchange Rate with Nonlinear Volatility Pratiwi, Windy Ayu; Sumertajaya , I Made; Notodiputro , Khairil Anwar
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 4 (2025): JUTIF Volume 6, Number 4, Agustus 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.4.5057

Abstract

Abstract - Predicting exchange rates with high volatility and nonlinear patterns presents a critical challenge in financial analysis. Deep learning models such as RNN and LSTM are widely used for their ability to capture temporal dependencies, yet each has limitations when applied individually. This study aims to enhance the prediction accuracy of the Indonesian Rupiah (IDR) to US Dollar (USD) exchange rate by implementing a stacking ensemble approach that combines RNN and LSTM models. The dataset consists of 522 weekly observations from January 2015 to December 2024, sourced from the official website of Bank Indonesia (bi.go.id). In the proposed framework, RNN and LSTM serve as base learners, while linear regression acts as the meta-learner. Model performance is evaluated using RMSE, MAPE, and MSE. The results indicate that the stacking ensemble consistently outperforms the individual models, achieving an RMSE of 117.91, a MAPE of 0.01, and an MSE of 13,901.67. The model effectively captures historical patterns and delivers stable and accurate predictions. In conclusion, the stacking ensemble approach developed in this study contributes to the advancement of ensemble learning techniques in computer science and offers practical value for financial decision-makers, particularly in managing complex and dynamic exchange rate scenarios.
Comparison of ARIMA, LSTM, and Ensemble Averaging Models for Short-Term and Long- Term Forecasting of Non-Stationary Time Series Data Pratiwi, Windy Ayu; Sumertajaya, I Made; Notodiputro, Khairil Anwar
Inferensi Vol 8, No 3 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i3.22643

Abstract

This study aims to forecast the highest weekly selling rate of the Indonesian Rupiah (IDR) against the US Dollar (USD) and identify the most accurate model among ARIMA, LSTM, and Ensemble Averaging. The evaluation results indicate that ARIMA achieves an accuracy of 97.75%, demonstrating strong performance in short-term forecasting, while LSTM achieves 99.98% accuracy, excelling in capturing complex and dynamic patterns in long-term predictions. The Ensemble Averaging approach attains the highest accuracy of 99.99%, proving to be the optimal solution by combining ARIMA’s stability with LSTM’s adaptability, resulting in more precise and stable predictions. The findings of this study highlight that the ensemble approach is more effective than individual models, as it balances accuracy and prediction stability across various forecasting scenarios. This method serves as a reliable tool for addressing market volatility and contributes significantly to the advancement of financial and economic forecasting techniques that are more adaptive and accurate.