Claim Missing Document
Check
Articles

Found 31 Documents
Search

METODE ANALISIS DISKRIMINAN KUADRAT TERKECIL PARSIAL UNTUK KLASIFIKASI SEGMEN LOYALITAS KONSUMEN SUSU PERTUMBUHAN Herdina Kuswari; Farit Mochamad Afendi; Khairil Anwar Notodiputro
Indonesian Journal of Statistics and Applications Vol 4 No 2 (2020)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v4i2.586

Abstract

Consumer segmentation is the process of dividing consumers into different segments based on consumer characteristics, making it easier for companies to develop marketing strategies. The segmentation is carried out based on consumer loyalty using the RFM (Recency, Frequency, Monetary) approach a number of 7753 members of a nutritional product loyalty program is considered in the analysis. Partial least square discriminant analysis classification modeling is built using the results of consumer segmentation being the a response variable. The model is not good enough based on the AUC (Area Under Curve) value of the ROC (Relative Operating Characteristic) curve that quite low for each segment. The explanatory variables that have high contribution to the model is X5, X9, and X2 with VIP (Variable Importance in the Projection) values more than 1.
COMPARISON OF K-MEANS CLUSTERING METHOD AND K-MEDOIDS ON TWITTER DATA Cahyani Oktarina; Khairil Anwar Notodiputro; Indahwati Indahwati
Indonesian Journal of Statistics and Applications Vol 4 No 1 (2020)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v4i1.599

Abstract

The presidential election is one of the political events that occur in Indonesia once in five years. Public satisfaction and dissatisfaction with political issues have led to an increase in the number of political opinion tweets. The purpose of this study is to examine the performance of the k-means and k-medoids method in the Twitter data and to tweet about the presidential election in 2019. The data used in this study are primary data taken from Muhyi's research, then mining the text against data obtained. Because this data has been processed by Muhyi to analyze the electability of the 2019 presidential candidate pairs, for this journal needs a preprocessing was carried out to analyze the tendency of tweets to side with the candidate pairs of one or two. The difference in the pre-processing of this research with previous research is that there is a cleaning of duplicate data and normalizing. The results of this study indicate that the optimal number of clusters resulting from the k-means method and the k-medoid method are different.
Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter: Peningkatan Kinerja Model Klasifikasi dengan Pembelajaran Aktif dalam Mendeteksi Ujaran Kebencian di Twitter Muhammad Ilham Abidin; Khairil Anwar Notodiputro; Bagus Sartono
Indonesian Journal of Statistics and Applications Vol 5 No 1 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i1p26-38

Abstract

Efforts from the police to address hate speech on social media such as Twitter will not be sufficient to rely solely on manual checks. Therefore, it is necessary to use statistical modelling like the classification model to detect hate speech automatically. Classification is a type of predictive modelling to produce accurate predictions based on labelled data. Generally, the available data are usually unlabelled implying that the labelling process needs to be done beforehand. Data labelling is time consuming, high cost, and often fails to produce correct labels. This research aims to improve the performances of classification models by adding a small amount of data through the so called active learning method. The results showed that there was no significant difference in the performances of logistic regression and naïve bayes classification models in detecting hate speech. However, the results also showed that adding data through the active learning method substantially improved the logistics regression performance in detecting hate speech when compared to data addition based on a simple random sampling method. Therefore, the performances of classification models in detecting hate speech on Twitter could be improved by using an active learning method.
Determinant Factors of Working Children based on Conditional Logistics Regression for Matched Pairs Data: Determinan Anak Bekerja Berdasarkan Model Regresi Logistik Bersyarat untuk Data Berpasangan Rizky Zulkarnain; Tri Listianingrum; Khairil Anwar Notodiputro
Indonesian Journal of Statistics and Applications Vol 5 No 1 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i1p161-172

Abstract

Working children may create problem since it relates to human right as well as to the development of children especially in getting sufficient education. This paper discusses determinant factors of working children by using conditional logistics regression for matched pairs data. Matching is employed to adjust confounding factors and to avoid bias. In this paper there are three confounding factors that have been considered, i.e. residential area, gender, and income of household head. The results showed that the conditional regression model outperformed the standard regression model. The number of household members, whether the head of household was married or single, age of the head of household, educational attainment of the head of household, as well as the work status of the head of household were the determinant factors of the working children.
A Conditional Logistic Regression Model for Analyzing Unemployment Rates in West Java: Model Regresi Logistik Bersyarat untuk Analisis Tingkat Pengangguran di Provinsi Jawa Barat Dwi Jayanti; Septian P Palupi; Khairil Anwar Notodiputro
Indonesian Journal of Statistics and Applications Vol 5 No 1 (2021)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v5i1p195-204

Abstract

Unemployment is a critical problem faced by developing countries.  It is a complex problem which creates other social and economic problems such as poverty, economic gaps, and crimes. This paper discusses the determinant factors of unemployment rates based on empirical data using the conditional logistic regression model.  The model was used to analyze matched pair data using gender, age and residence as matching factors.  The result showed that household status, marriage status, as well as levels of education were the determinant factors of a person being unemployed in West Java.  It is also shown that the conditional logistic regression outperformed the standard logistic regression for analyzing the cause of unemployment.
Analyzing Low Birthweight in Java Based on Logistic Regression Model for Matched Pair Data: Analisis Berat Badan Lahir Rendah di Pulau Jawa Berdasarkan Model Regresi Logistik untuk Data Berpadanan Christiana Anggraeni Putri; Rini Irfani; Khairil Anwar Notodiputro
Indonesian Journal of Statistics and Applications Vol 7 No 2 (2023)
Publisher : Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v7i2p75-85

Abstract

Low birthweight is one of the leading causes of neonatal death. Generally, the study of low birth weight is done by modeling logistic regression without considering the influence of confounding variables that can deviate the actual relationship between the explanatory variables and the response. This paper aims to identify low birth weight determinants in Java based on the logistic regression model for conditional study design, in which the analysis is based on matching the education level of the mother with one control. The results of the analysis showed that matched logistic regression can be used to correct bias due to the influence of a confounding variable. It reveals that based on the results of modeling, the frequency of pregnancy examinations and the parity of children are significantly affect the risk of low birth weight in Java Island.
Comparison between Statistical Approaches and Data Mining Algorithms for Outlier Detection Annisa Putri Utami; Anwar Fitrianto; Khairil Anwar Notodiputro
CAUCHY: Jurnal Matematika Murni dan Aplikasi Vol 9, No 1 (2024): CAUCHY: JURNAL MATEMATIKA MURNI DAN APLIKASI
Publisher : Mathematics Department, Universitas Islam Negeri Maulana Malik Ibrahim Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.18860/ca.v9i1.25450

Abstract

Outliers are observation values that are very different from most observations. The presence of outliers in data can have a negative impact on research but can contain important information for other research. So, identifying outliers before conducting data analysis is a crucial thing to do. Outlier detection methods/techniques were first pioneered by researchers in statistics. However, due to rapid technological advances which have an impact on the ease of collecting extensive data, the development of outlier detection techniques is now handled mainly by researchers in the field of computer science (data mining) using computing facilities. This research aims to examine the results of simulation studies by comparing methods for identifying several outliers using statistical approaches and data mining algorithm approaches in various predetermined data scenarios. Based on the scenario carried out, the outlier detection method using a statistical approach is generally better than the outlier detection method using a data mining-based approach. Suggestions for further research are to improve the data mining method by focusing more on statistical analysis apart from focusing on data processing computing time so that the expected results of outlier detection are faster and more precise.
Comparison of GMERF and GLMM Tree Models on Poverty Household Data with Imbalanced Categories Bukhari, Ari Shobri; Notodiputro, Khairil Anwar; Indahwati, Indahwati; Fitrianto, Anwar
Inferensi Vol 8, No 2 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i2.21901

Abstract

Decision tree and forest methods have become popular approaches in data science and continue to evolve. One of these developments is the combination of decision trees with Generalized Linear Mixed Models (GLMM), resulting in the GLMM Tree, which is applicable to multilevel and longitudinal data. Another model, Generalized Mixed Effect Random Forest (GMERF), extends the concept of decision forests with GLMM, effectively handling complex data structures with non-linear interactions. This study compares the performance of GLMM Tree and GMERF models in classifying poor households in South Sulawesi Province, characterized by imbalanced categories. GLMM Tree provides a simple, interpretable classification through tree diagrams, while GMERF highlights variable importance. Initial tests show all three models (GLMM, GLMM Tree, and GMERF) achieve high accuracy and specificity but exhibit low sensitivity. By applying oversampling, sensitivity and AUC are significantly improved, though this is accompanied by a decline in accuracy and specificity, revealing a trade-off. The study concludes that while GLMM, GLMM Tree and GMERF have their strengths, using them together offers a more comprehensive understanding of poverty classification. Handling imbalanced data with oversampling is effective in increasing sensitivity, but careful consideration is needed due to its impact on overall accuracy.
Choosing the Right Tool: Practical Considerations for GLMM and GEE in Longitudinal Studies, with a Focus on Data Challenges Sihombing, Pardomuan Robinson; Erfiani, Erfiani; Notodiputro, Khairil Anwar; Kurnia, Anang
ZERO: Jurnal Sains, Matematika dan Terapan Vol 9, No 1 (2025): Zero: Jurnal Sains Matematika dan Terapan
Publisher : UIN Sumatera Utara

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30829/zero.v9i1.24602

Abstract

The proposed research systematically reviews the comparative issues between GLMM and GEE for longitudinal data. The review discusses the competing arguments regarding the practical strengths and weaknesses of the two arrests. Empirical evidence demonstrates that GLMM generally provides subject-specific estimates and performs better than GEE in hierarchical and individual variance. In contrast, GEE provides resilient population-level findings, which are crucial for policy. The choice of method depends on the data structure and scope of inference. GLMM is consistently better when characterizing individuals, for example, in studies where we assume random effects are drawn from a complex distribution. GEEs shine most brightly in large datasets, obtaining robust population-level estimates even when the working correlation is misspecified. Finally, the results provide hands-on recommendations for researchers from various domains who apply statistical models to longitudinal studies to select solid, context-fitting statistical models for long-term studies.
MULTILEVEL REGRESSIONS FOR MODELING MEAN SCORES OF NATIONAL EXAMINATIONS Nurfadilah, Khalilah; Aidi, Muhammad Nur; Notodiputro, Khairil A.; Susetyo, Budi
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 18 No 1 (2024): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol18iss1pp0323-0332

Abstract

National Exam known as UN score is the final evaluation to determine the achievement of national graduate competency standards in the school. The determinants of the achievement of the standards can’t be separated from the role of schools and local governments in which this regard is known as nested. In the field of statistics, this phenomenon can be described with a multilevel model, where level-1 is the school while level-2 is the district where the school is located. Several multilevel models are used to describe the phenomenon, the result shows that the two-level regression model without interaction is selected as the best model and the variables which affect the UN average scores significantly at level-1 are school status , the ratio between laboratories and students , while the variable at level-2 is expenditure per capita of district/city . From this study, that educational institutions' steps in achieving a graduation standard can be right on the target.