Claim Missing Document
Check
Articles

Found 10 Documents
Search

Survival Analysis of Students Not Graduated on Time Using Cox Proportional Hazard Regression Method and Random Survival Forest Method Arib, Muhammad Arib Alwansyah
Journal of Statistics and Data Science Vol. 2 No. 1 (2023)
Publisher : UNIB Press

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33369/jsds.v2i1.24312

Abstract

Higher education is a place to educate the next generation of the nation in terms of academic and non-academic. Basically every college tries to maximize the graduation of its students, both in quantity and quality. The undergraduate education program is targeted to complete 8 semesters of study or can also be taken in less than 8 semesters and a maximum of 14 semesters. Many factors are thought to affect the length of student study, both internal and external factors. Based on the factors that are thought to affect the length of study of the student, it is necessary to conduct research to determine what factors have a significant effect on the length of study of the student. The method that can be used to determine these factors is survival analysis using cox proportional hazard regression and random survival forest. Factors that affect the length of study using cox proportional hazard regression is GPA, while by using the random survival forest method, the factors that influence the length of study of students are GPA, gender, and part time. Based on the comparison using the C-Index method, random survival forest is a suitable method to use in the data because the C-Index error value is 26.9% which is smaller than the cox proportional hazard which is 27.8%.
SURVIVAL ANALYSIS ON DATA OF STUDENTS NOT GRADUATING ON TIME USING WEIBULL REGRESSION, COX PROPORTIONAL HAZARDS REGRESSION, AND RANDOM SURVIVAL FOREST METHODS Rachmawati, Ramya; Afandi, Nur; Alwansyah, Muhammad Arib
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 19 No 3 (2025): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol19iss3pp2111-2126

Abstract

This article presents a comprehensive study of the factors that influence the length of study data of undergraduate students at FMIPA UNIB class 2018 and 2019. This study is essential because observations show that many students study for more than 8 semesters. The purpose of this study is to determine the factors that significantly influence the length of study of undergraduate students. These factors can be internal and external. Survival analysis is the right method to identify these factors because ordinary regression analysis is unable to estimate survival data. Therefore, methods such as Weibull regression, Cox Proportional Hazards regression, and Random Survival Forest are used. This study does not compare the methods used because these methods are independent of each other, but have the same goal, namely, to determine the factors that influence the length of study of students. The data used in this study are data on the length of study of students from the 2018 and 2019 cohorts sourced from the academic subsection of FMIPA UNIB, with variables of GPA, gender, region of origin, university entry route, parents' occupation, type of study program, and length of study. The results showed that GPA and the type of study program significantly influenced the length of study in Weibull regression analysis. In Cox proportional hazard regression, the GPA variable is an influential factor, while using the Random Survival Forest method, all factors significantly influenced the length of study, with their respective levels of importance.
The Disparity of Maternal and Neonatal Death Modeling in Sumatra Region Using Geographically Weighted Bivariate Negative Binomial Regression Bayubuana, Muhammad Gabdika Bayubuana; Nugroho, Sigit; Rini, Dyah Setyo; Alwansyah, Muhammad Arib
Journal of Statistics and Data Science Vol. 3 No. 2 (2024)
Publisher : UNIB Press

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33369/jsds.v3i2.41285

Abstract

The Sumatra region occupies the second highest rank in terms of Maternal Mortality Rate (MMR) and Neonatal Mortality Rate (NMR) in Indonesia in 2020. Many factors are thought to have influenced these two cases, both directly and indirectly. So it is necessary to do an analysis to find out what factors influence MMR and NMR. The methods that can be used to determine these factors are Bivariate Negative Binomial Regression (BNBR) and Geographically Weighted Bivariate Negative Binomial Regression (GWBNBR). The results of the analysis show that the Deviance Information Criterion (DIC) in GWBNBR is smaller than BNBR, so GWBNBR is better than BNBR in modeling MMR and NMR in the Sumatra Region in 2020.
A Panel Data Spatial Regression Approach for Modeling Poverty Data In Southern Sumatra Hidayati, Nurul; Karuna, Elisabeth Evelin; Alwansyah, Muhammad Arib
Journal of Statistics and Data Science Vol. 3 No. 2 (2024)
Publisher : UNIB Press

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33369/jsds.v3i2.41288

Abstract

This research examines the use of spatial panel data regression approach to model poverty data in the Southern Sumatra region. The main objective of the study is to model poverty in the Southern Sumatra region using spatial panel data regression. Panel data from districts/cities in South Sumatra, Jambi, Lampung, Bengkulu, and Bangka Belitung during the 2015-2021 period were used in the analysis. The spatial panel models used in this study are panel SAR regression and panel SEM. The results show that the spatial panel data approach is better at explaining variations in poverty levels compared to non-spatial models. A significant spatial spillover effect was found, where the poverty level of an area is influenced by the conditions of its neighboring areas. The results of the analysis show that the best model to use in modeling the Poverty Percentage data in the Southern Sumatra region is the Spatial Autoregressive Fixed Effect (SAR-FE) model based on the smallest AIC and BIC values. Factors such as average years of schooling and life expectancy are proven to have a significant influence on the percentage of poverty in the SAR Fixed Effect model.
MODELING THE MANY EARTHQUAKES IN SUMATRA USING POISSON HIDDEN MARKOV MODELS AND EXPECTATION MAXIMIZATION ALGORITHM Alwansyah, Muhammad Arib; Rachmawati, Ramya
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 18 No 1 (2024): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol18iss1pp0163-10135

Abstract

Sumatra Island is one of the islands that are prone to earthquakes because Sumatra Island is located at the confluence of three plates, namely the large Indo-Australian plate, the Eurasian plate and the Philippine plate. In general, the number of earthquake events follows the Poisson distribution, but there are cases where there is overdispersion in the Poisson distribution. The Poisson Hidden Markov Models (PHMMs) method is used to overcome overdispersion, then applying the Expectation-Maximization Algorithm (EM algorithm) to each model to obtain the estimated parameters. From the models obtained, the best model will be selected based on the smallest Akaike Information Criterion (AIC) value. The data used is secondary data on earthquake events on the island of Sumatra from January 2000 to December 2022 with a depth of ≤ 70 Km and a magnitude of ≥ 4.4 Mw. From the research, the model with m = 3 is the best estimation model with an AIC value of 1503,286. From the best model, estimates are obtained for Poisson Hidden Markov Models with an average occurrence of earthquakes of 5.7633 ≈ 6 events within one month.
Sentiment Analysis of Twitter User’s Perceptions of the Campus Merdeka Using Naïve Bayes Classifier and Support Vector Machine Methods Salsabilla, Intan; Alwansyah, Muhammad Arib; Nugroho, Sigit; Agwil, Winalia
Journal of Statistics and Data Science Vol. 2 No. 2 (2023)
Publisher : UNIB Press

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33369/jsds.v2i2.30577

Abstract

The Campus Merdeka program is being implemented by the government to realize autonomous and flexible learning in tertiary institutions to create a learning culture that is innovative, not restrictive, and the needs of students. The Campus Merdeka provides added value and is attractive and provides various responses from the public both directly and on different social media platforms. One of the social media platforms is Twitter. Therefore, research was conducted on the community's response to the Campus Merdeka program on Twitter social media. Twitter documents in the form of community response tweets to the Campus Merdeka program are classified into two categories, namely positive responses and negative responses. The method used in this study is the Naïve Bayes Classifier (NBC) and Support Vector Machine (SVM) with a Polynomial Degree 2 kernel. The highest level of accuracy resulting from this research is 73.5% with a parameter value of  of 0.5, a constant value  is 0.5, with training data of 309 documents for training data and 132 documents for test data. The accuracy results obtained for the Naïve Bayes Classifier method are 65.9% and for the Support Vector Machine method, an accuracy is 73.5%.
Handling Missing Data in Bivariate Gamma Generation Data Using the Random Forest Method Arib, Muhammad Arib Alwansyah; Ramya, Ramya Rachmawati
J-KOMA : Jurnal Ilmu Komputer dan Aplikasi Vol 8 No 02 (2025): J-KOMA : Jurnal Ilmu Komputer dan Aplikasi
Publisher : Universitas Negeri Jakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21009/JKOMA.082.02

Abstract

Missing data is a common problem in data analysis that can reduce the quality and accuracy of study results if not handled properly. This study aims to evaluate the performance of the Random Forest (RF) imputation method at various levels of missing value proportions, namely 5%, 10%, 15%, and 20%. The data used are Bivariate Gamma data of 200 observations with two variables, generated using RStudio software. Evaluation of imputation performance is carried out by considering the correlation value between the imputed data and the original data, the p-value as an indicator of the significance of the difference, and the error measures Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE).
Forest Fire Clustering in Indonesia Using the Clustering Large Applications (CLARA) Method Arib, Muhammad Arib Alwansyah; Ridya, Ridya Destriani; Sigit, Sigit Nugroho; Nurul, Nurul Hidayati
J-KOMA : Jurnal Ilmu Komputer dan Aplikasi Vol 8 No 02 (2025): J-KOMA : Jurnal Ilmu Komputer dan Aplikasi
Publisher : Universitas Negeri Jakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21009/JKOMA.082.03

Abstract

Clustering is a process of grouping, observing or grouping classes that have similar objects. One clustering method that handles large amounts of data is clustering large applications (CLARA). This research aims to identify groups of forest fires in Indonesia using the CLARA method and to determine the characteristics of forest fires and the locations of forest fire occurrence points in Indonesia. The data used is hot spot data totaling 3,265 events, which can be obtained from the NASA LANCE–FIRM MODIS Active Fire website. The variables used to group forest fire events are latitude, longitude, brightness, frp and confidence. So by grouping 3,265 hot spot data by determining the optimum cluster using the Shilhoutte index and Dunn index values, the optimum cluster results were obtained, namely 2 clusters
Bahasa Inggris Arib, Muhammad Arib Alwansyah; Viola, Viola Oktamelisa; Sigit, Sigit Nugroho; Etis, Etis Sunandi
J-KOMA : Jurnal Ilmu Komputer dan Aplikasi Vol 8 No 02 (2025): J-KOMA : Jurnal Ilmu Komputer dan Aplikasi
Publisher : Universitas Negeri Jakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21009/JKOMA.082.04

Abstract

Clustering is a data grouping method applied to identifies groups formed by combining elements that have the same characteristics. One of the clustering methods that can be used is the K-Medoids method known as Partitioning Around Medoids (PAM). This study aims to obtain grouping and determine the characteristics of the results of grouping regencies/cities in the Sumatra Region based on the percentage of poverty using the K-medoids cluster method. The data used are poverty data per district/city totaling 154 in the Sumatra Region with the variables used being the expected length of schooling, average length of schooling, open unemployment rate, and percentage of poor population. The results obtained in this study are that districts/cities in the Sumatra Region have 2 optimum clusters as seen from the silhouette index value and davies-bouldin index value
Pananganan Data Hilang pada Data Bangkitan Bivariate Gamma Arib, Muhammad Arib Alwansyah; Khaola, Khaola Rachma Adzima; Rido, Muhammad Rido Wujudi
Diophantine Journal of Mathematics and Its Applications Vol. 4 No. 2 (2025): Vol. 4 No. 2 (2025)
Publisher : UNIB Press

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33369/diophantine.v4i2.46691

Abstract

Missing data is a problem in data processing that can reduce the quality of analysis results if not addressed. This study aims to evaluate the performance of two imputation methods, namely Random Forest Imputation (RF) and Classification and Regression Tree (CART), at various levels of missing value proportions, namely 5%, 10%, 15%, and 20%. The data used in this study are Bivariate Gamma data of 200 observations with two variables, which were generated using RStudio software. The evaluation was carried out based on the correlation value between the imputed data and the original data, as well as the error measures Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE). The results showed that at the missing value levels of 5% and 10%, the CART method produced the smallest MAPE and RMSE values, so that the CART method was the best method, although there was no significant difference between the RF method and the 10% missing value data. At 15% and 20% missing values, the RF method demonstrated superior performance with smaller MAPE and RMSE values ​​compared to CART. Overall, the CART method is more suitable for use with a low proportion of missing values, while the RF method provides more stable performance at a high proportion of missing values. The results of this study provide recommendations for selecting a more appropriate imputation method based on the level of missing data.