cover
Contact Name
-
Contact Email
-
Phone
-
Journal Mail Official
-
Editorial Address
-
Location
Kota semarang,
Jawa tengah
INDONESIA
Media Statistika
Published by Universitas Diponegoro
ISSN : -     EISSN : 24770647     DOI : -
Core Subject : Science,
Arjuna Subject : -
Articles 11 Documents
Search results for , issue "Vol 17, No 1 (2024): Media Statistika" : 11 Documents clear
ENSEMBLE-BASED LOGISTIC REGRESSION ON HIGH-DIMENSIONAL DATA: A SIMULATION STUDY Widhianingsih, Tintrim Dwi Ary; Kuswanto, Heri; Prastyo, Dedy Dwi
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.13-24

Abstract

Dramatic computation growth encourages big data era, which induces data size escalation in various fields. Apart from huge sample size, cases arise high-dimensional data having more feature size than its samples. High-computing power compels the usage of modern approaches to deal with this typical dataset, while in practice, common logistic regression method is yet applied due to its simplicity and explainability. Applying logistic regression on high-dimensional data arises multicollinearity, overfitting, and computational complexity issues. Logistic Regression Ensemble (Lorens) and Ensemble Logistic Regression (ELR) are the logistic-regression-based alternative methods proposed to solve these problems. Lorens adopts ensemble concept with mutually exclusive feature partitions to form several subsets of data, while ELR involves feature selection in the algorithm by drawing part of features based on probability ranking value. This paper uncovers the effectiveness of Lorens and ELR applied to high-dimensional data classification through simulation study under three different scenarios, i.e., with feature size variation, for imbalanced high-dimensional data, and under multicollinearity conditions. Our simulation study reveals that ELR outperforms Lorens and obtains more stable performance over different feature sizes and imbalanced data settings. On the other hand, Lorens achieves more reliable performance than ELR on a simulation study with a multicollinearity issue.
A-OPTIMAL DESIGN IN NON-LINEAR MODELS TO INCREASE SILICON DIOXIDE PURITY LEVELS Weisha, Ghea; Erfiani, Erfiani; Irzaman, Irzaman; Syafitri, Utami Dyah
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.36-44

Abstract

Silica is the most mineral found on earth and is widely used in industry. Silica used in industry is usually silicon dioxide with a purity ≥ 95% and its often sold at a higher cost. To obtain the silica at a lower cost, silica extraction from biomass such as rice husk can be conducted. The purity of silica extracted from biomass tends to be lower than that of mineral silica. Silica with low purity can be increased by adjusting the temperature and the rate of temperature rise. This research aims to obtain the best design to determine the purity of silicon dioxide. The design of this study was generated based on the A-optimality criterion using the DETMAX algorithm. The A-optimality criterion is minimizing the trace of the variance-covariance of the parameter estimation. The best design points obtained using A-optimal design consist of three temperature groups: the minimum temperature of 800°C, the middle temperature of 850°C, and the maximum temperature of 900°C, with varying rates of temperature rise. Points were repeated at the temperature of 850°C, with rates of temperature rise of 1.67°C/min and 3.34°C/min. 
CONWAY-MAXWELL POISSON REGRESSION MODELING OF INFANT MORTALITY IN SOUTH SULAWESI Oktaviana, Oktaviana; Sanusi, Wahidah; Aswi, Aswi; Sukarna, Sukarna; Folorunso, Serifat Adedamola
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.45-56

Abstract

Overdispersion is a common problem in count data that can lead to inaccurate parameter estimates in Poisson regression models. Quasi-Poisson and negative binomial regressions are often used to address overdispersion but have limitations, especially with small samples. The Conway-Maxwell Poisson (CMP) regression model, an extension of the Poisson distribution, effectively addresses both overdispersion and underdispersion, even with limited data, due to additional parameters that better control data dispersion. The Infant Mortality Rate (IMR) is a critical public health indicator, reflecting healthcare quality and broader social, economic, and environmental factors. Accurate IMR estimation is essential for evaluating health policies. This study aims to (1) identify overdispersion in IMR data from South Sulawesi, (2) model IMR using CMP regression, and (3) identify factors influencing IMR. The dataset includes IMR, Low Birth Weight (LBW), diarrhea, asphyxia, pneumonia, and exclusive breastfeeding. Analysis showed significant overdispersion with a ratio of 4.639, making CMP the optimal model with an AIC of 186.845. Significant factors identified were LBW, asphyxia, pneumonia, and exclusive breastfeeding. These findings advance statistical methodologies for count data analysis and offer a more accurate approach to evaluating public health policies, supporting efforts to reduce infant mortality in South Sulawesi Province.
ANALYSIS OF MULTI-OBJECTIVE LINEAR ROBUST OPTIMIZATION MODEL WITH LEXICOGRAPHICAL METHOD Azis, Chusnul Chatimah; Chaerani, Diah; Rusyaman, Endang
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.57-68

Abstract

Problems in robust multi-objective linear optimization are a class of optimization problems with uncertain data parameters which aim in the decision-making process to obtain the best results in certain circumstances by choosing various solution methods for the multi-objective. This research aims to formulate a multi-objective Robust Optimization (RO) model using the Lexicographic Method, then analyzing the existence and uniqueness of the solution. Furthermore, gap analysis on the topic was carried out using a Systematic Literature Review (SLR) approach with the Preferred Reporting Items for Systematic Review and Meta Analysis (PRISMA) method. Results in SLR, the analysis results also shows that the Lexicographic Method is effective in handling data uncertainty with the objective functions sorted by priority. The robust formulation with polyhedral uncertainty sets ensures the flexibility and adaptability of the model. Convexity analysis and application of the Karush-Kuhn-Tucker (KKT) method prove that the resulting solution is exist and unique.
GENE MARKERS IDENTIFICATION OF ACUTE MYOCARDIAL INFARCTION DISEASE BASED ON GENOMIC PROFILING THROUGH EXTREME GRADIENT BOOSTING (XGBoost) Fajriyah, Rohmatul; Isnandar, Havidzah Asri; Arifuddin, Adhar
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.69-80

Abstract

One disease that can cause death is Acute Myocardial Infarction (AMI). AMI, also known as a heart attack, is a condition that causes permanent damage to heart muscle tissue due to prolonged ischemia or lack of blood flow that occurs due to blockage of the epicardial coronary arteries and results in blood clots and limiting blood supply to the myocardium. During the years the young AMI patients are increasing. One of the ways to diagnose early is providing information of biomarkers related to this disease by implementing the bioinformatics data analysis. The research was conducted to look at the genomic profile of patients suffering from AMI based on without recurrent events and normal control, using the XGBoost method, due to its scalability and efficiency.  Based on the grid search of tuning hyperparameters, the XGBoost method gives a classification accuracy of 88.89%, AUC 90 and kappa 0.7805. These results indicate that the XGBoost method can classify patients suffering from AMI well. This research has identified three genes that contribute the most to classifying AMI patients, namely calponin 2, ribosomal protein S11 and myotropin. Based on the heatmap visualization, information was obtained that the three genes are class markers without recurrent events.
STACKING ENSEMBLE APPROACH IN STATISTICAL DOWNSCALING USING CMIP6-DCPP FOR RAINFALL ESTIMATION IN RIAU Mahkya, Dani Al; Djuraidah, Anik; Wigena, Aji Hamim; Sartono, Bagus
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.1-12

Abstract

Rainfall modeling and prediction is one of the important things to do. Rainfall has an important relationship and role with various aspects of the environment. One phenomenon that can be associated with rainfall is forest and land fires. Riau is one of the provinces in Indonesia that has a high potential for forest and land fires. This is because Riau has a large area of peatland. One approach that can be used to estimate rainfall is statistical downscaling. The concept of this approach is to form a functional relationship between global and local data. This research uses CMIP6-DCPP output data that will be used to estimate rainfall at 10 observation stations in Riau. The proposed model in this research is Stacking Ensemble with PC Regression and LASSO Regression in the base model and Multiple Linear Regression in the meta model. This research aims to determine the best CMIP6-DCPP model for estimating rainfall in Riau and increasing the accuracy of rainfall estimates using the Stacking Ensemble approach.
MULTICLASS CLASSIFICATION OF MARKETPLACE PRODUCTS WITH MACHINE LEARNING Aditama, Farhan Satria; Krismawati, Dewi; Pramana, Setia
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.25-35

Abstract

The use of marketplace data and machine learning in the collection of commodity data can provide an opportunity for Statistics Indonesia to complete the commodity directories for various surveys. This research adopts machine learning to train a product classification model based on existing datasets to predict whether a new dataset falls into which KBKI category. The dataset contains more than 32,000 products from 26 classes consisting of product data from two biggest marketplaces in Indonesia. Algorithms used for classification include Random Forests (RF), Support Vector Machines (SVM), and Multinomial Naive Bayes (MNB). Results indicate that MNB is the most effective algorithm when considering the trade-off between accuracy and processing time. MNB achieved the highest micro-average F1 scores, with 91.8% for Tokopedia and 95.4% for Shopee, and has the fastest execution time approximately 5 seconds.
ANALYSIS OF MULTILEVEL STRUCTURAL EQUATION MODELING WITH GENERALIZED STRUCTURED COMPONENT ANALYSIS METHOD Amanah, Fitri; Abdurakhman, Abdurakhman
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.81-92

Abstract

Generalized Structured Component Analysis (GSCA) is a component-based SEM. One of the developments of GSCA is the GSCA method for multilevel data known as multilevel GSCA. Multilevel data is data that has a nested, grouped, or nested structure. This study aims to apply multilevel GSCA to the data on factors that affect poverty. The data used is on Indonesia's health, education and poverty in 2023.. The result is that all indicators are significant to the latent variables. The structural model shows that the quality of health has a negative and significant effect on poverty, education has a negative and significant effect on poverty, and the quality of health has a positive and significant effect on education. The results of between group show that health quality has a positive and significant effect on education in all regions, health quality has a negative and significant effect on poverty in Bali & Nusa Tenggara, Sulawesi, as well as Maluku and Papua, education has a negative and significant effect on poverty in Sumatra, Java, and Maluku & Papua. The overall goodness of fit value (FIT) is 0.622, meaning the model can explain 62.2% of data variation.
IMPLEMENTATION OF PROPHET IN AMERICAN ELECTRICITY FORECASTING WITH AND WITHOUT PARAMETER TUNING Sulandari, Winita; Yudhanto, Yudho; Hapsari, Riskhia; Wijayanti, Monica Dini; Pardede, Hilman Ferdinandus
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.93-104

Abstract

Prophet is one of the machine learning approximation methods that accommodate trends, seasonality, and holiday impacts in time series data. Generally, the performance of machine learning models can be improved by implementing hyperparameter tuning. This study investigates whether hyperparameter tuning can improve the model's performance. To show its effectiveness, the Prophet model constructed by parameter tuning is compared to the one with fixed parameter values (namely the default model) for both the original series and the Box-Cox transformed series in terms of mean absolute percentage error (MAPE). Based on the experimental results of the twenty-four daily electricity load time series in American Electric Power (AEP). This shows that parameter tuning successfully reduces the MAPE of the default model in the range of about 3-8% for training data. However, there is no guarantee for testing data. Although, in some cases, parameter tuning can reduce the MAPE value of the default model by up to 38%, in other cases, it actually increases the MAPE of the default model by almost 15%. The experiments on testing data also show that models built from transformed data do not necessarily produce more accurate forecast values than those built from the original data.
SPATIAL PANEL MODELING OF PROVINCIAL INFLATION IN INDONESIA TO MITIGATE ECONOMIC IMPACTS OF HEALTH CRISES Astuti, Ani Budi; Pramoedyo, Henny; Astutik, Suci; Setiarini, An Nisa Dwi
MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14710/medstat.17.1.105-116

Abstract

Probabilistic statistical modeling simplifies complex issues, including economic and health challenges, by applying inductive statistics. Spatial panel modeling, using Queen Contiguity weighting, has proven to be essential for analyzing inflation expenditure patterns during health crises, such as COVID-19 in Indonesia. This study highlights the impact of inflation on national economic stability and explores the inter-provincial relationships that influence inflation dynamics across expenditure groups. The purpose of this study is to develop a spatial panel model to address this gap, offering insights for policy and recovery strategies. The results reveal significant spatial interdependence in provincial inflation data, underscoring the role of spatial factors in economic analysis. Two models are identified: Spatial Autoregressive Model with Random Effects (SAR-RE) before the crisis and Spatial Error Model with Random Effects (SEM-RE) during the crisis. Transportation facilities consistently affect inflation, demonstrating the effectiveness of spatial panel modeling in guiding policies for economic stability and recovery.

Page 1 of 2 | Total Record : 11