cover
Contact Name
Dr. Muhammad Ahsan
Contact Email
muh.ahsan@its.ac.id
Phone
+6281331551312
Journal Mail Official
inferensi.statistika@its.ac.id
Editorial Address
Department of Statistics Faculty of Science and Data Analytics Institut Teknologi Sepuluh Nopember (ITS) Kampus ITS Keputih Sukolilo Surabaya Indonesia 60111
Location
Kota surabaya,
Jawa timur
INDONESIA
Inferensi
ISSN : 0216308X     EISSN : 27213862     DOI : http://dx.doi.org/10.12962/j27213862
The aim of Inferensi is to publish original articles concerning statistical theories and novel applications in diverse research fields related to statistics and data science. The objective of papers should be to contribute to the understanding of the statistical methodology and/or to develop and improve statistical methods; any mathematical theory should be directed towards these aims; and any approach in data science. The kinds of contribution considered include descriptions of new methods of collecting or analysing data, with the underlying theory, an indication of the scope of application and preferably a real example. Also considered are comparisons, critical evaluations and new applications of existing methods, contributions to probability theory which have a clear practical bearing (including the formulation and analysis of stochastic models), statistical computation or simulation where the original methodology is involved and original contributions to the foundations of statistical science. It also sometimes publishes review and expository articles on specific topics, which are expected to bring valuable information for researchers interested in the fields selected. The journal contributes to broadening the coverage of statistics and data analysis in publishing articles based on innovative ideas. The journal is also unique in combining traditional statistical science and relatively new data science. All articles are refereed by experts.
Articles 147 Documents
The Continuum Regression Analysis with Preprocessed Variable Selection LASSO and SIR-LASSO Suruddin, Adzkar Adlu Hasyr; Erfiani, Erfiani; Sumertajaya, I Made
Inferensi Vol 8, No 1 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i1.21658

Abstract

Analyzing high-dimensional data is a considerable challenge in statistics and data science. Issues like multicollinearity and outliers often arise, leading to unstable coefficients and diminished model effectiveness. Continuum regression is a useful method for calibration models because it effectively handles multicollinearity and reduces the number of dimensions in the data. This method condenses data into autonomous latent variables, resulting in a more stable, precise, and reliable model. It is possible to use the dimensionality reduction method without losing any important information from the original data. This makes it a useful tool for making calibration models work better. In the initial phase, minimizing dimensions via variable selection is crucial. The study aims to build and test the Continuum Regression calibration model using LASSO and SIR-LASSO variable selection preprocessing methods. SIR-LASSO is a method that integrates SIR with the variable selection capabilities of LASSO. This technique aims to handle high-dimensional data by identifying relevant low-dimensional structures. LASSO improves variable selection by applying a penalty to regression coefficients, reducing the impact of less significant or redundant variables. The integration improves SIR's efficacy in assessing high-dimensional data while also enhancing model stability and interpretability. This approach seeks to address the issues of multicollinearity and model instability. We conducted simulations using both low-dimensional and high-dimensional datasets to assess the efficacy of CR LASSO and CR SIR-LASSO. RStudio version 4.1.3 was used for the analysis. The "MASS" package was used to create data with a multivariate normal distribution. The "glmnet" package was used for LASSO variable selection, and the "LassoSIR" package was used for SIR-LASSO variable selection. In the simulation itself, LASSO surpasses SIR-LASSO in variable selection by yielding the lowest RMSEP value in every scenario. On the other hand, SIR-LASSO becomes less stable as the number of dimensions increases, which suggests that it is sensitive to large changes in variables. As shown by lower median RMSEP values across a range of sample sizes and situations, CR LASSO is usually better at making predictions than SIR-LASSO. The RMSEP distributions for LASSO are consistently tighter, which means that its performance is more stable and reliable compared to SIR-LASSO, whose data has more outliers and more variation. Even with a growing sample size, LASSO maintains its advantage, particularly when setting the value at 0.5. SIR-LASSO, although occasionally competitive, generally yields more variable results, particularly with larger sample sizes. Overall, LASSO appears to be a more reliable option for CR model with pre-processed variable selection.
Prediction and Analysis of The Number of ARI Cases based on PM2.5 Concentration with Co-Kriging Approach Chamidah, Nur; Andriani, Putu Eka; Fitri, Marfa Audilla; Fajrina, Sofia Andika Nur; Suryono, Alda Fuadiyah; Alexandra, Victoria Anggia
Inferensi Vol 8, No 1 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i1.20512

Abstract

Air quality significantly impacts global environmental health, influencing both human well-being and climate change. According to the World Health Organization (WHO), air pollution is one of the most substantial environmental threats to human health, with Indonesia experiencing particularly severe air quality issues. The World Air Quality Report ranks Indonesia 14th globally and 1st in Southeast Asia for poor air quality, with a notable increase in PM2.5 concentrations to 37.1 µg/m³ in 2023. Major sources of pollution include coal-fired power plants, motor vehicles, forest fires, and agricultural activities. In urban areas like Surabaya, PM2.5 levels have risen, contributing to high incidences of Acute Respiratory Infections (ARI). Spatial analysis reveals a correlation between PM2.5 levels and ARI cases, with spatial regression and co-kriging methods offering accurate estimation models. This study utilizes co-kriging, incorporating PM2.5 data from nine districts in Surabaya, to estimate ARI cases. The Exponential semivariogram model provided the most accurate predictions, with a MAPE value of 5.11%. The highest estimated ARI cases were in the Kenjeran district, highlighting the need for targeted interventions. Future research should expand observation points and consider additional influencing factors such as weather, population density, and socioeconomic conditions to enhance prediction accuracy and support effective public health strategies.
Forecasting Futures Gold Prices Using Pulse Function Intervention Analysis Approach Miranda, Ariadna Sopia; Andriani, Putu Eka; Sediono, Sediono; Syahzaqi, Idrus
Inferensi Vol 8, No 1 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i1.21979

Abstract

Gold is a precious metal that plays an important role in global trade and is often use as a financial standard in various countries. In 2024, gold prices surged sharply due to global macroeconomic factors, such as economic uncertainty, positioning gold as a safe haven for investors. Accurate predictions of future gold prices are crucial for helping investors make informed decisions and adapt to market changes. In line with Sustainable Development Goal (SDG) 8 on Decent Work and Economic Growth, this study uses the pulse function intervention analysis approach to predict gold prices by identifying patterns of changes in the pre-intervention and post-intervention periods. This study aims to make a significant contribution to the use of comprehensive and relevant predictive tools by considering the effects of interventions, supporting investor decision-making, and contributing to economic growth. The best model was obtained at ARIMA (0,2,1) with intervention parameters b=0, r=2, and s=0. The prediction results show a close alignment with actual data, yielding a MAPE value of 1.289%. Additionally, this model produces the smallest AIC value of 1125.1, an SBC value of 1135.86, and an MSE value of 1403.11, demonstrating excellent predictive capability.
Comparison of Ensemble Learning Methods in Classifying Unbalanced Data on the Bank Marketing Dataset Hasnataeni, Yunia; Sadik, Kusman; Soleh, Agus M; Astari, Reka Agustia
Inferensi Vol 8, No 1 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i1.20569

Abstract

The banking industry is experiencing rapid growth, particularly in telemarketing strategies to increase product and service sales. Despite widespread use, these strategies need higher success rates due to data imbalance, where fewer customers accept offers than those who reject them. This study evaluates machine learning algorithms, including Random Forest, Gradient Boosting, Extra Trees, and AdaBoost, without and handling imbalanced data using the Random Over-Sampling Examples (ROSE) method. The evaluation covers accuracy, precision, recall, F1-score, and AUC of the ROC curve. Results indicate that Random Forest and AdaBoost consistently perform well, with Random Forest maintaining a high accuracy of 91.00% after handling imbalanced data. Gradient Boosting and Extra Trees improve in precision post-oversampling. All models exhibit high AUC values, close to 0.94, demonstrating excellent differentiation between positive and negative classes. The study concludes that addressing data imbalance enhances model performance, making these models suitable for effective telemarketing strategies in the banking sector.
A Nonparametric Regression Approach Address Poverty Problems in East Nusa Tenggara Province Adrianingsih, Narita Yuri; Mungkabel, Mariana; Dani, Andrea Tri Rian; Ni'matuzzahroh, Ludia
Inferensi Vol 8, No 1 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i1.20508

Abstract

The administration is focused on reducing poverty, which is still a significant issue. Since the regression curve is unknown and the truncated spline nonparametric regression approach offers a high degree of flexibility, the study was conducted to determine what factors influence it, particularly in the East Nusa Tenggara area. The goal of this study is to develop a nonparametric regression model. The average length of schooling, life expectancy, percentage of the illiterate population aged 15 and over, labor force participation rate, percentage of households based on the information source, and population density affect poverty in the East Nusa Tenggara area. With a minimum GCV of 39.57, it was determined that 1 knot point were the ideal knot point. To some extent, the characteristics that influenced poverty were life expectancy, labor force participation rate, percentage of households with a proper light source, and population density. The best model met these criteria with an R2 of 81.28%. The findings suggest that targeted interventions to improve these factors can significantly reduce poverty in East Nusa Tenggara.
Implementation of Clustering Time Series with DTW to Clustering and Forecasting Rice Prices Each Provinces in Indonesia Tsabitah, Dhiya; Angraini, Yenni; Sumertajaya, I Made
Inferensi Vol 8, No 1 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i1.21952

Abstract

Indonesia faces a significant imbalance between domestic supply and demand, leading to escalating rice prices and pronounced regional disparities. To elucidate underlying price patterns and forecast future trends, this study employed Hierarchical Clustering Time-Series with DTW and ARIMA modelling at both individual and cluster levels. Comprehensive analysis, incorporating visualization and threshold comparisons, identified Central Kalimantan as an outlier. Individual ARIMA models demonstrated exceptional performance, with MAPE values below 10%. The clustering time-series correlation using Cophenetic coefficient, reached 0.68 for ward linkages. Two clustering approaches were explored: (1) ignoring the outlier province, (2) excluding Central Kalimantan and incorporating it into a separate cluster. Optimal cluster measurement, the Elbow, Silhouette, Calinski-Harabasz, and Davies-Bouldin, yielded 6-7 clusters for the former approach and 3-5 clusters for the latter. Comparative analysis of individual and cluster forecasts, coupled with paired t-tests, revealed that Ward linkage in the second approach produced the most favorable results, with 27/34 provinces exhibiting cluster MAPE values less than or equal totheir individual MAPE. This finding underscores the efficacy of cluster-based modeling in generating accurate and representative estimates for a substantial portion of provinces. A 12-period rice price forecast indicates a prevailing trend of rising prices in most regions of Indonesia.
Comparison of GMERF and GLMM Tree Models on Poverty Household Data with Imbalanced Categories Bukhari, Ari Shobri; Notodiputro, Khairil Anwar; Indahwati, Indahwati; Fitrianto, Anwar
Inferensi Vol 8, No 2 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i2.21901

Abstract

Decision tree and forest methods have become popular approaches in data science and continue to evolve. One of these developments is the combination of decision trees with Generalized Linear Mixed Models (GLMM), resulting in the GLMM Tree, which is applicable to multilevel and longitudinal data. Another model, Generalized Mixed Effect Random Forest (GMERF), extends the concept of decision forests with GLMM, effectively handling complex data structures with non-linear interactions. This study compares the performance of GLMM Tree and GMERF models in classifying poor households in South Sulawesi Province, characterized by imbalanced categories. GLMM Tree provides a simple, interpretable classification through tree diagrams, while GMERF highlights variable importance. Initial tests show all three models (GLMM, GLMM Tree, and GMERF) achieve high accuracy and specificity but exhibit low sensitivity. By applying oversampling, sensitivity and AUC are significantly improved, though this is accompanied by a decline in accuracy and specificity, revealing a trade-off. The study concludes that while GLMM, GLMM Tree and GMERF have their strengths, using them together offers a more comprehensive understanding of poverty classification. Handling imbalanced data with oversampling is effective in increasing sensitivity, but careful consideration is needed due to its impact on overall accuracy.
Variables Selection Affecting Indonesian Human Development Index Using LASSO Sunandi, Etis; Siswantining, Titin
Inferensi Vol 8, No 2 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i2.22891

Abstract

According to Statistics Indonesia, the Human Development Index (HDI) is a measure that reflects the level of human development achievement in a region, based on three basic dimensions: a long and healthy life, knowledge, and a decent standard of living. There are many factors that are suspected to influence HDI in Indonesia. Another hand, estimation of parameters in regression analysis using the Least Squares Method will experience problems, if the number of independent variables is greater than the number of observations. One method that can be used to overcome this problem is to use the Least Absolute Shrinkage and Selection Operator (LASSO) method.  The purpose of this study is the selection of variables that affect Indonesia's Human Development Index (HDI) in 2023 using the LASSO. The LASSO method is known as a model used to select independent variables while overcoming multicollinearity problems. The ridge regression model is used as a comparison model. The results showed that LASSO Analysis is better than Ridge Regression. This can be seen from the Mean Squared Error of Prediction (MSEP) of LASSO (0.34) is smaller than the ridge regression (3.61). In addition, the r-squared value of LASSO is higher, which is 97.6%.
Implementing Markov Switching Regression Using Best Subset Approach For BSI Stock Price Prediction Analysis Nurdiansyah, Denny; Ma'ady, Mochamad Nizar Palefi; Wijayanti, Lulud; Novitasari, Diah Ayu; Rohmawati, Siti
Inferensi Vol 8, No 2 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i2.21030

Abstract

Stocks are evidence of ownership of the capital or funds of a company or institution and are represented by a document that includes the par value, the company name, and the rights and obligations described for each owner. Since so many factors affect the rise and fall of stock prices, investors should pay attention to the factors that influence the rise and fall of stock prices to avoid incurring losses or profits when buying and selling stocks. The rise and fall of stock prices can be analyzed with Markov switching regression by trying all possible placements of factors to get the best subset. Public holdings will continue to increase due to nation-building and Sharia Bank Indonesia (BRIS) stock price appreciation. This study aims to determine the impact of increases and decreases in the closing price of BSI stock. The modeling used in this study is Markov switching regression using the best subset approach. The data used in this study are secondary in the form of daily data for the closing price of Bank Syariah Indonesia shares, Inflation, BI Rate, Selling Exchange Rate, Money Supply, and Gross Domestic Product (GDP). Data are obtained from the official BPS website. The results of this study show that Markov switching regression modeling can identify the feasibility of regimes as "bull" and "bear" periods. State 2 indicates an uptrend or "bullish," and state 1 indicates a downtrend or "bearish." The best subset approach obtains the best model with the lowest SSE value. The study concluded that the statistical modeling results of  BSI stock's closing prices during "bull" and "bear" periods provide significant predictors: BI Rate, Selling Exchange Rate, and Money Supply.
Spatio-Temporal Kriging for Monthly Precipitation Interpolation in East Kalimantan Jannah, Friendtika Miftaqul; Fitriani, Rahma; Pramoedyo, Henny
Inferensi Vol 8, No 2 (2025)
Publisher : Department of Statistics ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j27213862.v8i2.22195

Abstract

Precipitation is one of the factors that can lead to various disasters, such as droughts and floods. Ordinary interpolation methods, such as spatial kriging, cannot accommodate the time element, which is crucial for addressing precipitation-related disasters. Therefore, this study applies a spatio-temporal kriging, which incorporates both spatial and temporal elements. The aim of this study is to develop a spatio-temporal kriging model for precipitation, serving as a basis for interpolating precipitation at unobserved points over various time intervals within the study domain. This model is expected to be an effective tool for disaster mitigation and water conservation strategies. The data used in this study comprises total monthly precipitation recorded at seven precipitation observation posts in East Kalimantan from 2021 to 2023. The findings indicate that the spatio-temporal ordinary kriging model is the most suitable approach, with the best semivariogram model identified as the simple sum-metric. The spatial semivariogram follows an exponential model, while the temporal and joint semivariograms follow Gaussian models. The accuracy of the chosen model yields an RMSE of 2493.687. The interpolation results reveal that West Kutai falls within the medium to high precipitation category, making it the district with the highest flood risk.