Articles
10 Documents
Search results for
, issue
"Vol 15, No 2 (2022): Media Statistika"
:
10 Documents
clear
COMPARISON OF SMOTE RANDOM FOREST AND SMOTE K-NEAREST NEIGHBORS CLASSIFICATION ANALYSIS ON IMBALANCED DATA
Jus Prasetya;
Abdurakhman Abdurakhman
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.198-208
In machine learning study, classification analysis aims to minimize misclassification and also maximize the results of prediction accuracy. The main characteristic of this classification problem is that there is one class that significantly exceeds the number of samples of other classes. SMOTE minority class data is studied and extrapolated so that it can produce new synthetic samples. Random forest is a classification method consisting of a combination of mutually independent classification trees. K-Nearest Neighbors which is a classification method that labels the new sample based on the nearest neighbors of the new sample. SMOTE generates synthesis data in the minority class, namely class 1 (cervical cancer) to 585 observation respondents (samples) so that the total observation respondents are 1208 samples. SMOTE random forest resulted an accuracy of 96.28%, sensitivity 99.17%, specificity 93.44%, precision 93.70%, and AUC 96.30%. SMOTE K-Nearest Neighborss resulted an accuracy of 87.60%, sensitivity 77.50%, specificity 97.54%, precision 96.88%, and AUC 82.27%. SMOTE random forest produces a perfect classification model, SMOTE K-Nearest neighbors classification produces a good classification model, while the random forest and K-Nearest neighbors classification on imbalanced data results a failed classification model.
MANAGING HEART RELATED DISEASE RISKS IN BPJS KESEHATAN USING COLLECTIVE RISK MODELS
Gede Ary Prabha Yogesswara;
Danang Teguh Qoyyimi;
Abdurakhman Abdurakhman
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.175-185
BPJS Kesehatan is a legal entity established to administer the health service program using the insurance system. Heart related diseases is a disease with the largest coverage cost in Indonesia. It can be calculated by using the collective risk model as an approximation of the aggregate loss model. This model is a compound distribution from claim frequency and claim severity, where claim frequency be the primary distributions. The Poisson distribution can be used to the distribution of the heart disease claim frequency. Whereas, the distribution of the heart disease claim severity has a lognormal distribution. The model obtained can explain the aggregate loss of heart disease claims properly.
SPRATAMA MODEL FOR INDONESIAN PARAPHRASE DETECTION USING BIDIRECTIONAL LONG SHORT-TERM MEMORY AND BIDIRECTIONAL GATED RECURRENT UNIT
Titin Siswantining;
Stanley Pratama;
Devvi Sarwinda
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.129-138
Paraphrasing is a way to write sentences with other words with the same intent or purpose. Automatic paraphrase detection can be done using Natural Language Sentence Matching (NLSM) which is part of Natural Language Processing (NLP). NLP is a computational technique for processing text in general, while NLSM is used specifically to find the relationship between two sentences. With the development Neural Network (NN), nowadays NLP can be done more easily by computers. Many models for detecting and paraphrasing in English have been developed compared to Indonesian, which has less training data. This study proposes SPratama Model, which models paraphrase detection for Indonesian using a Recurrent Neural Network (RNN), namely Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Gated Recurrent Unit (BiGRU). The data used is "Quora Question Pairs" taken from Kaggle and translated into Indonesian using Google Translate. The results of this study indicate that the proposed model has an accuracy of around 80% for the detection of paraphrased sentences.
ESTIMATING AND FORECASTING COVID-19 CASES IN SULAWESI ISLAND USING GENERALIZED SPACE-TIME AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODEL
Sukarna Sukarna;
Nurul Fadilah Syahrul;
Wahidah Sanusi;
Aswi Aswi;
Muhammad Abdy;
Irwan Irwan
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.186-197
A range of spatio-temporal models has been used to model Covid-19 cases. However, there is only a small amount of literature on the analysis of estimating and forecasting Covid-19 cases using the Generalized Space-Time Autoregressive Integrated Moving Average (GSTARIMA) model. This model is a development of the GSTARMA model which has non-stationary data. This paper aims to estimate and forecast the daily number of Covid-19 cases in Sulawesi Island using GSTARIMA models. We compared two models namely GSTARI and GSTIMA considering the root mean square error (RMSE). Data on a daily number of Covid-19 cases (from April 10, 2020, to May 07, 2021) were used. The location weight used is the inverse distance weight based on the distance between airports in the capital cities of each province. The appropriate models obtained based on the data are the GSTARIMA (1;0;1;1) model and the GSTARIMA (1;1;1;0) model. The results showed that the forecast for the number of new Covid-19 cases is accurate and reliable only for the short term.
COLLABORATIVE FILTERING APPROACH: SKINCARE PRODUCT RECOMMENDATION USING SINGULAR VALUE DECOMPOSITION (SVD)
Farhatun Nissa;
Arum Handini Primandari;
Achmad Kurniansyah Thalib
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.139-150
The recommendation system provides recommendations for something, be it goods, songs, or movies. The term system is not limited to a service system but concerns a model that can provide recommendations. With recent technological advances, many companies provide various skincare products because current generations are increasingly aware of self-care. With various choices, someone may experience confusion in determining the product they want to buy. Therefore, we need a system that can provide product recommendations run on any platform we use. The most common method for recommendation systems often comes with Collaborating Filtering (CF) where it relies on the past user and item dataset. The singular value decomposition (SVD) method uses a matrix factorization technique that predict the user's rating based on historical ratings. The measurement of the model's accuracy is the RMSE average of 1.01276, indicating that this value results from the best parameters. The results focus on showing skincare product recommendations to users sorted based on rating predictions.
THE GGE BIPLOT ON RCIM MODEL FOR ASSESSING THE GENOTYPE-ENVIRONMENT INTERACTION WITH SIMULATING OUTLIERS: ROBUSTNESS IN R-SQUARED PROCRUSTES
Alfian Futuhul Hadi;
Halimatus Sa'diyah;
Dimas Bagus Cahyaningrat Wicaksono
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.209-219
The genotype by environment interaction (GEI) analysis was usually done by Additive Main Effects and Multiplicative Interaction (AMMI) model with Biplot features, and recently there was a Row Column Interaction Model (RCIM) alternatively. In the Biplot of genotype (G) and genotype by environment (GE) interactions, known as the GGE Biplot, the main effect of environment (E) was deleted, while the main effect of G and the interaction effect of GE is kept and combined. Subsequently, continuing our recent research of the robustness of the GGE Biplot in RCIM models, this paper aims to develop the GGE Biplot by RCIM model to analyze the GEI with outlying observations. We used the RCIM model with Asymptotic Laplace Distribution (ALD) that was applied on the simulated data with scattered and single environment outliers to evaluate the robustness of the GGE Biplot. In addition, the robustness was evaluated using the R-squared statistic of the Procrustes analysis. It is shown that the GGE Biplot of RCIM with the ALD family function provides better robustness than the Gaussian. A noticeable superiority of the GGE Biplot with RCIM ALD appeared as the percentage of single environment outliers reach the number of rows of the data matrix.
MODELING THE CONTRIBUTION OF THE MANUFACTURING SECTOR TO THE GROSS DOMESTIC PRODUCT OF KENYA USING TIME SERIES ANALYSIS
Maurice Wanyonyi
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.117-128
The manufacturing sector is considered a pivotal contributor to the growth of the economy around the globe. Kenya relies on the manufacturing sector to generate revenue and ultimately enhance the growth of the economy. Despite the key purpose played by these sectors in the economy, inflation rate has diversely affected their performance. The purpose of the study was to develop the Autoregressive Integrated Moving Average time series model to forecast the inflation rate in Kenya. The analysis utilized secondary data from the Kenya National Bureau of Statistics and the model was fitted to the data using R. The ARIMA with the information criterion of 576.24 was identified as the best model. Based on the forecasting, it was established that there will be a slight shift in the inflation in the coming years. Therefore, the government should use wage and price control to fight inflation but put in place policies to prevent recession and job loss in the country. The government should also employ contractionary monetary policy to fight inflation by reducing the money supply in the economy through decreases bond prices and increased interest rates. Implementation of these recommendations might assist in reducing the rate of inflation in the country.
APPLICATION OF BIPLOT ANALYSIS WITH ROBUST SINGULAR VALUE DECOMPOSITION TO POVERTY DATA IN SULAWESI ISLAND
Febriyana Taki;
Lailany Yahya;
Muhammad Rezky Friesta Payu
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.220-230
Poverty is defined as an inability of the individual to meet basic needs for a decent life. According to BPS data in2020, Sulawesi Island ranks fifth as the poorest island in Indonesia. This study aims to find out the mapping of areas and indicators of poverty in Sulawesi Island using Biplot Analysis with Robust Singular Value Decomposition approach for outlier research data. Based on the results of the study, there are five objects that are outlier and the information provided by the biplot amounted 98.45%. District/city that have similar characteristics are divided into 4 groups. The indicator of poverty that has the most diversity is the School Old Expectations Numbers (Var 4) and the one with the least diversity is Poor Households Using Clean Water (Var 8). Indicators of poverty that are positively correlated are Literacy Numbers (Var 1) and Non-Working Poor Population (Var 5), while the negative correlated are The Non-Working Poor Population (Var 5) and Poor Households Using Clean Water (Var 8). There are 19 districts/cities that have literacy values above the average of all districts/cities and 11 districts/cities that have a per capita expenditure value below the average of all districts/cities.
IMPLEMENTATION OF STOCHASTIC MODEL FOR RISK ASSESSMENT ON INDONESIAN STOCK EXCHANGE
Di Asih I Maruddani;
Trimono Trimono;
Mas'ad Mas'ad
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.151-162
Currently, financial assets become an alternative choice for investors in Indonesia to get maximum profits. The Indonesia Stock Exchange is the official capital market in Indonesia which is a place for trading financial assets. Stocks are listed as the most preferred financial asset by investors. In reality, stock investment is not a risk-free investment. The main risk that investors should face is the loss risk. This kind of risk can occur at any time. From that problem, this study aims to do risk assessment on the Indonesian stock market. The evaluation will be started with stock price index prediction using the Stochastic model (Geometric Brownian Motion Model and Jump Diffusion). Then, the result from that processes will be used to get loss risk prediction through the Adjusted Expected Shortfall model. By using the historical price of JKSE index from 01/08/21 to 31/08/22, Jump Diffusion is the best model to predict the JKSE index with MAPE value is 1.08%. Then, at the 95% confidence level and 1-day holding period, the expected loss risk using Adjusted Expected Shortfall model on 09/01/2022 is -0.02978.
MEASUREMENT OF SUPPORT VECTOR REGRESSION PERFORMANCE WITH CLUSTER ANALYSIS FOR STOCK PRICE MODELING
Izza Dinikal Arsy;
Dedi Rosadi
MEDIA STATISTIKA Vol 15, No 2 (2022): Media Statistika
Publisher : Department of Statistics, Faculty of Science and Mathematics, Universitas Diponegoro
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.14710/medstat.15.2.163-174
Risk-averse investors will seek out stock investments with the minimum risk. One step that can be taken is to develop a model of stock prices and predict their fluctuations in the coming months. Significant studies on the modeling of stock movements have used the ARCH/GARCH method, but this method requires some assumptions. This paper will discuss the performance of stock modeling using Support Vector Regression. The performance is measured using the root mean square error value in two stock clusters based on its volatility value, e.g., stocks with large volatility and stocks with small volatility. This case study makes use of daily closing price data from 10 LQ-45 index shares from October 12, 2018 to October 11, 2019. In conclusion, SVR's performance on stocks with high volatility produces RMSE, which is considerably higher than SVR's performance on stocks with low volatility.