cover
Contact Name
Wardhani Utami Dewi
Contact Email
dewiutamiwardhani@gmail.com
Phone
+62895379324824
Journal Mail Official
scncstatistics@gmail.com
Editorial Address
Jl. Ki Hajar Dewantara No.116, Iringmulyo, Metro Timur, Kota Metro, Lampung 34111
Location
Kota metro,
Lampung
INDONESIA
Sciencestatistics: Journal of Statistics, Probability, and Its Application
ISSN : 29642884     EISSN : 29639875     DOI : https://doi.org/10.24127
Core Subject : Science, Education,
Sciencestatistics: Journal of Statistics, Probability, and Its Application is an Open Access journal in the field of statistical inference, experimental design and analysis, survey methods and analysis, research operations, data mining, statistical modeling, statistical updating, time series and econometrics, multivariate analysis, statistics education, simulation and modeling, numerical analysis, algebra, combinatorics, and applied mathematics.
Articles 29 Documents
Text Mining dari Jurnal Teknologi Pendidikan Menggunakan Term Frequency Matrix Wahyu Ananda Putri Aulia, Dea
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 1 (2025): JANUARY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i1.5989

Abstract

Penelitian ini bertujuan untuk menganalisis kata-kata yang paling sering muncul dalam beberapa artikel yang diterbitkan oleh Jurnal Teknologi Pendidikan menggunakan pendekatan term frequency matrix. Melalui teknik text mining, data dari abstrak artikel diolah untuk mengidentifikasi kata-kata kunci dan memahami pola penelitian dalam bidang teknologi pendidikan. Analisis dilakukan pada tiga edisi terakhir jurnal, yaitu Vol. 7 No. 3, Vol. 7 No. 4, dan Vol. 8 No. 1, dengan memanfaatkan metode matematika berbasis frekuensi kata. Data diolah melalui tahapan preprocessing, tokenisasi, dan pembuatan matriks frekuensi term, diikuti oleh analisis statistik deskriptif dan pemodelan distribusi. Hasil penelitian menunjukkan bahwa kata-kata seperti "learning," "students," "media," dan "education" memiliki frekuensi kemunculan yang tinggi, menunjukkan fokus utama penelitian pada tema pembelajaran dan integrasi teknologi dalam pendidikan. Temuan ini diharapkan dapat memberikan wawasan yang lebih mendalam bagi peneliti dan praktisi dalam memahami tren penelitian serta merancang inovasi dalam pengembangan teknologi pendidikan. This research aims to analyze the words that appear most frequently in several articles published by the Journal of Educational Technology using a term frequency matrix approach. Through text mining techniques, data from abstract articles is processed to identify key words and understand research patterns in the field of educational technology. The analysis was carried out on the last three editions of the journal, namely Vol. 7 No. 3, Vol. 7 No. 4, and Vol. 8 No. 1, by utilizing mathematical methods based on word frequency. The data is processed through the stages of preprocessing, tokenization, and creating a term frequency matrix, followed by descriptive statistical analysis and distribution modeling. The results showed that words such as "learning", "students", "media", and "education" had a high frequency of occurrence, indicating the main focus of the research on the theme of learning and the integration of technology in education. It is hoped that these findings can provide deeper insight for researchers and practitioners in understanding research trends and designing innovations in the development of educational technology.
Estimasi Model Fixed Effect Pada Analisis Regresi Data Panel Dengan Metode Least Square Dummy Variable (LSDV) Junia Rahma Nur Imani; Khoirin Nisa; Dorrah Aziz; Nusyirwan
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 1 (2025): JANUARY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i1.7525

Abstract

Data panel merupakan gabungan antara data cross section dan data time series. Salah satu model analisis regresi data panel adalah model fixed effect. Model fixed effect mempunyai asumsi bahw intersep berbeda untuk setiap individu, tetapi koefisien slope konstan. Estimasi dilakukan dengan menggunakan variabel dummy untuk menjelaskan adanya perbedaan intersep antar individu. Penelitian ini bertujuan untuk mengestimasi model fixed effect pada analisis regresi data panel dengan metode least square dummy variable dan menerapkannya pada data upah minimum provinsi di Indonesia tahun 2014-2017. Berdasarkan hasil penelitian yang telah dilakukan dengan menggunakan estimasi parameter = untuk model fixed effect pada analisis regresi data panel upah minimum provinsi di Indonesia diperoleh model sebagai berikut, = 5.248452+ + 0.007415 + 0.002882 + 1.63E-07dengan, = upah minimum provinsi, = indeks harga konsumen, = tingkat partisipasi angkatan kerja, = produk domestik regional bruto dan = variabel dummy, k = 1,2, ...,33 (provinsi). Panel data is a combination of cross section data and time series data. One of panel data regression analysis model is the fixed effect model. The fixed effect model has the assumption that intercepts are different for each individual, but the slope coefficient is constant. Estimation is done by using dummy variables to explain the existence of intercept differences between individuals. This study aims to estimate the fixed effect model in panel data regression analysis using the least square dummy variable method and apply it to the provincial minimum wage data in Indonesia in 2014-2017. Based on the results of the research that has been done by using paremeter estimator = for fixed effect model in the panel regression analysis on provincial minimum wage data in Indonesia, we obtained as follows, = 5.248452+ + 0.007415 + 0.002882 + 1.63E-07 with, = provincial minimum wage, = consumer price index, = labor force participation rate, = regional gross domestic product, = dummy variable , k = 1,2, ...,33 (province).
Penerapan Model Generalized Space Time Autoregressive (GSTAR) pada Data Inflasi Beberapa Kota Ulfa Putri Rahmani; Khoirin Nisa; Nurmaita Hamsyiah
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 1 (2025): JANUARY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i1.7526

Abstract

Model yang umum digunakan untuk data space time adalah model Vector autoregressive (VAR), Space Time Autoregressive (STAR), dan Generalized Space Time Autoregressive (GSTAR). Untuk lokasi yang memiliki karakteristik yang berbeda (heterogen), model GSTAR lebih baik digunakan dibandingkan model STAR. Tujuan dari penelitian ini adalah menerapkan model GSTAR pada data time series dari tiga lokasi berbeda. Data yang digunakan pada penelitian ini adalah data inflasi Palembang, Bandar Lampung, dan DKI Jakarta bulan Januari 2012 hingga Juni 2019. Bobot Lokasi yang digunakan adalah bobot lokasi invers jarak dan bobot lokasi normalisasi korelasi silang. Pada penelitian ini pendugaan parameter dilakukan dengan metode Generalized Least Square (GLS). Dari hasil analisis diperoleh model yang terbaik adalah model GSTAR(11) dengan bobot lokasi invers jarak karena memiliki rata-rata RMSE terkecil yaitu 0.467767. The models commonly used for space time data are the Vector autoregressive (VAR), Space Time Autoregressive (STAR), and Generalized Space Time Autoregressive (GSTAR) models. For locations that have different (heterogeneous) characteristics, the GSTAR model is better to use than the STAR model. The aim of this research is to apply the GSTAR model to time series data from three different locations. The data used in this research is inflation data from Palembang, Bandar Lampung, and DKI Jakarta from January 2012 to June 2019. The location weights used are distance inverse location weights and cross-correlation normalized location weights. In this research, parameter estimation was carried out using the Generalized Least Square (GLS) method. From the analysis results, it was found that the best model was the GSTAR(11) model with inverse distance location weights because it had the smallest average RMSE, namely 0.467767.
Tweedie Distribution: A Statistical Solution for Unusually Dispersed Data Zainol Mustafa
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 1 (2025): JANUARY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i1.8003

Abstract

The Tweedie distribution has emerged as an effective statistical approach to model data with unusual dispersion characteristics, especially data with mixed discrete and continuous components. In this study, the Tweedie distribution is applied to insurance claims data to model the pattern of claims containing many zero values and large claims that are continuous in nature. With parameter estimation using the iteratively reweighted least squares (IRLS) algorithm in R software, the results show that the Tweedie distribution can handle higher variability (overdispersion) accurately. The estimated power parameter value () of 1.7 indicates that the Tweedie distribution combines the Poisson and Gamma distributions, which are effective in modeling claims data with high dispersion. This study also shows that the Tweedie distribution is able to provide better and more realistic predictions compared to traditional distributions such as Poisson or Gamma, which cannot handle data with mixed characteristics and overdispersion well. These findings provide important contributions to insurance claims modeling and open up the potential for wider applications in various other fields that face data with high variability and mixed patterns.
Simulation and Analysis of Gamma Distribution in Assessing Delay Rate Completion of the Curriculum in Schools Sari, Reni Permata; Muhammad Ihsan Dacholfany; Amir Khushk; Wardhani Utami Dewi
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 1 (2025): JANUARY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i1.8158

Abstract

Completion of the curriculum on time is one of the important indicators of the success of the learning process. However, various factors such as material difficulty and external distractions often cause delays in curriculum completion. This study aims to model the delay in curriculum completion using Gamma distribution, with the research location at SMP Negeri 1 Melinting, East Lampung. Primary data is obtained from schools, while secondary data comes from related literature. This study uses Monte Carlo simulation based on Gamma distribution with the parameters of mean delay () and degree of variance (). The results showed an average delay of about 2.4 weeks, with the Gamma distribution matching the actual data based on the Kolmogorov-Smirnov test. These findings suggest that the Gamma distribution can be an effective prediction tool for modeling curriculum completion delays. Managerial recommendations include the preparation of flexible schedules and the use of simulation models for risk mitigation. This research contributes to education managers in designing better time and resource management strategiesCompletion of the curriculum on time is one of the important indicators of the success of the learning process. However, various factors such as material difficulty and external distractions often cause delays in curriculum completion. This study aims to model the delay in curriculum completion using Gamma distribution, with the research location at SMP Negeri 1 Melinting, East Lampung. Primary data is obtained from schools, while secondary data comes from related literature. This study uses Monte Carlo simulation based on Gamma distribution with the parameters of mean delay () and degree of variance (). The results showed an average delay of about 2.4 weeks, with the Gamma distribution matching the actual data based on the Kolmogorov-Smirnov test. These findings suggest that the Gamma distribution can be an effective prediction tool for modeling curriculum completion delays. Managerial recommendations include the preparation of flexible schedules and the use of simulation models for risk mitigation. This research contributes to education managers in designing better time and resource management strategies
Growth Model Study Using a Comparison of Gompertz, Logistic, and Weibull Models Suciati, Indah; Vina Nurmadani; Yoga Aji Sukma; Linda Rassiyanti
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 2 (2025): JULY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i2.9423

Abstract

Coronavirus Disease or COVID-19 has been a concern for the world, including Indonesia. The very rapid transmission of COVID-19 has had a wide impact on all communities around the world, especially Indonesia. To see the transmission of COVID-19 cases, which continues to increase rapidly, we can use a growth model. The growth model is a non-linear regression model that is used to describe growth behavior. These models can be exponential, sigmoidal, or S-shaped curves. The purpose of this study was to determine the growth curve model of positive COVID-19 cases in Indonesia using the Gompertz, Logistic, and Weibull models. After that, the model evaluation will be carried out using the coefficient of determination as a parameter, so that the best model will be obtained that can predict more accurately the growth of positive COVID-19 cases in Indonesia. The best model that can predict the growth of positive COVID-19 cases in Indonesia is the Gompertz model, with a coefficient of determination is 0.99064.
Prediction of Concrete Compressive Strength using SARIMA Method for Efficient Construction Planning Baihaqi, Rizqi Alif; Eva Rolia
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 2 (2025): JULY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i2.9613

Abstract

sdvvdvsd
Perbandingan Estimator Robust Huber dan Tukey’s Biweight terhadap Berbagai Skema Pencilan dalam Regresi Linier Linda Rassiyanti; Indah Suciati; Vina Nurmadani; Yoga Aji Sukma
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 2 (2025): JULY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i2.9630

Abstract

Regresi linier secara umum menggunakan pendekatan Ordinary Least Squares (OLS) namun sering kali mengalami gangguan ketika data mengandung pencilan (outlier), yang dapat menyebabkan estimasi parameter menjadi bias dan tidak akurat. Regresi robust dikembangkan untuk mengatasi kelemahan OLS dengan menurunkan sensitivitas terhadap pencilan. Terdapat dua fungsi kerugian yang sering digunakan dalam regresi robust, yaitu Huber Loss dan Tukey’s Biweight Loss. Penelitian ini bertujuan untuk membandingkan performa dua metode regresi robust, yaitu Huber Loss dan Tukey’s Biweight, dalam menghadapi berbagai skema pencilan. Data simulasi dibangkitkan dengan parameter intersep dan slope masing-masing sebesar 3 dan 2, kemudian ditambahkan pencilan secara sistematis pada variabel X, Y, maupun keduanya, dengan proporsi 10%, 20%, dan 30%. Hasil analisis menunjukkan bahwa Tukey’s Biweight memberikan estimasi parameter yang lebih stabil pada kondisi pencilan ekstrem, terutama saat pencilan terjadi pada variabel Y atau kombinasi X dan Y. Sedangkan, Huber Loss cenderung menghasilkan Mean Squared Error (MSE) yang lebih rendah dalam beberapa kondisi, mencerminkan adanya trade-off antara bias dan variansi. Dengan demikian, Tukey’s Biweight lebih cocok untuk pencilan ekstrem, sedangkan Huber Loss lebih efisien dalam kondisi pencilan ringan hingga sedang. Linear regression, commonly estimated using the Ordinary Least Squares (OLS) method, is known for its sensitivity to outliers, which can lead to biased and inefficient parameter estimates. Robust regression was developed to overcome the weaknesses of OLS by reducing sensitivity to outliers. Two commonly used loss functions in robust regression are Huber Loss and Tukey’s Biweight Loss. This study aims to compare the performance of these two robust regression methods—Huber Loss and Tukey’s Biweight—in handling various outlier scenarios. Simulated data were generated with intercept and slope parameters set at 3 and 2, respectively, and outliers were systematically introduced to the X variable, the Y variable, or both, in proportions of 10%, 20%, and 30%. The analysis results indicate that Tukey’s Biweight provides more stable parameter estimates under extreme outlier conditions, especially when outliers occur in the Y variable or in both X and Y. Meanwhile, Huber Loss tends to yield lower Mean Squared Error (MSE) in certain conditions, reflecting a classic trade-off between bias and variance. Therefore, Tukey’s Biweight is more suitable for extreme outliers, whereas Huber Loss is more efficient under mild to moderate outlier conditions.
Optimizing Breast Cancer Prediction by Applying Machine Learning Vina Nurmadani; Indah Suciati; Yoga Aji Sukma; Linda Rassiyanti
Sciencestatistics: Journal of Statistics, Probability, and Its Application Vol. 3 No. 2 (2025): JULY
Publisher : Universitas Muhammadiyah Metro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24127/sciencestatistics.v3i2.9667

Abstract

In 2015, breast cancer ranked among the most prevalent and fatal cancers affecting women globally. Artificial intelligence is urgently needed to help medical professionals make more accurate decisions, reduce overdiagnosis, and streamline the diagnostic process. This study will implement and perform a comparative study of selected machine learning techniques algorithms, with a focus on SVM, XGBoost, and ANN, with various parameter combinations on the breast cancer dataset. Performance metrics such as accuracy, precision, recall, and F1-score were employed to evaluate and compare the algorithms. The results of this study show that the best model for predicting chronic breast cancer disease, which can help medical professionals predict chronic disease so that it can be treated quickly and accurately, is the SVM method using 8 parameters without the mitosis parameter: Clump thickness, Cell Size Uniformity, Cell Shape Uniformity, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, and Normal Nuclei, with an accuracy value of 0.96 and a sensitivity value of 0.98.

Page 3 of 3 | Total Record : 29