Claim Missing Document
Check
Articles

Digital Newsworthiness Scores Model Using a Combination of Unsupervised and Supervised Learning Approaches: Pemodelan Skor Kelayakan Berita Digital dengan Pendekatan Kombinasi Unsupervised dan Supervised Learning Citra, Reza Felix; Wigena, Aji Hamim; Sartono, Bagus
Indonesian Journal of Statistics and Applications Vol 9 No 1 (2025)
Publisher : Statistics and Data Science Program Study, IPB University, IPB University, in collaboration with the Forum Pendidikan Tinggi Statistika Indonesia (FORSTAT) and the Ikatan Statistisi Indonesia (ISI)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/ijsa.v9i1p86-99

Abstract

The rapid evolution of digital technology has transformed the media landscape, making news more accessible while also introducing challenges related to content quality and accuracy. The rise of misinformation and fake news has diminished public trust in traditional media. A method for evaluating the quality and potential impact of news articles prior to publication. By adapting credit risk scoring principles, a model was used to predict the suitability of news content based on factors such as title length, number of images, news category, and publication timing. A variable target was firstly formed using three clustering methods: K-Means, K-Modes, and K-Medoids. The results indicated that K-Means outperformed the other methods, leading us to use its outcomes for determining publication suitability. Subsequently, stepwise logistic regression was applied to implement the credit risk scoring approach, allowing for variable selection and assessment of importance. Ultimately, ten variables were identified to generate a newsworthiness score, with minimum and maximum scores of 997 and 1407, respectively. The average scores for articles deemed publishable and not publishable were 1137 and 1110. A cutoff score of 1123 was established based on these averages, categorizing 6708 articles (57.9%) as suitable for publication. These findings aim to assist media organizations in refining their content curation processes, thereby enhancing the overall quality of news consumption.
Identifying Poverty Vulnerability Patterns in Indonesia using Cheng and Chruch’s Algorithm Afnan, Irsyifa Mayzela; Wijayanto, Hari; Wigena, Aji Hamim
JTAM (Jurnal Teori dan Aplikasi Matematika) Vol 8, No 4 (2024): October
Publisher : Universitas Muhammadiyah Mataram

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31764/jtam.v8i4.25790

Abstract

Poverty remains a significant issue in developing countries, including Indonesia, where in 2022, the number of people living in poverty reached 26.36 million, with a poverty rate of 9.57%. The Central Statistics Agency (BPS) measures poverty using a basic needs approach, defined as the inability to meet essential food and non-food needs through expenditure. Individuals are considered poor if their average monthly per capita expenditure is below the poverty line. Research on poverty has evolved into a more multidimensional understanding, The Multidimensional Poverty Index (MPI), which identifies deprivation across three key dimensions: health, education, and living standards. This study aims to identify patterns of poverty vulnerability by applying the Cheng and Church (CC) algorithm through a biclustering approach using data from BPS. This quantitative method utilizes 13 multidimensional poverty indicators across 34 provinces. The CC algorithm begins by setting a threshold, followed by removing rows and columns with the largest residuals, adding qualifying rows and columns, and substituting elements to prevent overlap. The quality of the bicluster is then evaluated based on the Mean Squared Residue (MSR) value until optimal groups are formed. The results indicate that a threshold of ? = 0.01 generates seven biclusters with the lowest mean squared residual (0.0065), signifying optimal bicluster quality. Further validation using the Liu and Wang index reveals less than 50% similarity with other thresholds, reinforcing the uniqueness of these findings. MSR serves as a measure of homogeneity within the bicluster, similar to how uniform the level of poverty is within a region. If families have similar expenditures and are below the poverty line, they face similar challenges, resulting in a low MSR value. In contrast, the Liu and Wang index compares regions with different poverty alleviation strategies. These findings provide valuable insights for policymakers. For example, in bicluster 7, where specific interventions are needed in Papua and West Kalimantan, which face local challenges such as reliance on agriculture, low education levels, and limited access to sanitation and clean water.
Mendeteksi Unsur Depresi pada Unggahan Media Sosial Menggunakan Metode Machine Learning dengan Optimasi Berbasis Inspirasi Alam Santoso, Zein Rizky; Wigena, Aji Hamim; Kurnia, Anang
ESTIMASI: Journal of Statistics and Its Application Vol. 6, No. 2, Juli, 2025 : Estimasi
Publisher : Hasanuddin University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20956/ejsa.v6i2.45516

Abstract

Social media has now become an inseparable part of everyday life, including in expressing emotions and mental states. One popular platform is X (formerly Twitter), where many users indirectly share signs of depression. This study develops a classification model to detect indications of depression in social media posts, using machine learning algorithms and feature selection techniques based on nature-inspired algorithms. The classification algorithms used include Naïve Bayes, k-Nearest Neighbors (k-NN), Decision Tree, Random Forest, and XGBoost. Each algorithm is combined with feature selection techniques using Particle Swarm Optimization (PSO), Bat Algorithm (BA), and Flamingo Search Algorithm (FSA). The models are evaluated based on accuracy, precision, recall, F1-score, and the number of features used. The results show that the combination of the Random Forest method with FSA-based feature selection (RF-FSA) delivers the best performance, with an accuracy of 82.2%, balanced precision and recall, and efficient feature usage. Another strong alternative is XGBoost with FSA (XGB-FSA), although it requires more features and longer computational time. This study demonstrates that selecting the right feature selection algorithm, particularly FSA, can significantly improve both the accuracy and efficiency of depression text classification models. The resulting model is expected to serve as a useful tool for early detection of depression symptoms from social media posts, allowing for quicker and more targeted interventions.
D-OPTIMAL DESIGNS FOR SPLIT-PLOT MIXTURE PROCESS VARIABLE DESIGNS OF THE STEEL SLAG EXPERIMENT Arina, Faula; Wigena, Aji Hamim; Sumertajaya, I Made; Syafitri, Utami
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 16 No 1 (2022): BAREKENG: Jurnal Ilmu Matematika dan Terapan
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (661.096 KB) | DOI: 10.30598/barekengvol16iss1pp303-312

Abstract

The nature of the steel slag concrete experiment followed a mixture process variable (MPV) design. In this study, the concrete is composed of five mixture components, cement, fine aggregate, coarse aggregate, percentage steel slag replaced the fine aggregate and water, and process variable was the size of steel slag. Due to the constraints of the components, the experimental region was not a simplex. The standard MPV of a quadratic model produces large experimental runs. In this paper, D-optimal design with split- plot MPV approach was proposed. The five mixture components were assigned as the subplot factors and the process variable was assigned as the whole plot factors. The main objective of this information is a modified point exchange algorithm was developed to generate the D-optimal design. In addition, the paper investigates related issue namely, the estimation of the covariant matrix in MPV split-plot design. The final design consisted of 18 whole plots each of size 2 and experiment design with 36 observations
THE PROMINENCE OF VECTOR AUTOREGRESSIVE MODEL IN MULTIVARIATE TIME SERIES FORECASTING MODELS WITH STATIONARY PROBLEMS Rohaeti, Embay; Sumertajaya, I Made; Wigena, Aji Hamim; Sadik, Kusman
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 16 No 4 (2022): BAREKENG: Journal of Mathematics and Its Applications
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (688.398 KB) | DOI: 10.30598/barekengvol16iss4pp1313-1324

Abstract

One of the problems in modelling multivariate time series is stationary. Stationary test results do not always produce all stationary variables; mixed stationary and non-stationary variables are possible. When stationary problems are found in multivariate time series modelling, it is necessary to evaluate the model's performance in various stationary conditions to obtain the best forecasting model. This study aims to get a superior multivariate time series forecasting model based on the goodness of the model in various stationary conditions. In this study, the evaluation of the model's performance through simulation data modelling is then applied to the actual data with a stationary problem, namely Bogor City inflation data. The best model in simulation modelling is based on the stability of RMSE and MAD in 100 replications. The results are that the VAR model is the best in various stationary conditions. Meanwhile, the best model on actual data modelling is based on evaluation in 4 folds for model fitting power and model forecasting power. The Bogor City inflation data modelling with the mixed stationary problem resulted in the best model, namely the VAR(1) model. This means the VAR model is good enough to be used as a forecasting model in mixed stationary conditions. Thus, in this study, based on the goodness of the model in two modelling scenarios in various stationary conditions, overall, it was found that the VAR model was superior to the VARD and VECM models.
APPLICATION OF PENALIZED SPLINE-SPATIAL AUTOREGRESSIVE MODEL TO HIV CASE DATA IN INDONESIA Pigitha, Nindi; Djuraidah, Anik; Wigena, Aji Hamim
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 17 No 1 (2023): BAREKENG: Journal of Mathematics and Its Applications
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (389.821 KB) | DOI: 10.30598/barekengvol17iss1pp0527-0534

Abstract

Spatial regression analysis is a statistical method used to perform modeling by considering spatial effects. Spatial models generally use a parametric approach by assuming a linear relationship between explanatory and response variables. The nonparametric regression method is better suited for data with a nonlinear connection because it does not need linear assumptions. One of the nonparametric regression methods is penalized spline regression (P-Spline). The P-spline has a simple mathematical relationship with mixed linear model. The use of a mixed linear model allows the P-Spline to be combined with other statistical models. PS-SAR is a combination of the P-Spline and the SAR spatial model so that it can analyze spatial data with a semiparametric approach. Based on data from monitoring the development of the HIV situation in 2018, the number of HIV cases in Indonesia shows a clustered pattern that indicate spatial dependence. In addition, the relationship between the number of positive cases and the factors tends to be nonlinear. Therefore, this study aims to apply the PS-SAR model to HIV case data in Indonesia. The resulting model is evaluated based on the estimates of autoregressive spatial coefficient, MSE, MAPE, and Pseudo R2. Based on the results, the PS-SAR model has an autoregressive spatial coefficient similar to the SAR model and has smaller MSE and MAPE than the SAR model.
PRE-PROCESSING DATA ON MULTICLASS CLASSIFICATION OF ANEMIA AND IRON DEFICIENCY WITH THE XGBOOST METHOD Nurrahman, Fathu; Wijayanto, Hari; Wigena, Aji Hamim; Nurjanah, Nunung
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 17 No 2 (2023): BAREKENG: Journal of Mathematics and Its Applications
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol17iss2pp0767-0774

Abstract

Anemia and iron deficiency are health problems in Indonesia and globally. In Multiclass Classification, data problems often occur, such as missing data, too many variables, and unbalanced data. Then pre-processing data will be carried out using MissForest imputation, Boruta featuring selection, and SMOTE to help improve the performance of the classification model in predicting a particular class. After the data pre-processing process is carried out, classification modeling will be carried out using the XGBoost algorithm. It was found that when pre-processing the data could improve the performance of the model in predicting multiclass classification for cases of anemia and iron deficiency in women in Indonesia by 0.815 for the accuracy value and 0.9693 for the AUC value
SMALL AREA ESTIMATION WITH HIERARCHICAL BAYES FOR CROSS-SECTIONAL AND TIME SERIES SKEWED DATA Yuniarty, Titin; Indahwati, Indahwati; Wigena, Aji Hamim
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 18 No 1 (2024): BAREKENG: Journal of Mathematics and Its Application
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30598/barekengvol18iss1pp0493-0506

Abstract

Small Area Estimation (SAE) is a method based on modeling for estimating small area parameters, that applies Linear Mixed Model (LMM) as its basic. It is conventionally solved with Empirical Best Linear Unbiased Prediction (EBLUP). The main requirement for LMM to produce high precision estimates is normally distributed. The observation unit is food crop farmer households from Sulawesi Tenggara Province to estimate food and non-food per capita expenditure at the district/city level using SAE that has been positively skewed. Applying EBLUP for positively skewed data will result less accurate estimates. Meanwhile, transformation will be potentially result biased estimates. Therefore, the problem of skewed data and small area level in this research was completed by Hierarchical Bayes (HB) on combination cross-sectional and time series under skew-normal distribution assumption. The results obtained were skew-normal SAE HB model was significantly reducing Relative Root Mean Squared Error (RRMSE) than the direct estimation. It indicates that SAE modeling is able to provide a shrinkage effect on the direct estimation results. But, there is slightly different interpretating between direct estimation and skew-normal SAE HB. It is possible because the modeling used assumption that the autocorrelation coefficient is equal to 1 or known as the random walk effect. However, in reality, Susenas is not a panel data, so unit of observation for each time period may be different. Therefore, further research should be compared it with the skew-normal or another skewed distribution that assumes the autocorrelation coefficient is unknown and should be estimated in the model.
Biclustering-Based Analysis to Identify Fruit Production Potential in Indonesia Using Plaid Model Algorithm Alwani, Nadira Nisa; Sumertajaya, I Made; Wigena, Aji Hamim
Scientific Journal of Informatics Vol. 12 No. 3: August 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i3.25054

Abstract

Purpose: The application of biclustering using the plaid model aims to simultaneously identify mapping or grouping patterns of provinces and fruit type in Indonesia. The performance evaluation of the plaid model algorithm is used to assess its capability to discover and generate optimal biclusters, thereby representing the relationship between regions and fruit types with similar production characteristics. Methods: The plaid model algorithm produces optimal biclusters by configuring parameter scenarios such as model selection, managing the number of layers, and determining threshold values for rows and columns. The Average Mean Square Residue (MSR) value and the number of biclusters that can provide the most relevant data are used to determine the optimal parameter selection. Result: The plaid model algorithm effectively grouped provinces and fruit varieties into multiple biclusters. The row-constant model was choosen based on the average MSR value of 2.0537, which formed five overlapping biclusters across provinces and fruit types. Several provinces, such as Central Java and West Java, demonstrated a high potential for rose apples, breadfruit, and salak. Other provinces showed comparatively moderate levels of production. Novelty: This study presents a novel way to apply the plaid model biclustering algorithm to data on fruit varieties in various Indonesian provinces. Rarely used in horticulture, this method offers an alternative perspective on structured commodity mapping, especially when identifying specific patterns between fruit varieties and geographic distribution.
The Impact of Using A Linear Model for the Ordinal Response of Mixture Experiments Syafitri, Utami Dyah; Erfiani, Erfiani; Soleh, Agus M; Wigena, Aji Hamim
ZERO: Jurnal Sains, Matematika dan Terapan Vol 9, No 2 (2025): Zero: Jurnal Sains Matematika dan Terapan
Publisher : UIN Sumatera Utara

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30829/zero.v9i2.25760

Abstract

In a sensory test, the response is a Likert scale, which belongs to the ordinal scale. The ordinal response can be analyzed using a linear model approach; however, this approach can be misleading.  This research aims to compare three different methods for ordinal response: the average score, the second-order Scheffe model, and the ordinal logistic model. The case study focused on the response to the taste of cookies resulting from the mixture experiment. The mixture experiment is one type of experimental design which is commonly used for product formulation.  The research involved three ingredients with different lower bonds.  The D-optimal design which also the {3,2} simplex-lattice design was chosen for the experiment. The three methods were conducted, and they all yielded the same results for the optimum composition; however, the ordinal model provided more information about the data's characteristics. The optimal formulation of each ingredient was 10%, 20%, 70%.