Claim Missing Document
Check
Articles

Found 29 Documents
Search

Performance Comparison of Random Forest (RF) and Classification and Regression Trees (CART) for Hotel Star Rating Prediction Utami, Annisaa; Permadi, Dimas Fanny Hebrasianto; Rosita, Yesy Diah; Unjung, Jumanto
Scientific Journal of Informatics Vol. 11 No. 3: August 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i3.11068

Abstract

Purpose: This study proposes to evaluate the effectiveness of Random Forest (RF) compared to Classification and Regression Trees (CART) in prediction of hotel star ratings. The objective is to identify the algorithm that provides the most reliable and accurate classification outcomes based on diverse hotel attributes in accordance with the standard categorization of star hotel categories. This is necessary due to the important role of accurate star ratings in guiding consumer choices and enhancing competitive positioning in the hospitality industry. Method: This study conducted a comprehensive dataset about Hotel in Banyumas Regency, including location, facilities, the size of rooms, type of rooms, price of rooms, and customer reviews, subjected to training through both RF and CART algorithms. Both algorithms are evaluated using accuracy, precision, recall, and F1 score. Additionally, both algorithms due to in the same preprocessing while performing hyperparameter tuning improve the efficacy of each model. Result: The results showed that RF achieved the best overall accuracy and robustness than CART across all tests conducted. Furthermore, RF also outperformed CART in classification effectiveness among classes, including enhanced precision and recall scores across multiple stars rating categories, signifying increased generalization and consistency in classification tasks. RF classifier consistently surpassed the CART classifier in terms of both accuracy and F1-score throughout all random states and test sizes, with a highest score of 0.9932 at a random state of 100 and a test size of 0.4. The most reliable results were obtained using RF with 42 random states and a test size of 0.2, resulting in an accuracy of 0.9909, precision of 1.0, recall of 1.0, and F1 score of 1.0. Simultaneously, CART shows values of 0.9818, 1.0, 1.0, and 1.0, respectively, while maintaining the same variation. This consistent performance, regardless of fluctuations, illustrates the robustness and suitability of RF for classification tasks compared to CART. Novelty: This study offered new insights about the implementation of machine learning about hotel star rating predictions using RF and CART algorithms. Also, the novelty of the collected hotel dataset used in this study. A detailed comparative analysis was also provided, contributing to the existing literature by showing the effectiveness of RF over CART for this specific application. Future studies could explore the integration of additional machine learning methods to further enhance prediction accuracy and operational efficiency in the hospitality industry.
A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending Pertiwi, Dwika Ananda Agustina; Ahmad, Kamilah; Unjung, Jumanto; Muslim, Much Aziz
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.14018

Abstract

Purpose: The problem of imbalanced datasets often affects the performance of classification models for prediction, one of which is credit risk prediction in P2P lending. To overcome this problem, several data balancing models have been applied in the existing literature. However, existing research only evaluates performance based on classification model performance. Thus, in addition to measuring the performance of classification models, this study involves the contribution of the performance of data balancing models including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Methods: This research uses the Lending Club dataset with an imbalanced ratio (IR) of 4.098, and 2 classifiers such as LightGBM and XGBoost, as well as 10 cross-validation to assess the performance of the data balancing model including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Then the model is evaluated using the metrics of accuracy, recall, precision, and F1-score. Result: The research results show that SMOTE has superior performance as a data balancing model in P2P lending, with an accuracy of the LightGBM+SMOTE model of 92.56% and the XGBoost+SMOTE model of 92.32%, where this performance is better than other models. Novelty: This research concludes that SMOTE as a data balancing model to improve credit risk prediction in P2P lending has superior performance. Apart from that, in this case, we find that the larger the data size used as a model training sample, the superior performance obtained by the classification model in predicting credit risk in P2P lending.
Evaluation of Ridge Classifier and Logistic Regression for Customer Churn Prediction on Imbalanced Telecommunication Data Rofik, Rofik; Unjung, Jumanto
Scientific Journal of Informatics Vol. 12 No. 2: May 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i2.24620

Abstract

Purpose: Customer churn is a crucial issue for companies, especially those in the telecommunications sector, as it has a direct impact on revenue and new customer acquisition costs. The purpose of this research is to create a customer churn prediction model through performance comparison between the Logistic Regression algorithm and Ridge Classifier, considering the effect of data balancing. Methods: This study developed a churn classification model by comparing the Logistic Regression and Ridge Classifier algorithms in three scenarios: without data balancing, balancing using SMOTE, and balancing using GAN. The dataset used was Telco Customer Churn from Kaggle. Model evaluation was performed using a confusion matrix with accuracy, precision, recall, and F1-score metrics, with a primary focus on the accuracy metric. Result: The results show that data balancing using SMOTE and GAN does not improve model accuracy. The highest accuracy was achieved by the Ridge Classifier without data balancing, at 82.47%, followed by Logistic Regression at 82.25%. However, the recall and F1-score metrics improved when using SMOTE. The highest recall was achieved by Ridge Classifier at 75.34% and Logistic Regression at 75.07% in the SMOTE 50:50 scenario. The highest F1-score was also achieved by Ridge Classifier at 64.76% and Logistic Regression at 64.68% followed by the SMOTE 50:30 scenario. Meanwhile, the precision metric tends to decrease after data balancing. Novelty: The uniqueness of this study lies in the comparison of the performance of the Ridge Classifier and Logistic Regression in data balancing scenarios using SMOTE and GAN, which has not been widely discussed in previous studies. The main findings show that the highest accuracy is achieved when the Ridge Classifier model uses original data or without applying SMOTE or GAN data balancing. However, data balancing using SMOTE has been proven to significantly improve the recall and F1-score metrics.
Enhancing Abusive Language Detection on Twitter Using Stacking Ensemble Learning Utami, Putri; Tanga, Yulizchia Malica Pinkan; Unjung, Jumanto; Muslim, Much Aziz
Journal of Information System Exploration and Research Vol. 3 No. 2 (2025): July 2025
Publisher : shmpublisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joiser.v3i2.594

Abstract

Detecting abusive language on Twitter is an important step in reducing the prevalence of negative content and harassment. This study aims to improve the accuracy and effectiveness of abusive language detection on Twitter by addressing the limitations of the single model commonly used previously. The stacking method is employed by combining Term Frequency-Inverse Document Frequency (TF-IDF) as the feature extraction method, along with the Naive Bayes and XGBoost algorithms as classification models. Naive Bayes is known for its simplicity in handling text classification, while XGBoost excels in processing complex data and achieving high accuracy. The combination of these two models is expected to improve performance in detecting coarse language. The research results show that the proposed model outperforms the methods in previous studies, with an accuracy of 91.91% and an AUC of 96.76%. These findings demonstrate the effectiveness of the stacking approach in reducing classification errors in coarse language detection. Further research could explore the use of larger datasets or more complex models to improve detection accuracy.
Pemanfaatan Teknologi IoT untuk Optimalisasi Energi dan Manajemen Fasilitas Berkelanjutan Menggunakan Fuzzy Logic Ceorido Ghalib Wibowo; Jumanto Unjung; Dwika Ananda Agustina Pertiwi; Much. Aziz Muslim
Prosiding SISFOTEK Vol 9 No 1 (2025): SISFOTEK IX 2025
Publisher : Ikatan Ahli Informatika Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

The development of the smart airport concept requires the integration of digital technologies to enhance operational efficiency, environmental sustainability, and passenger comfort. One of the key technologies driving this transformation is the Internet of Things (IoT), which enables real-time data collection and analysis from interconnected devices and sensors. This study explores the utilization of IoT for energy optimization and sustainable facility management in smart airports, focusing on systems such as energy consumption monitoring, automated lighting control, sensor-based temperature regulation, and efficient equipment management. The research employs a literature review and data-driven system analysis within the operational context of modern airports. The results indicate that IoT-based implementations can reduce energy consumption by up to 25–30% through dynamic and predictive control of resource usage. Furthermore, IoT contributes to sustainability by reducing carbon emissions and extending infrastructure lifespan. Therefore, IoT technology serves as a fundamental component in realizing airports that are efficient, environmentally friendly, and future-oriented.
Perbandingan Akurasi Machine Learning dan Deep Learning dalam Deteksi Serangan SQL Injection Franki SW; Jumanto Unjung; DAA Pertiwi; Much. Aziz Muslim
Prosiding SISFOTEK Vol 9 No 1 (2025): SISFOTEK IX 2025
Publisher : Ikatan Ahli Informatika Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

SQL Injection (SQLi) attacks are among the most common threats to web application security, potentially leading to data breaches and unauthorized manipulation of database systems. The limitations of traditional detection mechanisms, such as Web Application Firewalls (WAF), highlight the need for intelligent approaches capable of adapting to emerging attack patterns. This study aims to develop an effective, accurate, and adaptive SQL Injection detection model by comparing the performance of the Random Forest algorithm as a representation of traditional Machine Learning and the Multilayer Perceptron (MLP) as a representation of Deep Learning. The evaluation focuses on classification accuracy, processing speed, and implementation simplicity using an identical SQL Injection attack dataset. The results of this study are expected to provide recommendations for an optimal detection model to enhance web application security and strengthen defense systems against code injection-based cyber threats.
Malaria Disease Detection System in Humans Using Convolutional Neural Network (CNN) Yana, Natasya Siska Fitri; Shabaha, Achmad Rozin; Unjung, Jumanto
Journal of Electronics Technology Exploration Vol. 3 No. 2 (2025): December 2025
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joetex.v3i2.646

Abstract

Malaria is a deadly disease transmitted by the Plasmodium parasite. Detection is performed by trained microscopists who analyze microscopic images of blood smears. This analysis can be done automatically using modern deep learning techniques. The need for skilled labor can be significantly reduced by developing accurate and efficient automated models. In this article, we propose a fully automated convolutional neural network (CNN)-based model for diagnosing malaria from microscopic images of blood smears. Various techniques including knowledge distillation, data augmentation, autoencoder, feature extraction with CNN model to optimize and improve model accuracy and reasoning performance. Our deep learning model can detect malaria parasites from microscopic images with 95% accuracy requiring more than 27,600 images. This shows that the mode is able to provide more accurate predictions compared to malaria disease detection models using other algorithms such as in previous studies with an accuracy of 90%. By using CNN algorithm, this article can contribute novelty in the development of effective malaria detection methods for malaria disease.
Optimization of SVM and Gradient Boosting Models Using GridSearchCV in Detecting Fake Job Postings Rofik Rofik; Roshan Aland Hakim; Jumanto Unjung; Budi Prasetiyo; Much Aziz Muslim
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 23 No. 2 (2024)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v23i2.3566

Abstract

Online job searching is one of the most efficient ways to do this, and it is widely used by people worldwide because of the automated process of transferring job recruitment information. The easy and fast process of transferring information in job recruitment has led to the rise of fake job vacancy fraud. Several studies have been conducted to predict fake job vacancies, focusing on improving accuracy. However, the main problem in prediction is choosing the wrong parameters so that the classification algorithm does not work optimally. This research aimed to increase the accuracy of fake job vacancy predictions by tuning parameters using GridSearchCV. The research method used was SVM and Gradient Boosting with parameter adjustments to improve the parameter combination and align it with the predicted model characteristics. The research process was divided into preprocessing, feature extraction, data separation, and modeling stages. The model was tested using the EMSCAD dataset. This research showed that the SVM algorithm can achieve the highest accuracy of 98.88%, while gradient enhancement produces an accuracy of 98.08%. This research showed that optimizing the SVM model with GridSearchCV can increase accuracy in predicting fake job recruitment.
Classification of Pancreatic Cancer Diagnosis with CatBoost Using Urine Biomarker Combination Tanga, Yulizchia Malica Pinkan; Utami, Putri; Darmawan, Aditya Yoga; Unjung, Jumanto
Journal of Electronics Technology Exploration Vol. 4 No. 1 (2026): June 2026
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joetex.v4i1.651

Abstract

Uncontrolled cell growth in the pancreatic gland, is one of the most aggressive types of cancer with a high mortality rate, called pancreatic cancer. This research focuses on improving early diagnosis methods for pancreatic cancer by using CatBoost. Urine biomarker datasets were collected and subjected to pre-processing, including label coding, standardized scaling, and balancing via the Synthetic Minority Oversampling Technique (SMOTE). The CatBoost model achieved an accuracy of 98.89%, specificity of 99.35%, sensitivity of 98.71%, and Area Under the Curve (AUC) of 0.9951. These results show that the CatBoost model significantly outperforms the diagnosis models in previous studies, overcoming the challenges of early detection and classification of pancreatic cancer. This study shows that CatBoost is effective for diagnosing pancreatic cancer and suggests that future research explore other models on larger and more diverse datasets.