Claim Missing Document
Check
Articles

Found 5 Documents
Search
Journal : Scientific Journal of Informatics

Sentiment Analysis on SocialMedia Using TF-IDF Vectorization and H2O Gradient Boosting for Student Anxiety Detection Ningsih, Maylinna Rahayu; Unjung, Jumanto
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i1.20582

Abstract

Purpose: Mental health issues are now a concern for many people. Anxiety or often called Anxiety that is excessive and prolonged has also become the forefront of various psychological disorders that trigger impacts such as stress to suicide. People using social media platforms tend to be a medium for expressing opinions sharing information and even expressing daily emotions. Many studies have shown a correlation between expressing emotional statements on social media and mental disorders. This research aims to conduct sentiment analysis of Anxiety on social media using H2O Gradient Boosting by implementing TF-IDF Vectorization which is set to max feature. Methods: This research utilizes 6980 post data from social media. The method applied is by conducting Exploratory Data Analysis then Data preprocessing, Tf-Idf Vectoriztion with max feature experiments 100, 250, 500, 1000 and 2000, H2O Gradient Boosting Model, Cross Validation, and Model performance evaluation. Result: The results of this study show good model performance through max feature TF-IDF = 250 with an accuracy value of 99%, Specificity 99.57%, and Eror Rate of 0.0106. Novelty: So that the use of the H2O Gradient Boosting model succeeded in providing good performance in classifying anxiety sentiment.
Sign Language Detection System Using YOLOv5 Algorithm to Promote Communication Equality People with Disabilities Ningsih, Maylinna Rahayu; Nurriski, Yopi Julia; Sanjani, Fathimah Az Zahra; Hakim, M. Faris Al; Unjung, Jumanto; Muslim, Much Aziz
Scientific Journal of Informatics Vol. 11 No. 2: May 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i2.6007

Abstract

Purpose: Communication is an important asset in human interaction, but not everyone has equal access to this key asset. Some of us have limitations such as hearing or speech impairments, which require a different communicative approach, namely sign language. These limitations often present accessibility gaps in various sectors, including education and employment, in line with Sustainable Development Goals (SDGs) numbers 4, 8, and 10. This research responds to these challenges by proposing a BISINDO sign language detection system using YOLOv5-NAS-S. The research aims to develop a sign language detection model that is accurate and fast, meets the communicative needs of people with disabilities, and supports the SDGs in reducing the accessibility gap. Methods: The research adopted a transfer learning approach with YOLOv5-NAS-S using BISINDO sign language data against a background of data diversity. Data pre-processing involved Super-Gradients and Roboflow augmentation, while model training was conducted with the Trainer of SuperGradients. Result: The results show that the model achieves a mAP of 97,2% and Recall of 99.6% which indicates a solid ability in separating sign language image classes. This model not only identifies sign language classes but can also predict complex conditions consistently. Novelty: The YOLOv5-NAS-S algorithm shows significant advantages compared to previous studies. The success of this performance is expected to make a positive contribution to efforts to create a more inclusive society, in accordance with the Sustainable Development Goals (SDGs). Further development related to predictive and real-time integration, as well as investigation of possible practical applications in various industries, are some suggestions for further research.
Performance Comparison of Random Forest (RF) and Classification and Regression Trees (CART) for Hotel Star Rating Prediction Utami, Annisaa; Permadi, Dimas Fanny Hebrasianto; Rosita, Yesy Diah; Unjung, Jumanto
Scientific Journal of Informatics Vol. 11 No. 3: August 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i3.11068

Abstract

Purpose: This study proposes to evaluate the effectiveness of Random Forest (RF) compared to Classification and Regression Trees (CART) in prediction of hotel star ratings. The objective is to identify the algorithm that provides the most reliable and accurate classification outcomes based on diverse hotel attributes in accordance with the standard categorization of star hotel categories. This is necessary due to the important role of accurate star ratings in guiding consumer choices and enhancing competitive positioning in the hospitality industry. Method: This study conducted a comprehensive dataset about Hotel in Banyumas Regency, including location, facilities, the size of rooms, type of rooms, price of rooms, and customer reviews, subjected to training through both RF and CART algorithms. Both algorithms are evaluated using accuracy, precision, recall, and F1 score. Additionally, both algorithms due to in the same preprocessing while performing hyperparameter tuning improve the efficacy of each model. Result: The results showed that RF achieved the best overall accuracy and robustness than CART across all tests conducted. Furthermore, RF also outperformed CART in classification effectiveness among classes, including enhanced precision and recall scores across multiple stars rating categories, signifying increased generalization and consistency in classification tasks. RF classifier consistently surpassed the CART classifier in terms of both accuracy and F1-score throughout all random states and test sizes, with a highest score of 0.9932 at a random state of 100 and a test size of 0.4. The most reliable results were obtained using RF with 42 random states and a test size of 0.2, resulting in an accuracy of 0.9909, precision of 1.0, recall of 1.0, and F1 score of 1.0. Simultaneously, CART shows values of 0.9818, 1.0, 1.0, and 1.0, respectively, while maintaining the same variation. This consistent performance, regardless of fluctuations, illustrates the robustness and suitability of RF for classification tasks compared to CART. Novelty: This study offered new insights about the implementation of machine learning about hotel star rating predictions using RF and CART algorithms. Also, the novelty of the collected hotel dataset used in this study. A detailed comparative analysis was also provided, contributing to the existing literature by showing the effectiveness of RF over CART for this specific application. Future studies could explore the integration of additional machine learning methods to further enhance prediction accuracy and operational efficiency in the hospitality industry.
A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending Pertiwi, Dwika Ananda Agustina; Ahmad, Kamilah; Unjung, Jumanto; Muslim, Much Aziz
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.14018

Abstract

Purpose: The problem of imbalanced datasets often affects the performance of classification models for prediction, one of which is credit risk prediction in P2P lending. To overcome this problem, several data balancing models have been applied in the existing literature. However, existing research only evaluates performance based on classification model performance. Thus, in addition to measuring the performance of classification models, this study involves the contribution of the performance of data balancing models including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Methods: This research uses the Lending Club dataset with an imbalanced ratio (IR) of 4.098, and 2 classifiers such as LightGBM and XGBoost, as well as 10 cross-validation to assess the performance of the data balancing model including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Then the model is evaluated using the metrics of accuracy, recall, precision, and F1-score. Result: The research results show that SMOTE has superior performance as a data balancing model in P2P lending, with an accuracy of the LightGBM+SMOTE model of 92.56% and the XGBoost+SMOTE model of 92.32%, where this performance is better than other models. Novelty: This research concludes that SMOTE as a data balancing model to improve credit risk prediction in P2P lending has superior performance. Apart from that, in this case, we find that the larger the data size used as a model training sample, the superior performance obtained by the classification model in predicting credit risk in P2P lending.
Evaluation of Ridge Classifier and Logistic Regression for Customer Churn Prediction on Imbalanced Telecommunication Data Rofik, Rofik; Unjung, Jumanto
Scientific Journal of Informatics Vol. 12 No. 2: May 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v12i2.24620

Abstract

Purpose: Customer churn is a crucial issue for companies, especially those in the telecommunications sector, as it has a direct impact on revenue and new customer acquisition costs. The purpose of this research is to create a customer churn prediction model through performance comparison between the Logistic Regression algorithm and Ridge Classifier, considering the effect of data balancing. Methods: This study developed a churn classification model by comparing the Logistic Regression and Ridge Classifier algorithms in three scenarios: without data balancing, balancing using SMOTE, and balancing using GAN. The dataset used was Telco Customer Churn from Kaggle. Model evaluation was performed using a confusion matrix with accuracy, precision, recall, and F1-score metrics, with a primary focus on the accuracy metric. Result: The results show that data balancing using SMOTE and GAN does not improve model accuracy. The highest accuracy was achieved by the Ridge Classifier without data balancing, at 82.47%, followed by Logistic Regression at 82.25%. However, the recall and F1-score metrics improved when using SMOTE. The highest recall was achieved by Ridge Classifier at 75.34% and Logistic Regression at 75.07% in the SMOTE 50:50 scenario. The highest F1-score was also achieved by Ridge Classifier at 64.76% and Logistic Regression at 64.68% followed by the SMOTE 50:30 scenario. Meanwhile, the precision metric tends to decrease after data balancing. Novelty: The uniqueness of this study lies in the comparison of the performance of the Ridge Classifier and Logistic Regression in data balancing scenarios using SMOTE and GAN, which has not been widely discussed in previous studies. The main findings show that the highest accuracy is achieved when the Ridge Classifier model uses original data or without applying SMOTE or GAN data balancing. However, data balancing using SMOTE has been proven to significantly improve the recall and F1-score metrics.