Claim Missing Document
Check
Articles

Word Embedding Feature for Improvement Machine Learning Performance in Sentiment Analysis Disney Plus Hotstar Comments Jasmir, Jasmir; Nurhadi, Nurhadi; Rohaini, Eni; Pahlevi B, M Riza; Pardamean Simanjuntak, Daniel Sintong
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol. 10 No. 2 (2024): June
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v10i2.28799

Abstract

In this research we apply several machine learning methods and word embedding features to process social media data, specifically comments on the Disney Plus Hotstar application. The word embedding features used include Word2Vec, GloVe, and FastText. Our aim is to evaluate the impact of these features on the classification performance of machine learning methods such as Naive Bayes (NB), K-Nearest Neighbor (KNN), and Random Forest (RF). NB is very simple and efficient and very sensitive to feature selection. Meanwhile, KNN is known for its weaknesses such as biased k values, overly complex computations, memory limitations, and ignoring irrelevant attributes. Then RF has a weakness, namely that the evaluation value can change significantly with just a slight change in the data. Feature selection in text classification is crucial for enhancing scalability, efficiency, and accuracy. Our testing results indicate that KNN achieved the highest accuracy both before and after feature selection. The FastText feature led to the highest performance for KNN, yielding balanced accuracy, precision, recall, and F1-score values.
Comparative Analysis of Optimizer Effectiveness in GRU and CNN-GRU Models for Airport Traffic Prediction Riyadi, Willy; Jasmir, Jasmir
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol. 10 No. 3 (2024): September
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v10i3.29659

Abstract

The COVID-19 pandemic has posed significant challenges to airport traffic management, necessitating accurate predictive models. This research evaluates the effectiveness of various optimizers in enhancing airport traffic prediction using Deep Learning models, specifically Gated Recurrent Units (GRU) and Convolutional Neural Network-Gated Recurrent Units (CNN-GRU). We compare the performance of optimizers including RMSprop, Adam, Nadam, AdamW, Adamax, and Lion, and analyze the impact of their parameter tuning on model accuracy. Time series data from airports in the United States, Canada, Chile, and Australia were used, with preprocessing steps like filtering, cleaning, and applying a MinMax Scaler. The data was split into 80% for training and 20% for testing. Our findings reveal that the Adam optimizer paired with the GRU model achieved the lowest Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) in the USA. The study underscores the importance of selecting and tuning optimizers, with ReduceLROnPlateau used to adjust the learning rate dynamically, preventing overfitting and improving model convergence. However, limitations include dataset imbalance and region-specific results, which may affect the generalizability of the findings. Future research should address these limitations by developing balanced datasets and exploring optimizer performance across a broader range of regions and conditions. This study lays the groundwork for further investigating sustainable and accurate airport traffic prediction models.
PATTERN CLASSIFICATION SIGN LANGUAGE USING FEATURES DESCRIPTORS AND MACHINE LEARNING Nurhadi, Nurhadi; Winanto, Eko Arip; Said, Rahaini Mohd; Jasmir, Jasmir; Afuan, Lasmedi
Jurnal Teknik Informatika (Jutif) Vol. 5 No. 2 (2024): JUTIF Volume 5, Number 2, April 2024
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2024.5.2.1228

Abstract

Sign language is way of communication for the deaf and speech impaired. In Indonesia, the utilization of a standardized language involves the incorporation of American Sign Language (ASL). ASL is employed for various communication needs, ranging from basic alphanumeric fingerspelling (A-Z and numbers) to the more complex SIBI form (comprising gesture vocabulary) in everyday interactions as well as formal contexts. This surge in the digitization of sign language underscores the ongoing advancements in research and development. The challenge in this research lies in the ability to recognize American Sign Language (ASL) with diverse intensities and invariant backgrounds. Therefore, the study emphasis is on proposing a suitable segmentation method comparison for multi-intensity ASL cases. Subsequently, global feature descriptor methods, including Color Histogram, Hu Moments, and Haralick Texture techniques, are applied for feature extraction. The result of the Logistic Regression method versus the supervised Random Forest checks accuracy and suitability in identifying ASL fingerspelling. The findings of this research is predictive value of logistic regression is 48%, with class Y having the highest precision (0.86), class V having the lowest accuracy (0.16), and class L having the highest recall (0.73). The maximum precision in classes B, F, H, I, K, Y, and Z is 1.00, and the lowest in class U is 0.58, while the highest recall is in class G, which is 1.00. The lowest is in class V, while the predictive value from the random forest is 86 percent. Class H has the greatest f1 score (0.99), while class U has the lowest f1 score (0.64). The Random Forest method outperforms the two methods suggested in the paper, according to the comparison.
Comparison and Data Visualization in Thyroid Cancer Disease Prediction Using Machine Learning Algorithms Yudha, M. Zahran; Jasmir, Jasmir; Fachruddin, Fachruddin
MALCOM: Indonesian Journal of Machine Learning and Computer Science Vol. 6 No. 1 (2026): MALCOM January 2026
Publisher : Institut Riset dan Publikasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.57152/malcom.v6i1.2249

Abstract

Thyroid cancer is a common endocrine malignancy requiring accurate early prediction for improved patient outcomes. Comprehensive comparative studies of machine learning algorithms, accompanied by systematic visualization, remain limited. This study compares tree-based algorithms (Decision Trees, Random Forest) and boosting algorithms (Gradient Boosting, XGBoost) for thyroid cancer prediction and develops visualization strategies for clinical interpretation. Four algorithms were evaluated using accuracy (correct prediction proportion), precision (positive predictive value), recall (true positive rate), F1-score (harmonic mean of precision and recall), and AUC-ROC (area under the ROC curve). Visualization techniques, including confusion matrices, ROC curves, and feature importance plots, facilitated the interpretation of the model. XGBoost achieved superior performance with accuracy 95.2%, precision 94.8%, recall 95.6%, F1-score 95.2%, and AUC-ROC 0.978, followed by Random Forest (93.5%, 92.7%, 94.1%, 93.4%, 0.965), Gradient Boosting (91.8%, 90.9%, 92.4%, 91.6%, 0.952), and Decision Trees (87.3%, 86.5%, 88.2%, 87.3%, 0.913). Feature importance analysis identified key predictors. Boosting algorithms, particularly XGBoost, demonstrate superior thyroid cancer prediction across all metrics. Integrated visualization enhances clinical interpretability, providing empirical guidance for implementing machine learning-based diagnostic support systems.
Optimasi XGBoost Dengan SHAP Untuk Sistem Skrining Penyakit Jantung Clara Zuliani Syahputri; Jasmir Jasmir; Fachruddin Fachruddin
Prosiding Seminar Nasional Ilmu Teknik Vol. 2 No. 2 (2025): Desember: Prosiding Seminar Nasional Ilmu Teknik
Publisher : Asosiasi Riset Ilmu Teknik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.61132/prosemnasproit.v2i2.147

Abstract

Heart disease is the leading cause of death in Indonesia and globally, necessitating an early screening system that is both accurate and clinically trustworthy. Although XGBoost demonstrates high predictive performance, its black-box nature undermines clinical trust, while low recall risks missed diagnosis an unacceptable consequence in population screening, especially in middle-income countries with limited healthcare resources. This study aims to develop a sensitive, transparent, and implementation-ready heart disease screening framework through the integration of SHAP-based Explainable AI. The CDC's Indicators of Heart Disease dataset (319,795 samples) was processed according to WHO/CDC standards, followed by class imbalance handling, hyperparameter optimization using RandomizedSearchCV, evaluation based on metrics sensitive to minority classes (AUC, recall, F1-score, AUC-PR), and threshold tuning to maximize recall. The baseline model showed a very low recall of 12.18%. After optimization and threshold tuning at 0.10, the model achieved recall >96% (96.79%) with a G-mean of 0.7477, supported by SHAP interpretation stability and the ability to capture non-linear interactions between advanced age (AgeCategory_WHO) and poor general health (GenHealth). SHAP analysis confirmed the alignment of dominant features with medical evidence, and its visualizations provide transparent explanations for healthcare professionals indicating its potential implementation as an interpretable clinical decision support system.
Perancangan Alat Deteksi Tingkat Kematangan Buah Mangga Indramayu Berdasarkan Kandungan Gas dan Pengolahan Citra Menggunakan YOLOv11 Adi Kusuma; Jasmir Jasmir; Willy Riyadi; Ahmad Ahmad
Prosiding Seminar Nasional Ilmu Teknik Vol. 2 No. 2 (2025): Desember: Prosiding Seminar Nasional Ilmu Teknik
Publisher : Asosiasi Riset Ilmu Teknik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.61132/prosemnasproit.v2i2.151

Abstract

Indramayu mango is a seasonal fruit that is highly favored due to its delicious taste and high nutritional content. However, high mango production is often not supported by adequate post-harvest facilities, particularly in terms of fruit ripeness classification. Currently, mango ripeness classification is still performed manually, which tends to be subjective and inconsistent. To address this issue, this study proposes a ripeness detection system for Indramayu mangoes by integrating the TGS2602 gas sensor and the YOLOv11 algorithm based on image processing. The TGS2602 sensor is used to detect ethylene gas emitted by ripe mangoes, while YOLOv11 is employed for visual image analysis of the fruit. This study aims to evaluate the system’s performance in classifying ripe and unripe mangoes, as well as analyze the integration between the gas sensor and the object detection model. The test results show that the TGS2602 sensor can detect increased ethylene gas concentration in ripe mangoes, while YOLOv11 demonstrates high accuracy in detecting mangoes based on visual images, with precision and recall close to 1.0. The system was also tested under various lighting conditions, including dark environments, and still performed well, although with a slight decrease in accuracy under low-light conditions.
Evolusi Performa Arsitektur Deep Learning melalui Optimasi Bertahap dan Interpretabilitas Grad-CAM untuk Klasifikasi Penyakit Ikan Air Tawar Sasa Kirana Wulandari; Fachruddin Fachruddin; Jasmir Jasmir
Prosiding Seminar Nasional Ilmu Teknik Vol. 2 No. 2 (2025): Desember: Prosiding Seminar Nasional Ilmu Teknik
Publisher : Asosiasi Riset Ilmu Teknik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.61132/prosemnasproit.v2i2.179

Abstract

Freshwater fish diseases significantly affect aquaculture productivity and economic sustainability, while accurate visual classification remains challenging due to interclass similarity and image variability. This study presents a comparative evaluation of three deep learning architectures—DenseNet201, ResNet50, and EfficientNetV2-S—using a stepwise optimization strategy combined with Gradient-weighted Class Activation Mapping (Grad-CAM) for freshwater fish disease classification. Models were trained through three phases: baseline, optimized, and fine-tuned. Performance was evaluated using accuracy, precision, recall, F1 score, Matthews correlation coefficient (MCC), Cohen’s kappa, and per-class ROC–AUC. Results show consistent performance improvement across all architectures, with EfficientNetV2-S achieving the highest accuracy (97.14%), followed by ResNet50 (96.11%) and DenseNet201 (94.40%). High ROC–AUC values (>0.98) indicate strong discriminative capability. Grad-CAM analysis confirms that all optimized models focus on biologically relevant lesion regions, enhancing model transparency and reliability.
Comparison of AdaBoost and Random Forest Methods in Osteoporosis Risk Prediction Based on Machine Learning Parlindungan H, Edwardo; Assegaff, Setiawan; Jasmir, Jasmir
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 1 (2026): JUTIF Volume 7, Number 1, February 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.1.5297

Abstract

This study aims to determine the most effective ensemble machine learning algorithm for osteoporosis risk prediction in resource-constrained healthcare settings, specifically comparing AdaBoost and Random Forest performance on Southeast Asian population data. We implemented nested 5-fold cross-validation on a dataset of 1,958 records with 15 lifestyle and demographic attributes. Both algorithms underwent hyperparameter optimization, and performance was evaluated using accuracy, precision, recall, F1-score, and clinical utility metrics including cost-effectiveness analysis. AdaBoost achieved superior performance with 86.90% accuracy (95% CI: 84.2-89.6%) and perfect precision (1.00) compared to Random Forest's 84.69% accuracy and 0.92 precision. Statistical significance testing confirmed AdaBoost's advantage (p=0.032). Clinical implementation in three health centers demonstrated 60% reduction in unnecessary referrals. This is the first study to compare these algorithms specifically for Southeast Asian populations with clinical validation and cost-effectiveness analysis, providing a ready-to-deploy model for resource-limited healthcare settings.
K-Means Clustering with Elbow Method and Validity Indices for Classifying Student Academic Achievement Based on Knowledge Scores at SDN 48 Kota Jambi Azmi, M. Fikri; Abidin, Dodo Zaenal; Jasmir, Jasmir
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 1 (2026): JUTIF Volume 7, Number 1, February 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.1.5349

Abstract

Student performance evaluation at SDN 48 Kota Jambi has been traditionally conducted manually, which is inefficient and often subjective. This study aims to provide an objective classification of students’ academic achievement using data-driven methods. The research applies the Knowledge Discovery in Databases (KDD) framework, which involves data selection, preprocessing, clustering, and evaluation. The dataset consists of knowledge scores from 152 elementary students across seven subjects, obtained from the Merdeka Curriculum report cards. Data preprocessing included cleaning and normalization to ensure consistency. K-Means clustering was implemented using RapidMiner, with the optimal number of clusters determined through the Elbow Method. Cluster validity was assessed using the Davies–Bouldin Index (1.226) and the Silhouette Coefficient (0.245). The results produced three clusters: high achievers (30.9%), medium achievers (27.0%), and low achievers (42.1%). Centroid analysis indicated that Mathematics and Physical Education were the most discriminative subjects across groups. These findings highlight a substantial proportion of students requiring remedial intervention and support differentiated learning strategies. The contribution of this research lies in applying educational data mining techniques to an elementary school context in Jambi, integrating both quantitative indices and qualitative validation with teachers. The study demonstrates that clustering methods can enhance educational decision-making, providing a basis for adaptive teaching, targeted interventions, and resource allocation in elementary education.
Word Embedding Features to Improve Machine Learning Performance in Sentiment Analysis of the Honor of Kings Game Harris, Abdul; Nugroho, Agus; Novianto, Yudi; Jasmir, Jasmir; Fatma, Dhea
Sistemasi: Jurnal Sistem Informasi Vol 15, No 2 (2026): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v15i2.5850

Abstract

The rapid growth of social media has encouraged an increasing number of studies on sentiment analysis to better understand public perceptions and opinions. This study aims to evaluate the performance of three machine learning algorithms—Naïve Bayes, K-Nearest Neighbor (KNN), and Random Forest—in classifying user review sentiments toward the game Honor of Kings. The dataset was collected from the Google Play Store, consisting of 900 reviews. The data then underwent preprocessing steps including cleaning, case folding, tokenization, stopword removal, stemming, and sentiment labeling into positive and negative classes. Furthermore, three word embedding techniques were applied, namely Word2Vec, GloVe, and FastText, each of which was tested across the three machine learning algorithms. The experimental results indicate that the use of word embedding features significantly improves classification accuracy compared to models without embedding features. KNN combined with FastText achieved the best performance, reaching an accuracy of 87.55%, while Random Forest combined with FastText produced the lowest accuracy. FastText demonstrated superior performance due to its ability to represent words through subword information, making it more effective in handling rare vocabulary and large-scale datasets. This study confirms that combining machine learning classification methods with word embedding features plays a crucial role in improving sentiment analysis performance. Future research may focus on hyperparameter optimization, the application of more advanced preprocessing techniques, and dataset expansion to develop more robust models with better generalization capability.