Claim Missing Document
Check
Articles

Word Embedding Feature for Improvement Machine Learning Performance in Sentiment Analysis Disney Plus Hotstar Comments Jasmir, Jasmir; Nurhadi, Nurhadi; Rohaini, Eni; Pahlevi B, M Riza; Pardamean Simanjuntak, Daniel Sintong
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol. 10 No. 2 (2024): June
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v10i2.28799

Abstract

In this research we apply several machine learning methods and word embedding features to process social media data, specifically comments on the Disney Plus Hotstar application. The word embedding features used include Word2Vec, GloVe, and FastText. Our aim is to evaluate the impact of these features on the classification performance of machine learning methods such as Naive Bayes (NB), K-Nearest Neighbor (KNN), and Random Forest (RF). NB is very simple and efficient and very sensitive to feature selection. Meanwhile, KNN is known for its weaknesses such as biased k values, overly complex computations, memory limitations, and ignoring irrelevant attributes. Then RF has a weakness, namely that the evaluation value can change significantly with just a slight change in the data. Feature selection in text classification is crucial for enhancing scalability, efficiency, and accuracy. Our testing results indicate that KNN achieved the highest accuracy both before and after feature selection. The FastText feature led to the highest performance for KNN, yielding balanced accuracy, precision, recall, and F1-score values.
Comparative Analysis of Optimizer Effectiveness in GRU and CNN-GRU Models for Airport Traffic Prediction Riyadi, Willy; Jasmir, Jasmir
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol. 10 No. 3 (2024): September
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v10i3.29659

Abstract

The COVID-19 pandemic has posed significant challenges to airport traffic management, necessitating accurate predictive models. This research evaluates the effectiveness of various optimizers in enhancing airport traffic prediction using Deep Learning models, specifically Gated Recurrent Units (GRU) and Convolutional Neural Network-Gated Recurrent Units (CNN-GRU). We compare the performance of optimizers including RMSprop, Adam, Nadam, AdamW, Adamax, and Lion, and analyze the impact of their parameter tuning on model accuracy. Time series data from airports in the United States, Canada, Chile, and Australia were used, with preprocessing steps like filtering, cleaning, and applying a MinMax Scaler. The data was split into 80% for training and 20% for testing. Our findings reveal that the Adam optimizer paired with the GRU model achieved the lowest Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) in the USA. The study underscores the importance of selecting and tuning optimizers, with ReduceLROnPlateau used to adjust the learning rate dynamically, preventing overfitting and improving model convergence. However, limitations include dataset imbalance and region-specific results, which may affect the generalizability of the findings. Future research should address these limitations by developing balanced datasets and exploring optimizer performance across a broader range of regions and conditions. This study lays the groundwork for further investigating sustainable and accurate airport traffic prediction models.
PATTERN CLASSIFICATION SIGN LANGUAGE USING FEATURES DESCRIPTORS AND MACHINE LEARNING Nurhadi, Nurhadi; Winanto, Eko Arip; Said, Rahaini Mohd; Jasmir, Jasmir; Afuan, Lasmedi
Jurnal Teknik Informatika (Jutif) Vol. 5 No. 2 (2024): JUTIF Volume 5, Number 2, April 2024
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2024.5.2.1228

Abstract

Sign language is way of communication for the deaf and speech impaired. In Indonesia, the utilization of a standardized language involves the incorporation of American Sign Language (ASL). ASL is employed for various communication needs, ranging from basic alphanumeric fingerspelling (A-Z and numbers) to the more complex SIBI form (comprising gesture vocabulary) in everyday interactions as well as formal contexts. This surge in the digitization of sign language underscores the ongoing advancements in research and development. The challenge in this research lies in the ability to recognize American Sign Language (ASL) with diverse intensities and invariant backgrounds. Therefore, the study emphasis is on proposing a suitable segmentation method comparison for multi-intensity ASL cases. Subsequently, global feature descriptor methods, including Color Histogram, Hu Moments, and Haralick Texture techniques, are applied for feature extraction. The result of the Logistic Regression method versus the supervised Random Forest checks accuracy and suitability in identifying ASL fingerspelling. The findings of this research is predictive value of logistic regression is 48%, with class Y having the highest precision (0.86), class V having the lowest accuracy (0.16), and class L having the highest recall (0.73). The maximum precision in classes B, F, H, I, K, Y, and Z is 1.00, and the lowest in class U is 0.58, while the highest recall is in class G, which is 1.00. The lowest is in class V, while the predictive value from the random forest is 86 percent. Class H has the greatest f1 score (0.99), while class U has the lowest f1 score (0.64). The Random Forest method outperforms the two methods suggested in the paper, according to the comparison.
Comparison and Data Visualization in Thyroid Cancer Disease Prediction Using Machine Learning Algorithms Yudha, M. Zahran; Jasmir, Jasmir; Fachruddin, Fachruddin
MALCOM: Indonesian Journal of Machine Learning and Computer Science Vol. 6 No. 1 (2026): MALCOM January 2026
Publisher : Institut Riset dan Publikasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.57152/malcom.v6i1.2249

Abstract

Thyroid cancer is a common endocrine malignancy requiring accurate early prediction for improved patient outcomes. Comprehensive comparative studies of machine learning algorithms, accompanied by systematic visualization, remain limited. This study compares tree-based algorithms (Decision Trees, Random Forest) and boosting algorithms (Gradient Boosting, XGBoost) for thyroid cancer prediction and develops visualization strategies for clinical interpretation. Four algorithms were evaluated using accuracy (correct prediction proportion), precision (positive predictive value), recall (true positive rate), F1-score (harmonic mean of precision and recall), and AUC-ROC (area under the ROC curve). Visualization techniques, including confusion matrices, ROC curves, and feature importance plots, facilitated the interpretation of the model. XGBoost achieved superior performance with accuracy 95.2%, precision 94.8%, recall 95.6%, F1-score 95.2%, and AUC-ROC 0.978, followed by Random Forest (93.5%, 92.7%, 94.1%, 93.4%, 0.965), Gradient Boosting (91.8%, 90.9%, 92.4%, 91.6%, 0.952), and Decision Trees (87.3%, 86.5%, 88.2%, 87.3%, 0.913). Feature importance analysis identified key predictors. Boosting algorithms, particularly XGBoost, demonstrate superior thyroid cancer prediction across all metrics. Integrated visualization enhances clinical interpretability, providing empirical guidance for implementing machine learning-based diagnostic support systems.
Optimasi XGBoost Dengan SHAP Untuk Sistem Skrining Penyakit Jantung Clara Zuliani Syahputri; Jasmir Jasmir; Fachruddin Fachruddin
Prosiding Seminar Nasional Ilmu Teknik Vol. 2 No. 2 (2025): Desember: Prosiding Seminar Nasional Ilmu Teknik
Publisher : Asosiasi Riset Ilmu Teknik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.61132/prosemnasproit.v2i2.147

Abstract

Heart disease is the leading cause of death in Indonesia and globally, necessitating an early screening system that is both accurate and clinically trustworthy. Although XGBoost demonstrates high predictive performance, its black-box nature undermines clinical trust, while low recall risks missed diagnosis an unacceptable consequence in population screening, especially in middle-income countries with limited healthcare resources. This study aims to develop a sensitive, transparent, and implementation-ready heart disease screening framework through the integration of SHAP-based Explainable AI. The CDC's Indicators of Heart Disease dataset (319,795 samples) was processed according to WHO/CDC standards, followed by class imbalance handling, hyperparameter optimization using RandomizedSearchCV, evaluation based on metrics sensitive to minority classes (AUC, recall, F1-score, AUC-PR), and threshold tuning to maximize recall. The baseline model showed a very low recall of 12.18%. After optimization and threshold tuning at 0.10, the model achieved recall >96% (96.79%) with a G-mean of 0.7477, supported by SHAP interpretation stability and the ability to capture non-linear interactions between advanced age (AgeCategory_WHO) and poor general health (GenHealth). SHAP analysis confirmed the alignment of dominant features with medical evidence, and its visualizations provide transparent explanations for healthcare professionals indicating its potential implementation as an interpretable clinical decision support system.
Perancangan Alat Deteksi Tingkat Kematangan Buah Mangga Indramayu Berdasarkan Kandungan Gas dan Pengolahan Citra Menggunakan YOLOv11 Adi Kusuma; Jasmir Jasmir; Willy Riyadi; Ahmad Ahmad
Prosiding Seminar Nasional Ilmu Teknik Vol. 2 No. 2 (2025): Desember: Prosiding Seminar Nasional Ilmu Teknik
Publisher : Asosiasi Riset Ilmu Teknik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.61132/prosemnasproit.v2i2.151

Abstract

Indramayu mango is a seasonal fruit that is highly favored due to its delicious taste and high nutritional content. However, high mango production is often not supported by adequate post-harvest facilities, particularly in terms of fruit ripeness classification. Currently, mango ripeness classification is still performed manually, which tends to be subjective and inconsistent. To address this issue, this study proposes a ripeness detection system for Indramayu mangoes by integrating the TGS2602 gas sensor and the YOLOv11 algorithm based on image processing. The TGS2602 sensor is used to detect ethylene gas emitted by ripe mangoes, while YOLOv11 is employed for visual image analysis of the fruit. This study aims to evaluate the system’s performance in classifying ripe and unripe mangoes, as well as analyze the integration between the gas sensor and the object detection model. The test results show that the TGS2602 sensor can detect increased ethylene gas concentration in ripe mangoes, while YOLOv11 demonstrates high accuracy in detecting mangoes based on visual images, with precision and recall close to 1.0. The system was also tested under various lighting conditions, including dark environments, and still performed well, although with a slight decrease in accuracy under low-light conditions.
Evolusi Performa Arsitektur Deep Learning melalui Optimasi Bertahap dan Interpretabilitas Grad-CAM untuk Klasifikasi Penyakit Ikan Air Tawar Sasa Kirana Wulandari; Fachruddin Fachruddin; Jasmir Jasmir
Prosiding Seminar Nasional Ilmu Teknik Vol. 2 No. 2 (2025): Desember: Prosiding Seminar Nasional Ilmu Teknik
Publisher : Asosiasi Riset Ilmu Teknik Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.61132/prosemnasproit.v2i2.179

Abstract

Freshwater fish diseases significantly affect aquaculture productivity and economic sustainability, while accurate visual classification remains challenging due to interclass similarity and image variability. This study presents a comparative evaluation of three deep learning architectures—DenseNet201, ResNet50, and EfficientNetV2-S—using a stepwise optimization strategy combined with Gradient-weighted Class Activation Mapping (Grad-CAM) for freshwater fish disease classification. Models were trained through three phases: baseline, optimized, and fine-tuned. Performance was evaluated using accuracy, precision, recall, F1 score, Matthews correlation coefficient (MCC), Cohen’s kappa, and per-class ROC–AUC. Results show consistent performance improvement across all architectures, with EfficientNetV2-S achieving the highest accuracy (97.14%), followed by ResNet50 (96.11%) and DenseNet201 (94.40%). High ROC–AUC values (>0.98) indicate strong discriminative capability. Grad-CAM analysis confirms that all optimized models focus on biologically relevant lesion regions, enhancing model transparency and reliability.
Word Embedding Features to Improve Machine Learning Performance in Sentiment Analysis of the Honor of Kings Game Harris, Abdul; Nugroho, Agus; Novianto, Yudi; Jasmir, Jasmir; Fatma, Dhea
Sistemasi: Jurnal Sistem Informasi Vol 15, No 2 (2026): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v15i2.5850

Abstract

The rapid growth of social media has encouraged an increasing number of studies on sentiment analysis to better understand public perceptions and opinions. This study aims to evaluate the performance of three machine learning algorithms—Naïve Bayes, K-Nearest Neighbor (KNN), and Random Forest—in classifying user review sentiments toward the game Honor of Kings. The dataset was collected from the Google Play Store, consisting of 900 reviews. The data then underwent preprocessing steps including cleaning, case folding, tokenization, stopword removal, stemming, and sentiment labeling into positive and negative classes. Furthermore, three word embedding techniques were applied, namely Word2Vec, GloVe, and FastText, each of which was tested across the three machine learning algorithms. The experimental results indicate that the use of word embedding features significantly improves classification accuracy compared to models without embedding features. KNN combined with FastText achieved the best performance, reaching an accuracy of 87.55%, while Random Forest combined with FastText produced the lowest accuracy. FastText demonstrated superior performance due to its ability to represent words through subword information, making it more effective in handling rare vocabulary and large-scale datasets. This study confirms that combining machine learning classification methods with word embedding features plays a crucial role in improving sentiment analysis performance. Future research may focus on hyperparameter optimization, the application of more advanced preprocessing techniques, and dataset expansion to develop more robust models with better generalization capability.
Fitur Information Gain untuk Meningkatkan Nilai Performa Pengklasifikasi Machine Learning pada Analisis Sentimen Komentar Spam Pengguna Youtube Jasmir, Jasmir; Gunardi, Gunardi; Rohaini, Eni; Naibaho, Ronald; Sukoco, Bambang; Jasmir , Jasmir
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 13 No 2: April 2026
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25126/jtiik.132

Abstract

Perkembangan pesat media sosial telah memberikan ruang bagi setiap individu untuk menyampaikan pendapat, baik berupa komentar positif maupun negatif terhadap konten yang mereka akses. Kemudahan dalam memberikan opini secara daring ini berdampak pada semakin besarnya jumlah ulasan yang tersedia. Namun, volume ulasan yang sangat besar sering kali sulit untuk dianalisis secara manual dan berpotensi menimbulkan bias dalam penilaian. Untuk mengatasi permasalahan tersebut, diperlukan pendekatan otomatis melalui klasifikasi sentimen yang bertujuan mengelompokkan opini pengguna ke dalam kategori positif atau negatif. Dalam penelitian ini digunakan tiga algoritma pembelajaran mesin, yaitu Naïve Bayes (NB), K-Nearest Neighbor (KNN), dan Random Forest (RF). Data penelitian diperoleh dari public dataset UCI Machine Learning. Fokus penelitian adalah meningkatkan kinerja klasifikasi dengan memanfaatkan teknik seleksi fitur information gain. Hasil eksperimen menunjukkan bahwa penerapan information gain secara konsisten meningkatkan performa semua algoritma yang diuji, baik pada metrik akurasi, presisi, recall, maupun f1-score. Naïve Bayes awalnya memperoleh akurasi tertinggi sebesar 74,33% pada kondisi tanpa fitur tambahan. Namun, setelah penerapan information gain, algoritma KNN menunjukkan hasil paling optimal dengan akurasi mencapai 81,28% serta performa yang relatif seimbang pada semua metrik evaluasi. Sementara itu, Random Forest juga mengalami peningkatan, meskipun tidak melampaui KNN. Secara keseluruhan, penelitian ini menegaskan bahwa pemilihan fitur yang relevan melalui information gain mampu meningkatkan efisiensi dan efektivitas klasifikasi sentimen, serta dapat menjadi pendekatan yang potensial untuk menganalisis opini dalam skala besar.   Abstract The rapid growth of social media has provided individuals with the opportunity to freely express their opinions, whether positive or negative, toward the content they encounter. The increasing ease of sharing opinions online has resulted in a massive volume of user reviews. However, the large number of reviews is difficult to analyze manually and may introduce bias in interpretation. To address this issue, sentiment classification is applied to automatically categorize user opinions into positive or negative classes. In this study, three machine learning algorithms were employed: Naïve Bayes (NB), K-Nearest Neighbor (KNN), and Random Forest (RF). The dataset was obtained from the public UCI Machine Learning repository. The main objective of this research is to improve classification performance by utilizing feature selection through the information gain method. Experimental results demonstrate that applying information gain consistently enhances the performance of all evaluated algorithms across multiple metrics, including accuracy, precision, recall, and F1-score. Without feature selection, Naïve Bayes achieved the highest accuracy of 74.33%. However, after applying information gain, KNN outperformed the other algorithms by reaching an accuracy of 81.28% and exhibited balanced results across all evaluation metrics. Random Forest also showed improvement but did not surpass the performance of KNN. Overall, these findings highlight the importance of feature selection in improving both the efficiency and effectiveness of sentiment classification. Furthermore, the use of information gain proves to be a promising approach for large-scale opinion analysis, particularly in handling the high dimensionality of textual data.
An Adaptive Feature-Aware Hybrid Resampling Strategy for Imbalanced Diabetes Classification with Integrated Balanced Index Evaluation Jasmir, Jasmir; Pahlevi, Riza; Gunardi, Gunardi; Rohaini, Eni; Annisa, Tiko Nur
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 10 No 2 (2026): April 2026
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v10i2.7418

Abstract

Class imbalance remains a critical challenge in medical data classification, particularly in diabetes prediction, as it significantly degrades minority-class sensitivity. This study proposes an Adaptive Feature-Aware Hybrid Resampling Strategy (AHRS) that dynamically integrates oversampling and undersampling based on Imbalance Ratio (IR) and Feature Importance (FI). Unlike conventional static resampling methods, AHRS iteratively adjusts class distribution while preserving informative feature structures. In addition, this study introduces the Integrated Balanced Index (IBI), a bounded composite metric integrating precision, recall, and specificity to provide a fairer evaluation of classification performance on imbalanced medical datasets. The proposed approach was evaluated using the Pima Indian Diabetes Dataset (768 instances) with K-Nearest Neighbor, Naïve Bayes, and Random Forest classifiers under 5-fold stratified cross-validation. Experimental results demonstrate that AHRS consistently outperforms SMOTE, Random Oversampling, and Tomek Links, achieving accuracy improvements of 5–7% and recall gains of up to 10%. Random Forest combined with AHRS achieved the highest IBI score of 0.90, indicating strong balance between sensitivity and specificity. The findings suggest that adaptive, feature-aware resampling combined with balanced evaluation metrics provides a reliable and interpretable framework for fair medical classification systems and Clinical Decision Support Systems (CDSS).