Claim Missing Document
Check
Articles

Found 4 Documents
Search

Model Klastering Hybrid Menggunakan Inisialisasi K-means++ dan Algoritma Optimasi Grey Wolf Mukti, Bayu Priya
JUSTIN (Jurnal Sistem dan Teknologi Informasi) Vol 13, No 2 (2025)
Publisher : Jurusan Informatika Universitas Tanjungpura

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26418/justin.v13i2.88211

Abstract

Penelitian ini mengembangkan GWO-KMeans++, model klastering hybrid yang mengintegrasikan Grey Wolf Optimizer (GWO) dengan K-Means++ untuk mengatasi masalah local optima dalam inisialisasi centroid. Model diuji pada lima dataset UCI (Seeds, Wine, Sonar, Bank, Forest) dengan karakteristik beragam, mulai dari data pertanian berdimensi rendah (6 fitur) hingga sinyal sonar berisik (60 fitur). Kinerja diukur menggunakan Silhouette Score (SC) dan Davies-Bouldin Index (DB) untuk jumlah klaster k=2"“10, lalu dibandingkan dengan K-Means++ melalui Uji Wilcoxon Signed-Rank. Hasil menunjukkan GWO-KMeans++ meningkatkan SC sebesar 19,71"“24,59% (Seeds, k=5"“7), 56,81% (Wine, k=5), dan 210,85% (Sonar, k=2), serta mengurangi DB hingga 22,19% (Seeds, k=7) dan 28,02% (Wine, k=5). Uji statistik mengonfirmasi peningkatan SC signifikan di semua dataset (p 0,05), dengan nilai p=0,0039 (Seeds, Wine, Sonar), p=0,0117 (Bank), dan p=0,0273 (Forest). Namun, perbaikan DB hanya signifikan pada Seeds (p=0,0117) dan Wine (p=0,0078). Visualisasi klaster memperlihatkan distribusi data lebih terpisah dan centroid lebih akurat, khususnya pada data multidimensi (Wine) dan berisik (Sonar). Model ini stabil pada k=3"“6, cocok untuk data nonlinier, dengan aplikasi di bioinformatika hingga deteksi kecurangan keuangan. Rekomendasi lanjutan meliputi optimasi parameter GWO, integrasi reduksi dimensi (PCA), dan pengujian pada dataset big data.
Leveraging TF-IDF and Random Forest to Uncover Genre Patterns in Google Books Metadata Putri, Nadya Awalia; Mukti, Bayu Priya
International Journal for Applied Information Management Vol. 5 No. 4 (2025): Regular Issue: December 2025
Publisher : Bright Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijaim.v5i4.112

Abstract

This paper presents a machine learning-based approach for classifying books into genres using their descriptions. We employed a Random Forest classifier combined with Term Frequency-Inverse Document Frequency (TF-IDF) to convert text descriptions into numerical features, enabling the classification of books into six genres: Fiction, Literary Criticism, Education, Social Science, Biography & Autobiography, and Unknown Genre. The model was trained and evaluated on a dataset sourced from Google Books, which was preprocessed to remove missing data and clean the text descriptions by eliminating punctuation, numbers, and stopwords. We performed 5-fold cross-validation to assess the model's performance, which resulted in an average cross-validation accuracy of 64.22%. The final model achieved an accuracy of 62.71% on the test set, with the highest recall observed in the "Fiction" genre. The results indicated that the Random Forest classifier was particularly effective in classifying well-represented genres like "Fiction" and "Unknown Genre." However, genres with fewer samples, such as "Social Science" and "Biography & Autobiography," showed poor performance, highlighting the challenges posed by class imbalance and data sparsity. A confusion matrix and classification report revealed these discrepancies, with certain genres being misclassified more often than others. This research demonstrates the feasibility of using machine learning for automated book genre classification, offering significant potential for enhancing book recommendation systems and improving user experience. Despite its promising results, the study's limitations, including data sparsity and genre imbalance, suggest that further work is needed to refine the model. Future research could explore the use of deep learning techniques and the expansion of the dataset to address these issues and improve genre classification accuracy. The potential for automated genre classification in real-world applications, such as book categorization and personalized recommendations, presents an exciting direction for the book industry.
A Comparative Analysis of Machine Learning Classifier of Anemia Diagnosis Based on Complete Blood Count (CBC) Data Putri, Nadia Awalia; Mukti, Bayu Priya
International Journal of Informatics and Information Systems Vol 8, No 4: Regular Issue: December 2025
Publisher : International Journal of Informatics and Information Systems

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/ijiis.v8i4.286

Abstract

Anemia is a prevalent hematological condition that requires accurate and timely diagnosis to ensure effective treatment. This study aims to compare the performance of several machine learning algorithms Random Forest, Support Vector Machine (SVM), Naive Bayes, and XGBoost in classifying different types of anemia based on Complete Blood Count (CBC) data. The dataset includes three diagnostic categories: Healthy, Normocytic hypochromic anemia, and Normocytic normochromic anemia. After preprocessing and normalization, each model was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC. The results show that XGBoost achieved the highest overall performance with 99% accuracy and a perfect AUC of 1.00, followed closely by SVM and Naive Bayes. Naive Bayes showed lower performance, particularly in identifying normocytic normochromic anemia. These findings suggest that machine learning, especially ensemble-based models, holds strong potential in supporting clinical diagnosis of anemia using CBC data.
Weakly Supervised Sentiment Analysis of Indonesian Rural Tourism Reviews: A TF-IDF Baseline for Melung Tourism Village Rifa’i, Zanuar; Mukti, Bayu Priya
Edu Komputika Journal Vol. 12 No. 1 (2025): Edu Komputika Journal
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/edukom.v12i1.31893

Abstract

This study investigates sentiment classification of Indonesian-language tourist reviews from the rural destination of Melung Tourism Village. A total of 724 user-generated reviews from 546 unique users are preprocessed using Indonesian-specific text cleaning, stopword filtering, and stemming, then weakly labeled through a stemmed positive–negative lexicon. TF-IDF unigram–bigram features are extracted from the preprocessed texts and used to train three classical classifiers: Naive Bayes, linear Support Vector Machine (SVM), and Logistic Regression. To address class imbalance, RandomOverSampler is applied only to the training data, and model evaluation combines stratified 5-fold cross-validation with a held-out test set, using weighted F1-score as the primary metric. Logistic Regression achieves the best performance on the test set (weighted F1 = 0.8799, accuracy = 0.8828), closely followed by SVM, while Naive Bayes lags behind. The results show that, even with a modest, weakly supervised dataset, a carefully designed classical pipeline can yield reliable sentiment indicators to support data-driven management of rural tourism destinations.