NURADILLA, SITI
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Klasifikasi Halaman SEO Berbasis Machine Learning Melalui Mutual Information dan Random Forest Feature Importance NURADILLA, SITI; SADIK, KUSMAN; SUHAENI, CICI; SOLEH, AGUS M
MIND (Multimedia Artificial Intelligent Networking Database) Journal Vol 10, No 1 (2025): MIND Journal
Publisher : Institut Teknologi Nasional Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26760/mindjournal.v10i1.114-129

Abstract

AbstrakProses optimasi SEO melibatkan banyak faktor yang saling terkait, sehingga sulit bagi tim SEO dalam menentukan halaman mana yang memerlukan perbaikan lebih lanjut. Penelitian ini bertujuan untuk mengembangkan model berbasis machine learning yang tidak hanya akurat dalam mengklasifikasikan halaman, tetapi juga efisien dalam memilih fitur yang paling informatif. Metode yang digunakan dalam penelitian ini melibatkan seleksi fitur menggunakan Mutual Information (MI) dan Random Forest Feature Importance (RFFI) untuk mengidentifikasi faktor-faktor yang paling penting untuk optimasi SEO, yang dimodelkan menggunakan Random Forest dan Weighted Voting Ensemble (WVE). Model dievaluasi berdasarkan Accuracy, Precision, Recall, dan ROC AUC. Hasil penelitian menunjukkan bahwa model Random Forest dengan 20 fitur berdasarkan RFFI, memberikan performa terbaik dengan ROC AUC sebesar 75.87%, Accuracy 77,74%, Precision 60,51%, dan Recall 71.29%. Model mampu membedakan secara efektif halaman yang membutuhkan optimasi SEO atau tidak.Kata kunci: Feature Importance, Random Forest, SEO, Seleksi Variabel, WVEAbstractThe SEO optimization process involves many interrelated factors, making it challenging to identify which pages need further improvement. This study proposes a machine learning-based model that is accurate in classifying web pages and efficient in selecting the most relevant features. Feature selection is performed using Mutual Information (MI) and Random Forest Feature Importance (RFFI) to identify key factors for SEO optimization, followed by modeling with Random Forest and Weighted Voting Ensemble (WVE). The model is evaluated using Accuracy, Precision, Recall, and ROC AUC. Results indicate that the Random Forest model with 20 features selected via RFFI delivers the best performance, achieving a ROC AUC of 75.87%, Accuracy of 77.74%, Precision of 60.51%, and Recall of 71.29%. The model effectively distinguishes between pages that require SEO optimization and those that do not.Keywords: Feature Importance, Random Forest, SEO, Variable Selection, WVE
Pemodelan Topik pada Komentar YouTube Arra: Komparasi LDA dan K-Means Menggunakan Fitur Leksikal dan Semantik Nuradilla, Siti; Kamila, Sabrina Adnin; Zahra, Latifah; Suhaeni, Cici; Sartono, Bagus
Jurnal Informatika: Jurnal Pengembangan IT Vol 10, No 3 (2025)
Publisher : Politeknik Harapan Bersama

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30591/jpit.v10i3.8763

Abstract

YouTube has become a platform for sharing content, including positive material and stereotypes that often trigger debates. One noteworthy phenomenon is the video of Arra, a toddler known for her remarkable communication skills. This uniqueness has drawn significant attention and sparked debates about the mismatch between her age and cognitive development. The diverse comments on Arra’s videos reflect sharply differing perspectives among netizens, making manual analysis highly challenging. Therefore, it is important to examine the topics discussed by netizens to understand the dominant issues emerging in these discussions. Through this approach, the public can gain insights, and parents may receive valuable input regarding child-rearing practices. The main objective of this study is to explore the effectiveness of the two methods and their combinations of text representations in identifying key topics within comments by comparing the coherence performance of the models. This research applies topic modeling to analyze comments using two primary approaches: Latent Dirichlet Allocation (LDA) and K-Means clustering. The study involves data collection through comment crawling, followed by text preprocessing and text representation using TF-IDF and GloVe embeddings. LDA and K-Means are then used to identify dominant topics appearing in the comments. The results show that LDA with TF-IDF achieved the highest coherence score of 0.662, although the resulting topics were still difficult to interpret due to overlap. Meanwhile, K-Means with GloVe 100D yielded a slightly lower coherence score of 0.6538 but outperformed in terms of interpretability. Therefore, K-Means with GloVe 100D is considered a more balanced approach in terms of both coherence and topic readability.