NURADILLA, SITI
Unknown Affiliation

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

Klasifikasi Halaman SEO Berbasis Machine Learning Melalui Mutual Information dan Random Forest Feature Importance NURADILLA, SITI; SADIK, KUSMAN; SUHAENI, CICI; SOLEH, AGUS M
MIND (Multimedia Artificial Intelligent Networking Database) Journal Vol 10, No 1 (2025): MIND Journal
Publisher : Institut Teknologi Nasional Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26760/mindjournal.v10i1.114-129

Abstract

AbstrakProses optimasi SEO melibatkan banyak faktor yang saling terkait, sehingga sulit bagi tim SEO dalam menentukan halaman mana yang memerlukan perbaikan lebih lanjut. Penelitian ini bertujuan untuk mengembangkan model berbasis machine learning yang tidak hanya akurat dalam mengklasifikasikan halaman, tetapi juga efisien dalam memilih fitur yang paling informatif. Metode yang digunakan dalam penelitian ini melibatkan seleksi fitur menggunakan Mutual Information (MI) dan Random Forest Feature Importance (RFFI) untuk mengidentifikasi faktor-faktor yang paling penting untuk optimasi SEO, yang dimodelkan menggunakan Random Forest dan Weighted Voting Ensemble (WVE). Model dievaluasi berdasarkan Accuracy, Precision, Recall, dan ROC AUC. Hasil penelitian menunjukkan bahwa model Random Forest dengan 20 fitur berdasarkan RFFI, memberikan performa terbaik dengan ROC AUC sebesar 75.87%, Accuracy 77,74%, Precision 60,51%, dan Recall 71.29%. Model mampu membedakan secara efektif halaman yang membutuhkan optimasi SEO atau tidak.Kata kunci: Feature Importance, Random Forest, SEO, Seleksi Variabel, WVEAbstractThe SEO optimization process involves many interrelated factors, making it challenging to identify which pages need further improvement. This study proposes a machine learning-based model that is accurate in classifying web pages and efficient in selecting the most relevant features. Feature selection is performed using Mutual Information (MI) and Random Forest Feature Importance (RFFI) to identify key factors for SEO optimization, followed by modeling with Random Forest and Weighted Voting Ensemble (WVE). The model is evaluated using Accuracy, Precision, Recall, and ROC AUC. Results indicate that the Random Forest model with 20 features selected via RFFI delivers the best performance, achieving a ROC AUC of 75.87%, Accuracy of 77.74%, Precision of 60.51%, and Recall of 71.29%. The model effectively distinguishes between pages that require SEO optimization and those that do not.Keywords: Feature Importance, Random Forest, SEO, Variable Selection, WVE
Pemodelan Topik pada Komentar YouTube Arra: Komparasi LDA dan K-Means Menggunakan Fitur Leksikal dan Semantik Nuradilla, Siti; Kamila, Sabrina Adnin; Zahra, Latifah; Suhaeni, Cici; Sartono, Bagus
Jurnal Informatika: Jurnal Pengembangan IT Vol 10, No 3 (2025)
Publisher : Politeknik Harapan Bersama

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30591/jpit.v10i3.8763

Abstract

YouTube has become a platform for sharing content, including positive material and stereotypes that often trigger debates. One noteworthy phenomenon is the video of Arra, a toddler known for her remarkable communication skills. This uniqueness has drawn significant attention and sparked debates about the mismatch between her age and cognitive development. The diverse comments on Arra’s videos reflect sharply differing perspectives among netizens, making manual analysis highly challenging. Therefore, it is important to examine the topics discussed by netizens to understand the dominant issues emerging in these discussions. Through this approach, the public can gain insights, and parents may receive valuable input regarding child-rearing practices. The main objective of this study is to explore the effectiveness of the two methods and their combinations of text representations in identifying key topics within comments by comparing the coherence performance of the models. This research applies topic modeling to analyze comments using two primary approaches: Latent Dirichlet Allocation (LDA) and K-Means clustering. The study involves data collection through comment crawling, followed by text preprocessing and text representation using TF-IDF and GloVe embeddings. LDA and K-Means are then used to identify dominant topics appearing in the comments. The results show that LDA with TF-IDF achieved the highest coherence score of 0.662, although the resulting topics were still difficult to interpret due to overlap. Meanwhile, K-Means with GloVe 100D yielded a slightly lower coherence score of 0.6538 but outperformed in terms of interpretability. Therefore, K-Means with GloVe 100D is considered a more balanced approach in terms of both coherence and topic readability.
Household Clustering in West Java Based on Stunting Risk Factors Using K-Modes and K-Prototypes Algorithms Yusran, Muhammad; Nuradilla, Siti; Putri, Mega Ramatika; Fitrianto, Anwar; Yudhianto, Rachmat Bintang
Journal of Applied Informatics and Computing Vol. 9 No. 6 (2025): December 2025
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v9i6.11508

Abstract

Stunting remains one of Indonesia’s most persistent public health challenges, with West Java contributing the highest number of cases due to its large population and regional disparities in household welfare. Identifying household groups vulnerable to stunting is essential for designing targeted interventions that integrate nutrition, sanitation, and socio-economic development. This study introduces a data-driven clustering framework using the K-Modes and K-Prototypes algorithms to classify 22,161 households in West Java based on 26 indicators from the March 2024 National Socioeconomic Survey (SUSENAS), encompassing food security, sanitation, drinking water access, economic conditions, social assistance, and demographics. The K-Modes algorithm was applied to categorical data, while K-Prototypes integrated numerical and categorical variables, with parameter optimization performed using a grid search and the Elbow method. Clustering performance was evaluated through the Silhouette Score, Calinski–Harabasz Index, and Davies–Bouldin Index, followed by a bootstrapped stability analysis employing the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI). Results show that K-Prototypes outperformed K-Modes, yielding a higher Silhouette Score (0.6681 compared to 0.2922), higher CH Index (13,890.6 compared to 3,976.1), and lower DBI (0.4607 compared to 1.5274), indicating superior compactness and separation. Stability testing confirmed strong robustness, with mean ARI = 0.959 and mean NMI = 0.932 across 50 bootstrap replications. The optimal five-cluster structure identified distinct socioeconomic groups, with the highest stunting risk found among households with low income, limited housing space, inadequate sanitation, and more children under five. The findings highlight the effectiveness of K-Prototypes in modeling mixed-type data and support the design of evidence-based, regionally adaptive stunting reduction strategies aligned with Presidential Regulation No. 72/2021 on the Acceleration of Stunting Reduction.