Claim Missing Document
Check
Articles

Found 3 Documents
Search
Journal : Jurnal Algoritma

Klasifikasi Tuberkulosis (TBC) dengan Metode Random Forest Menggunakan Teknik Re-Sampling ADASYN-Tomek Links Nurhaliza, Nabillah; Sabrina, Puspita Nurul; Ashaury, Herdi
Jurnal Algoritma Vol 22 No 2 (2025): Jurnal Algoritma
Publisher : Institut Teknologi Garut

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33364/algoritma/v.22-2.2458

Abstract

Data imbalance is a common challenge in medical classification, including in the diagnosis of Tuberculosis (TB), where the number of positive cases is significantly lower than that of negative cases. This condition can reduce model performance, particularly in detecting the minority class. This study aims to evaluate the performance of the Random Forest method in classifying imbalanced TB data by applying a combination of the ADASYN and Tomek Links re-sampling techniques. The dataset used was obtained from the Cisarua Public Health Center (Puskesmas), Bogor, consisting of 1,069 patient records with 15 features and one target label. The research process included data preprocessing, one-hot encoding, data splitting, the use of ADASYN to generate synthetic samples for the minority class, and the application of Tomek Links to remove ambiguous data in overlapping class regions. The evaluation employed accuracy, precision, recall, and F1-score metrics using both hold-out and k-fold cross-validation schemes. The results show that the combination of ADASYN and Tomek Links improved the F1-score for the positive class from 0.67 to 0.71 in the hold-out evaluation, and reached 0.9129 in the cross-validation evaluation. These findings indicate that the proposed approach is effective in addressing data imbalance and has the potential to be integrated into clinical decision-support systems in community health centers (Puskesmas) to aid in early detection of TB cases.
Prediksi Pendapatan Film Menggunakan Gradient Boosting Rahmah, Revina Nur; Sabrina, Puspita Nurul; Ramadhan, Edvin
Jurnal Algoritma Vol 22 No 2 (2025): Jurnal Algoritma
Publisher : Institut Teknologi Garut

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33364/algoritma/v.22-2.2613

Abstract

Industri film memiliki tingkat persaingan dan risiko tinggi, sehingga kemampuan memprediksi pendapatan sebelum rilis menjadi krusial bagi produser, distributor, dan investor. Penelitian ini mengembangkan model prediksi pendapatan film menggunakan algoritma Gradient Boosting dan metode diskretisasi Equal Frequency Binning (EFB) pada atribut Earnings. Dataset mencakup film dari tahun 1930–2016 dengan berbagai fitur seperti genre, anggaran, box office, aktor, dan sutradara. Proses meliputi pre-processing data, diskretisasi Earnings menjadi tiga kelas (Low, Medium, High), pembagian data dengan Holdout Method (80% latih, 20% uji), serta pelatihan dan evaluasi model. Hasil menunjukkan akurasi 96.51% dengan precision, recall, dan F1-score tinggi di semua kelas, berkat efektivitas EFB dalam menyeimbangkan distribusi dan keunggulan Gradient Boosting dalam menangkap interaksi fitur. Model ini terbukti akurat dan dapat dijadikan referensi dalam pengambilan keputusan investasi pra-produksi. Penelitian lanjutan disarankan untuk memperluas cakupan data dan mempertimbangkan fitur tambahan seperti sentimen media sosial dan strategi promosi guna meningkatkan generalisasi model.
Klasifikasi Indeks Standar Pencemaran Udara Menggunakan Algoritma Catboost Dengan Teknik Balancing Data Random UnderSampling Aditya, Aldy; Umbara, Fajri Rakhmat; Sabrina, Puspita Nurul
Jurnal Algoritma Vol 22 No 2 (2025): Jurnal Algoritma
Publisher : Institut Teknologi Garut

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33364/algoritma/v.22-2.2971

Abstract

Air quality is an important factor that affects public health and the environment. The Air Pollution Index is used as an indicator to measure the level of air pollution in a region. The main challenge in the air quality classification process is the imbalance of data that can affect the modeling results. This study aims to analyze the performance of the Categorical Boosting (CatBoost) algorithm in ISPU classification by applying the Random Under sampling technique to overcome class imbalance. The dataset used was obtained from air quality monitoring in DKI Jakarta for the period 2020–2024 with a total of 5,386 records and 12 attributes. The research stages included data collection, data cleaning, data transformation, data balancing, feature selection using Recursive Feature Elimination (RFE), modeling with CatBoost, and model evaluation using a confusion matrix. The feature selection results showed five main features that had the most influence, namely PM10, PM2.5, SO2, NO2, and max. The CatBoost model built with the best parameters produced an accuracy of 98 percent, precision of 100 percent, recall of 98.91 percent, and an F1-score of 99.44 percent. Thus, the application of CatBoost and Random Under sampling techniques proved to be effective in improving ISPU classification performance. The results of this study are expected to be used as a decision support system in efforts to mitigate the impact of air pollution in DKI Jakarta.