Claim Missing Document
Check
Articles

Found 3 Documents
Search
Journal : Journal of Computers and Digital Business

DNA Sequence Classification Using Machine Learning Models Based on k-mer Features Kautsar, Afthar
Journal of Computers and Digital Business Vol. 4 No. 2 (2025)
Publisher : PT. Delitekno Media Madiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56427/jcbd.v4i2.762

Abstract

Cell-free DNA (cfDNA) has emerged as a promising biomarker in various clinical applications, particularly in cancer detection, prenatal diagnostics, and disease monitoring. Accurate classification of cfDNA sequences is crucial for improving diagnostic reliability and enabling timely clinical decisions. This study investigates the application of machine learning models—Decision Tree (DT), Support Vector Machine (SVM), and Deep Neural Network (DNN)—for classifying cfDNA sequences using k-mer-based feature extraction, with k set to 3. A total of 3,000 DNA sequences comprising both normal and tumor-derived samples were transformed into numerical feature vectors based on the frequency of 3-mer patterns. The models were trained and evaluated using standard metrics including accuracy, precision, recall, and F1-score. Experimental results demonstrate that the DNN model achieved the highest classification performance, effectively distinguishing between normal and tumor cfDNA. In contrast, the DT and SVM models exhibited relatively lower performance, particularly in identifying normal sequences. The study also addresses challenges such as class imbalance and limitations of simple k-mer representations. These findings highlight the potential of deep learning approaches in improving cfDNA sequence analysis and open avenues for future research using more complex models, larger datasets, and feature engineering techniques to enhance classification accuracy and clinical applicability.
Applying Random Forest Algorithm for Phishing URL Identification Kautsar, Afthar; Aida, Maghfira; Yulistia , Anita
Journal of Computers and Digital Business Vol. 4 No. 3 (2025)
Publisher : PT. Delitekno Media Madiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56427/jcbd.v4i3.782

Abstract

Phishing attacks continue to be one of the most pervasive cybersecurity threats, particularly through malicious URLs designed to mimic legitimate websites and steal sensitive user information. To address this challenge, this study employs the Random Forest algorithm for automated phishing URL detection using a publicly available dataset from Kaggle. The dataset contains diverse structural, technical, and popularity-based features that capture behavioral and lexical characteristics of each URL. Following data preprocessing and an 80/20 train–test split, the Random Forest classifier achieved strong predictive performance, attaining an accuracy of 94.94%, a precision of 95.19%, and a recall of 96.94%. The model further demonstrated robust classification capability with an F1-score of 96.06% and an ROC AUC value of 0.985, indicating excellent discrimination between phishing and legitimate URLs. Feature importance analysis shows that factors such as the URL’s presence in Google’s index, page rank metrics, and specific structural patterns significantly influence prediction outcomes. Additionally, performance visualizations including ROC and Precision–Recall curves reinforce the model’s reliability and stability. Overall, the findings suggest that Random Forest provides an effective and efficient solution for phishing URL detection, offering promising potential for integration into real-world cybersecurity systems.
Classification of Korean Drama Popularity Based on Ratings Using Naïve Bayes Kautsar, Afthar
Journal of Computers and Digital Business Vol. 5 No. 1 (2026)
Publisher : PT. Delitekno Media Madiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56427/jcbd.v5i1.814

Abstract

This study aims to classify the popularity of Korean dramas based on ratings obtained from the MyDramaList website. With the rapid growth of digital entertainment platforms, evaluating drama popularity has become increasingly important for understanding audience preferences and supporting decision-making in the content industry. The Naive Bayes algorithm is employed as the classification method due to its computational efficiency and suitability for handling categorical and numerical features. The dataset comprises 351 Korean dramas with attributes including title, year of release, genre, tags, number of episodes, cast information, synopsis, and user ratings. Ratings serve as the primary label for categorizing dramas into three classes: Top Dramas (rating ≥ 8.5), Popular (7.5–8.4), and Less Popular (< 7.5). The classification pipeline involves data preprocessing, feature encoding, and model training using Naive Bayes. Evaluation results yield an overall accuracy of 79%, with per-class performance assessed through precision, recall, and F1-score metrics. Supplementary visualizations, including pie charts, bar charts, and word clouds, are employed to analyze the distribution of dominant genres and tags across popularity categories. The findings indicate that the proposed approach provides a viable baseline for drama popularity classification while revealing content patterns, such as the prevalence of specific genres and thematic tags among top-rated dramas, that may inform content curation strategies on digital platforms.