Cell-free DNA (cfDNA) has emerged as a promising biomarker in various clinical applications, particularly in cancer detection, prenatal diagnostics, and disease monitoring. Accurate classification of cfDNA sequences is crucial for improving diagnostic reliability and enabling timely clinical decisions. This study investigates the application of machine learning models—Decision Tree (DT), Support Vector Machine (SVM), and Deep Neural Network (DNN)—for classifying cfDNA sequences using k-mer-based feature extraction, with k set to 3. A total of 3,000 DNA sequences comprising both normal and tumor-derived samples were transformed into numerical feature vectors based on the frequency of 3-mer patterns. The models were trained and evaluated using standard metrics including accuracy, precision, recall, and F1-score. Experimental results demonstrate that the DNN model achieved the highest classification performance, effectively distinguishing between normal and tumor cfDNA. In contrast, the DT and SVM models exhibited relatively lower performance, particularly in identifying normal sequences. The study also addresses challenges such as class imbalance and limitations of simple k-mer representations. These findings highlight the potential of deep learning approaches in improving cfDNA sequence analysis and open avenues for future research using more complex models, larger datasets, and feature engineering techniques to enhance classification accuracy and clinical applicability.
Copyrights © 2025