Wiharja, Salman
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparative Study of the Effect of Datasets and Machine Learning Algorithms for PDF Malware Detection Wiharja, Salman; Pradeka, Deden; Suteddy, Wirmanto
Digital Zone: Jurnal Teknologi Informasi dan Komunikasi Vol. 15 No. 1 (2024): Digital Zone: Jurnal Teknologi Informasi dan Komunikasi
Publisher : Publisher: Fakultas Ilmu Komputer, Institution: Universitas Lancang Kuning

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31849/digitalzone.v15i1.19744

Abstract

This research presents an innovative approach to detecting malicious PDFs through machine learning algorithms, focusing on the expansion of the Evasive-PDFMal2022 dataset. The objective is to enhance the accuracy of detecting malicious PDFs by enriching the dataset, augmenting its representation and diversity, and developing a practical tool—a website—for extracting and detecting malicious PDFs. The methodology involves updating and enlarging the dataset with additional malicious PDFs sourced from CVE and Exploit-db, along with non-malicious PDFs from diverse origins. Features are then extracted using the PDFID tool, and these 20 features serve as the foundation for implementing K-Nearest Neighbor (KNN), Random Forest, and Random Committee algorithms. The outcomes demonstrate that the model trained with the expanded dataset achieves a remarkable 99% accuracy, surpassing the performance of models relying solely on the Evasive-PDFMal2022 dataset. Additionally, this research significantly enhances the representation and diversity of the dataset while delivering a practical solution in the form of a website tailored for the extraction and detection of malicious PDFs.