Salleh, Ahmad Zarif
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Applying Data Mining on Personal Computer for Document Classification Chai, Ian; Salleh, Ahmad Zarif
JOIV : International Journal on Informatics Visualization Vol 9, No 3 (2025)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.9.3.3473

Abstract

The typical user creates documents over many years of computer usage. As people move from computer to computer, they tend to copy the files to the new computer, because "you never know when we might need to refer to something from the past." Hence, the collection grows larger and larger, expanding to hundreds and thousands. This collection soon exceeds the ability of most people to remember what each document was, even if they have been keeping them in some order in folders – and many people fail to anticipate how the folders and subfolders should be arranged as time passes – and by the time they realize it, most find it too daunting a task to reclassify them all manually. Therefore, we sought to solve this problem using a data mining-based solution, specifically multinomial naive Bayes. We developed a document classification program to automatically categorize all documents stored on a person's personal computer hard drive, eliminating the need for manual classification. The proposed algorithm achieved a score of 0.853 for accuracy, 9,833 for precision, 0.661 for recall, and 0.767 for the F1 metric. It should be possible, with further refinement and improvement, for example by balancing the dataset and increasing its size, for this technique to be applied in practical applications that enable automatic document classifications on the computers of most computer users.