Jurnal Transformatika
Vol. 19 No. 2 (2022): January 2022

Content Classification based-on Latent Semantic Analysis and Support Vector Machine (LSA-SVM)

Marthasari, Gita Indah (Unknown)
Hayatin, Nur (Unknown)
Yuniarti, Maulidya (Unknown)



Article Info

Publish Date
09 Feb 2022

Abstract

The diversity of the content of a web page can have a negative impact if used by the wrong user. Almost a half of internet users are children. Therefore, it is important to classify web pages to find out which pages are worthy of being seen by children and that are not feasible. One method that can be used is the Support Vector Machine (SVM) algorithm. SVM is a binary classification whose working principle is to find the best hyperplane to separate the two classes. To obtain better classification accuracy, the SVM is combined with the Latent Semantic Analysis (LSA) algorithm. The data used in this study were taken from the DMOZ web data which has been classified into two categories. The data is then entered into the pre-processing stage for further feature extraction using LSA. The LSA algorithm is used to find out the semantic similarities of words and text contained in web pages. The results of feature extraction are then classified using SVM with RBF kernel. Based on the testing result, we obtain a classification accuracy of 64%.

Copyrights © 2022






Journal Info

Abbrev

TRANSFORMATIKA

Publisher

Subject

Computer Science & IT

Description

Transformatika is a peer reviewed Journal in Indonesian and English published two issues per year (January and July). The aim of Transformatika is to publish high-quality articles of the latest developments in the field of Information Technology. We accept the article with the scope of Information ...