Garuda - Garba Rujukan Digital

Jurnal Transformatika

Vol. 19 No. 2 (2022): January 2022

Marthasari, Gita Indah (Unknown)
Hayatin, Nur (Unknown)
Yuniarti, Maulidya (Unknown)

Publish Date
09 Feb 2022

The diversity of the content of a web page can have a negative impact if used by the wrong user. Almost a half of internet users are children. Therefore, it is important to classify web pages to find out which pages are worthy of being seen by children and that are not feasible. One method that can be used is the Support Vector Machine (SVM) algorithm. SVM is a binary classification whose working principle is to find the best hyperplane to separate the two classes. To obtain better classification accuracy, the SVM is combined with the Latent Semantic Analysis (LSA) algorithm. The data used in this study were taken from the DMOZ web data which has been classified into two categories. The data is then entered into the pre-processing stage for further feature extraction using LSA. The LSA algorithm is used to find out the semantic similarities of words and text contained in web pages. The results of feature extraction are then classified using SVM with RBF kernel. Based on the testing result, we obtain a classification accuracy of 64%.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal Transformatika

Website

Abbrev

TRANSFORMATIKA

Publisher

Universitas Semarang

Subject

Computer Science & IT

Description

Transformatika is a peer reviewed Journal in Indonesian and English published two issues per year (January and July). The aim of Transformatika is to publish high-quality articles of the latest developments in the field of Information Technology. We accept the article with the scope of Information ...

Article Info

Abstract

Content Classification based-on Latent Semantic Analysis and Support Vector Machine (LSA-SVM)

Article Info

Abstract