Indu Kashyap
Manav Rachna International Institute of Research Studies

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Indonesian Journal of Electrical Engineering and Computer Science

Classification based topic extraction using domain-specific vocabulary: a supervised approach Vandana Kalra; Indu Kashyap; Harmeet Kaur
Indonesian Journal of Electrical Engineering and Computer Science Vol 26, No 1: April 2022
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v26.i1.pp442-449

Abstract

Recently, a probabilistic topic modelling approach, latent dirichlet allocation (LDA), has been extensively applied in the arena of document classification. However, classical LDA is an unsupervised algorithm implemented using a fixed number of topics without prior domain knowledge and generates different outcomes with the change in the order of documents. This article presents a comprehensive framework to evade the order effect and unsupervised probabilistic nature. First, the framework creates the vocabulary specific to the category using a weight-dependent model that extracts distinctive features suitable for supervised classification. Then, it transforms a classified cluster of documents from the domain corpus to the relevant topic making it more robust to noise. The framework was tested on a comprehensive collection of benchmark news datasets that vary in sample size, class characteristics, and classification tasks. In contrast to the conventional classification methods, the proposed framework achieved 95.56% and 95.23% accuracy when applied on two datasets, indicating that the proposed algorithm has a better classification capability. Furthermore, the topics extracted from the classified clusters are highly relevant to domain categories.