Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Journal of Data Science and Its Applications

Multi Label Topic Classification for Hadith Bukhari in Indonesian Translation using Random Forest Adhitia Wiraguna; said al faraby; Adiwijaya Adiwijaya
Journal of Data Science and Its Applications Vol 4 No 1 (2021): Journal of Data Science and Its Applications
Publisher : Telkom University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34818/jdsa.2021.4.70

Abstract

Hadith is a mandatory thing to be studied and practiced by Muslims. There are many types of teachingsthat humans can take by studying the hadith. To assist Muslims in studying the hadith, a multi labelclassification system is needed to categorize Sahih Bukhari Hadi in Indonesian translation based on threetopics, namely prohibition, advice and information. In building a text classification system, there are variousclassification methods that can be used, in this study using Random Forest (RF). The simplicity of the RFalgorithm and good ability to deal with high dimensional data, make RF a suitable method of textclassification. But, there is not widely known RF capability for the multi label classification. This study usesthe Problem Transformation approach method, namely Binary Relevance (BR) and Label Powerset (LP)to adapt RF in building a multi label classification system. The results showed that the best hamming lossperformance obtained from a system that used BR and does not use stemming which is equal to 0,0663.These results indicate that the BR method is better than the LP method in adapting the RF algorithm toperform multi label classification of hadith data. This is happened because the BR method produces aclassification model of the number of labels in the hadith data and on the other hand, the transformation ofdata from the use of LP makes the data are imbalanced.
Sentiment Analysis of Beauty Product Reviews Using the K-Nearest Neighbor (KNN) and TF-IDF Methods with Chi-Square Feature Selection Yusrifa Deta Kirana; Said Al Faraby
Journal of Data Science and Its Applications Vol 4 No 1 (2021): Journal of Data Science and Its Applications
Publisher : Telkom University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34818/jdsa.2021.4.71

Abstract

The rise of beauty products in recent times can make consumers hesitate to choose a beauty product, especially for women. Beauty product reviews have become a very valuable source of information for consumers in making decisions to purchase a product in improving their products and marketing strategies. The process of sentiment analysis on negative and positive beauty product reviews will be classified one by one. Therefore, in this study, sentiment analysis was applied to the beauty product review data using the K-Nearest Neighbor (KNN) method to find the best k in the case of this study. The dataset used will be pre-processed with case folding, noise removal, tokenization, stemming, stopword removal, and slang words, for feature extraction using Term Frequency Inverse Document Frequency (TF-IDF) to calculate the weight of a word in the document, and The feature selection method uses Chi-Square which aims to select the features needed to increase the accuracy value. In this study, the best accuracy value was 71% of the data classified using KNN with a k value of 50 and the model on feature selection with 76 features.