Claim Missing Document
Check
Articles

Found 4 Documents
Search

Text Classification of Indonesian Translated Hadith Using XGBoost Model and Chi-Square Feature Selection Putri, Dita Julaika; Dwifebri, Mahendra; Adiwijaya, Adiwijaya
Building of Informatics, Technology and Science (BITS) Vol 4 No 4 (2023): March 2023
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v4i4.2944

Abstract

Aside from the Holy Qur'an, Hadith is indeed a life guide that every Muslims in this world must follow. The technology for classifying texts and sentences, including categorizing hadiths, is evolving in tandem with the advancement of the times. The model used to perform classification has also been developed and optimized such as the use of the XGBoost algorithm which is more optimized than the previous tree algorithm. This can also make it easier for us as Muslims to study hadiths by categorizing them according to recommendations, prohibitions, and information. This study conducted text classification of Indonesian translations of hadith texts based on recommendations, prohibitions, and information using the XGBoost algorithm, TF-IDF for its feature extraction, and Chi-Square for its feature selection. In this study, experiments were carried out by changing the order of the preprocessing process for the stopword removal and stemming parts, performing the classification process with and without using chi-square as a feature selection, and adding parameter value during the modeling process with XGBoost and the highest final results obtained were 79% for accuracy, 79% for precision, 78% for recall and 78% for F1-score.
Sentiment Analysis on Movie Review from Rotten Tomatoes Using Logistic Regression and Information Gain Feature Selection Abimanyu, Arsenio Jusuf; Dwifebri, Mahendra; Astuti, Widi
Building of Informatics, Technology and Science (BITS) Vol 5 No 1 (2023): June 2023
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v5i1.3595

Abstract

The advancement and development of technology today can have a positive influence on the use of the internet and also on the dissemination of information it contains, including information about the world of cinema. With this convenience, there are many movie reviews that can be obtained easily. Movie reviews are very influential in the various ways movies are available. Thanks to the ease of various information on the internet, the number of movie reviews has become diverse. Therefore, it is necessary to do a sentiment analysis. In this research, the classification method used is Logistic Regression. The method was chosen because it has accurate classification accuracy. In this study, Information Gain was also chosen as a feature selection because it is good enough to do a filter approach in classification. Furthermore, for feature extraction, TF-IDF was chosen because it can overcome data imbalance in the dataset. The best model resulting from this research is a model built without using stemming in the preprocessing stage, without using information gain feature selection, and using parameters in Logistic Regression which produces an f1-score of 76.50%.
Handling Imbalanced Data Sets Using SMOTE and ADASYN to Improve Classification Performance of Ecoli Data Sets Halim, Anthony Mas; Dwifebri, Mahendra; Nhita, Fhira
Building of Informatics, Technology and Science (BITS) Vol 5 No 1 (2023): June 2023
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v5i1.3647

Abstract

In this digital era, machine learning is a technology that is in demand by organizations and individuals. In the age of data and digital information, the ability to process data efficiently is needed. As the amount of data grows, there are various problems in machine learning. One of them is that with the increasing amount of data, class imbalance is also often found. Class imbalance is a condition where a class dominates another class, in one example case is when the positive value class has less number than the negative class. The class that is less in number is categorized as the minority class, while the class that dominates the dataset is called the majority class. Class imbalance can affect classification performance in a bad way, so handling imbalanced classes is needed to improve classification results. Classification of imbalanced data using Random Forest has satisfactory results, as well as by implementing SMOTE and ADASYN as sampling methods because they are highly popular and easy to implement. The best model produced in this study is the model that applies SMOTE oversampling on a dataset with 10% IR with a balanced accuracy of 98.75%, and the best result when applying ADASYN oversampling is on a dataset with 13% IR and a balanced accuracy of 99.03%.
Klasifikasi Komentar Toxic Pada Sosial Media Menggunakan SVM, Information Gain dan TF-IDF Ilham Maulana, Muhammad; Muslim Lhaksmana, Kemas; Dwifebri, Mahendra
eProceedings of Engineering Vol. 10 No. 5 (2023): Oktober 2023
Publisher : eProceedings of Engineering

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Abstrak — Sosial media merupakan suatu bentuk perantara interaksi sosial secara online. Aplikasi media sosial pun sudah dalam banyak bentuk dan di dalam sosial media ini meskipun banyak hal positif yang dapat diambil, ada beberapa juga halhal negatif contoh nya toxic comment. Toxic comment sendiri tidaklah mudah untuk dideteksi secara manual, maka penelitian berencana untuk mengklasifikasikan toxic comment tersebut menggunakan machine learning. Beberapa penelitian untuk klasifikasi toxic comment sudah dilakukan, dalam beberapa penelitian tersebut digunakan metode Support Vector Machine. Dalam penelitian ini metode yang digunakan adalah Support Vector Machine (SVM) sebagai classifier, Information Gain sebagai feature selection dan TF- IDF sebagai feature extraction. Data-data yang dikumpulkan adalah melalui cuitan twitter beberapa pengguna di media sosial tersebut. Komentarkomentar tersebut dikumpulkan menjadi satu lalu diklasifikasikan menggunakan metode-metode yang sudah disebutkan.Kata kunci— Sosial media, Klasifikasi teks, Toxic comment, SVM