Claim Missing Document
Check
Articles

Klasifikasi Sentimen pada Dataset Terbatas Menggunakan Random Forest dan Word2Vec Fitri, Dina Deswara; Agustian, Surya; Pizaini, Pizaini; Sanjaya, Suwanto
Journal of Computer System and Informatics (JoSYC) Vol 6 No 1 (2024): November 2024
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/josyc.v6i1.6246

Abstract

Sentiment measurement of public opinion on social media is essential for understanding societal views on various issues, including public figures and political events. This research explores the effectiveness of the Random Forest algorithm with Word2Vec-based word representation for sentiment classification on a limited dataset. The case study involves tweets regarding Kaesang Pangarep as the Chairman of PSI, supplemented by external data related to Covid-19 and general topics. The dataset was processed using cleaning techniques, case folding, stopword removal, stemming, and tokenization. Words in the dataset were represented using the Word2Vec model with a Continuous Bag of Words (CBOW) architecture and a vector dimension of 500. Random Forest was employed to classify sentiment into positive, negative, or neutral categories. In the initial phase, the model was trained using 300 samples per label; however, the results showed unsatisfactory performance with an F1-Score of 49.00% and an accuracy of 50.00%. To improve performance, the dataset was expanded by adding 900 samples from Kaesang and 1,080 samples from external topics. The final results indicated an improvement with an F1-Score of 49.89%, an accuracy of 58.29%, precision of 49.16%, and recall of 56.47%. This research confirms that the use of Random Forest with word representation from Word2Vec can enhance sentiment classification performance, even with a limited dataset, and contributes to the development of sentiment analysis techniques in the field of machine learning.
Pengaruh Agregasi Data pada Klasifikasi Sentimen untuk Dataset Terbatas Menggunakan SGD Classifier Fauzan Ray T; Surya Agustian; Febi Yanto; Pizaini
Computer Science and Information Technology Vol 5 No 3 (2024): Jurnal Computer Science and Information Technology (CoSciTech)
Publisher : Universitas Muhammadiyah Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Social media, especially Twitter or X, is a rich source of data for sentiment analysis. However, dataset limitation is a major challenge in utilizing machine learning, especially to produce fast and accurate sentiment analysis. This research applies data aggregation techniques to expand the training dataset and tests various preprocessing steps, such as cleaning, case folding, normalization, stemming, and lexicon-based methods. The classification method used is Stochastic Gradient Descent Classifier with text representation using Fast Text language model to generate word embedding. Lexicon-based preprocessing, particularly for emoji and emoticon handling, shows significant impact when data is added, as it is able to capture additional emotion and context that is often overlooked in conventional text analysis. Experimental results show that data addition and preprocessing optimization improved F1 Score from a baseline of 40% to 52.13%, surpassing the organizer which reached 51.28%. These findings emphasize the importance of data aggregation, preprocessing optimization, and parameter tuning using grid search in improving model performance on text sentiment classification with limited datasets.
Gated Recurrent Unit (GRU) for Sentiment Classification on Imbalanced Data: The COVID-19 Vaccine Program in Twitter Hadi, Mukhlis; Agustian, Surya
MATICS: Jurnal Ilmu Komputer dan Teknologi Informasi (Journal of Computer Science and Information Technology) Vol 17, No 1 (2025): MATICS
Publisher : Department of Informatics Engineering

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.18860/mat.v17i1.27995

Abstract

Abstract— The initial implementation of the COVID-19 vaccination by the Indonesian government sparked mixed reactions from the public, ranging from strong support to fierce opposition. These differing opinions influenced individuals' decisions to either accept or refuse the vaccination program for themselves or their families. Public sentiment, expressed through posts, comments, or status updates, provides valuable insights into vaccine acceptance or rejection. This study conducts sentiment analysis using deep learning techniques, specifically employing the Gated Recurrent Unit (GRU) method on Twitter data. The dataset consists of three sentiment classes: positive, negative, and neutral. The Word2Vec word embedding model was used as input and trained on a COVID-19 vaccination sentiment dataset collected from Twitter. Since the classes in the existing data tweets are imbalanced, some other steps are required to improve the classification. The best-performing model achieved an F1-score of 66% and an accuracy of 69%. This classification model effectively addresses the class imbalance problem, delivering competitive results compared to other methods.
Pengembangan Aplikasi Pendeteksi Daging Sapi dan Babi Menggunakan Deep Learning Arsitektur EfficientNet-B6 Berbasis Android Pangestu, Yoga; Sanjaya, Suwanto; Jasril; Agustian, Surya; Safaat, Nazruddin
Jurnal Informatika Ekonomi Bisnis Vol. 7, No. 2 (June 2025)
Publisher : SAFE-Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37034/infeb.v7i2.1195

Abstract

The advancement of digital technology has generated a demand for applications that assist the public in ensuring the halal status of food products, particularly in distinguishing between beef and pork. This study aims to develop an Android-based application for detecting beef and pork using Deep Learning methods with the EfficientNet-B6 architecture, employing the eXtreme Programming software development approach. The image classification model utilizes a Convolutional Neural Network architecture integrated into a Python-based server, while the user interface is developed with Java in Android Studio. System testing was conducted using black-box methods on several Android devices, with varying room conditions and meat types. The results show that the application can classify meat with an accuracy of 66.7%, considering room conditions such as light and dark environments, and meat types including fatty and non-fatty. This application provides fast response times and a user-friendly interface. This application is expected to enable users to independently and efficiently verify the halal status of meat, thereby supporting the needs of Muslim consumers in the digital era.
Klasifikasi Sentimen Pada Dataset yang Terbatas Menggunakan Algoritma Convolutional Neural Network Saputra, M Ridho; Surya Agustian; Jasril; Novriyanto
Bulletin of Computer Science Research Vol. 5 No. 4 (2025): June 2025
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bulletincsr.v5i4.613

Abstract

This study aims to analyze public responses to the appointment of Kaesang Pangarep as the Chairman of the Indonesian Solidarity Party (PSI) using a sentiment classification approach based on the Convolutional Neural Network (CNN) algorithm. The primary dataset consists of 300 Indonesian-language tweets categorized into three sentiment classes: positive, negative, and neutral. The limited size of the training data presents a major challenge, as it can hinder the model's ability to generalize. To address this issue, data augmentation was carried out by incorporating external datasets with Covid-19 and Open Topic themes. The preprocessing stages include text cleaning, normalization, and tokenization. The developed CNN model utilizes a layered architecture and applies regularization techniques such as L2 and dropout to reduce the risk of overfitting. Accuracy, F1-score, precision, and recall were used as performance evaluation metrics. Experimental results show that the best performance was achieved when the Kaesang and Covid-19 datasets were combined, yielding an F1-score of 0.62 on the validation set and 0.51 on the test set. These findings indicate that adding external data can improve classification accuracy, even under limited data conditions. This study contributes to the development of deep learning-based sentiment classification methods for Indonesian-language texts.
Analisis Sentimen Ulasan Aplikasi Indodax Pada Google Play Store Dengan Algoritma Random Forest Muhammad Iqbal Maulana; Yusra; Muhammad Fikry; Surya Agustian; Siti Ramadhani
Bulletin of Computer Science Research Vol. 5 No. 4 (2025): June 2025
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bulletincsr.v5i4.626

Abstract

Crypto assets have become a global phenomenon with a significant increase in the number of investors in Indonesia. Indodax, as the largest crypto asset trading platform in Indonesia, has contributed to the growth of this ecosystem and received many user reviews through the Google Play Store. With more than 5 million downloads and 100 thousand reviews, sentiment analysis is an important tool to understand user perceptions of Indodax services. The results of manual labeling show that the majority of reviews are positive (3989 reviews), while neutral and negative sentiments are 477 and 534 reviews respectively. From the research and testing that has been carried out using the Random Forest method and optimizing with Hyperparameter Tuning GridSearchCV on 4 test scenarios. The best results were obtained in Scenario 4 (3 Preprocessing Stages (Cleaning, Case Folding, and Tokenization) + Random Forest & Hyperparameter Tuning) producing the best value, with Precision 81%, Recall 64%, F1-Score 70% and Accuracy 89%. With the best parameter values ??{'criterion': 'entropy', 'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}. This study shows that every experimental model that is optimized produces a higher value than experimental model that is not optimized.
Pengembangan Aplikasi Pendeteksi Daging Sapi dan Babi Menggunakan Deep Learning Arsitektur EfficientNet-B6 Berbasis Android Pangestu, Yoga; Sanjaya, Suwanto; Jasril; Agustian, Surya; Safaat, Nazruddin
Jurnal Informatika Ekonomi Bisnis Vol. 7, No. 2 (June 2025)
Publisher : SAFE-Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.37034/infeb.v7i2.1195

Abstract

The advancement of digital technology has generated a demand for applications that assist the public in ensuring the halal status of food products, particularly in distinguishing between beef and pork. This study aims to develop an Android-based application for detecting beef and pork using Deep Learning methods with the EfficientNet-B6 architecture, employing the eXtreme Programming software development approach. The image classification model utilizes a Convolutional Neural Network architecture integrated into a Python-based server, while the user interface is developed with Java in Android Studio. System testing was conducted using black-box methods on several Android devices, with varying room conditions and meat types. The results show that the application can classify meat with an accuracy of 66.7%, considering room conditions such as light and dark environments, and meat types including fatty and non-fatty. This application provides fast response times and a user-friendly interface. This application is expected to enable users to independently and efficiently verify the halal status of meat, thereby supporting the needs of Muslim consumers in the digital era.
Perbandingan Performa Random Forest dan Long Short-Term Memory dalam Klasifikasi Teks Multilabel Terjemahan Hadits Bukhari: Comparison of Random Forest and Long Short-Term Memory Performance in Multilabel Text Classification of Bukhari Hadith Translation Ahmad, Rizmah Zakiah Nur; Harahap, Nazruddin Safaat; Agustian, Surya; Iskandar, Iwan; Sanjaya, Suwanto
MALCOM: Indonesian Journal of Machine Learning and Computer Science Vol. 5 No. 3 (2025): MALCOM July 2025
Publisher : Institut Riset dan Publikasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.57152/malcom.v5i3.2046

Abstract

Hadits merupakan fondasi utama kedua dalam Islam, yang memandu umat Islam dalam menafsirkan nilai-nilai Islam dan mengimplementasikannya secara nyata dalam berbagai aspek kehidupan. Salah satu perawi hadits yang paling dihormati adalah Imam Bukhari, yang dikenal dengan ketelitian dan ketegasannya dalam memilih hadits-hadits yang otentik. Penelitian ini menggunakan data dari terjemahan hadis dari Sahih Bukhari ke dalam bahasa Indonesia yang telah diklasifikasikan ke dalam tiga kategori utama, yaitu anjuran, larangan, dan informasi. Untuk mengidentifikasi karakteristik masing-masing kategori, klasifikasi teks dilakukan dengan menggunakan dua metode populer, yaitu Random Forest (RF) dan Long Short-Term Memory (LSTM), yang dikenal efektif dalam memproses data teks berskala besar dan kompleks. Tujuan dari penelitian ini adalah untuk menguji perbedaan kinerja antara kedua metode tersebut dalam mengelompokkan hadis yang datanya telah lengkap. Hasil evaluasi menunjukkan bahwa metode RF mencapai akurasi tertinggi sebesar 89,48%, sedikit lebih unggul dari LSTM yang memperoleh 88,52%. Kedua metode mencatat nilai Hamming Loss yang sama, yaitu 0,1048 (89,52%). Temuan ini menunjukkan bahwa kelengkapan dan kualitas data hadis Bukhari berkontribusi dalam meningkatkan akurasi klasifikasi dengan memberikan konteks dan variasi yang lebih baik untuk model.
Perbandingan Performa Metode Klasifikasi Teks Multilabel Hadis Terjemahan Bukhari Menggunakan Support Vector Machine dan Long Short Term Memory: Performance Comparison of Multilabel Text Classification Methods on Translated Hadiths of Bukhari Using Support Vector Machine and Long Short Term Memory Ramadhani, Aulia; Safaat, Nazruddin; Agustian, Surya; Iskandar, Iwan; Sanjaya, Suwanto
MALCOM: Indonesian Journal of Machine Learning and Computer Science Vol. 5 No. 3 (2025): MALCOM July 2025
Publisher : Institut Riset dan Publikasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.57152/malcom.v5i3.2051

Abstract

Hadis merupakan sumber hukum kedua dalam Islam, dan salah satu kitab hadis yang paling dikenal adalah Shahih al-Bukhari. Untuk mendukung pemahaman dan pengamalan yang tepat, hadis perlu diklasifikasikan secara akurat. Mengingat satu hadis dapat mengandung lebih dari satu informasi, pendekatan klasifikasi multilabel menjadi sangat relevan. Penelitian ini bertujuan untuk memberikan kontribusi dalam bidang klasifikasi teks dengan mengeksplorasi kombinasi metode dan parameter yang optimal untuk klasifikasi multilabel hadis. Hasil penelitian menunjukkan bahwa Support Vector Machine (SVM) memberikan performa terbaik pada label Larangan dengan Macro F1-score sebesar 82,57%, melalui kombinasi SVM + TF-IDF menggunakan kernel = linear, parameter C (regularization parameter) = 1 tanpa stopword removal dan tanpa balancing. Sementara itu, Long Short Term Memory (LSTM) juga unggul pada label Larangan dengan Macro F1-score 82,66% pada kombinasi parameter Epoch = 20, Dropout = 0.5, Dense = 128 dan Batch Size = 64 tanpa stopword removal dan tanpa balancing kombinasi ini juga menghasilkan nilai Hamming Loss terendah sebesar 10,452%, yang lebih baik dibandingkan dengan penelitian sebelumnya serta menunjukkan bahwa LSTM terbukti lebih efektif secara keseluruhan dengan penyetelan parameter yang tepat. Penelitian ini juga berkontribusi dalam peningkatan kualitas data dengan melengkapi matan hadis yang digunakan, sehingga menghasilkan performa klasifikasi yang lebih baik.
Klasifikasi Sentimen Menggunakan Metode Multilayer Perceptron dengan Fitur TF-IDF: Sentiment Classification Using Multilayer Perceptron Algorithm with TF-IDF Features Arasy, Abdurrahman; Agustian, Surya; Handayani, Lestari; Iskandar, Iwan
MALCOM: Indonesian Journal of Machine Learning and Computer Science Vol. 5 No. 3 (2025): MALCOM July 2025
Publisher : Institut Riset dan Publikasi Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.57152/malcom.v5i3.2052

Abstract

Media sosial, khususnya Twitter (X), telah menjadi platform utama dalam diskusi politik dan kebijakan pemerintah. Istilah dalam pengiriman pesan pada Twitter dikenal sebagai Tweet yang terdiri dari pesan dengan maksimal 280 karakter. Meskipun Tweet seringkali hanya berupateks, juga dapat menyertakan hyperlink, video, dan jenis media lainnya yang dapat digunakan untuk mengukur opini publik. penelitian ini bertujuan mengklasifikasikan sentimen masyarakat terkait pengangkatan Kaesang Pangarep sebagai Ketua Umum Partai Solidaritas Indonesia (PSI) dengan metode Multi-Layer Perceptron (MLP) Classifier dengan pendekatan Term Frequency-Inverse Document Frequency (TF-IDF) menggunakan bahasa pemograman python. Data yang digunakan terdiri dari 300 tweet, dengan 100 tweet perkelas atau opsi untuk hasil yang optimal. Tiga kategori tersebut adalah positif, netral, dan negatif. Berdasarkan penelitian yang telah dilakukan metode terbaik mencapai F1-score sebesar 0,6767 dan akurasi 0,6667. Hasil ini menunjukkan bahwa kombinasi MLP Classifier dan TF-IDF dapat mengatasi keterbatasan dataset hingga tingkat tertentu dibandingkan metode baseline. Penelitian ini juga memberikan wawasan tentang optimasi klasifikasi sentimen dalam kondisi data terbatas, yang dapat diterapkan pada topik lain dengan permasalahan serupa
Co-Authors .Safrizal, Safrizal Afdhal Zikri Afriyanti, Liza Aftari, Dhea Putri AGUNG SUCIPTO Ahmad, Rizmah Zakiah Nur Alfitra Salam Arasy, Abdurrahman Ash Shiddicky Aulia Ramadhani Ayu Fransiska Delifah, Nur Dermawan, Jozu Dzaky Abdillah Salafy Eka Pandu Cynthia, Eka Pandu El Saputra, Yoga Elin Haerani Elvia Budianita Fahrezy, Irgi Faizah Husniah Fauzan Ray T Fauzi Ihsan Febi Yanto Febrian Rizki Adi Sutiyo Fitri Insani Fitri Insani Fitri Wulandari Fitri, Dina Deswara Fuji Astuti Habib Hakim Sinaga Hadi, Mukhlis Halimah Hasibuan, Ilham Habibi Heru Wibowo Idhafi, Zaky Iffa, Marwika Rifattul Ihsan, Miftahul Iis Afrianty Iis Afrianty Iis Afrianty Iis Afrianty Illahi, Ridho Iman Fauzi Aditya Sayogo Indri Pangestuti Iwan Iskandar Jasril Jasril Jasril Jasril Jasril Jasril Lestari Handayani Lubis, Anggun Tri Utami BR. Miftah Farid Muhammad Fikry Muhammad Fikry Muhammad Iqbal Maulana Muhammad Irsyad Muhammad Irsyad Muhammad Ravil Muktar Sahbuddin Mukti M Kusairi Mulyadi, Syahrul Nadila Handayani Putri naldi, Afri Nazir, Alwis Nazruddin Safaat Nazruddin Safaat H Nazruddin Safaat H Negara, Benny Sukma Novriyanto Novriyanto Novriyanto Nurul Fatiara Oktavia, Lola Pangestu, Yoga Pizaini Pizaini Pranata, Joni Prima Yohana Putri Zahwa Putri, Adilah Atikah Putri, Atika Rahmad Abdillah Rahmad Kurniawan Ramadhani, Siti Reski Mai Candra Reski Mai Candra Rizqa Raaiqa Bintana Safrizal, Afri Naldi Salam Kurniawan Saputra, Ikhsan Dwi Saputra, M Ridho Saputra, Nugroho Wahyu Sinaga, Habib Hakim Siska Kurnia Gusti Siti Ramadhani Siti Ramadhani Siti Ramadhani Sri Puji Utami A. Subhi, Yazid Abdullah Suci Rahayu Sulistia Ningsih, Sulistia Suwanto Sanjaya Syaiful Azhar Trya Ayu Pratiwi Utari, Roid Fitrah Yusra Yusra Yusra, Yusra