Garuda - Garba Rujukan Digital

Sentiment Analysis of Presidential Candidates in 2024: A Comparison of the Performance of Support Vector Machine and Random Forest with N-Gram Method Ramadhan, Muhammad Rizki; Budiman, Kholiq
Recursive Journal of Informatics Vol. 3 No. 1 (2025): March 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/rji.v3i1.8385

Abstract. This paper conducts a sentiment analysis of presidential candidates in Indonesia's 2024 election using Twitter data. Utilizing the "Indonesia Presidential Candidate’s Dataset, 2024" from Kaggle, containing 8555 Twitter entries, sentiment was categorized as positive or negative. Preprocessing techniques cleaned and normalized the data, followed by labeling with the VADER lexicon. This study contributes insights into public sentiment towards presidential candidates and the effectiveness of machine learning algorithms for political sentiment analysis. Purpose: This study aims to analyze public sentiment towards presidential candidates in Indonesia's 2024 election using the N-Gram method. By employing Support Vector Machine and Random Forest algorithms, we compare their performance in sentiment analysis. Utilizing the "Indonesia Presidential Candidate’s Dataset, 2024" from Kaggle, containing 8555 Twitter data entries, we seek to provide insights into the electorate's perceptions and preferences, contributing to a deeper understanding of the political landscape during this crucial period. Methods/Study design/approach: The study uses Support Vector Machine (SVM) and Random Forest algorithms for sentiment analysis on a dataset of 8555 tweets about Indonesia’s 2024 presidential candidates. SVM, paired with TF-IDF, and Random Forest, paired with N-Gram, are used for feature extraction. The data is labeled using the Vader lexicon. Result/Findings: The study compared Support Vector Machine (SVM) with TF-IDF and Random Forest with N-Gram methods in analyzing public sentiment towards Indonesia's 2024 presidential candidates. Results showed Random Forest with N-Gram achieved 85% accuracy, outperforming SVM with TF-IDF at 82%. Novelty/Originality/Value: This study provides insights into sentiment analysis applied to the 2024 Indonesian presidential election, enhancing understanding of public sentiment dynamics. Comparing SVM with TF-IDF and Random Forest with N-Gram contributes to the field, suggesting avenues for future research such as integrating contextual information or social network analysis for deeper insights into political opinion trends.

Textual Entailment for Non-Disclosure Agreement Contract Using ALBERT Method Azmi, Abdillah; Alamsyah, Alamsyah
Recursive Journal of Informatics Vol. 3 No. 1 (2025): March 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/rji.v3i1.9730

Purpose: NDA (Non-Disclosure Agreement) is one type of contract letter. An NDA binds two or more parties who all agree that certain information shared or created by one party is confidential. This type of contract serves to protect sensitive information, maintain patent rights, or control the information shared. Reading and understanding a contract letter is a repetitive, time-consuming, and labor-intensive process. Nevertheless, the activity is still crucial in the business world, as it can bind two or more parties under the law. This problem is perfect for Artificial Intelligence using Deep Learning. Therefore, this research aims to test and develop a pretrained language model that is designed for understanding contract letters through Natural Language Inference task. Method The method used is to train model to perform the language inference task of textual entailment using CNLI (Contract NLI) dataset. ALBERT-base model version that has been tuned to perform textual entailment is used along with LambdaLR for early stopping and AdamW as optimizer. The model is pre-trained with CNLI dataset several times with multiple hyperparameter. Result: As a result, the ALBERT base model that was used showed an accuracy score of 85 and EM score up to 85.04 percent. Although this score is not the State of the Art of the CNLI benchmark, the trained model can outperform other base versions of model that based on BERT and BART, like SpanNLI BERT-base, SCROLLS (BART-base) and Unlimiformer (BART-base). Value: ALBERT is a model that focuses on memory efficiency and small size parameters while maintaining performance. This model is suitable for performing tasks that require long context understanding with minimum hardware requirements. Such a model could be promising for the future of NLP in the legal area.

Sentiment Analysis of Jobstreet Application Reviews on Google Play Store Using Support Vector Machine Algorithm with Adaptive Synthetic Shantika, Febryan Surya; Abidin, Zaenal
Recursive Journal of Informatics Vol. 3 No. 2 (2025): September 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/rji.v3i2.11891

Abstract. Purpose: This research aims to test the performance result of the Support Vector Machine (SVM) classification algorithm using the help of Adaptive Synthetic (ADASYN) oversampling to analyze sentiment in Jobstreet application reviews on the Google Play Store. Sentiment analysis is a significant method to understand the market needs and application improvement. Methods/Study design/approach: The dataset originates from Google Play reviews gained using the scrapping method, comprising 5,174 reviews with 11 attributes. The process begins with data scrapping, data labeling, and data preprocessing, including casefolding, tokenizing, filtering, and stemming using Python programs. The data is then weighted and split using an 80:20 ratio. Then applying oversampling ADASYN on a clean dataset before using SVM classification to produce the performance result. Result/Findings: Both scenarios are conducted on SVM classification to classify the dataset. The evaluation results indicate that using SVM classification without ADASYN produces an accuracy result of 89.08%. Other scenarios by using SVM classification with the ADASYN sampling approach produce an accuracy result of 89.95%. The performance in accuracy result by using the ADASYN sampling approach on SVM classification shows an increasing result of 0.87%. Novelty/Originality/Value: This study employs two result scenarios of SVM classification by using the ADASYN sampling approach. It contributes to the literature by demonstrating the usability of the ADASYN oversampling approach to optimalize the SVM classification result used for sentiment analysis in Jobstreet application reviews on the Google Play Store.

Improving Pantun Generator Performance with Fine Tuning Generative Pre-Trained Transformers Sodikkun, Achmat; Budiman, Kholiq
Recursive Journal of Informatics Vol. 3 No. 2 (2025): September 2025
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/ge6xey51

Purpose: The study aims to address the challenges in generating high-quality pantun, an important element of Indonesian cultural heritage. Traditional methods struggle with limited vocabulary, variation, and consistency in rhyme patterns. This research seeks to enhance the performance of a pantun generator by applying fine-tuning techniques to the Generative Pre-trained Transformers (GPT) model, coupled with post-processing, and validated by linguistic experts. Methods/Study design/approach: The research involves fine-tuning the GPT model using a dataset of Indonesian pantun. The methodology includes dataset collection, data pre-processing for cleaning and adjustment, and hyperparameter optimization. The effectiveness of the model is evaluated using perplexity and rhyme accuracy metrics. The study also incorporates post-processing to refine the generated pantun further. Result/Findings: The study achieved a best perplexity value of 14.64, indicating a strong predictive performance by the model. Post-processing significantly improved the rhyme accuracy of the generated pantun to 89%, a substantial improvement over previous studies by Siallagan and Alfina, which only achieved 50%. These results demonstrate that fine-tuning the GPT model, supported by appropriate hyperparameter settings and post-processing techniques, effectively enhances the quality of generated pantun. Novelty/Originality/Value: This research contributes to the development of generative applications in Indonesian, particularly in the context of cultural preservation. The findings highlight the potential of fine-tuning GPT models to improve language generation tasks and provide valuable insights for creative and educational applications. The validation by experts ensures that the generated pantun adheres to established writing standards

Neural Network Optimization Using Hybrid Adaptive Mutation Particle Swarm Optimization and Levenberg-Marquardt in Cases of Cardiovascular Disease Cahyani, Rima Ayu; Purwinarko, Aji
Recursive Journal of Informatics Vol. 2 No. 2 (2024): September 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/192vyt90

Abstract. Cardiovascular disease is a condition generally characterized by the narrowing or blockage of blood vessels, which can lead to heart attacks, chest pain, or strokes. It is the leading cause of death worldwide, accounting for approximately 31% or 17.9 million deaths each year globally. Deaths caused by cardiovascular disease are projected to continue increasing until 2030, with the number of patients reaching 23.3 million. As cases of death due to cardiovascular disease become more prevalent, early detection is crucial to reduce mortality rates. Purpose: Many previous researchers have conducted studies on predicting cardiovascular disease using neural network methods. This study extends these methods by incorporating feature selection and optimization with Hybrid AMPSO-LMA. The research is designed to explore the implementation and predictive outcomes of Hybrid AMPSO-LMA in optimizing MLP for cases of cardiovascular disease. Methods/Study design/approach: The first step in conducting this research is to download the Heart Disease Dataset from Kaggle.com. The dataset is processed through preprocessing by removing duplicates and transforming the data. Then, data mining processes are carried out using the MLP algorithm optimized with Hybrid AMPSO-LMA to obtain results and conclusions. This system is designed using the Python programming language and utilizes Flask for website access in HTML. Result/Findings: The research results demonstrate that the method employed by the author successfully improves the accuracy of predicting cardiovascular disease. Predicting cardiovascular disease using the MLP algorithm yields an accuracy of 86.1%, and after optimization with Hybrid AMPSO-LMA, the accuracy increases to 86.88%. Novelty/Originality/Value: This effort will contribute to the development of a more reliable and effective cardiovascular disease prediction system, with the goal of early identification of individuals exhibiting symptoms of cardiovascular disease.

Implementation of Random Forest with Synthetic Minority Oversampling Technique and Particle Swarm Optimization for Predicting Survival of Heart Failure Patients Zaaidatunni'mah, Untsa; Sugiharti, Endang
Recursive Journal of Informatics Vol. 2 No. 2 (2024): September 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/vavw2205

Abstract. Heart failure is caused by a disruption in the heart’s muscle wall, which results in the heart’s inability to pump blood in sufficient quantities to meet the body’s demand for blood. The increasing prevalence and mortality rates of heart failure can be reduced through early disease detection using data mining processes. Data mining is believed to aid in discovering and interpreting specific patterns in decision-making based on processed information. Data mining has also been applied in various fields, one of which is the healthcare sector. One of the data mining techniques used to predict a decision is the classification technique. Purpose: This research aims to apply SMOTE and PSO to the Random Forest classification algorithm in predicting the survival of heart failure patients and to determine its accuracy results. Methods/Study design/approach: To predict the survival of heart failure patients, we utilize the Random Forest classification algorithm and incorporate data imbalance handling with SMOTE and feature selection techniques with PSO on the Heart Failure Clinical Records Dataset. The data mining process consists of three distinct phases. Result/Findings: The application of SMOTE and PSO on the Heart Failure Clinical Records Dataset in the Random Forest classification process resulted in an accuracy rate of 93.9%. In contrast, the Random Forest classification process without SMOTE and PSO resulted in an accuracy rate of only 88.33%. This indicates that the proposed method combination can optimize the performance of the classification algorithm, achieving a higher accuracy compared to previous research. Novelty/Originality/Value: Data imbalance and irrelevant features in the Heart Failure Clinical Records Dataset significantly impact the classification process. Therefore, this research utilizes SMOTE as a data balancing method and PSO as a feature selection technique in the Heart Failure Clinical Records Dataset before the classification process of the Random Forest algorithm.

Analysis Of The Use Of Nazief-Adriani Stemming And Porter Stemming In Covid-19 Twitter Sentiment Analysis With Term Frequency-Inverse Document Frequency Weighting Based On K-Nearest Neighbor Algorithm Fikri, Muhammad; Abidin, Zaenal
Recursive Journal of Informatics Vol. 2 No. 2 (2024): September 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/fqc79v89

Abstract. This system was developed to determine the accuracy of sentiment analysis on Twitter regarding the COVID-19 issue using the Nazief-Adriani and Porter stemmers with TF-IDF weighting, along with a classification process using K-Nearest Neighbor (KNN) that resulted in a comparison of 48.24% for Nazief-Adriani and 48.24% for Porter. Purpose: This research aims to determine the accuracy of the Nazief-Adriani and Porter stemmer algorithms in performing text preprocessing using a dataset from Indonesian-language Twitter. This research involves word weighting using TF-IDF and classification using the K-Nearest Neighbor (KNN) algorithm. Methods/Study design/approach: The experimentation was conducted by applying the Nazief-Adriani and Porter stemmer algorithm methods, utilizing data sourced from Twitter related to COVID-19. Subsequently, the data underwent text preprocessing, stemming, TF-IDF weighting, accuracy testing of training and testing data using K-Nearest Neighbor (KNN) algorithm, and the accuracy of both stemmers was calculated employing a confusion matrix table. Result/Findings: This study obtained reasonably accurate results in testing the Nazief-Adriani stemmer with an accuracy of 50.98%, applied to sentiment analysis of COVID-19-related Twitter data using the Indonesian language. As for the accuracy of the Porter stemmer, it achieved an accuracy rate of 48.24%. Novelty/Originality/Value: Feature selection is crucial in stemmer accuracy testing. Therefore, in this study, feature selection is carried out using the Nazief-Adriani and Porter stemmers for testing purposes, and the accuracy data classification is conducted using the K-Nearest Neighbor (KNN) algorithm

Implementation of Raita Algorithm in Manado-Indonesia Translation Application with Text Suggestion Using Levenshtein Distance Algorithm Sekartaji, Novanka Agnes; Arifudin, Riza
Recursive Journal of Informatics Vol. 2 No. 2 (2024): September 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/pkvgtg90

Abstract. Manado City is one of the multidimensional and multicultural cities, possessing assets that are considered highly potential for development into tourism and development attractions. The current tourism assets being developed by the Manado City government are cultural tourism, as they hold a charm and allure for tourists. Hence, a communication tool in the form of a translation application is necessary for facilitating communication between visiting tourists and the native community of North Sulawesi, even for newcomers who intend to reside in North Sulawesi, given that the Manado language serves as the primary communication tool within the community. This research employs a combination of the Raita algorithm and the Levenshtein distance algorithm in its creation process, along with the confusion matrix method to calculate the accuracy of translation results using the Levenshtein distance algorithm with a text suggestion feature. The research begins by collecting a dataset consisting of Manado language vocabulary and their translations in Indonesia language, sourced from literature studies and original respondents from North Sulawesi, which have been validated by a validator to prevent translation data errors. The subsequent stage involves preprocessing the dataset, converting the entire content of the dataset to lowercase using the case folding process, and removing spaces at the start and end of texts using the trim function. Next, both algorithms are implemented, with the Raita algorithm serving for translation and the Levenshtein distance algorithm providing text suggestions for typing errors during the translation process. The accuracy results derived from the confusion matrix calculations during the translation process of 100 vocabulary words, accounting for typing errors, indicate that the Levenshtein distance algorithm is capable of effectively translating vocabulary accurately and correctly, even in the presence of typing errors, resulting in a high accuracy rate of 94,17%. Purpose: To determine the implementation of the Levenshtein distance and Raita algorithms in the process of using the Manado-Indonesian translation application, as well as the resulting accuracy level. Methods/Study design/approach: In this study, a combination of the Raita and Levenshtein distance algorithms is utilized in the translation application system, along with the confusion matrix method to calculate accuracy. Result/Findings: The accuracy achieved in the translation process using text suggestions from the Levenshtein distance algorithm is 94.17%. Novelty/Originality/Value: This research demonstrates that the combination of the Raita and Levenshtein distance algorithms yields optimal results in the vocabulary translation process and provides accurate outcomes from the use of effective text suggestions. This is attributed to the fact that nearly all the data used was successfully translated by the system, even in the presence of typographical errors.

Optimizing Random Forest for Predicting Thoracic Surgery Success in Lung Cancer Using Recursive Feature Elimination and GridSearchCV Putra, Deonisius Germandy Cahaya; Putra, Anggyi Trisnawan
Recursive Journal of Informatics Vol. 2 No. 2 (2024): September 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/cax5k765

Abstract. Lung cancer is one of the deadliest forms of cancer, claiming numerous lives annually. Thoracic surgery is a strategy to manage lung cancer patients; however, it poses high risks, including potential nerve damage and fatal complications leading to mortality. Predicting the success rate of thoracic surgery for lung cancer patients can be accomplished using data mining techniques based on classification principles. Medical data mining involves employing mathematical, statistical, and computational methods. In this study, the prediction of thoracic surgery success employs the random forest algorithm with recursive feature elimination for feature selection. The feature selection process yields the top 8 features. The 8 best features include 'DGN', 'PRE4', 'PRE5', 'PRE6', 'PRE10', 'PRE14', 'PRE30', and 'AGE'. Hyperparameter using GridSearchCV is then applied to enhance classification accuracy. The results of this method implementation demonstrate a predictive accuracy of 91.41%. Purpose: The study aims to develop and evaluate a Random Forest model with a Recursive Feature Elimination feature selection and applies hyperparameter GridSearchCV for predicting thoracic surgery success rate. Methods: This study uses the thoracic surgery dataset and applies various data preprocessing techniques. The dataset is then used for classification using the Random Forest algorithm and applies the Recursive Feature Elimination feature selection to obtain the best features. GridSearchCV is used in this study for hyperparameter. Result: The accuracy using the Random Forest algorithm and Recursive Feature Elimination feature selection with hyperparameters tuning GridSearchCV resulted in an accuracy of 91,41%. The accuracy was obtained from the following parameters values: bootstrap set to false, criterion set to gini, n_estimator equal to 100, max_depth set to none, min_samples_split equal to 4, min_samples_leaf equal to 1, max_features set to auto, n_jobs set to -1, and verbose set to 2 with 10-fold cross validation. Novelty: This study comparison and analysis of various dataset preprocessing methods and different model configurations are conducted to find the best model for predicting the success rate of thoracic surgery. The study also employs feature selection to choose the best feature to be used in classification process, as well as hyperparameter tuning to achieve optimal accuracy and discover the optimal values for these hyperparameters.

Sentiment Analysis on Twitter Social Media Regarding Covid-19 Vaccination with Naive Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) Saputra, Angga Riski Dwi; Prasetiyo, Budi
Recursive Journal of Informatics Vol. 2 No. 2 (2024): September 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/7h63ma50

Abstract. The Covid-19 vaccine is an important tool to stop the Covid-19 pandemic, however, there are pros and cons from the public regarding this Covid-19 vaccine. Purpose: These responses were conveyed by the public in many ways, one of which is through social media such as Twitter. Responses given by the public regarding the Covid-19 vaccination can be analyzed and categorized into responses with positive, neutral or negative sentiments. Methods: In this study, sentiment analysis was carried out regarding Covid-19 vaccination originating from Twitter using the Naïve Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) algorithms. The data used in this study is public tweet data regarding the Covid-19 vaccination with a total of 29,447 tweet data in English. Result: Sentiment analysis begins with data preprocessing on the dataset used for data normalization and data cleaning before classification. Then word vectorization was performed with TF-IDF and data classification was performed using the Naïve Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) algorithms. From the classification results, an accuracy value of 73% was obtained for the Naïve Bayes Classifier (NBC) algorithm and 83% for the Bidirectional Encoder Representations from Transformers (BERT) algorithm. Novelty: A direct comparison between classical models such as NBC and modern deep learning models such as BERT offers new insights into the advantages and disadvantages of both approaches in processing Twitter data. Additionally, this study proposes temporal sentiment analysis, which allows evaluating changes in public sentiment regarding vaccination over time. Another innovation is the implementation of a hybrid approach to data cleansing that combines traditional methods with the natural language processing capabilities of BERT, which more effectively addresses typical Twitter data issues such as slang and spelling errors. Finally, this research also expands sentiment classification to be multi-label, identifying more specific sentiment categories such as trust, fear, or doubt, which provides a deeper understanding of public opinion.

Home Page

OAI Link

Editorial Team

Contact

Reviewer

Google Scholar

Contact Name
-

Contact Email
rji@mail.unnes.ac.id

Phone
-

Journal Mail Official
rji@mail.unnes.ac.id

Editorial Address
Sekaran, Kec. Gn. Pati, Kota Semarang, Jawa Tengah 50229

Location
Kota semarang,

Jawa tengah

INDONESIA

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Home Page

OAI Link

Editorial Team

Contact

Reviewer

Google Scholar

Contact Name -

Contact Email rji@mail.unnes.ac.id

Phone -

Journal Mail Official rji@mail.unnes.ac.id

Editorial Address Sekaran, Kec. Gn. Pati, Kota Semarang, Jawa Tengah 50229

Location Kota semarang, Jawa tengah INDONESIA

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Contact Name
-

Contact Email
rji@mail.unnes.ac.id

Phone
-

Journal Mail Official
rji@mail.unnes.ac.id

Editorial Address
Sekaran, Kec. Gn. Pati, Kota Semarang, Jawa Tengah 50229

Location
Kota semarang,

Jawa tengah

INDONESIA