Claim Missing Document
Check
Articles

Found 32 Documents
Search

Analysis of Data and Feature Processing on Stroke Prediction using Wide Range Machine Learning Model Wisesty, Untari Novia; Wirayuda, Tjokorda Agung Budi; Sthevanie, Febryanti; Rismala, Rita
JOIN (Jurnal Online Informatika) Vol 9 No 1 (2024)
Publisher : Department of Informatics, UIN Sunan Gunung Djati Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15575/join.v9i1.1249

Abstract

Stroke is a disease which cause the death of brain cells, so that the part of the body controlled by the brain loses its function. If not treated immediately, this disease can cause long-term disability, brain damage, and death. In this research, stroke prediction was carried out on the Stroke dataset acquired from the Kaggle dataset using various machine learning models. Then, data sampling techniques are used to handle data imbalance problems in the stroke dataset, which include Random Undersampling, Random Oversampling, and SMOTE techniques. Pearson Correlation and Principal Component Analysis are also used for dimensional reduction and analyzing the important features that are most influential in predicting stroke. Pearson Correlation produces five attributes that have the highest Pearson coefficient, namely age, hypertension, heart disease, blood sugar level, and marital status. Experimental results have demonstrated that the utilization of RUS, ROS, and SMOTE sampling techniques can significantly boost the F1-Score testing by an impressive 43.44%, 34.44%, and 35.55% respectively, as compared to experiments conducted without implementing any data sampling techniques. The highest F1-Score testing was achieved using the Support Vector Machine and Gaussian Naïve Bayes models, namely 0.83.
Sentiment Analysis on a Large Indonesian Product Review Dataset Romadhony, Ade; Al Faraby, Said; Rismala, Rita; Wisesty, Untari Novia; Arifianto, Anditya
Journal of Information Systems Engineering and Business Intelligence Vol. 10 No. 1 (2024): February
Publisher : Universitas Airlangga

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20473/jisebi.10.1.167-178

Abstract

Background: The publicly available large dataset plays an important role in the development of the natural language processing/computational linguistic research field. However, up to now, there are only a few large Indonesian language datasets accessible for research purposes, including sentiment analysis datasets, where sentiment analysis is considered the most popular task. Objective: The objective of this work is to present sentiment analysis on a large Indonesian product review dataset, employing various features and methods. Two tasks have been implemented: classifying reviews into three classes (positive, negative, neutral), and predicting ratings. Methods: Sentiment analysis was conducted on the FDReview dataset, comprising over 700,000 reviews. The analysis treated sentiment as a classification problem, employing the following methods: Multinomial Naí¯ve Bayes (MNB), Support Vector Machine (SVM), LSTM, and BiLSTM. Result: The experimental results indicate that in the comparison of performance using conventional methods, MNB outperformed SVM in rating prediction, whereas SVM exhibited better performance in the review classification task. Additionally, the results demonstrate that the BiLSTM method outperformed all other methods in both tasks. Furthermore, this study includes experiments conducted on balanced and unbalanced small-sized sample datasets. Conclusion: Analysis of the experimental results revealed that the deep learning-based method performed better only in the large dataset setting. Results from the small balanced dataset indicate that conventional machine learning methods exhibit competitive performance compared to deep learning approaches.   Keywords: Indonesian review dataset, Large dataset, Rating prediction, Sentiment analysis