cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 53 Documents
Search results for , issue "Vol 5, No 4: DECEMBER 2024" : 53 Documents clear
A Comprehensive Stacking Ensemble Approach for Stress Level Classification in Higher Education Fonda, Hendry; Irawan, Yuda; Melyanti, Rika; Wahyuni, Refni; Muhaimin, Abdi
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.388

Abstract

This research focuses on developing a comprehensive ensemble stacking model for the classification of student stress levels in higher education environments, specifically at Hang Tuah University Pekanbaru. Using a physiological dataset that includes parameters such as SPO2, heart rate, body temperature, systolic, and diastolic pressure, this research categorizes the condition of college students into four main categories: anxious, calm, tense, and relaxed. The data taken from public health centers in the period 2021 to 2024 was processed using the SMOTE technique to overcome data imbalance and K-Fold Cross Validation for model validation. In model development, a combination of basic algorithms such as SVM, Logistic Regression, Multilayer Perceptron, and Random Forest is used which is enhanced by boosting techniques through ADABoost, and XGBoost as a meta model. The test results show that the proposed stacking model is able to achieve 95% accuracy, with an AUC of 0.95, which indicates excellent performance in classification. The model not only excels in detecting more extreme stress conditions such as anxiety, but also shows reliable ability in classifying more difficult to distinguish conditions such as tense and relaxed. The conclusion of this study shows that the applied stacking ensemble approach significantly improves prediction accuracy and stability compared to traditional models. For future research, it is recommended to explore the use of deep learning-based meta-models such as LSTM and BiLSTM as well as rotation techniques in stacking to improve model performance and flexibility. The findings are expected to contribute significantly to the development of more sophisticated and effective stress detection models.
Comparison of MobileNet and VGG16 CNN Architectures for Web-based Starfish Species Identification System Latumakulita, Luther Alexander; Paat, Frangky J.; Saroyo, Saroyo; Karim, Irwan; Astawa, I Nyoman Gede Arya; Sirait, Hasanuddin
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.456

Abstract

Bunaken Marine Park (BMP) is famous for its rich marine biodiversity. BMP is an asset for the marine tourism industry of the Manado city government, and the North Sulawesi Province of Indonesia needs to be strengthened. This research aims to build a web-based intelligent system using a convolutional neural network (CNN) to identify starfish species to initiate developing a media center marine biota identification system of BMP. Two CNN architectures, namely MobileNet and VGG16, were conducted to produce identification models. The first stage carried out a training process on 1800 starfish image data and then evaluated using the 5-fold cross-validation technique. Validation results show that MobileNet is superior to the VGG16 architecture by achieving validation accuracy of 100% for each fold while VGG16 produces validation accuracy in the range of 94% to 100%. On the other hand, in the second stage of model testing, it was found that VGG16 worked better than MobileNet in identifying 200 new data. The Best Model produced by VGG16 achieved testing accuracy of 100% while MobileNet produced 99.5%. However, stability analysis of the identification models produced by both architectures shows that MobileNet has relatively small loss values ranging from 0.00069325 to 0.00214802 as well as smaller standard deviation values of 0.27 compared to 0.61 produced by VGG16. These findings indicate MobileNet is more stable in carrying out identification work compared to VGG16 of, thus the best model provided by MobileNet is taken to deploy in the web platform which is created using the Python flask framework. The proposed system can be used to strengthen the marine tourism industry as a media center of educational marine biota using deep learning approaches.
Enforcement of Community Activity Restrictions Level Prediction in Jakarta Using Long Short-Term Memory Network Dewangga, Chendra; Hansun, Seng
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.318

Abstract

The implementation of restrictions on community activities (Pemberlakuan Pembatasan Kegiatan Masyarakat – PPKM) is a strategy from the Indonesian government in handling the spread of COVID-19. PPKM is divided into four levels which will determine the restriction types that are to be implemented in a region. In this study, we aim to build a website that can predict PPKM levels through COVID-19 daily positive and death cases recorded in the Jakarta City, Indonesia. The prediction system uses the Long Short-Term Memory (LSTM) network and Node.JS as the backend of the website. We also introduced the usage of multivariate approach for this regression task by combining both daily positive and death cases number into the LSTM network. Based on the test scores obtained through evaluation using Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE), it was concluded that the proposed LSTM method could accurately predict the death cases with 0.17% MAPE and 22.68 RMSE but has poor performance in predicting the daily positive cases with 53.11% MAPE and 27.15 RMSE. This might be rooted from the use of multivariate approach during the model development where more variation to the daily positive cases was detected.
Optimizing LSTM with Grid Search and Regularization Techniques to Enhance Accuracy in Human Activity Recognition Budiarso, Zuly; Listiyono, Hersatoto; Karim, Abdul
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.433

Abstract

This study aims to enhance the accuracy of Long Short-Term Memory (LSTM) models for human activity recognition using the UCI Human Activity Recognition (HAR) dataset. The dataset comprises time-series data from accelerometer and gyroscope sensors on smartphones worn by 30 volunteers as they performed everyday activities such as walking, climbing stairs, descending stairs, sitting, standing, and lying down. Optimization was carried out using Grid Search for hyperparameter tuning and L2 regularization to prevent overfitting. The results show that the optimized LSTM model improved accuracy from 92.33% to 94.50%, precision from 93.12% to 94.61%, recall from 92.33% to 94.50%, and F1-score from 92.32% to 94.51% compared to the standard LSTM model. Despite these improvements, the study encountered several challenges, particularly in tuning hyperparameters, which required significant computational resources and time due to the complexity of the search space. Additionally, balancing regularization to prevent both underfitting and overfitting proved to be a delicate process. Further limitations include the model's performance variability with different sensor placements and potential overfitting to specific activity patterns. However, the implementation of hyperparameter optimization and regularization proved effective in improving the model's ability to recognize human activity patterns from complex sensor data. Therefore, this approach holds significant potential for broader applications in sensor-based human activity recognition systems, though further research is needed to address these limitations and generalize the findings.
How Effective are Different Machine Learning Algorithms in Predicting Legal Outcomes in South Africa? Khosa, Joe; Mashao, Daniel; Olanipekun, Ayorinde; Harley, Charis
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.215

Abstract

This study examines the effectiveness of different machine learning algorithms in predicting legal outcomes in South Africa's Judiciary system. Considering the advancement of artificial intelligence in the legal sector, this research aims to assess the effectiveness of various machine learning algorithms within the legal domain. Text classification is done using machine learning algorithms, including Logistic Regression, Random Forest, and K-Nearest Neighbours, with datasets obtained from a state legal firm in South Africa. The datasets undergo diligent data cleansing and pre-processing methods, encompassing tokenization and lemmatization techniques. This study evaluates these models' applications through accuracy metrics. The findings demonstrate that the Logistic Regression model attained an accuracy rate of 75.05%, whereas the Random Forest algorithm achieved an accuracy rate of 75.08%. On the other hand, the K-Nearest Neighbours algorithm exhibited no optimal performance, as evidenced by its accuracy rate of 62.76%. This study provides valuable insights for legal professionals by addressing a specific research question about the successful application of machine learning in South Africa's legal sector. The results indicate the possibility of using machine learning to predict the outcomes of criminal legal cases. Additionally, this study highlights the significance of responsibly and ethically implementing machine learning within the legal field. The results of this study enhance our comprehension of the prediction of legal outcomes, establishing a foundation for future investigations in this dynamic area of study. A limitation of this study is that the data was obtained from a single law firm in South Africa.
Ensembling Methods for Data Privacy in Data Science Mahendiran, N; Shivakumar, B L; Maidin, Siti Sarah; Wu, Hao
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.341

Abstract

The rapid advancement of technology has unified systems, data storage, applications, and operations, providing continuous services to organizations. However, this integration also introduces new vulnerabilities, particularly the risk of cyber-attacks. Malware and digital piracy pose significant threats to data security, with the potential to compromise sensitive information, leading to severe financial and reputational damage. This study aims to develop an effective method for detecting malware-infected files on storage devices within the Internet of Things (IoT) environment. The proposed approach utilizes a stacked regression ensemble for data pre-processing and the Sea Lion Optimization Algorithm (sLOA) to extract salient features, enhancing the classification process. Using malware data from an intrusion detection dataset, an ensemble classification technique is applied to identify malicious infections. The experimental results demonstrate that the proposed method achieved an accuracy of 98%, a precision of 99.6%, a recall of 96%, and an F-measure of 95% by the final iteration, significantly outperforming existing techniques in addressing cyber-security challenges within IoT systems.
Leveraging K-Nearest Neighbors with SMOTE and Boosting Techniques for Data Imbalance and Accuracy Improvement Lubis, Adyanata; Irawan, Yuda; Junadhi, Junadhi; Defit, Sarjon
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.343

Abstract

This research addresses the issue of low accuracy in sentiment analysis on Israeli products on social media, initially achieving only 64% using the K-NN algorithm. Given the ongoing Israeli-Palestinian conflict, which has garnered widespread international attention and strong opinions, understanding public sentiment towards Israeli products is crucial. To improve accuracy, the study employs SMOTE to handle data imbalance and combines K-NN with boosting algorithms like AdaBoost and XGBoost, which were selected for their effectiveness in improving model performance on imbalanced and complex datasets. AdaBoost was chosen for its ability to enhance model accuracy by focusing on misclassified instances, while XGBoost was selected for its efficiency and robustness in handling large datasets with multiple features. The research process includes data pre-processing (cleaning, normalization, tokenization, stopwords removal, and stemming), labeling using a Lexicon-Based approach, and feature extraction with CountVectorizer and TF-IDF. SMOTE was applied to oversample the minority class to match the number of instances in the majority class, ensuring balanced representation before model training. A total of 1,145 datasets were divided into training and testing data with a ratio of 70:30. Results demonstrate that SMOTE increased K-NN accuracy to 77%. Interestingly, combining K-NN with AdaBoost after SMOTE achieved 72% accuracy, which, although lower than the 77% achieved with SMOTE alone, was higher than the 68% accuracy without SMOTE. This discrepancy can be attributed to the added complexity introduced by AdaBoost, which may not synergize as effectively with SMOTE as XGBoost does, particularly in this dataset's context. In contrast, K-NN with XGBoost after SMOTE reached the highest accuracy of 88%, demonstrating a more effective combination. Boosting without SMOTE resulted in lower accuracies: 68% for KNN+AdaBoost and 64% for KNN+XGBoost. The combination of K-NN with SMOTE and XGBoost significantly improves model accuracy and reliability for sentiment analysis on social media.
Deep Learning Based Face Mask Detection System Using MobileNetV2 for Enhanced Health Protocol Compliance Fadly, Fadly; Kurniawan, Tri Basuki; Dewi, Deshinta Arrova; Zakaria, Mohd Zaki; Hisham, Putri Aisha Athira binti
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.476

Abstract

Personal protective equipment (PPE) is crucial in mitigating the spread of infections within the pharmacy industry, manufacturing sectors, and healthcare facilities. Airborne particles and contaminants can be released during the handling of pharmaceuticals, the operation of machinery, or patient care activities. These particles can be transmitted through close contact with an infected individual or by touching contaminated surfaces and then touching one's face (mouth, nose, or eyes). PPE, including face masks, plays a vital role in minimizing the risk of transmission of infectious diseases. Although mandates for wearing face masks might relax as situations improve and vaccination rates increase, staying prepared for potential future outbreaks and the resurgence of infectious diseases remains important. Therefore, an automated system for face mask detection is important for future use. This research proposes real-time face mask detection by identifying who is (i) not wearing a mask and (ii) wearing a mask. This research presents a deep-learning approach using a pre-trained model, MobileNet-V2. The model is trained on a 10,000 dataset of images of individuals with and without masks. The result shows that the pre-trained MobileNet-V2 model obtained a high accuracy of 98.69% on the testing dataset.
Diagnosing Cardiovascular Diseases using Optimized Machine Learning Algorithms with GridSearchCV Alemerien, Khalid; Alsarayreh, Saleel; Altarawneh, Enshirah
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.280

Abstract

Accurate and timely diseases diagnosis is the most important responsibility in the healthcare industry for protecting the people lives. Many lives can be spared from death if their cases diagnosed accurately and early. One of the dangerous diseases is cardiovascular disease (CVD), is the leading cause of death worldwide, making it one of the hardest conditions to diagnose. Globally, about 17.9 million of people are died because of the cardiovascular disease. In order to assist physicians in this mission, automated solutions based on machine learning and deep learning techniques are introduced. Therefore, machine learning algorithms can diagnose diseases quickly and accurately, which adds a huge value to the medical industry. This gives physicians and patients plenty of time. To address this issue, we utilized several supervised machine learning (ML) techniques with GridSearchCV optimizer. Using the optimization techniques can enhance the performance and accuracy of proposed ML-based models. Therefore, we conducted a comparative analysis study to identify the most efficient classification model using two benchmark real datasets from the online Kaggle repository. Seven popular machine learning techniques were utilized: Decision Tree (DT), Support Vector Machine (SVM), Logistic regression (LR), K-Nearest Neighbor (KNN), Random Forest (RF), XGBoost and Naïve Bayes (NB). The findings revealed that both Random Forest and XGBoost classifiers yields highest results in both of the datasets used in our study in terms of accuracy 95.38% and 98.54%, respectively. The rest of ML algorithms showed less performance in predicting the CVD in terms of accuracy, where DT and RF achieved an accuracy of 98.53% and 98.52%, respectively, on the first dataset. Furthermore, employing the proposed ML-based model in the diagnosing CVD process shows the expected implications for patients and physicians. In addition, it shows the impact of constructing a real comprehensive dataset to enhance the performance of proposed solutions.
Scalable Machine Learning Approaches for Real-Time Anomaly and Outlier Detection in Streaming Environments Dewi, Deshinta Arrova; Singh, Harprith Kaur Rajinder; Periasamy, Jeyarani; Kurniawan, Tri Basuki; Henderi, Henderi; Hasibuan, M. Said
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.444

Abstract

The prevalence of streaming data across various sectors poses significant challenges for real-time anomaly detection due to its volume, velocity, and variability. Traditional data processing methods often need to be improved for such dynamic environments, necessitating robust, scalable, and efficient real-time analysis systems. This study compares two advanced machine learning approaches—LSTM autoencoders and Matrix Profile algorithms—to identify the most effective method for anomaly detection in streaming environments using the NYC taxi dataset. Existing literature on anomaly detection in streaming data highlights various methodologies, including statistical tests, window-based techniques, and machine learning models. Traditional methods like the Generalized ESD test have been adapted for streaming data but often require a full historical dataset to function effectively. In contrast, machine learning approaches, particularly those using LSTM networks, are noted for their ability to learn complex patterns and dependencies, offering promising results in real-time applications. In a comparative analysis, LSTM autoencoders significantly outperformed other methods, achieving an F1-score of 0.22 for anomaly detection, notably higher than other techniques. This model demonstrated superior capability in capturing temporal dependencies and complex data patterns, making it highly effective for the dynamic and varied data in the NYC taxi dataset. The LSTM autoencoder's advanced pattern recognition and anomaly detection capabilities confirm its suitability for complex, high-velocity streaming data environments. Future research should explore the integration of LSTM autoencoders with other machine-learning techniques to enhance further the accuracy, scalability, and efficiency of anomaly detection systems. This study advances our understanding of scalable machine-learning approaches and underscores the critical importance of selecting appropriate models based on the specific characteristics and challenges of the data involved.