cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 518 Documents
Fake vs Real Image Detection Using Deep Learning Algorithm Fatoni, Fatoni; Kurniawan, Tri Basuki; Dewi, Deshinta Arrova; Zakaria, Mohd Zaki; Muhayeddin, Abdul Muniif Mohd
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.490

Abstract

The purpose of this research project is to address the growing issues presented by modified visual information by developing a deep learning model for identifying between real and fake images. To enhance accuracy, this project evaluates the effectiveness of deep learning algorithms such as Residual Neural Network (ResNet), Visual Geometry Group 16 (VGG16), and Convolutional Neural Network (CNN) together with Error Level Analysis (ELA) as preprocessing the dataset. The CASIA dataset contains 7,492 real images and 5,124 fake images. The images included are from a wide range of random subjects, including buildings, fruits, animals, and more, providing a comprehensive dataset for model training and validation. This research examined models' effectiveness through experiments, measuring their training and validation accuracies. It comes out with the best accuracy of each model, which is for Convolutional Neural Network (CNN), 94% for training accuracy, and validation accuracy of 92%. For VGG16, with both training and validation accuracy reaching 94%. Lastly, Residual Neural Network (ResNet) demonstrated optimal performance with 95% training accuracy and 93% validation accuracy. This project also constructs a system prototype for practical applications, offering an interface for real-world testing. When integrating into the system prototype, only Residual Neural Network (ResNet) shows consistency and effectiveness when predicting both fake and real images, and this led to the decision to choose ResNet for integration into the system. Furthermore, the project identified several areas for improvement. Firstly, expanding the model comparison for discovering more successful algorithms. Next, improving the dataset preprocessing phase by incorporating filtering or denoising techniques. Lastly, refining the system prototype for greater appeal and user-friendliness has the potential to attract a larger audience.
A Comprehensive Stacking Ensemble Approach for Stress Level Classification in Higher Education Fonda, Hendry; Irawan, Yuda; Melyanti, Rika; Wahyuni, Refni; Muhaimin, Abdi
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.388

Abstract

This research focuses on developing a comprehensive ensemble stacking model for the classification of student stress levels in higher education environments, specifically at Hang Tuah University Pekanbaru. Using a physiological dataset that includes parameters such as SPO2, heart rate, body temperature, systolic, and diastolic pressure, this research categorizes the condition of college students into four main categories: anxious, calm, tense, and relaxed. The data taken from public health centers in the period 2021 to 2024 was processed using the SMOTE technique to overcome data imbalance and K-Fold Cross Validation for model validation. In model development, a combination of basic algorithms such as SVM, Logistic Regression, Multilayer Perceptron, and Random Forest is used which is enhanced by boosting techniques through ADABoost, and XGBoost as a meta model. The test results show that the proposed stacking model is able to achieve 95% accuracy, with an AUC of 0.95, which indicates excellent performance in classification. The model not only excels in detecting more extreme stress conditions such as anxiety, but also shows reliable ability in classifying more difficult to distinguish conditions such as tense and relaxed. The conclusion of this study shows that the applied stacking ensemble approach significantly improves prediction accuracy and stability compared to traditional models. For future research, it is recommended to explore the use of deep learning-based meta-models such as LSTM and BiLSTM as well as rotation techniques in stacking to improve model performance and flexibility. The findings are expected to contribute significantly to the development of more sophisticated and effective stress detection models.
Comparison of MobileNet and VGG16 CNN Architectures for Web-based Starfish Species Identification System Latumakulita, Luther Alexander; Paat, Frangky J.; Saroyo, Saroyo; Karim, Irwan; Astawa, I Nyoman Gede Arya; Sirait, Hasanuddin
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.456

Abstract

Bunaken Marine Park (BMP) is famous for its rich marine biodiversity. BMP is an asset for the marine tourism industry of the Manado city government, and the North Sulawesi Province of Indonesia needs to be strengthened. This research aims to build a web-based intelligent system using a convolutional neural network (CNN) to identify starfish species to initiate developing a media center marine biota identification system of BMP. Two CNN architectures, namely MobileNet and VGG16, were conducted to produce identification models. The first stage carried out a training process on 1800 starfish image data and then evaluated using the 5-fold cross-validation technique. Validation results show that MobileNet is superior to the VGG16 architecture by achieving validation accuracy of 100% for each fold while VGG16 produces validation accuracy in the range of 94% to 100%. On the other hand, in the second stage of model testing, it was found that VGG16 worked better than MobileNet in identifying 200 new data. The Best Model produced by VGG16 achieved testing accuracy of 100% while MobileNet produced 99.5%. However, stability analysis of the identification models produced by both architectures shows that MobileNet has relatively small loss values ranging from 0.00069325 to 0.00214802 as well as smaller standard deviation values of 0.27 compared to 0.61 produced by VGG16. These findings indicate MobileNet is more stable in carrying out identification work compared to VGG16 of, thus the best model provided by MobileNet is taken to deploy in the web platform which is created using the Python flask framework. The proposed system can be used to strengthen the marine tourism industry as a media center of educational marine biota using deep learning approaches.
Using Machine Learning Approach to Cluster Marine Environmental Features of Lesser Sunda Island Lusiana, Evellin Dewi; Astutik, Suci; Nurjannah, Nurjannah; Sambah, Abu Bakar
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.478

Abstract

Mapping marine ecosystems is acknowledged as a vital tool for implementing ecosystem services in practical situations. It provides a framework for effective marine spatial planning, enabling the designation of marine protected areas (MPAs) that consider ecological connectivity and habitat requirements. It also helps pinpoint areas of high biodiversity or ecological significance, allowing conservationists to prioritize these regions for protection and management. Numerous studies over decades have produced a vast amount of data that illustrates the features of the marine ecosystem. Therefore, the unsupervised learning is a promising technique to map marine ecosystem based on its environmental features. This study aims to compare unsupervised learning techniques to analyze marine environmental features in order to map marine ecosystem in Lesser Sunda waters. Eleven global environmental variables were accessed from global databases. The Lesser Sunda waters were delineated into groups according to their environmental characteristics using four unsupervised learning techniques: k-mean, fuzzy c-mean, self-organizing map (SOM), and density-based spatial clustering of applications with noise (DBSCAN). According to the findings, the Lesser Sunda waters can be divided into five to nine clusters, each with distinct environmental features. Moreover, the fuzzy c-mean method's clustering result outperformed the others based on the highest Silhouette (0.2204478) and Calinski-Harabasz (1741.099) Index. As an unsupervised learning technique, fuzzy c-mean clustering offered good performance in delineating Lesser Sunda Island marine waters with five clusters. The clustering results mostly consistent with existing conservation programs, even though there are several areas which needed international and multinational organization collaboration to effectively accomplish marine conservation objectives.
Enforcement of Community Activity Restrictions Level Prediction in Jakarta Using Long Short-Term Memory Network Dewangga, Chendra; Hansun, Seng
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.318

Abstract

The implementation of restrictions on community activities (Pemberlakuan Pembatasan Kegiatan Masyarakat – PPKM) is a strategy from the Indonesian government in handling the spread of COVID-19. PPKM is divided into four levels which will determine the restriction types that are to be implemented in a region. In this study, we aim to build a website that can predict PPKM levels through COVID-19 daily positive and death cases recorded in the Jakarta City, Indonesia. The prediction system uses the Long Short-Term Memory (LSTM) network and Node.JS as the backend of the website. We also introduced the usage of multivariate approach for this regression task by combining both daily positive and death cases number into the LSTM network. Based on the test scores obtained through evaluation using Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE), it was concluded that the proposed LSTM method could accurately predict the death cases with 0.17% MAPE and 22.68 RMSE but has poor performance in predicting the daily positive cases with 53.11% MAPE and 27.15 RMSE. This might be rooted from the use of multivariate approach during the model development where more variation to the daily positive cases was detected.
Optimizing LSTM with Grid Search and Regularization Techniques to Enhance Accuracy in Human Activity Recognition Budiarso, Zuly; Listiyono, Hersatoto; Karim, Abdul
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.433

Abstract

This study aims to enhance the accuracy of Long Short-Term Memory (LSTM) models for human activity recognition using the UCI Human Activity Recognition (HAR) dataset. The dataset comprises time-series data from accelerometer and gyroscope sensors on smartphones worn by 30 volunteers as they performed everyday activities such as walking, climbing stairs, descending stairs, sitting, standing, and lying down. Optimization was carried out using Grid Search for hyperparameter tuning and L2 regularization to prevent overfitting. The results show that the optimized LSTM model improved accuracy from 92.33% to 94.50%, precision from 93.12% to 94.61%, recall from 92.33% to 94.50%, and F1-score from 92.32% to 94.51% compared to the standard LSTM model. Despite these improvements, the study encountered several challenges, particularly in tuning hyperparameters, which required significant computational resources and time due to the complexity of the search space. Additionally, balancing regularization to prevent both underfitting and overfitting proved to be a delicate process. Further limitations include the model's performance variability with different sensor placements and potential overfitting to specific activity patterns. However, the implementation of hyperparameter optimization and regularization proved effective in improving the model's ability to recognize human activity patterns from complex sensor data. Therefore, this approach holds significant potential for broader applications in sensor-based human activity recognition systems, though further research is needed to address these limitations and generalize the findings.
Object-Level Sentiment Analysis Use a Language Model Le, Thuy Thi; Phan, Tuoi Thi
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.448

Abstract

Sentiment analysis remains a prominent area of research in the natural language processing (NLP) community and holds significant practical value in domains such as commerce and education. Most existing approaches evaluate sentiments for a single object or product, typically categorizing them as positive or negative. However, when a text involves comparisons between multiple objects, it can be challenging to identify which sentiment or emotion is associated with which object. Few studies have addressed this issue, often stopping at evaluating emotions at the sentence level or for individual words related to aspects or objects. This study proposes an object-level sentiment analysis problem that produces a set of pairs or triples consisting of an object, aspect, and sentiment. Additionally, in texts expressing opinions or comments on a specific aspect, the aspect may be implied through references to the object without being explicitly mentioned. Identifying such implicit aspects is crucial, as it ensures no loss of information and enhances the efficiency of extraction of information in object-level sentiment analysis. The integration of implicit aspect identification and object-level sentiment analysis is the primary focus of this research. In recent years, many language models have been developed and effectively applied to various NLP tasks. Therefore, to address the proposed challenges, this study utilizes deep learning that incorporates language models combined with NLP methods such as parsing and dependency analysis, to achieve the desired output. Using language model and NLP techniques automatically generate training data for the learning model. The proposed method achieves an accuracy of 90%, making a substantial contribution to the field of NLP.
Fuzzy TOPSIS-Based Group Decision Model for Selecting IT Employees Vania, Abigail; Utama, Ditdit Nugeraha
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.511

Abstract

In the era of digitalization, the demand for competent IT employees is growing rapidly. However, the IT employee selection process often faces various challenges, such as biased selection criteria, many applicants, and difficulty in objective assessment. These challenges can lead to inaccurate selection decisions and have a negative impact on company performance. This research aims to develop a Group Decision Support Model (GDSM) for IT Employee Selection using the Fuzzy TOPSIS method to enhance objectivity and reliability in decision-making. This GDSM combines assessments from HRD and User IT groups by considering the weight of each criterion. The proposed model overcomes bias, uncertainty, and subjectivity in judgments from both groups. The GDSM is constructed with 8 parameters/sub-criteria (2 criteria) from the HRD group and 12 parameters (5 criteria) from the User IT group from interviews and research. Thus, the total is 20 assessment parameters, consisting of coding test, education, certification, computer literacy, openness to experience, conscientiousness, extroversion, agreeableness, neuroticism, verbal, numerical, ability to learn, appearance attitude, work experience, communication skills, time management, job knowledge, motivation to apply, decision making, and service orientation. The methodology involves determining parameters, weights, fuzzification and this GDSM was tested through a limited simulation of IT employee selection using 11 respondents from Computer Science students for evaluation of the model. The result of this model is a ranking of the candidates. The best candidate is Cand. 8, with a closeness coefficient (CC) value of 0.896. The worst candidate is Cand. 3, with CC 0.241. The model is acceptable because it has no difference value between coding and manual for all candidates. This study contributes to increasing objectivity in IT employee selection and offers an implementation model for companies that want to improve the effectiveness of the recruitment process.
How Effective are Different Machine Learning Algorithms in Predicting Legal Outcomes in South Africa? Khosa, Joe; Mashao, Daniel; Olanipekun, Ayorinde; Harley, Charis
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.215

Abstract

This study examines the effectiveness of different machine learning algorithms in predicting legal outcomes in South Africa's Judiciary system. Considering the advancement of artificial intelligence in the legal sector, this research aims to assess the effectiveness of various machine learning algorithms within the legal domain. Text classification is done using machine learning algorithms, including Logistic Regression, Random Forest, and K-Nearest Neighbours, with datasets obtained from a state legal firm in South Africa. The datasets undergo diligent data cleansing and pre-processing methods, encompassing tokenization and lemmatization techniques. This study evaluates these models' applications through accuracy metrics. The findings demonstrate that the Logistic Regression model attained an accuracy rate of 75.05%, whereas the Random Forest algorithm achieved an accuracy rate of 75.08%. On the other hand, the K-Nearest Neighbours algorithm exhibited no optimal performance, as evidenced by its accuracy rate of 62.76%. This study provides valuable insights for legal professionals by addressing a specific research question about the successful application of machine learning in South Africa's legal sector. The results indicate the possibility of using machine learning to predict the outcomes of criminal legal cases. Additionally, this study highlights the significance of responsibly and ethically implementing machine learning within the legal field. The results of this study enhance our comprehension of the prediction of legal outcomes, establishing a foundation for future investigations in this dynamic area of study. A limitation of this study is that the data was obtained from a single law firm in South Africa.
Improving Classification Accuracy of Local Coconut Fruits with Image Augmentation and Deep Learning Algorithm Convolutional Neural Networks (CNN) Usman, Usman; Yunita, Fitri; Ridha, Muh. Rasyid
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.389

Abstract

Local coconut varieties must be classified to maintain the quality and genetic diversity of coconuts as the main commodity in Indonesia's largest coconut-producing region. This study introduces a deep learning module for improved classification of coconuts, using color jitter as part of a data augmentation strategy to supplement the existing dataset and utilizing well-known CNN-based models like VGG16 for image analysis, with a focus on the needs of future research. The goal is to improve the classification accuracy of local coconut varieties through deep learning. We investigate both data augmentations and EDA, and we use VGG-16-based CNN models to enhance the classification performance. We used a confusion matrix for the model evaluation, containing metrics like accuracy, precision, recall, and f1-score. Results reveal that a color jitter augmentation model attained a training accuracy of 99.12%, testing accuracy of 97.33%, and validation accuracy of 97.33%. Model exploration using VGG16, on the other hand, improved all three: training accuracy—99.87%, testing accuracy—98.77%, and validation accuracy—98.97% average F1-score: 99%. Our research contributes massively to providing the best automatic classification method that will benefit and help farmers shorten their jobs while promoting economic growth in trading effectively across Indonesian regions. Its novelty is in combining image augmentation and CNNs, concerning the VGG16 model, showing better.