cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 518 Documents
Polarization of Religious Issues in Indonesia’s Social Media Society and Its Impact on Social Conflict Faizin, Barzan; Fitri, Susanti Ainul; AS, Enjang; Maylawati, Dian Sa'adillah; Rizqullah, Naufal; Ramdhani, Muhammad Ali
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.447

Abstract

In this new era, people use social media to share information and discuss political, social, and religious issues, leading to pros and cons arguments. In Twitter’s hashtags and tweets, religious issues frequently trigger a hot conversation that causes disputes among citizens and even street movements. This study is intended to reveal the religious issues that often trigger polarization among Twitter users and how they influence horizontal conflict in society as what happened during the election period in 2019. This research applied mixed methods with social media analytics to conceal religious issues in Indonesia's social media society. The data collection was done by crawling data from the Indonesian Twitter users’ tweets regarding religious issues hashtags, which is a reference for further analysis. The research findings show that the top eight religious issues widely discussed based on 23,433 Twitter users’ tweets are the hashtags (#) salafi, wahabi, intoleransi (intolerance), taliban, anti-Pancasila, politisasi agama (politicization of religion), politik identitas (identity politics), and radikalisme (radicalism). In social conversation networks, these issues are related to each other and other issues such as political figures, the three presidential candidates, the general election, and the Republic of Indonesia presidential election in 2024. Concerning these issues, Twitter users believe that the issues, positive or negative, do not influence their religious and political stance. However, to a certain extent, they believe that religious issues impact social discourses regarding horizontal conflicts. 44% opinions prove this indicated that the debate over religious matters had little influence on their opinion of these topics, and 64.5% agreed that religious concerns can cause social strife. Finally, it is hoped that further studies will elaborate on how religious issues on Twitter and other social media directly impact social harmony in everyday life.
Ensembling Methods for Data Privacy in Data Science Mahendiran, N; Shivakumar, B L; Maidin, Siti Sarah; Wu, Hao
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.341

Abstract

The rapid advancement of technology has unified systems, data storage, applications, and operations, providing continuous services to organizations. However, this integration also introduces new vulnerabilities, particularly the risk of cyber-attacks. Malware and digital piracy pose significant threats to data security, with the potential to compromise sensitive information, leading to severe financial and reputational damage. This study aims to develop an effective method for detecting malware-infected files on storage devices within the Internet of Things (IoT) environment. The proposed approach utilizes a stacked regression ensemble for data pre-processing and the Sea Lion Optimization Algorithm (sLOA) to extract salient features, enhancing the classification process. Using malware data from an intrusion detection dataset, an ensemble classification technique is applied to identify malicious infections. The experimental results demonstrate that the proposed method achieved an accuracy of 98%, a precision of 99.6%, a recall of 96%, and an F-measure of 95% by the final iteration, significantly outperforming existing techniques in addressing cyber-security challenges within IoT systems.
Current and Future Trends for Sustainable Software Development: Software Security in Agile and Hybrid Agile through Bibliometric Analysis Maidin, Siti Sarah; Yahya, Norzariyah; Fauzi, Muhammad Ashraf bin Fauri; Bakar, Normi Sham Awang Abu
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.473

Abstract

The industrial growth of digitalized era has given rise to a growing concern in software development. The present research investigates the prevailing and projected patterns in sustainable software development, especially those related to process innovation, with a particular emphasis on software security within Agile and Hybrid Agile approaches, employing bibliometric analysis. However, a comprehensive understanding of the security concerns of both agile and hybrid agile is crucial and needs further garnered. However, it is expected that a thorough comprehension of the hybrid agile model landscape would uncover various themes encompassing its implementation. The analysis aims to provide a comprehensive overview of the current, present, and future state of software security for agile and hybrid agile. The study employed a bibliometric approach to gather a total of 1593 journals from the Web of Science (WOS) database. This study utilizes co-citation and co-word analysis techniques to identify the most significant articles, delineate the fundamentals framework, and provide a prognosis for future development. The present investigation has successfully discovered four distinct co-citation and three distinct co-word clusters. This study offers valuable insights regarding the software security in agile and hybrid agile. The increasing evolution of the software ecosystem necessitates the prioritization of bridging the gap between agility and security. This paper provides a detailed roadmap for scholars and practitioners who are navigating this intersection
Leveraging K-Nearest Neighbors with SMOTE and Boosting Techniques for Data Imbalance and Accuracy Improvement Lubis, Adyanata; Irawan, Yuda; Junadhi, Junadhi; Defit, Sarjon
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.343

Abstract

This research addresses the issue of low accuracy in sentiment analysis on Israeli products on social media, initially achieving only 64% using the K-NN algorithm. Given the ongoing Israeli-Palestinian conflict, which has garnered widespread international attention and strong opinions, understanding public sentiment towards Israeli products is crucial. To improve accuracy, the study employs SMOTE to handle data imbalance and combines K-NN with boosting algorithms like AdaBoost and XGBoost, which were selected for their effectiveness in improving model performance on imbalanced and complex datasets. AdaBoost was chosen for its ability to enhance model accuracy by focusing on misclassified instances, while XGBoost was selected for its efficiency and robustness in handling large datasets with multiple features. The research process includes data pre-processing (cleaning, normalization, tokenization, stopwords removal, and stemming), labeling using a Lexicon-Based approach, and feature extraction with CountVectorizer and TF-IDF. SMOTE was applied to oversample the minority class to match the number of instances in the majority class, ensuring balanced representation before model training. A total of 1,145 datasets were divided into training and testing data with a ratio of 70:30. Results demonstrate that SMOTE increased K-NN accuracy to 77%. Interestingly, combining K-NN with AdaBoost after SMOTE achieved 72% accuracy, which, although lower than the 77% achieved with SMOTE alone, was higher than the 68% accuracy without SMOTE. This discrepancy can be attributed to the added complexity introduced by AdaBoost, which may not synergize as effectively with SMOTE as XGBoost does, particularly in this dataset's context. In contrast, K-NN with XGBoost after SMOTE reached the highest accuracy of 88%, demonstrating a more effective combination. Boosting without SMOTE resulted in lower accuracies: 68% for KNN+AdaBoost and 64% for KNN+XGBoost. The combination of K-NN with SMOTE and XGBoost significantly improves model accuracy and reliability for sentiment analysis on social media.
Deep Learning Based Face Mask Detection System Using MobileNetV2 for Enhanced Health Protocol Compliance Fadly, Fadly; Kurniawan, Tri Basuki; Dewi, Deshinta Arrova; Zakaria, Mohd Zaki; Hisham, Putri Aisha Athira binti
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.476

Abstract

Personal protective equipment (PPE) is crucial in mitigating the spread of infections within the pharmacy industry, manufacturing sectors, and healthcare facilities. Airborne particles and contaminants can be released during the handling of pharmaceuticals, the operation of machinery, or patient care activities. These particles can be transmitted through close contact with an infected individual or by touching contaminated surfaces and then touching one's face (mouth, nose, or eyes). PPE, including face masks, plays a vital role in minimizing the risk of transmission of infectious diseases. Although mandates for wearing face masks might relax as situations improve and vaccination rates increase, staying prepared for potential future outbreaks and the resurgence of infectious diseases remains important. Therefore, an automated system for face mask detection is important for future use. This research proposes real-time face mask detection by identifying who is (i) not wearing a mask and (ii) wearing a mask. This research presents a deep-learning approach using a pre-trained model, MobileNet-V2. The model is trained on a 10,000 dataset of images of individuals with and without masks. The result shows that the pre-trained MobileNet-V2 model obtained a high accuracy of 98.69% on the testing dataset.
Gamification Effect of Team Games Tournament in Game-Based Learning on Student Motivation Wijaya, Anugerah Bagus; Nida, Faridatun; Zettira, Salsa Billa Zulmi; Suliswaningsih, Suliswaningsih; Afiana, Fiby Nur; Rifai, Zanuar
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.450

Abstract

This study examines the impact of gamification elements, specifically the duration of use and level of collaboration, on student motivation in online learning environments. Using the Team Games Tournament model, which combines elements of both competition and collaboration, a web-based Game-Based Learning application was developed to enhance student motivation. The study employed a motivation survey based on the model Attention, Relevance, Confidence, Satisfaction, which was administered to participants before and after using the application. In addition to the survey, interaction data, such as the duration of application use, frequency of participation, points earned, and the level of collaboration, were collected to assess the relationship between these factors and student motivation. The study involved 20 fifth-semester students (12 male, 8 female) enrolled in a digital games course, many of whom had prior gaming experience, which could influence their response to the gamified learning experience. The data collected was analyzed using Decision Tree algorithms, Pearson correlation, and simple linear regression to understand the impact of various gamification elements on motivation. The results showed that both the duration of application use and the level of collaboration were significant factors in increasing student motivation. Specifically, motivation increased by an average of 0.72 points for every 10 minutes of application use, as measured by the difference between pre-test and post-test survey scores. These findings underscore the importance of balancing competitive and collaborative elements within game-based learning environments. By incorporating features that promote collaboration and encouraging sustained application use, educators can significantly enhance student engagement and motivation. The study provides valuable insights for the development of future game-based learning applications, highlighting the need for optimal design in terms of collaboration and duration to create an effective and engaging digital learning experience.
Diagnosing Cardiovascular Diseases using Optimized Machine Learning Algorithms with GridSearchCV Alemerien, Khalid; Alsarayreh, Saleel; Altarawneh, Enshirah
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.280

Abstract

Accurate and timely diseases diagnosis is the most important responsibility in the healthcare industry for protecting the people lives. Many lives can be spared from death if their cases diagnosed accurately and early. One of the dangerous diseases is cardiovascular disease (CVD), is the leading cause of death worldwide, making it one of the hardest conditions to diagnose. Globally, about 17.9 million of people are died because of the cardiovascular disease. In order to assist physicians in this mission, automated solutions based on machine learning and deep learning techniques are introduced. Therefore, machine learning algorithms can diagnose diseases quickly and accurately, which adds a huge value to the medical industry. This gives physicians and patients plenty of time. To address this issue, we utilized several supervised machine learning (ML) techniques with GridSearchCV optimizer. Using the optimization techniques can enhance the performance and accuracy of proposed ML-based models. Therefore, we conducted a comparative analysis study to identify the most efficient classification model using two benchmark real datasets from the online Kaggle repository. Seven popular machine learning techniques were utilized: Decision Tree (DT), Support Vector Machine (SVM), Logistic regression (LR), K-Nearest Neighbor (KNN), Random Forest (RF), XGBoost and Naïve Bayes (NB). The findings revealed that both Random Forest and XGBoost classifiers yields highest results in both of the datasets used in our study in terms of accuracy 95.38% and 98.54%, respectively. The rest of ML algorithms showed less performance in predicting the CVD in terms of accuracy, where DT and RF achieved an accuracy of 98.53% and 98.52%, respectively, on the first dataset. Furthermore, employing the proposed ML-based model in the diagnosing CVD process shows the expected implications for patients and physicians. In addition, it shows the impact of constructing a real comprehensive dataset to enhance the performance of proposed solutions.
Scalable Machine Learning Approaches for Real-Time Anomaly and Outlier Detection in Streaming Environments Dewi, Deshinta Arrova; Singh, Harprith Kaur Rajinder; Periasamy, Jeyarani; Kurniawan, Tri Basuki; Henderi, Henderi; Hasibuan, M. Said
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.444

Abstract

The prevalence of streaming data across various sectors poses significant challenges for real-time anomaly detection due to its volume, velocity, and variability. Traditional data processing methods often need to be improved for such dynamic environments, necessitating robust, scalable, and efficient real-time analysis systems. This study compares two advanced machine learning approaches—LSTM autoencoders and Matrix Profile algorithms—to identify the most effective method for anomaly detection in streaming environments using the NYC taxi dataset. Existing literature on anomaly detection in streaming data highlights various methodologies, including statistical tests, window-based techniques, and machine learning models. Traditional methods like the Generalized ESD test have been adapted for streaming data but often require a full historical dataset to function effectively. In contrast, machine learning approaches, particularly those using LSTM networks, are noted for their ability to learn complex patterns and dependencies, offering promising results in real-time applications. In a comparative analysis, LSTM autoencoders significantly outperformed other methods, achieving an F1-score of 0.22 for anomaly detection, notably higher than other techniques. This model demonstrated superior capability in capturing temporal dependencies and complex data patterns, making it highly effective for the dynamic and varied data in the NYC taxi dataset. The LSTM autoencoder's advanced pattern recognition and anomaly detection capabilities confirm its suitability for complex, high-velocity streaming data environments. Future research should explore the integration of LSTM autoencoders with other machine-learning techniques to enhance further the accuracy, scalability, and efficiency of anomaly detection systems. This study advances our understanding of scalable machine-learning approaches and underscores the critical importance of selecting appropriate models based on the specific characteristics and challenges of the data involved.
The Development of Stacking Techniques in Machine Learning for Breast Cancer Detection Van FC, Lucky Lhaura; Anam, M. Khairul; Bukhori, Saiful; Mahamad, Abd Kadir; Saon, Sharifah; Nyoto, Rebecca La Volla
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.416

Abstract

This study addresses the challenges of accurately detecting breast cancer using machine learning (ML) models, particularly when handling imbalanced datasets that often cause model bias toward the majority class. To tackle this, the Synthetic Minority Over-sampling Technique (SMOTE) was applied not only to balance the class distribution but also to improve the model's sensitivity in detecting malignant tumors, which are underrepresented in the dataset. SMOTE was effective in generating synthetic samples for the minority class without introducing overfitting, enhancing the model's generalization on unseen data. Additionally, AdaBoost was employed as the meta model in the stacking framework, chosen for its ability to focus on misclassified instances during training, thereby boosting the overall performance of the combined base models. The study evaluates several models and combinations, with K-Nearest Neighbors (KNN) + SMOTE achieving an accuracy of 97%, precision, recall, and F1-score of 97%. Similarly, C4.5 + Hyperparameter Tuning + SMOTE reached 95% in all metrics. The stacking model with Logistic Regression (LR) as the meta model and SMOTE achieved a strong performance with 97% accuracy, precision, recall, and F1-score all at 97%. The best result was obtained using the combination of Stacking AdaBoost + Hyperparameter Tuning + SMOTE, reaching an accuracy of 98%. These findings highlight the effectiveness of combining SMOTE with stacking techniques to develop robust predictive models for medical applications. The novelty of this study lies in the integration of SMOTE and advanced stacking methods, particularly using AdaBoost and Logistic Regression, to address the issue of class imbalance in medical datasets. Future work will explore deploying this model in clinical settings for accurate and timely breast cancer detection.
Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost Maharina, Maharina; Paryono, Tukino; Fauzi, Ahmad; Indra, Jamaludin; Sihabudin, Sihabudin; Harahap, Muhammad Khoiruddin; Rizki, Lutfi Trisandi
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.503

Abstract

This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in Jakarta remains challenging due to the city’s highly variable weather patterns, including fluctuations in rainfall, humidity, temperature, and wind characteristics. Existing methods often struggle with these complexities, as they rely on traditional ML models such as K-Nearest Neighbors (KNN), which may not capture certain patterns or provide high accuracy and robustness. Therefore, this study proposes three ML methods—Logistic Regression (LR), LightGBM, and XGBoost—to predict floods accurately. Five performance metrics (i.e., accuracy, area under the curve (AUC), precision, recall, and F1-score) were used to measure and compare the accuracy of the algorithms. The proposed method consists of three main processes. The first process involves data preprocessing and evaluation using 14 different ML models. In the second process, additional feature engineering is applied to improve the quality of the data. Finally, the third process combines the previous steps with oversampling techniques and cross-validation methods. This structured approach aims to enhance the overall performance of the analysis. The experimental results show that Process 3 significantly improves performance compared to Processes 1 and 2. The model predicts floods with an accuracy score of 93.82% for LR, 96.67% for XGBoost, and 96.81% for LightGBM, respectively. Thus, the proposed model offers a solution for operational decision-making in flood risk management, including flood mitigation planning.