cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 518 Documents
Efficient Fruit Grading and Selection System Leveraging Computer Vision and Machine Learning Dewi, Deshinta Arrova; Kurniawan, Tri Basuki; Thinakaran, Rajermani; Batumalay, Malathy; Habib, Shabana; Islam, Muhammad
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.443

Abstract

Automated fruit grading is crucial to overcoming the time and accuracy challenges posed by manual methods, which are often limited by subjective human judgment. This study introduces an intelligent grading system leveraging computer vision and AI to improve speed and consistency in assessing fruit quality. Using high-resolution imaging and advanced feature extraction, including grayscale processing, binarization, and enhancement, the system achieves non-destructive, efficient sorting for fruits like apples, bananas, and oranges. Grayscale processing reduces image complexity while preserving essential details, binarization isolates the fruit from its background, and enhancement highlights critical features. Notably, the Edge Pixel method proved most effective, achieving 79.20% accuracy in grading, while the Grayscale Pixel method reached 93.94% accuracy for fruit types. Edge Pixel also achieved 80.32% in differentiating grading types, showcasing its ability to capture essential shapes and edges. Fruits are classified into four grades: Grade_01 (highest quality), Grade_02 (minor imperfections), Grade_03 (notable defects but consumable), and Grade_04 (unfit for consumption). A specialized dataset supports model training, ensuring practical real-world application. The study concludes that this automated system offers significant improvements over traditional grading, providing a scalable, objective, and reliable solution for the agricultural sector, ultimately enhancing productivity and quality assurance.
Identifying Key Factors Causing Flooding Using Machine Learning Gama, Adie Wahyudi Oktavia; Dennatan, Monalisa; Dharmayasa, I Gusti Ngurah Putu; Maw, Me Me; Sugiana, I Putu; Suryanti, Irma
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.463

Abstract

The impact of flooding extends beyond physical and infrastructural damage, affecting social, economic, and environmental dimensions. This study aims to identify the key factors influencing flooding by developing a decision tree model. The research method applies the C4.5 algorithm to build a decision tree model using flood factors such as rainfall, soil type, elevation, land use, and distance from rivers. The model is then applied to 57 past flood data events to determine key contributors to flooding in Denpasar City, Bali, Indonesia. The analysis showed that land elevation is the most influential factor, with areas below 28 meters above sea level having a 71% likelihood of being flood vulnerability. Additionally, the model reveals unknown patterns contributing to flood vulnerability among the factors considered. These insights give a deeper understanding of how these factors combine to affect flood vulnerability. The model's effectiveness was evaluated using a confusion matrix, resulting in an accuracy rate of 90%, a precision rate of 100%, a sensitivity rate of 90%, a specificity rate of 100%, and a F1 Score rate of 94%, demonstrating its strong predictive power in identifying areas at risk of flood vulnerability. Although this study is limited by the availability of data, the focus on Denpasar City, and the potential omission of other relevant attributes, it advances flood risk assessment by applying machine learning to provide practical insights that could enhance flood management strategies, with potential applications to other urban areas facing similar risks.
Applying Structural Equation Modelling for Examining the Impact of Quality Dimensions in Improving the Adoption of Digital-Learning Platforms Alkhdour, Tayseer; Almaiah, Mohammed Amin; Shishakly, Rima; AlAli, Rommel
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.518

Abstract

Although a number of studies have proved the significance of quality characteristics in improving Digital-learning platforms success, there is few research about the impact of quality dimensions in increasing system adoption and usage. As a result, our research investigated the impact of quality indicators such as Quality of Service, quality of learning content and information, and quality of system on Digital-learning platforms usability. Quality of Service, quality of learning content and information, and quality of system characteristics were determined to be the essential components impacting Digital-learning platforms adoption among learners. The study revealed that system quality was the most critical factor influencing the perceived ease of use and usefulness of Digital-learning platforms. Information quality also had a significant impact on both perceived ease of use and usefulness. Additionally, service quality affected these usability factors as well. The findings indicate that system quality significantly influenced usability factors, specifically perceived ease of use and perceived usefulness (H1: β = 0.321; H2: β = 0.366). Additionally, service quality is found to have a significant effect on both usability factors, ease of use and usefulness (H5: β = 0.371; H6: β = 0.366). Furthermore, the results are essential in determining the importance of those quality components that can be utilized by developers in institutions of higher education to enhance their Digital-learning platforms experiences.
Analyzing Audience Sentiments in Digital Comedy: A Study of YouTube Comments Using LSTM Models Supriyono, Supriyono; Wibawa, Aji Prasetya; Suyono, Suyono; Kurniawan, Fachrul
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.393

Abstract

The main objective of this paper is to analyze audience sentiment towards stand-up comedy content on the YouTube platform, specifically comments on stand-up comedy videos from Kompas TV, using the Long Short-Term Memory (LSTM) method. This research contributes significantly to a deeper understanding of how audiences engage with humorous content through a sentiment analysis approach that uses the LSTM model, which can capture complex nuances in humorous content, such as sarcasm, irony, and cultural references. The research methodology involves crawling data from YouTube, where user comments are extracted and processed through several stages of data cleaning, such as removing duplicate content, text normalization, and irrelevant comments. Once the data is prepared, the LSTM model is trained to analyze positive, negative, and neutral sentiments with varying accuracy rates of 85% for positive sentiment, 80% for negative sentiment, and 78% for neutral sentiment. The main results show that the LSTM model successfully classifies sentiments, although it needs help handling the more ambiguous neutral sentiments. The figures and tables included in this study illustrate the relationship between the number of views, likes, and the sentiment classification of the comments. One notable finding is a strong positive correlation between the number of views and video likes. The conclusions of this study underscore the need for model improvements to handle neutral sentiment better and capture the complexity of humor content. The implications of this research are useful for content creators and digital marketers in understanding and responding to audience preferences more effectively. They also pave the way for further research in sentiment analysis on more specific content genres on digital platforms.
Health and Socio-Demographic Risk Factors of Childhood Stunting: Assessing the Role of Factor Interactions Through the Development of an AI Predictive Model Hariguna, Taqwa; Sarmini, Sarmini; Azis, Abdul
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.612

Abstract

Stunting is a significant global health problem, especially in developing countries such as Indonesia. This study aims to develop and evaluate an artificial intelligence (AI)-based predictive model to identify the risk of stunting in children using the CatBoost algorithm which is a combination of Weighted Apriori and XGBoost. This model is designed to utilize the advantages of each algorithm in handling data with variable weights to improve prediction accuracy. Feature analysis shows that "Height (cm) Age (months)" are the main indicators in classifying children's nutritional status. Model evaluation shows high accuracy of 94.85%, precision of 95%, recall of 94.85%, and F1 Score of 94.84%. Kappa Coefficient and Matthews Correlation Coefficient (MCC) reached 93.13% and 93.19%, respectively, while ROC-AUC reached 99.70%. These findings indicate that the CatBoost model can provide highly accurate results in detecting the risk of stunting and offer in-depth insights into risk factors that can improve the effectiveness of health interventions. This study fills the gap in the literature by integrating the Weighted Apriori and XGBoost algorithms, providing a significant contribution to early detection of stunting and supporting government efforts to reduce the prevalence of stunting in Indonesia and other regions.
Optimizing Survival Prediction in Children Undergoing Hematopoietic Stem Cell Transplantation through Enhanced Chaotic Harris Hawk Deep Clustering Arthi, R.; Priscilla, G Maria; Maidin, Siti Sarah; Yang, Qingxue
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.468

Abstract

Cancer can impact individuals of all ages, including both children and adults. Diagnosing the pediatric cancer can be challenging due to its rarity. Typically, it is not recommended to screen for pediatric cancer as it may lead to potential harm to the children. One of the specialized treatments for pediatric cancer is Hematopoietic Stem Cell Transplant (HSCT). HSCT performs replacement of existing one’s blood cells with the donor’s bone marrow healthy cells. However, forecasting the survival rates following the pediatric HSCT is crucial and poses challenges in early detection. Many machine learning algorithms have been developed to predict the risk of transplant outcomes which depends on the type of disease or patient’s comorbidity. In this work, the enhancement of survival prediction for children who have undergone hematopoietic stem cell transplantation (HSCT) is achieved through the introduction of a deep learning model that is based on behavioral characteristics. The primary aim of this model is to identify and differentiate between the patterns of malignancy, non-malignancy, and hematopoietic conditions within the dataset of bone marrow transplant patients. The existing unsupervised machine learning algorithms, performs clustering of instances with the randomly selected centroids, which often results in local optima and early convergence affects the accuracy rate. Hence, the present approach introduces Chaotic mapping Harris Hawk Optimization (CHHO) in order to enhance the conventional k-means clustering procedure due to its significantly reduced computational complexity. To understand the pattern of the bone marrow transplant dataset, the deep clustering model with its ability of auto encoder and decoder, discriminates the labelled instanced. With the inferred knowledge proposed CHHO with Deep clustering Model (CHHO-DCM) performs the effective clustering of instances with the advantage of both local and global optimization. The simulation outcomes have substantiated the effectiveness of the suggested CHHO-DCM model as it attains the highest level of precision when compared to the prevailing clustering models in predicting the survival of pediatric patients during Hematopoietic Stem Cell Transplantation (HSCT).s enduring HSCT.
Sentiment Analysis of the Kampus Merdeka Program on Twitter Using Support Vector Machine and a Feature Extraction Comparison: TF-IDF vs. FastText Afuan, Lasmedi; Hidayat, Nurul; Nofiyati, Nofiyati; As'ad, Mohamad Faris
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.436

Abstract

The Kampus Merdeka program, launched by the Indonesian Ministry of Education, Culture, Research, and Technology in 2020, aims to enhance students' skills through hands-on work experience. Considering the rising significance of social media, particularly Twitter, in gauging public opinion, this research focuses on analyzing the sentiment towards the Kampus Merdeka program. The primary objective is to classify the sentiments expressed in tweets related to the program and compare two feature extraction techniques—TF-IDF and FastText—to identify the best approach for transforming text data into numerical vectors. The sentiment classification model was built using the Support Vector Machine (SVM) algorithm, a machine learning technique known for its accuracy in text classification. A total of 16,730 tweets were collected and analyzed, yielding an accuracy of 73% for FastText and 72% for TF-IDF. Results show that FastText is more effective in capturing semantic relationships, leading to higher accuracy in sentiment classification. Findings indicate that the public sentiment towards the Kampus Merdeka program is predominantly positive (60.7%), with negative and neutral sentiments at 33.5% and 5.8%, respectively. The success of the FastText method underscores the importance of advanced feature extraction techniques in text classification. The novelty of this research lies in its use of FastText for educational policy evaluation, providing a new perspective on using sentiment analysis to assess public perception of educational programs.
Assessing Novice Voter Resilience on Disinformation During Indonesia Elections 2024 with Naïve Bayes Classifier Hari, Yulius; Yanggah, Minny Elisa; Paramita, Adi Suryaputra
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.489

Abstract

With the rise of social media platforms, the spread of fake news has become a significant concern. During the 2024 presidential election is dominated with novice voters, who are exposed to a lot of news from social media. As first-time voters, they get a lot of information and news exposure mainly from social media. This is also exacerbated by the fact that influencers are used to lead opinions. This research tries to measure the resilience of novice voters in dealing with hoax news compared with Naïve Bayes classifier to assessing the news. The purpose of this research is so that novice voters aware and are not easily polarized to prevent national disintegration due to disinformation and hoax news. Subsequently, this research also tries to develop a database of content and categories for hoax news from beginner voter data with a classification model. Data collection was carried out offline and online with interviews and questionnaires conducted with a total of 283 respondents from two private universities in East Java and came from various study programs. From the data, a classification approach using the naïve Bayes method was also built to help recommend a category whether this news is a hoax or news that can be verified. From the results of this study, it can also be concluded that the classification model with Naïve Bayes has a very good accuracy of up to 90.303% capable of categorizing a news story whether it is a hoax, dubious news, or valid news. In contrast, this study shows that the average accuracy of first-time voters is only 29.68%, which means that they are very vulnerable to hoax news, due to the many perceptions and assumptions in public comments that make views biased.
Spam Feature Selection Using Firefly Metaheuristic Algorithm Abualhaj, Mosleh M; Hiari, Mohammad O; Alsaaidah, Adeeb; Al-Zyoud, Mahran; Al-Khatib, Sumaya
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.336

Abstract

This paper presents a novel method for improving spam detection by utilizing the Firefly Algorithm (FA) for feature selection. The FA, a bio-inspired metaheuristic optimization algorithm, is applied to identify the most relevant features from the ISCX-URL2016 dataset, which contains 72 features. By balancing exploration (searching for new solutions) and exploitation (focusing on the best solutions), FA is able to effectively reduce the feature space from 72 to 31 features. This reduction improves model efficiency without sacrificing performance, as only the most impactful features are retained for the classification task. The selected features were then used to train three machine learning classifiers: Decision Tree (DT), Gradient Boost Tree (GBT), and Naive Bayes (NB). Each classifier's performance was evaluated based on accuracy, with DT achieving the highest accuracy of 99.81%, GBT achieving 99.70%, and NB scoring 90.33%. The superior performance of the DT algorithm is attributed to its ability to handle non-linear relationships and high-dimensional data, making it particularly well-suited for the FA-selected features. This combination of FA for feature selection and DT for classification demonstrates significant improvements in spam detection performance, highlighting the importance of selecting the most relevant features. The results show that by reducing the dimensionality of the dataset, the FA algorithm not only accelerates the classification process but also enhances detection accuracy.
Data Visualization of Climate Patterns in Indonesia Using Python and Looker Studio Dashboard: A Visual Data Mining Approach Refianti, Rina; Mutiara, Achmad Benny; Ariyanto, Ananda Satria
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.420

Abstract

Climate has a significant impact on the lives of Indonesian people. Information about climate patterns, when presented visually and interactively, can greatly enhance understanding of climate conditions in Indonesia. This study aims to produce a visualization of climate pattern data in Indonesia that can be accessed online by the general public, serving as a valuable resource for climate information. The study highlights the ability to display historical trends for a 10-year period (2010-2020) through interactive visuals, which load information according to user-defined filters, enabling diverse presentations of data. The research employs the Visual Data Mining method, encompassing Project Planning, Data Preparation, and Data Analysis phases. Additionally, Exploratory Data Analysis techniques were utilized in the data analysis phase. The data was cleaned and processed using the Python programming language with libraries such as pandas, numpy, seaborn, and matplotlib. Visualizations were created using Looker Studio tools and published on a website, providing accessible climate pattern information in Indonesia via the Internet. The final results of this research indicate that the developed climate visualization dashboard successfully delivers detailed insights into sunlight duration, temperature, humidity, rainfall, and wind speed across various Indonesian regions. Users can effectively monitor climate trends and weather changes. The dashboard also demonstrates significant seasonal variations and differences in climate patterns between provinces. Performance metrics reveal that the dashboard meets Key Performance Indicators, achieving a click-through ratio of 40.1%, the average page position in search engines is 4.8 top positions, and receiving positive user experience scores. Further development and research on the Climate Pattern Dashboard in Indonesia still have room for enhancement. Important aspects include expanding data coverage to include multiple decades for observing significant climate patterns and applying sophisticated prediction methods like machine learning algorithms for future climate change projections.