cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 518 Documents
Student Engagement in E-Learning During Crisis: An Unsupervised Machine Learning and Exploratory Data Analysis Approach Daoud, Rachid Ait; Amine, Abdellah; Abouelmehdi, Karim; Razouk, Ayoub
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.458

Abstract

The lockdown caused by COVID-19 has forced educational institutions to rapidly adopt e-learning, which has revealed many significant challenges related to student engagement. Following the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology, the present work aims to provide teachers and university administrators with a framework based on unsupervised machine learning and exploratory data analysis to identify engagement levels and understand the potential reasons for low engagement. Various data sources, including Microsoft Teams logs, demographic, and educational data, were merged to create a comprehensive dataset with the most relevant and useful measures for the success of our approach. This study was structured around three main research questions to achieve our goal. First, we sought to identify the most effective Microsoft Teams measures for identifying students' engagement levels. Then, our analysis focused on comparing different clustering models (two-level, three-level, and four-level models) to determine which one is most accurate in identifying low-engaged students. Finally, we examined the demographic and educational factors influencing low student engagement. The results revealed that: by applying the Sequential Forward Selection (SFS) technique, ScreenShareTime, VideoTime, NbrViewedVideos, Recency, and AvgTeamsSessionDay are the most relevant Microsoft Teams engagement metrics, improving the silhouette width from 0.37 to 0.70 when using these selected measurements. The four-level clustering model (Low, Medium, High, and Super) proved most effective in identifying low-engaged students. Analysis of factors showed that low engagement is primarily related to limited living conditions, with 66% of low-engaged students having low incomes. In addition, 50% do not use online services and 62% of low-engaged students took more than three years to reach their final year, indicating pre-existing academic difficulties. These findings provide educational institutions with valuable insights to enhance student engagement in distance learning, particularly during crisis periods such as the COVID-19 pandemic.
ARP Spoofing Attack Detection Model in IoT Network using Machine Learning: Complexity vs. Accuracy Alsaaidah, Adeeb; Almomani, Omar; Abu-Shareha, Ahmad Adel; Abualhaj, Mosleh M; Achuthan, Anusha
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.374

Abstract

Spoofing attacks targeting the address resolution protocol, or the so-called ARP, are common cyber-attacks in IoT environments. In such an attack, the attacker sends a fake message over a local area network to spoof the users and interfere with the communication transferred from and into these users. As such, to detect such attacks, there is a need to check the network gateways and routers continuously to capture and analyze the transmitted traffic. However, there are three major problems with such traffic data: 1) there are substantial irrelevant data to the ARP attacks, 2) there are massive patterns in the way by which the spoof can be implemented, and 3) there is a need for fast processing of such data to reduce any delay resulting from the processing stage. Accordingly, this paper proposes a detection approach using supervised machine learning algorithms. The focus of this paper is to show the tradeoff between speed and accuracy to offer various solutions based on the demanded quality. Various algorithms were tested to find a solution that balanced time requirements and accuracy. As such, the results using all features and with various feature selection techniques were reported. Besides, the results using simple classifiers and ensemble learning algorithms were also reported. The proposed approach is evaluated on an IoT network intrusion dataset (IoTID20) collected from different IoT devices. The results showed that the highest accuracy is obtained using the RF classifier with a subset of features produced by the wrapper technique. In such a case, the accuracy obtained was 99.74%, with running time equal to 305 milliseconds. However, If time is more critical for a given application, then DT can be used with the whole feature set. In such a case, the accuracy was 99.41%, with running time equal to 11  milliseconds.
Analysis of Seismic Data in Sumatra using Robust K-Means Clustering Rafflesia, Ulfasari; Rosadi, Dedi; Sari, Devni Prima; Novianti, Pepi
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.523

Abstract

Indonesia is located within the Pacific Ring of Fire and frequently experiences significant seismic activities, rendering the region susceptible to hazards. Specifically, Sumatra is an island in the western part of the country, near the Eurasian and Indo-Australian tectonic plates. Over the past five years, an observable uptick in seismic events has been recorded in Sumatra. This research aimed to cluster the Sumatra region’s seismic data using the k-means algorithm and its extensions, including trimmed and robust sparse k-means, to determine the characteristics and patterns of seismic events. The k-means clustering algorithm operates effectively on many data but needs to work better in the presence of outliers. Meanwhile, the data identification reports the presence of outliers in the seismic data. The clustering analysis identified two main clusters, supported by multivariate and spatial outlier detection during preprocessing. The first cluster, encompassing 62% of seismic events, is located offshore near the Mentawai seismic gap, characterized by shallow depths (33–41 km) and magnitudes of 4.5–5.0 Ms. The second cluster, representing 28% of events, includes both mainland and offshore regions, associated with the Sumatran Fault system and slab deformation zones, at moderate depths (54–154 km) with magnitudes of 4.3–4.4 Ms. Rare deep-focus events exceeding depths of 214 km were identified as outliers. Evaluation using Silhouette, Davies-Bouldin, and Dunn indices determined that k=2 was the optimal number of clusters. This study contributes by integrating robust clustering methods to handle outliers, enhancing the reliability of seismic data analysis. This study demonstrates the value of applying trimmed and robust sparse k-means algorithms to improve clustering performance in regions with complex tectonic activity.
The Efficacy of Online Gamification in Improving Basic English Skills for Fourth-Grade Students Pasawano, Tiamyod; Sangsawang, Thosporn
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.410

Abstract

The study aimed to achieve three main objectives: 1) to develop an online gamification system using digital learning platforms for teaching English to Grade 4 students, following the E1/E2 = 80/80 efficiency criterion, 2) to compare students' achievement in Basic English through online gamification, and 3) to assess students' satisfaction with the use of online gamification in learning Basic English. The sample comprised 30 Grade 4 students from Settabutr Upathum School in the academic year 2022, selected through purposive random sampling. Research instruments included online Zoom classes, lesson plans, and interactive learning platforms. The study employed mean, standard deviation, and t-tests for dependent samples for data analysis. The results revealed an efficiency value of E1/E2 as 70.00/69.00, falling short of the 80/80 criteria. Several factors, such as the comprehensive nature of testing macro skills using digital media beyond cognitive abilities, may have contributed to not meeting the set criterion. Furthermore, a significant improvement in learning achievements in Basic English was observed among Grade 4 students who used online gamification compared to traditional methods, with higher scores in achievement tests at a significance level of 0.05. Finally, students expressed a good level of satisfaction with the online gamification approach in learning Basic English.
Analyzing the Impact of Publicity and e-WOM on Indonesian Tourists’ Visit Intention to Seoul through Destination Awareness and Preference: A Structural Equation Modeling Approach Herstanti, Ghassani; Suhud, Usep; Handaru, Agung Wahyu
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.531

Abstract

This research explores the role of digital publicity and electronic word of mouth (E-WOM) in shaping Indonesian tourists' intentions to visit Seoul. By integrating destination awareness and preference as mediating variables, the study provides a holistic view of how publicity and EWOM interact to influence visit intentions. Digital publicity, including news coverage and promotional content on social media, raises initial awareness of Seoul by highlighting its attractions, culture, and experiences. E-WOM, expressed through online reviews, travel blogs, and social media shares, further enhances the perception of Seoul by providing authentic, peer-driven narratives. These user-generated insights are particularly impactful, as they foster trust and add an emotional dimension to tourists’ perception of the destination. Using a structural equation modeling approach, the study analyzes survey responses from Indonesian tourists to validate six core hypotheses, examining the direct and indirect effects of publicity and E-WOM on destination awareness, preference, and visit intention. Results indicate that both digital publicity and E-WOM significantly contribute to tourists' awareness and preference for Seoul, with preference being a particularly strong predictor of visit intention. The findings underscore the importance of aligning digital publicity efforts with targeted E-WOM strategies, enabling tourism marketers to build both cognitive awareness and emotional appeal, which ultimately drive visit intention. These insights are valuable for tourism stakeholders aiming to enhance destination marketing strategies, as they suggest that a combined approach—leveraging both structured publicity and organic E-WOM—can effectively increase a destination’s appeal. By focusing on creating authentic, accessible content and fostering positive online word of mouth, tourism authorities can better attract tourists and establish Seoul as a top choice for Indonesian travelers.
Searching Sahih Hadiths Based on Queries using Neural Models and FastText Susanti, Sari; Najiyah, Ina; Ramdhani, Yudi; Herliana, Asti; Muckti, Masaldi Kharisma; Oktaviani, Fani Rahma
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.467

Abstract

Hadith is the second source of Islamic law after the Qur’an, and the availability of accurate and easily accessible information about hadith is crucial, as it directly affects a person’s belief (aqidah). This highlights the importance of having hadith collections as essential guidance in everyday life. Today, digital versions of hadiths are available in various applications, e-books, and websites. However, users often complain that these sources are incomplete and do not contain the entire collection of the Prophet's hadiths from al-Kutub as-Sittah. Additionally, the complex presentation of these digital resources makes it difficult to find relevant hadiths efficiently. This study aims to improve access to accurate and relevant hadith information, focusing specifically on al-Kutub as-Sittah, using Information Retrieval systems that search for hadiths based on keywords. IR is employed because it has proven effective in retrieving precise documents according to the search terms. A Neural Network is used to match user queries with the document collection, while FastText word embedding is implemented for text representation. FastText is particularly useful for detecting similar meanings across different words, which is essential when interpreting Indonesian-translated hadiths that require nuanced understanding. The dataset used in this study consists of 31,275 Indonesian-translated hadiths from al-Kutub as-Sittah. In this study, it was found that many hadith translations have ancient language so that query reformulation is needed to get the right hadith because users often enter commands with currently trending words. In this study, it was also found that word2vec has less performance than FastText in weighting words in hadith translations. The results indicate that the neural network performs well in retrieving relevant hadith content according to the user’s commands or keywords. With a training data proportion of 70% and a testing data proportion of 30%, the Recall value was 0.7721 and the Precision value was 0.75112.
Aspect-Based Sentiment Analysis of Healthcare Reviews from Indonesian Hospitals based on Weighted Average Ensemble Setiawan, Esther Irawati; Tjendika, Patrick; Santoso, Joan; Ferdinandus, FX; Gunawan, Gunawan; Fujisawa, Kimiya
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.328

Abstract

Public assessments are essential for evaluating hospital quality and meeting patient demand for superior medical treatment. This study offers a novel approach to aspect-based sentiment analysis (ABSA), which consists of aspect extraction, emotion categorization, and aspect classification. The goal is to examine patient reviews (6,711 reviews) from Google assessments of 20 Indonesian hospitals, broken down by categories including cost, doctor, nurse, and other categories. For example, there are 469 good, 66 negative, and 7 neutral ratings for cleanliness and 93 positive, 125 negative, and 19 neutral reviews for pricing in the sample, which covers a range of attitudes. Using the Conditional Random Field (CRF) approach, aspect phrase extraction was refined and word characteristics and positional tags were adjusted, resulting in an improvement in the F1-score from 0.9447 to 0.9578. The Support Vector Machine (SVM) model had the greatest F1-score of 0.8424 out of two strategies used for aspect categorization. With the addition of sentiment words, sentiment classification improved and led by SVM to an ideal F1-score of 0.7913. For aspect and sentiment classification, a Weighted Average Ensemble approach incorporating SVM, Naïve Bayes, and K-Nearest Neighbors was employed, yielding F1-scores of 0.7881 and 0.8413, respectively. The use of an ensemble technique for sentiment and aspect classification and the incorporation of hyperparameter optimization in CRF for aspect term extraction, which led to notable performance gains, are the innovative aspects of this work.
Leveraging Data Analytics for Student Grade Prediction: A Comparative Study of Data Features Misinem, Misinem; Kurniawan, Tri Basuki; Dewi, Deshinta Arrova; Zakaria, Mohd Zaki; Nazmi, Che Mohd Alif
Journal of Applied Data Sciences Vol 5, No 4: DECEMBER 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i4.442

Abstract

In educational settings, a persistent challenge lies in accurately identifying and supporting students at risk of underperformance or grade retention. Traditional approaches often fall short by applying generalized interventions that fail to address specific academic needs, leading to ineffective outcomes and increased grade repetition. This study advocates for integrating machine learning algorithms into educational assessment practices to address these limitations. By leveraging historical and current performance data, machine learning models can help identify students needing additional support early in their academic journey, allowing for precise and timely interventions. This research examines the effectiveness of three machine learning algorithms: Naive Bayes, Deep Learning, and Decision Trees. Naive Bayes, known for its simplicity and efficiency, is well-suited for initial data screening. Deep Learning excels at uncovering complex patterns in large datasets, making it ideal for nuanced predictions. Decision Trees, with their interpretable and actionable outputs, provide clear decision paths, making them particularly advantageous for educational applications. Among the models tested, the Decision Tree algorithm demonstrated the highest performance, achieving an accuracy rate of 86.68%. This high precision underscores its suitability for educational contexts where decisions need to be based on reliable, interpretable data. The results strongly support the broader application of Decision Tree analysis in educational practices. By implementing this model, educational administrators can better identify at-risk students, tailor interventions to meet individual needs, and ultimately improve student success rates. This study suggests that Decision Trees could become a vital tool in data-driven strategies to enhance student retention and optimize academic outcomes.
Security Issues and Weaknesses in Blockchain Cloud Infrastructure: A Review Article Albaroodi, Hala A.; Anbar, Mohammed
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.324

Abstract

Cloud computing has become an essential technology due to its ability to provide scalable infrastructure and data services at a low cost and with minimal effort. It is widely adopted across various IT sectors and excels in providing flexible and scalable solutions for storage, computation, and networking. However, despite its widespread adoption, information security concerns remain a significant challenge, hampering its full potential. Issues such as data breaches, insufficient access controls, privacy risks, and vulnerability to external attacks persist, making security a critical obstacle for cloud computing’s growth. At the same time, blockchain technology has emerged as a promising solution for addressing these security challenges. Celebrated for ensuring data integrity, authenticity, and confidentiality, blockchain’s decentralized structure offers a potential safeguard against the risks cloud systems face. For instance, blockchain’s ability to maintain an immutable, tamper-proof ledger and decentralized control can mitigate unauthorised access risks, thereby enhancing cloud environments' transparency and security. One of the blockchain’s core components is the consensus protocol, a method through which a network of nodes validates transactions without needing to trust any single entity. In the case of Bitcoin, users follow the Proof of Work algorithm, dedicating hardware and energy resources to solve cryptographic puzzles and verify transactions. This decentralized verification process addresses fraud concerns, but it also brings challenges such as high energy consumption and network centralization, particularly in regions with cheap electricity. These concerns have led to worries about collusion risks and policy changes affecting the stability of the network. Blockchain’s decentralized nature has sparked significant interest, especially in its potential to enhance cloud computing security. Its ability to provide tamper-proof transaction logs, eliminate single points of failure, and grant users more control over data aligns well with the security demands of cloud environments. However, blockchain itself faces challenges, including scalability issues and its association with black-market trading due to its open-access model. Despite these concerns, blockchain’s integration into cloud systems presents a unique opportunity for addressing key security obstacles, thereby offering more robust solutions for corporate and financial applications.
Enhancing Sharia Stock Price Forecasting using a Hybrid ARIMA-LSTM with Locally Weighted Scatterplot Smoothing Regression Approach Gunaryati, Aris; Mutiara, Achmad Benny; Puspitodjati, Sulistyo
Journal of Applied Data Sciences Vol 6, No 1: JANUARY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i1.514

Abstract

Predicting Sharia stock prices is complex because it has high volatility and non-linear data patterns. To improve the accuracy of the forecast, the right technique is needed according to the existing data pattern. One of the techniques currently developing is integrating (hybrid) two forecasting models. This study proposes a hybrid autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM) model with the locally weighted scatterplot smoothing (lowess) linear regression technique. This model is designed by creating a linear regression between the actual value and the predicted results of the ARIMA and LSTM models using the Lowess technique. The dataset used here is the closing stock prices of four Indonesian Islamic banking companies. The hybrid ARIMA-LSTM model with lowess linear regression significantly outperforms the individual ARIMA and LSTM models because it produces better performance metrics, namely mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), for training and testing datasets. The proposed hybrid model effectively reduces noise, and the model can capture complex patterns in the Sharia stock price dataset, and the prediction results are more accurate. The accuracy values for training data and data testing datasets were respectively 97.6% and 98.3% (BANK. JK), 98.3% and 98.2% (BRIS. JK), 99.4% and 99.5% (BTPN. JK), and 97.7% and 99.3% (PNBS. JK).