cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 54 Documents
Search results for , issue "Vol 6, No 2: MAY 2025" : 54 Documents clear
Sentiment Analysis on Slang Enriched Texts Using Machine Learning Approaches Prastyo, Priyo Agung; Berlilana, Berlilana; Tahyudin, Imam
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.626

Abstract

This study explores sentiment analysis of slang-enriched user reviews using machine learning techniques, specifically Naive Bayes, Support Vector Machine (SVM), and Random Forest, to classify user sentiment into Positive, Negative, and Neutral categories while addressing challenges posed by informal and conversational language through slang normalization. A lexicon-based scoring method was employed to standardize slang terms such as “gak,” “aja,” and “banget,” ensuring consistency in sentiment analysis. The results indicate that Neutral sentiment dominates the dataset (51%), followed by Negative (28%) and Positive (21%), with lexicon-based scores confirming this distribution. Negative sentiment exhibits a broader intensity range, reflecting user dissatisfaction primarily related to network quality, service reliability, and pricing, as evident from recurring terms like “sinyal” (signal), “jaringan” (network), and “mahal” (expensive). Word cloud visualizations reinforce these findings, highlighting the prevalence of these concerns in user feedback. Performance evaluation of the machine learning models reveals that SVM and Random Forest achieved the highest accuracy (96%), significantly outperforming Naive Bayes (73%), demonstrating their effectiveness in handling high-dimensional text data and accurately classifying slang-rich content. These findings underscore the importance of slang normalization in preprocessing, as it significantly enhances sentiment classification accuracy. This study provides actionable insights for service providers, helping them identify and address key sources of user dissatisfaction. Future research can explore deep learning models such as BERT and LSTM to further enhance sentiment analysis by capturing contextual relationships within text data, while topic modeling techniques could uncover deeper thematic patterns in user feedback, enabling data-driven strategies to improve customer satisfaction.
Optimizing Sentiment Analysis on Imbalanced Hotel Review Data Using SMOTE and Ensemble Machine Learning Techniques Putra, Pandu Pratama; Anam, M. Khairul; Chan, Andi Supriadi; Hadi, Abrar; Hendri, Nofri; Masnur, Alkadri
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.618

Abstract

This research addresses the challenge of imbalanced sentiment classes in hotel review datasets obtained from Traveloka by integrating SMOTE (Synthetic Minority Oversampling Technique) with ensemble machine learning methods. The study aimed to enhance the classification of Positive, Negative, and Neutral sentiments in customer reviews. Data preprocessing techniques, including tokenization, stemming, and stopword removal, prepared the textual data for analysis. Various machine learning models—CART, KNN, Naive Bayes, and Random Forest—were evaluated individually and in ensemble configurations such as Bagging, Stacking, Soft Voting, and Hard Voting. The Stacking ensemble approach, utilizing Logistic Regression as a meta-classifier, demonstrated superior performance with an accuracy, precision, recall, and F1-score of 88%, outperforming Bagging (86%), Hard Voting (84%), and Soft Voting (81%). The findings highlight the effectiveness of SMOTE in balancing sentiment classes, particularly improving the classification of underrepresented Neutral and Negative categories. The novelty of this study lies in the comprehensive use of ensemble techniques combined with SMOTE, which significantly enhanced prediction stability and accuracy compared to previous approaches. These results provide valuable insights into leveraging advanced machine learning techniques for sentiment analysis, offering practical implications for improving customer experience and service quality in the hospitality industry.
Enhancing Digital Marketing Strategies with Machine Learning for Analyzing Key Drivers of Online Advertising Performance Berlilana, Berlilana; Hariguna, Taqwa; El Emary, Ibrahiem M. M.
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.658

Abstract

The rapid growth of digital advertising has underscored the need for data-driven strategies to optimize campaign performance. This study applies machine learning techniques to analyze online advertising data, aiming to identify key performance drivers and provide actionable insights for optimizing marketing strategies. The dataset includes metrics such as clicks, displays, costs, and revenue, which were preprocessed, analyzed, and modeled using ensemble methods, including Random Forest and Gradient Boosting. These ensemble methods were chosen for their ability to handle high-dimensional data, mitigate overfitting, and capture complex, nonlinear relationships between variables. Random Forest, with its bagging approach, enhances generalization by reducing variance, while Gradient Boosting incrementally corrects errors by focusing on hard-to-predict instances, improving overall predictive performance. Descriptive analysis revealed significant variability in campaign outcomes, with cost and user engagement emerging as primary predictors of revenue. Machine learning models demonstrated strong predictive accuracy, with Random Forest achieving 92% accuracy and an F1-score of 89%. Visualizations such as feature importance charts, correlation heatmaps, and learning curves validated the robustness of the models and highlighted key insights, including inefficiencies in cost allocation and the limited impact of certain categorical features like placement. The study emphasizes the potential of machine learning to optimize digital marketing strategies by identifying critical factors that influence campaign success. The findings provide a scalable framework for resource allocation, audience targeting, and strategic decision-making in online advertising. Future research could further enhance predictions by incorporating additional features, such as audience demographics and temporal trends, to provide deeper insights into campaign dynamics.
Development of Skyline Query Algorithm for Individual Preference Recommendation in Streaming Data Amin, Ruhul; Djatna, Taufik; Annisa, Annisa; Sitanggang, Sukaesih
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.599

Abstract

The ability of a recommendation system to deliver relevant outcomes is significantly influenced by its adaptability to the dynamic nature of individual user preferences. Data-streaming-based recommendation systems face substantial challenges in aligning recommendations with rapid shifts in user preferences. Previous research on the development of skyline query algorithms has predominantly focused on processing efficiency and parallel performance optimization yet has not addressed the dynamic nature of individual user preferences—an essential factor for generating relevant and responsive recommendations in streaming data environments. This study aims to develop a skyline query algorithm called Distributed Data Skyline (DDSky) to provide recommendations based on dynamic individual user preferences within data-streaming contexts. DDSky leverages the Recency, Frequency, Monetary, and Rating (RFMRT) model to capture real-time changes in user preferences. This model is integrated with parallel skyline computation and structured to enhance the data processing efficiency on a large scale. The parallel processing approach divides tasks into smaller subtasks executed simultaneously across multiple threads. This strategy enables the simultaneous processing of attributes such as price, distance, and individual user preferences, thereby delivering relevant and responsive recommendations to real-time changes in user preferences. The DDSky algorithm was evaluated using a local dataset from the JALITA application and compared with the Eager algorithm. The results demonstrated that DDSky outperformed Eager, achieving an average recall value of 0.45 and an F1-measure of 0.55, compared to Eager's recall value of 0.33 and F1-measure of 0.47. Furthermore, DDSky achieved an average precision of 0.73, which closely approached Eager's precision of 0.82. Additionally, DDSky exhibited optimal throughput performance for datasets containing up to 10,000 items with high flexibility across various data types. With its unique technical approach, DDSky delivers more responsive and relevant recommendations to dynamic user preferences, establishing its superiority in data-streaming-based recommendation systems.
Improving Evaluation Metrics for Text Summarization: A Comparative Study and Proposal of a Novel Metric Junadhi, Junadhi; Agustin, Agustin; Efrizoni, Lusiana; Okmayura, Finanta; Habibie, Dedi Rahman; Muslim, Muslim
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.547

Abstract

This research evaluates and compares the effectiveness of various evaluation metrics in text summarization, focusing on the development of a new metric that holistically measures summary quality. Commonly used metrics, including ROUGE, BLEU, METEOR, and BERTScore, were tested on three datasets: CNN/DailyMail, XSum, and PubMed. The analysis revealed that while ROUGE achieved an average score of 0.65, it struggled to capture semantic nuances, particularly for abstractive summarization models. In contrast, BERTScore, which incorporates semantic representation, performed better with an average score of 0.75. To address these limitations, we developed the Proposed Metric, which combines semantic similarity, n-gram overlap, and sentence fluency. The Proposed Metric achieved an average score of 0.78 across datasets, surpassing conventional metrics by providing more accurate assessments of summary quality. This research contributes a novel approach to text summarization evaluation by integrating semantic and structural aspects into a single metric. The findings highlight the Proposed Metric's ability to capture contextual coherence and semantic alignment, making it suitable for real-world applications such as news summarization and medical research. These results emphasize the importance of developing holistic metrics for better evaluation of text summarization models.
HU Variance Moment Optimizes Keyframe Selection Based on Deep Learning for Violence Detection Putri, Sukmawati Anggraeni; Andono, Pulung Nurtantio; Purwanto, Purwanto; Soeleman, Moch Arief
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.648

Abstract

Violence in public spaces poses a serious threat to individuals and society. Manual monitoring and violence detection require much time and human resources, ultimately hindering detection accuracy and speed. Therefore, an automated method is needed to detect violence to ensure fast and efficient action. Along with technological advances, violence detection research has adopted various methods and models, including deep learning, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In this study, the classification process for detecting violence and non-violence uses the VGG19 model, one of the CNN models that has good performance with limited computing. In addition, the Long Short-Term Memory (LSTM) model is the best RNN model for processing temporal data in videos. However, this performance will decrease with noise and irrelevant data in the classification process. Therefore, to optimize deep learning performance, this study in the pre-processing phase selects keyframes in frame extraction using the Hu Variance Moment Technique. This method calculates each frame’s Hu and Variance Moment values and selects keyframes based on high Hu values. Next, we use Adaptive Moment Estimation (Adam) to optimize the gradient of the selected keyframes. This study produces a Hu19LSTM model tested on three datasets: hockey fight, crowd, and AIRTLab. The proposed Hu19LSTM model produces an accuracy of 97% on the Hockey Fight dataset, 97% on the Crowd dataset, and 95% on the AIRTLab dataset. These results indicate that the Hu19LSTM model can increase its accuracy on the hockey fight and Crowd dataset by 97%.
Optimization of Recommender Systems for Image-Based Website Themes Using Transfer Learning Wahid, Arif Mu'amar; Hariguna, Taqwa; Karyono, Giat
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.671

Abstract

Recommender systems play a crucial role in personalizing user experiences in e-commerce, digital media, and web design. However, traditional methods such as Collaborative Filtering and Content-Based Filtering struggle to account for visual preferences, limiting their effectiveness in domains were aesthetics influence decision-making, such as website theme recommendations. These systems face challenges such as data sparsity, cold-start problems, and an inability to capture intricate visual features. To address these limitations, this study integrates Convolutional Neural Networks (CNNs) with advanced recommendation models, including Inception V3, DeepStyle, and Visual Neural Personalized Ranking (VNPR), to enhance the accuracy and personalization of visually-aware recommender systems. A quantitative research approach was employed, using controlled experiments to evaluate different combinations of feature extractors and recommendation models. Data was sourced from ThemeForest, a widely used platform for website themes, and underwent preprocessing to ensure consistency. The models were evaluated using precision, recall, F1 score, Mean Average Precision (MAP), and Normalized Discounted Cumulative Gain (NDCG) to measure recommendation quality. The results indicate that Inception V3 + VNPR outperforms other model combinations, achieving the highest accuracy in personalized theme recommendations. The integration of transfer learning further improved feature extraction and performance, even with limited training data. These findings underscore the importance of combining deep learning-based feature extraction with recommendation models to improve visually-driven recommendations. This study provides a comparative analysis of CNN-based recommender systems and contributes insights for optimizing recommendations in visually complex domains. Despite improvements, challenges such as dataset diversity remain a limitation, affecting generalizability. Future research could explore alternative CNN architectures, such as ResNet and DenseNet, and incorporate user feedback mechanisms to further enhance recommendation accuracy and adaptability.
Sentimental Analysis of Legal Aid Services: A Machine Learning Approach Khosa, Joe; Mashao, Daniel; Olanipekun, Ayorinde
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.521

Abstract

Legal Aid services in South Africa, administered by Legal Aid South Africa (SA), aim to provide essential legal representation to vulnerable individuals lacking financial resources. Despite its significant role, there is a pervasive perception among the public that the quality of these state-funded services is substandard, often leading to negative attitudes towards the organization. This research employs sentiment analysis to evaluate client perceptions of Legal Aid SA's services, using a dataset of 5,246 entries from Twitter and the Internal client feedback system between 2019 and 2024. The study utilizes various machine learning algorithms, including Naive Bayes, Stochastic Gradient Descent (SGD), Random Forest, Support Vector Classification (SVC), Logistic Regression, and Extreme Gradient Boosting (XGBoost), to analyze sentiment polarity and classify feedback into positive, neutral, and negative sentiments. The accuracy, precision, recall, and F1 scores assessed model performance. The SVC and XGBoost models demonstrated superior performance, achieving testing accuracies of 90.10% and 90.00%, respectively. In contrast, Naive Bayes and Logistic Regression lagged, with test accuracies of 82.00% and 85.00%, respectively. The findings reveal that most responses are either neutral or positive, suggesting a predominantly favourable impression of Legal Aid services. This research not only aims to enhance Legal Aid SA's service offerings but may also provide valuable insights for similar organizations globally.
The Integration of DEMATEL and SAW Methods for Developing a Research Performance Assessment Model for Lecturers Sutoyo, Muh. Nurtanzis; Paliling, Alders
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.550

Abstract

This work aims to integrate two decision analytic methodologies, DEMATEL and SAW, to develop a comprehensive and effective model for assessing research performance among instructors. These strategies aim to rectify the deficiencies of traditional evaluation models, which often neglect the complexity of interconnections among performance metrics. This study utilizes research performance data from lecturers, encompassing publication count, journal quality, impact, funding, and cooperation. SAW is employed to calculate aggregate scores utilizing weights obtained from the DEMATEL analysis, whereas DEMATEL is utilized to delineate and assess the interrelationships among the evaluation criteria. The results indicate that the quantity of publications significantly influences research quality, succeeded by research impact and journal quality. Alternative A, with a maximum score of 0.996, demonstrated that the professor excelled in nearly all categories. A clear and objective evaluation methodology was developed by integrating DEMATEL with SAW. The development of more flexible criterion weights to accommodate shifts in academic practices and research priorities is a significant implication for future investigations. To evaluate this model's appropriateness and effectiveness in various academic contexts, it must be further assessed across multiple topic areas and types of educational institutions. This study facilitates the implementation of big data technology in academic performance evaluation, enhancing the accuracy and relevance of assessment methods.
Applying the Smooth Transition Autoregressive Model for Discovering the Nonlinear Cointegration Relationship Between the Interest Rate and Inflation in Vietnam Nguyen, Ha Thanh
Journal of Applied Data Sciences Vol 6, No 2: MAY 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i2.568

Abstract

Interest rates and inflation are two key macroeconomic indicators that have a direct impact on a country’s economy. The Fisher hypothesis addresses the relationship between these two variables, with its core idea being that nominal interest rates and inflation have a positive long-term relationship, while real interest rates remain constant. The primary objective of this study is to explore the relationship between interest rates and inflation in Vietnam during the period from 2007 to 2023. Unlike previous studies, this research, based on Vietnam's specific context, employs the Smooth Transition Autoregressive (STAR) model. This approach allows for testing nonlinear cointegration, overcoming the limitations of traditional cointegration methods. The study identifies that interest rates and inflation exhibit long-term co-movement, adhering to a common trend. When these two variables deviate from their equilibrium position, they rapidly adjust back to equilibrium, governed by an asymmetric logarithmic transition function. The findings challenge the one-to-one relationship proposed by the Fisher hypothesis, revealing a more complex link between interest rates and inflation. Additionally, the study highlights the interactive nature of Vietnam’s monetary and financial markets. It demonstrates that monetary policy tools can influence the financial market, while the long-term nominal interest rate emerges as a potential indicator of inflation. These insights provide significant implications for policymakers aiming to stabilize the economy through effective monetary and financial strategies. This research further confirms the effectiveness of nonlinear cointegration methods and the STAR model in macroeconomic analysis. The article also presents an interesting finding regarding the Fisher hypothesis in a developing country like Vietnam.