cover
Contact Name
Abdullah
Contact Email
abdialam@gmail.com
Phone
+628127580419
Journal Mail Official
data.science.ins@gmail.com
Editorial Address
Jl. Soebrantas Gg. Jelutung Indah no 49 Tembilahan Indragiri Hilir Riau
Location
Kab. indragiri hilir,
Riau
INDONESIA
Data Science Insights
Published by PT Visi Media Network
ISSN : -     EISSN : 30311268     DOI : https://doi.org/10.63017/jdsi.v3i2
Data Science Insights, with ISSN 3031-1268 (Online) published by PT Visi Media Network is a journal that publishes Focus & Scope research articles, which include Data Science and Machine Learning; Data Science and AI; Blockchain and Advance Data Science; Cloud computing and Big Data; Business Intelligence and Big Data; Statistical Foundation for Data Science; Probability and Statistics for Data Science; Statistical Inference via Data Science; Big Data and Business Analytics; Statistical Thinking in Business; Data Driven Statistical Methods; Statistical Methods for Spatial Data Analytics; Statistical Techniques for Data Analysis; Data Science in Communication; Information and Communication Technology; Graph Data Management for Social Network Applications; Metadata for Information Management; Information/Data: Organization and Access; Information Science and Electronic Engineering; Big Data and Social Science; Data Communication and Computer Network; ICT & Data Analytics. This journal is published by the PT Visi Media Network, which is published twice a year.
Articles 28 Documents
Implementation of k-Means Algorithm for Coffee Sales Clustering Kevin, Edbert
Data Science Insights Vol. 3 No. 1 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i1.102

Abstract

Coffee is one of the most widely consumed beverages worldwide, with a rich history spanning centuries. Coffee is derived from the beans of the Coffea species, primarily Coffea arabica and Coffea canephora (robusta), and is prized not only for its stimulating effects but also for its complex flavor profile. This paper examines the diverse roles of coffee in human culture, its impact on health, and the global coffee industry. Coffee contains bioactive compounds, including caffeine, antioxidants, and diterpenes, which have been studied for their potential health benefits, such as improved cognitive function and reduced risk of certain chronic diseases. However, excessive consumption can lead to negative effects, including sleep disturbances and cardiovascular problems. In addition, the environmental and social impacts of coffee cultivation, including issues related to sustainability, fair trade, and climate change, are critically examined. The paper concludes with a discussion of emerging trends in coffee research, including innovations in processing methods, the rise of specialty coffees, and the growing importance of ethical sourcing in an increasingly globalized market. This comprehensive review emphasizes the need for a balanced understanding of coffee’s benefits and challenges, highlighting its role as a cultural staple and a commodity in the global economy.
Application of Decision Tree Algorithm for Classification of Rice Yields in Sumatra candra, wily
Data Science Insights Vol. 2 No. 2 (2024): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v2i2.103

Abstract

Rice is the main food crop in Indonesia, most of the agricultural sector in Indonesia is dominated by rice farming including on the island of Sumatra. A common problem that arises is how to find out the areas that produce the most rice each year on the island of Sumatra. This study aims to classify the areas that produce the most rice on the island of Sumatra. The dataset used in this study was taken from Kaggle with a total of 225 data and will be tested using the Decision Tree algorithm and several other algorithms. For data visualization, Tableau will be used to see which areas produce the most rice on the island of Sumatra. By using the research method using the Decision Tree algorithm, an accuracy of 97.78% was obtained with a data split of 0.8 for training data and 0.2 for testing data.
Predicting Student Performance using Linear Regression Kennedy Kassy, Max
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.104

Abstract

This study explores how to measure and predict student performance using various machine learning algorithms to determine the model that produces the best predictions. The collected data is obtained from the Kaggle data science and machine learning community website, obtaining a dataset with 6 attributes, namely: (1) Hours Studied, (2) Previous Scores, (3) Extracurricular Activities, (4) Sleep Hours, (5) Sample Question Papers Practiced, and (6) Performance Index. The data was cleaned and explored using Microsoft Excel, Google Colab and Tableau. Model development using RapidMiner and Google Colab. The algorithms used for the study were: k-NN, SVM, Linear Regression, Generalized Linear Model, Deep Learning. The Root Mean Squared Error (RMSE) results obtained by the algorithm were 2,455 (k-NN), 2,072 (SVM), 2,013 (Linear Regression), 2,030 (Generalized Linear Model), 2,364 (Deep Learning). From the RMSE it can be seen that the algorithm that gets the best results is Linear Regression, after being retested, Linear Regression gets an RMSE of 2.015, and Root Squared (R2) of 0.989, meaning the Linear Regression algorithm has an accuracy of 98.9%.
Prediction of Heart Disease Attack Risk using Deep Learning Algorithm Mishel, Michelle Virya Effendy
Data Science Insights Vol. 3 No. 1 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i1.107

Abstract

The heart is a muscular organ that acts as the main pump in the human circulatory system, pumping oxygen-rich blood throughout the body and returning blood containing carbon dioxide to be purified. Coronary heart disease, caused by arterial blockages due to plaque buildup (fat, cholesterol, and other substances), is often the leading cause of heart attacks as blood flow to the heart muscle is reduced. This condition is one of the leading causes of death worldwide, making it necessary to have an accurate method to detect this disease early. This study aims to help predict the risk of heart disease based on gender using data mining. Data mining facilitates heart disease diagnosis, particularly in helping doctors determine whether a patient suffers from heart disease based on early symptoms that appear. The author uses five data mining algorithms: Naïve Bayes, K-Nearest Neighbor (KNN), Decision Tree, Random Forest, and Deep Learning. The research results show that the Deep Learning model is the best algorithm for predicting heart disease symptoms. Additionally, using the right predictive model can help reduce the risk of delayed diagnosis. Therefore, the predictive model with this algorithm is recommended for implementation in hospitals to help detect heart disease symptoms in patients more accurately and efficiently. This way, early diagnosis can be made to improve patient recovery chances and reduce mortality rates due to heart disease.
Predicting Forest Fires using Five Machine Learning Algorithms Manik, Rian Delober
Data Science Insights Vol. 2 No. 2 (2024): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v2i2.114

Abstract

This research aims to develop a prediction model for forest fires that occur by utilizing five types of machine learning algorithms, namely Decision Tree, K-Nearest Neighbors (KNN), Random Forest, Naïve Bayes (Kernel), and Rule Induction. The data used in this research was taken from [www.kaggle.com]. By using data pre-processing techniques such as missing value imputation, data normalization, and feature selection techniques, to ensure the quality of the data used in the prediction model. The research results show that each algorithm has different performance in predicting forest fires that occur each month, with some algorithms showing higher levels of accuracy and precision. Further analysis discusses the advantages and disadvantages of each algorithm as well as the practical implications of implementing them in the environment.
Digital Data Collection among Low ICT-Literate Rural Communities: A Case Study using Google Forms via Smartphones Wan Ishak, Wan Hussain; Yamin, Fadhilah; Ismail, Risyawati Mohamed; Mustafar, Mastora; Abu Bakar, Siti Zakiah
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.121

Abstract

This study investigates the use of Google Forms as a digital tool for daily livestock monitoring among rural, low ICT-literate chicken farmers in Malaysia. A total of 198 responses were collected via smartphones through WhatsApp-distributed forms, allowing participants to self-report poultry conditions while reducing the need for frequent site visits. While the approach proved accessible and cost-effective, analysis revealed significant data quality issues, including inconsistent data entry (e.g., mixed numeric and textual values), unstructured categorical responses, duplicate submissions, ambiguous placeholder values, and the absence of unique identifiers. These challenges limited the reliability and usability of the dataset for meaningful analysis. To address these issues, the study recommends implementing structured input fields, validation rules, unique respondent IDs, and user training materials tailored to low digital literacy. This paper highlights both the potential and pitfalls of digital self-reporting tools in underserved rural contexts and provides practical recommendations for improving data quality in similar monitoring efforts. The findings offer valuable guidance for researchers and practitioners designing data collection systems in constrained environments.
Evaluating and Deploying Predictive Models for Weather Classification Lie, Jolin Arfina
Data Science Insights Vol. 2 No. 2 (2024): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v2i2.177

Abstract

Weather is the condition of the atmosphere in a specific location over a relatively short period of time, described through various parameters such as temperature, air pressure, wind speed, humidity, and other atmospheric phenomena. It differs from climate, which refers to the average atmospheric conditions over a large area and a long time period studied under the field of climatology. Weather can vary from hot to cold, wet to dry, and windy to calm. It is influenced by dynamic changes in the Earth’s atmosphere, including warming and cooling processes. In recent years, weather changes have become more frequent and unpredictable, significantly affecting daily human activities. Therefore, an intelligent system capable of detecting and predicting weather conditions is increasingly needed. This study aims to apply classification algorithms to predict weather conditions based on relevant meteorological parameters. The algorithms used include k-Nearest Neighbor, Random Forest, Naïve Bayes, Decision Tree, and Deep Learning. Given the irregularity and complexity of weather patterns, manual prediction becomes unreliable. Although it is impossible to predict the weather with absolute certainty, computational methods can provide reasonably accurate estimations. Based on the evaluation results, the Random Forest algorithm demonstrated the highest accuracy among the tested models. Furthermore, the final model was successfully deployed using Python, enabling real-time predictions on incoming weather data.
Comparative Analysis of Data Visualization Techniques for Rainfall Data Wan Ishak, Wan Hussain; Yamin, Fadhilah; Maidin, Siti Sarah; Husin, Abdullah
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.204

Abstract

Rainfall data is essential for applications such as climate monitoring, agricultural planning, flood forecasting, and water resource management. However, the interpretation of this data is often hindered by its high volume, variability, and multi-scale temporal nature. Effective visualization is critical not only for summarizing complex datasets but also for uncovering patterns, detecting anomalies, and facilitating informed decision-making. Despite the availability of numerous visualization techniques, selecting the most suitable method for rainfall data, especially across varying temporal resolutions is a challenging task. This study presents a comparative analysis of widely used data visualization techniques in the context of rainfall data. The methodology was structured into three phases: understanding the nature of rainfall data, reviewing relevant visualization techniques, and conducting a comparative content analysis. A SWOT (Strengths, Weaknesses, Opportunities, and Threats) evaluation was used to assess each technique’s analytical potential, while a temporal suitability comparison was performed across five time granularities: yearly, monthly, weekly, daily, and hourly. Findings show that no single technique is universally effective. Instead, each method demonstrates specific strengths and limitations depending on the temporal scale and analytical objective. Line charts and bar charts are well-suited for lower-frequency data, while heat maps and scatter plots are more effective for high-resolution, time-sensitive patterns. Box plots and histograms provide valuable insights into data distribution and variability, whereas map-based visualizations excel in spatial analysis but require enhancements for temporal exploration. The study concludes that visualization effectiveness depends on aligning method selection with data characteristics and analytical goals. A thoughtful combination of techniques is often necessary to achieve clarity, reduce misinterpretation, and enhance decision support in rainfall data analysis.

Page 3 of 3 | Total Record : 28