cover
Contact Name
Abdullah
Contact Email
abdialam@gmail.com
Phone
+628127580419
Journal Mail Official
data.science.ins@gmail.com
Editorial Address
Jl. Soebrantas Gg. Jelutung Indah no 49 Tembilahan Indragiri Hilir Riau
Location
Kab. indragiri hilir,
Riau
INDONESIA
Data Science Insights
Published by PT Visi Media Network
ISSN : -     EISSN : 30311268     DOI : https://doi.org/10.63017/jdsi.v3i2
Data Science Insights, with ISSN 3031-1268 (Online) published by PT Visi Media Network is a journal that publishes Focus & Scope research articles, which include Data Science and Machine Learning; Data Science and AI; Blockchain and Advance Data Science; Cloud computing and Big Data; Business Intelligence and Big Data; Statistical Foundation for Data Science; Probability and Statistics for Data Science; Statistical Inference via Data Science; Big Data and Business Analytics; Statistical Thinking in Business; Data Driven Statistical Methods; Statistical Methods for Spatial Data Analytics; Statistical Techniques for Data Analysis; Data Science in Communication; Information and Communication Technology; Graph Data Management for Social Network Applications; Metadata for Information Management; Information/Data: Organization and Access; Information Science and Electronic Engineering; Big Data and Social Science; Data Communication and Computer Network; ICT & Data Analytics. This journal is published by the PT Visi Media Network, which is published twice a year.
Articles 6 Documents
Search results for , issue "Vol. 3 No. 2 (2025): Journal of Data Science Insights" : 6 Documents clear
Assessing the Efficiency and Accuracy of K-Means Clustering Compared to Other Clustering Techniques Khan, Iliyas; Daud, Hanita Binti Daud; Zainuddin , Nooraini binti Zainuddin; Sokkalingam, Rajalingam Sokkalingam; Azad , Abdus Samad Azad; Samad, Abdussamad; Suleiman, Ahmad Abubakar Suleiman
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.23

Abstract

Clustering is an important method in data analysis, faces challenges due to the different nature of datasets, resulting in certain algorithms being less effective and taking a long time. Choosing the most effective clustering method involves evaluating its accuracy and computational speed for a dataset poses a significant challenge for today's researchers. To address these issues, current study compares different clustering methods, by using datasets, including iris, seed, and well log to evaluate their accuracy and execution speed. Results show that K-means performs better with large datasets. As sample size increases, the accuracy of the K-means algorithm tends to improve. The execution time of k-means is influenced by the number of features in the dataset, with datasets having a larger number of features typically requiring more time to process. Mean shift algorithm and spectral clustering algorithm are performed well in small data sets, but it takes a long time.
Cluster Analysis of Superstore Data using K-Means and K-Medoids for Product Delivery Insights Sarumaha, Intan chintia; Foureshtree, Ajeng Cahyani; Jocelyn, Angela; Santoso, Jeffri; Hutabarat, Fernando
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.34

Abstract

It is difficult to overcome the challenge of understanding the relationship between consumer patterns and overall market trends and improve the company's operational efficiency through optimizing the delivery process. Utilizing sales data from Super Store available on the Kaggle website, this study aims to identify predictable consumer patterns using cluster analysis, as well as explore how to improve delivery efficiency based on a better understanding of consumer needs and preferences. This research utilizes K-Means and K-Medoids clustering methods to group product subcategories into three categories: best-selling, in-selling, and not-selling. The process of data transformation, exploratory analysis, model building, as well as cluster performance evaluation were conducted with the help of analytical tools such as Microsoft Excel, Tableau, and RapidMiner. The results show that the K-Medoids algorithm provides more accurate clustering performance compared to K-Means, with a Davies-Bouldin Index value of -0.867 for K-Medoids and -0.519 for K-Means. This shows that K-Medoids is more suitable in describing the characteristics of existing data. The most in-demand cluster results are in the sub-category of machines and copiers products.
Comprehensive Approach to Weather Prediction with the Random Forest Algorithm Pedro Joyarieb; Silalahi, Vian Candra; Anggelica , Vallencia; Ongso, Khatrina Kelly
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.35

Abstract

Weather is an air condition that is very important in everyday life. Accurate weather predictions can help people anticipate and deal with weather changes that can have an impact on daily activities. This research aims to develop an effective weather prediction model using machine learning algorithms. In this research, we use three popular machine learning algorithms, namely Random Forest, Support Vector Machine (SVM), and Decision Tree. The data used consists of historical weather data, including air temperature, air humidity, rainfall, wind direction, air pressure, wind speed, and solar radiation. The research results show that the Random Forest algorithm has the highest accuracy, with a prediction rate of 83%. The SVM algorithm is next, with a prediction rate of 78%, while the Decision Tree algorithm has a prediction rate of 72%. These findings show that Random Forest is the most effective algorithm in predicting weather, especially in predicting air temperature and rainfall. This research has significant practical implications in increasing the accuracy of weather predictions, which can help society anticipate and deal with weather changes that can impact in daily activities. In the future, this research can be used as a basis for developing more accurate and reliable weather prediction systems.
Predicting Student Performance using Linear Regression Kennedy Kassy, Max
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.104

Abstract

This study explores how to measure and predict student performance using various machine learning algorithms to determine the model that produces the best predictions. The collected data is obtained from the Kaggle data science and machine learning community website, obtaining a dataset with 6 attributes, namely: (1) Hours Studied, (2) Previous Scores, (3) Extracurricular Activities, (4) Sleep Hours, (5) Sample Question Papers Practiced, and (6) Performance Index. The data was cleaned and explored using Microsoft Excel, Google Colab and Tableau. Model development using RapidMiner and Google Colab. The algorithms used for the study were: k-NN, SVM, Linear Regression, Generalized Linear Model, Deep Learning. The Root Mean Squared Error (RMSE) results obtained by the algorithm were 2,455 (k-NN), 2,072 (SVM), 2,013 (Linear Regression), 2,030 (Generalized Linear Model), 2,364 (Deep Learning). From the RMSE it can be seen that the algorithm that gets the best results is Linear Regression, after being retested, Linear Regression gets an RMSE of 2.015, and Root Squared (R2) of 0.989, meaning the Linear Regression algorithm has an accuracy of 98.9%.
Digital Data Collection among Low ICT-Literate Rural Communities: A Case Study using Google Forms via Smartphones Wan Ishak, Wan Hussain; Yamin, Fadhilah; Ismail, Risyawati Mohamed; Mustafar, Mastora; Abu Bakar, Siti Zakiah
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.121

Abstract

This study investigates the use of Google Forms as a digital tool for daily livestock monitoring among rural, low ICT-literate chicken farmers in Malaysia. A total of 198 responses were collected via smartphones through WhatsApp-distributed forms, allowing participants to self-report poultry conditions while reducing the need for frequent site visits. While the approach proved accessible and cost-effective, analysis revealed significant data quality issues, including inconsistent data entry (e.g., mixed numeric and textual values), unstructured categorical responses, duplicate submissions, ambiguous placeholder values, and the absence of unique identifiers. These challenges limited the reliability and usability of the dataset for meaningful analysis. To address these issues, the study recommends implementing structured input fields, validation rules, unique respondent IDs, and user training materials tailored to low digital literacy. This paper highlights both the potential and pitfalls of digital self-reporting tools in underserved rural contexts and provides practical recommendations for improving data quality in similar monitoring efforts. The findings offer valuable guidance for researchers and practitioners designing data collection systems in constrained environments.
Comparative Analysis of Data Visualization Techniques for Rainfall Data Wan Ishak, Wan Hussain; Yamin, Fadhilah; Maidin, Siti Sarah; Husin, Abdullah
Data Science Insights Vol. 3 No. 2 (2025): Journal of Data Science Insights
Publisher : PT Visi Media Network

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.63017/jdsi.v3i2.204

Abstract

Rainfall data is essential for applications such as climate monitoring, agricultural planning, flood forecasting, and water resource management. However, the interpretation of this data is often hindered by its high volume, variability, and multi-scale temporal nature. Effective visualization is critical not only for summarizing complex datasets but also for uncovering patterns, detecting anomalies, and facilitating informed decision-making. Despite the availability of numerous visualization techniques, selecting the most suitable method for rainfall data, especially across varying temporal resolutions is a challenging task. This study presents a comparative analysis of widely used data visualization techniques in the context of rainfall data. The methodology was structured into three phases: understanding the nature of rainfall data, reviewing relevant visualization techniques, and conducting a comparative content analysis. A SWOT (Strengths, Weaknesses, Opportunities, and Threats) evaluation was used to assess each technique’s analytical potential, while a temporal suitability comparison was performed across five time granularities: yearly, monthly, weekly, daily, and hourly. Findings show that no single technique is universally effective. Instead, each method demonstrates specific strengths and limitations depending on the temporal scale and analytical objective. Line charts and bar charts are well-suited for lower-frequency data, while heat maps and scatter plots are more effective for high-resolution, time-sensitive patterns. Box plots and histograms provide valuable insights into data distribution and variability, whereas map-based visualizations excel in spatial analysis but require enhancements for temporal exploration. The study concludes that visualization effectiveness depends on aligning method selection with data characteristics and analytical goals. A thoughtful combination of techniques is often necessary to achieve clarity, reduce misinterpretation, and enhance decision support in rainfall data analysis.

Page 1 of 1 | Total Record : 6