Journal of Dinda : Data Science, Information Technology, and Data Analytics
Journal of Dinda : Data Science, Information Technology, and Data Analytics as a publication media for research results in the fields of Data Science, Information Technology, and Data Analytics, but not implicitly limited. Published 2 times a year in February and August. The journal is managed by the Data Engineering Research Group, Faculty of Informatics, Telkom Purwokerto Institute of Technology. Journal of Dinda is a medium for scientific studies resulting from research, thinking, and critical-analytic studies regarding Data Science, Informatics, and Information Technology. This journal is expected to be a place to foster enthusiasm in education, research, and community service which continues to develop into supporting references for academics. FOCUS AND SCOPE Journal of Dinda : Data Science, Information Technology, and Data Analytics receive scientific articles with the scope of research on: Machine Learning, Deep Learning, Artificial Intelligence, Databases, Statistics, Optimization, Natural Language Processing, Big Data and Cloud Computing, Bioinformatics, Computer Vision, Speech Processing, Information Theory and Models, Data Mining, Mathematical, Probabilistic and Statical Theories, Machine Learning Theories, Models and Systems, Social Science, Information Technology
Articles
87 Documents
Comparison of C4.5 and Naive Bayes Algorithm Methods in Prediction of Student Graduation on Time (Case Study: Information Systems Study Program)
Disty Dikriani;
Alvina Tahta Indal Karim
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 3 No 1 (2023): February
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v3i1.782
In tertiary institutions, students become one of the important parameters in the evaluation of study program organizers. Prediction of student graduation is a special concern to know, early identification for students is needed as an important action. Information processing to predict student graduation is by implementing data mining. The implementation of data mining can be applied if a university, especially a study program, does not yet have an early classification in achieving student graduation on time. The ITTP Information System study program is one of the study programs that does not have an early identification of student graduation on time. Determination of graduation for SI ITTP Study Program students includes GPA, TOEFL scores, and total credits. The purpose of this research is to find out which attributes have the most influence in predicting graduation of ITTP IS Study Program students. The method used in this prediction is by using the classification of the C4.5 Algorithm and Naïve Bayes. The classification is used to determine which attributes have an effect on predicting student graduation on time and to compare the two classification methods. The results obtained are the training set size 70% which has the best accuracy when compared to other training set sizes. Comparing the accuracy between the two methods, it is known that the C4.5 algorithm has good accuracy when training set size is 70% and Naïve Bayes has higher accuracy when training set size is 75%. Decision tree C4.5 interprets that the most influential attribute is the GPA as the root of the decision tree to predict student graduation on time. The research is expected to be used as a reference for the ITTP IS Study Program in formulating student graduation policies on time and as a reference for further researchers in predicting in the same field.
The Descriptive Analysis of Perceptions of ITTP Data Science Students regarding Face-to-Face Learning Plans
Wahyu Nouval Aghniya;
Lutfhi Rakan Nabila;
Rizky Ananda Putra
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 3 No 2 (2023): August
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v3i2.1028
The case of Covid-19 which has been going up and down has forced educational units to think about what learning methods will be applied in the future and also have to pay attention to the responses that students will say. Remember, some students have various arguments, including students who can think maturely in assessing something related to their future interests. This research was conducted with the aim of knowing student perceptions regarding face-to-face learning plans during the pandemic at IT Telkom Purwokerto. In knowing each student's perception, there are several variables that can influence the results of their perception. For the population in this study, all undergraduate students of the IT Telkom Purwokerto Faculty of Informatics in 2021 with judgment/expert sampling as the sampling technique. The instrument used is a questionnaire or questionnaire. The data analysis method used in this research is descriptive quantitative analysis method. Based on the research that has been done, the results show that there were 37 answers (56.1%) who strongly agreed with the question regarding facilities & infrastructure, for Regarding service quality, there were 45 answers (68.2%) who strongly agreed, then for questions regarding student perceptions, there were 17 answers (25.8%) who felt strongly agreed. And obtained results of less than 15% and even up to 0% in each variable for answers that do not agree. So, most students agree with face-to-face learning and attending lectures. Likewise with the parents of each student who agreed to the plan.
Dominant Requirements for Student Graduation in the Faculty of Informatics using the C4.5 Algorithm
Alvina Tahta Indal Karim;
Sudianto Sudianto
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 3 No 2 (2023): August
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v3i2.1040
Graduating on time is one of the indicators in the achievement and ranking of educational institutions. The achievement of graduating on time in educational institutions is essential to balance incoming and graduating students. The problem that occurs, the attributes for graduating on time have varying weightings, so the determinants of the attributes for passing on time need to be known so that the anticipation of achieving graduation on time can be met. The purpose of this study is to find out the dominant attributes in the prediction of graduating on time for students. The attributes used are credit scores (Semester Credit Units), GPA scores (Grade Point Average), and English scores (TOEFL). The method used is the C4.5 Algorithm which is one of the classification methods in data mining. The data used was 262 data, split randomly with a composition of training and testing data of 80:20. Data is processed using the data mining process by creating decision trees. The decision tree results using the C4.5 Algorithm show that the GPA value is the most influential attribute in predicting a student's graduation time. In addition, predictions based on the decision tree of the C4.5 Algorithm with criterion = 'gini' and max_depth = 5 showed an accuracy result of 77%.
Minimalist DCT-based Depthwise Separable Convolutional Neural Network Approach for Tangut Script
Agi Prasetiadi;
Julian Saputra;
Imada Ramadhanti;
Asti Dwi Sripamuji;
Risa Riski Amalia
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 3 No 2 (2023): August
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v3i2.1106
The Tangut script, a lesser-explored dead script comprising numerous characters, has received limited attention in deep learning research, particularly in the field of optical character recognition (OCR). Existing OCR studies primarily focus on widely-used characters like Chinese characters and employ deep convolutional neural networks (CNNs) or combinations with recurrent neural networks (RNNs) to enhance accuracy in character recognition. In contrast, this study takes a counterintuitive approach to develop an OCR model specifically for the Tangut script. We utilize shorter layers with slimmer filters using a depthwise separable convolutional neural network (DSCNN) architecture. Furthermore, we preprocess the dataset using a frequency-based transformation, namely the Discrete Cosine Transform (DCT). The results demonstrate successful training of the model, showcasing faster convergence and higher accuracy compared to traditional deep neural networks commonly used in OCR applications.
Classification of Drug Types using Decision Tree Algorithm
Alissiyah Putri;
Dani Azka Faz;
Felis Tigris Hafizhulloh
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 3 No 2 (2023): August
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v3i2.1203
The accurate classification of drugs plays a crucial role in various areas of pharmaceutical research and development. In recent years, machine learning techniques have emerged as powerful tools for drug classification tasks. This paper presents a study on drug classification using machine learning techniques implemented in Python. The objective of this research is to explore the effectiveness of different machine learning algorithms in accurately classifying drugs based on their molecular properties and characteristics. The dataset used in this study consists of a diverse collection of drug compounds with annotated class labels. Several popular machine learning algorithms, including decision trees are implemented and evaluated using Python's extensive libraries such as scikit-learn. The dataset is pre-processed to handle missing values, normalize features, and reduce dimensionality using appropriate techniques. Experimental results demonstrate the performance of each algorithm in terms of accuracy, precision, recall, and F1-score. The findings of this study highlight the potential of machine learning techniques in accurately classifying drugs and provide valuable insights into the selection and optimization of algorithms for drug classification tasks. The Python implementation serves as a practical guide for researchers and practitioners interested in applying machine learning for drug classification purposes.
Classification of Sleep Disorders Using Random Forest on Sleep Health and Lifestyle Dataset
Idfian Azhar Hidayat
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 3 No 2 (2023): August
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v3i2.1215
This study aims to classify sleep disorders using Random Forest method on the Sleep Health and Lifestyledataset. This dataset contains information about sleep, lifestyle, and relevant health factors. In this study, thedataset was processed and divided into training and testing subsets. The Random Forest model was trained usingthe training subset with sleep and health-related features. The split quality in each decision tree wasmeasured using the Gini Index. The model was evaluated using the testing subset to measure its accuracy andclassification performance. The evaluation results showed that the Random Forest model could accurately predict sleep disorders. Analysis of class distributions, correlation relationships between features,and visualization by gender provided insights into the factors that influence sleep disorders. This research can potentially contribute to the field of health and medicine, especially in the recognition and diagnosis of sleepdisorders.
Design Of A Decision Support System For The Graduation Of New Student Candidates Based On MVC
Fivy Nur Safitri;
Daniel Yeri Kristiyanto
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 4 No 1 (2024): February
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v4i1.1341
The selection process for new students in the field of education is crucial and warrants careful consideration. IT Telkom Purwokerto has a dedicated division, the Admission Unit, responsible for the selection of new students. However, this process often encounters errors, such as miscalculations in the average scores of three subjects, discrepancies between new student data and graduation guideline data, and prolonged simulation processes for graduation. This study proposes a solution to these issues through the implementation of an MVC-based Decision Support System (DSS) for determining the eligibility of new student admissions. The Prototype methodology was chosen to develop an MVC-based system as a resolution to these issues. The criteria used in this research to determine new student admissions involve various factors, including the chosen high school major, interest in the offered majors, average mathematics scores, and the average scores of three main subjects: mathematics, Bahasa Indonesia, and English. The outcomes of this research include the development of an MVC-based decision support system that aims to determine the admission status of new students. It is anticipated that the implementation of this decision support system based MVC will not only aid relevant personnel in the admission decision process but also mitigate potential issues that may arise. The research contributes to the enhancement of the efficiency and accuracy of the new student selection process at IT Telkom Purwokerto.
Random Forest Machine Learning for Spam Email Classification
Rizky Ageng;
Rafdhani Faisal;
Solahuddin Ihsan
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 4 No 1 (2024): February
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v4i1.1363
This research discusses the crucial role of email as a main element in digital communication, facilitating information transfer and serving as an advertising platform. However, the problem of email spam, which involves sending unsolicited commercial messages, has had negative impacts such as consuming large amounts of resources and disrupting user experience. With its affordable cost and ease of sending messages to thousands of recipients, email spam includes product promotions, pornographic material, viruses and irrelevant content. The impact includes loss of time and damage to the user's computer resources. To address this problem, email services provide advanced spam filters that use email content analysis and machine learning techniques. This research focuses on the use of the Random Forest Classification algorithm as a basis for filtering spam emails. Although Random Forest is known to have strong classification capabilities, the risk of overfitting is a challenge. Therefore, this study adopts the Randomized Search CV method to identify the best parameter combination, ensuring the reliability of the model in dealing with the complexity of diverse email datasets. With this approach, this research contributes to the development of effective solutions to reduce the impact of email spam in digital communications.
Prediction of Obesity Classification Using K-Means Clustering
Aditya Wildan;
Helmy Akmal Burhansyah;
Choki Ferdiansyah
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 4 No 1 (2024): February
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v4i1.1366
This paper aims to determine the difference between someone who is obese and who is not and classify the level of obesity by utilizing the K-Means clustering algorithm to group them. The move was taken as part of obesity prevention efforts, with the hope that a deeper understanding of the distribution of obesity within specific categories could help design more specific and effective interventions. Using this approach, it is hoped that this study can contribute to our understanding of the complexities of obesity and encourage more precise and targeted preventive measures. In this study we used datasets from Kaggle. It is used to classify the difference between underweight and overweight people. In this study, data was processed using Data Mining techniques with the K-Means method. Based on the classification, four clusters were categorized. Cluster 0 in this cluster only has women, with an age range ranging from 45 to 60 years. Relatively thin to normal weight. Cluster 1 only has men, with an age range of more than 40 years and 55 to 60 years. People in this cluster are overweight or obese. Cluster 2 women aged 15-70 years make up the majority in this group, with women aged 55-60 years as the highest proportion. In general, they have a normal weight. Many underweight individuals aged 10-45 years, with the highest proportion at the age of 20-25 years. The classification results show that men have a higher likelihood of suffering from obesity than women. Therefore, obesity prevention needs to be done, one of which is by applying a healthy lifestyle.
K-Means Clustering Algorithm: A Study on Unemployment Rates in Districts/Cities in Three Highest Provinces
Mohammad Dian Purnama;
Mutia Eva Mustafidah
Indonesian Journal of Data Science, IoT, Machine Learning and Informatics Vol 4 No 1 (2024): February
Publisher : Research Group of Data Engineering, Faculty of Informatics
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
DOI: 10.20895/dinda.v4i1.1419
Unemployment is a recurring issue every year, particularly in provinces with high unemployment rates, posing economic and social challenges. West Java, Riau Islands, and Banten are identified as the three provinces with the highest unemployment rates, exceeding 8% in the year 2022. Hence, this study aims to delve into the unemployment scenario in these provinces, considering various influencing factors drawn from relevant previous research. The primary objective of this research is to obtain the classification results of regencies/cities in West Java, Riau Islands, and Banten based on unemployment indicators. The findings reveal four clusters: Cluster 1 comprises 13 regencies/cities with the lowest unemployment rates, Cluster 2 includes 4 regencies/cities with low unemployment rates, Cluster 3 consists of 13 regencies/cities with moderate unemployment rates, and Cluster 4 encompasses 12 regencies/cities with high unemployment rates.