cover
Contact Name
Aji Prasetya Wibawa
Contact Email
keds.journal@um.ac.id
Phone
+62818539333
Journal Mail Official
keds.journal@um.ac.id
Editorial Address
Semarang St. No 5, Malang, Indonesia
Location
Kota malang,
Jawa timur
INDONESIA
Knowledge Engineering and Data Science
ISSN : -     EISSN : 25974637     DOI : https://doi.org/10.17977
Knowledge Engineering and Data Science (2597-4637), KEDS, brings together researchers, industry practitioners, and potential users, to promote collaborations, exchange ideas and practices, discuss new opportunities, and investigate analytics frameworks on data-driven and knowledge base systems.
Articles 98 Documents
Evidence of Students’ Academic Performance at the Federal College of Education Asaba Nigeria: Mining Education Data Ojugoa, Arnold Adimabua; Odiakaose, Christopher Chukwufunaya; Emordi, Frances; Ako, Rita Erhovwo; Adigwe, Winifred; Anazia, Kizito Eluemonor; Geteloma, Victor
Knowledge Engineering and Data Science Vol 6, No 2 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i22023p145-156

Abstract

One main objective of higher education is to provide quality education to its students. One way to achieve the highest level of quality in the higher education system is by discovering knowledge for prediction regarding enrolment of students in a particular course, alienation of traditional classroom teaching model, detection of unfair means used in online examination, detection of abnormal values in the result sheets of the students, and prediction about students’ performance. The knowledge is hidden among the educational data set and is extractable through data mining techniques. The present paper is designed to justify the capabilities of data mining techniques in the context of higher education by offering a data mining model for the higher education system in the university. In this research, the classification task is used to evaluate student’s performance, and as many approaches are used for data classification, the decision tree method is used here. By this, we extract data that describes students’ summative performance at semester’s end, helps to identify the dropouts and students who need special attention, and allows the teacher to provide appropriate advising/counseling.
Deep Learning for Multi-Structured Javanese Gamelan Note Generator Arik Kurniawati; Eko Mulyanto Yuniarno; Yoyon Kusnendar Suprapto
Knowledge Engineering and Data Science Vol 6, No 1 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i12023p41-56

Abstract

Javanese gamelan, a traditional Indonesian musical style, has several song structures called gendhing. Gendhing (songs) are written in conventional notation and require gamelan musicians to recognize patterns in the structure of each song. Usually, previous research on gendhing focuses on artistic and ethnomusicological perspectives, but this study is to explore the correlation between gendhing as traditional music in Indonesia and deep learning technology that replaces the task of gamelan composers. This research proposes CNN-LSTM to generate notation of ricikan struktural instruments as an accompaniment to Javanese gamelan music compositions based on balungan notation, rhythm, song structure, and gatra information. This proposed method (CNN-LSTM) is compared with LSTM and CNN. The musical data in this study is represented using numerical notation for the main melody in balungan notation. The experimental results showed that the CNN-LSTM model showed better performance compared to the LSTM and CNN models, with accuracy values of 91.9%, 91.5%, and 91.2% for CNN-LSTM, LSTM, and CNN, respectively. And the value of note distance for the Sampak song structure is 4 for the CNN-LSTM model, 8 for the LSTM model, and 12 for the CNN model. The smaller the note distance, the closer it is to the original notation provided by the gamelan composer. This study provides relevance for novice gamelan musicians who are interested in learning karawitan, especially in understanding ricikan struktural music notation and gamelan art in composing musical compositions of a song.
Optimizing Random Forest Algorithm to Classify Player's Memorisation via In-game Data Akmal Vrisna Alzuhdi; Harits Ar Rosyid; Mohammad Yasser Chuttur; Shah Nazir
Knowledge Engineering and Data Science Vol 6, No 1 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i12023p103-113

Abstract

Assessment of a player's knowledge in game education has been around for some time. Traditional evaluation in and around a gaming session may disrupt the players' immersion. This research uses an optimized Random Forest to construct a non-invasive prediction of a game education player's Memorization via in-game data. Firstly, we obtained the dataset from a 3-month survey to record in-game data of 50 players who play 4-15 game stages of the Chem Fight (a test case game). Next, we generated three variants of datasets via the preprocessing stages: resampling method (SMOTE), normalization (min-max), and a combination of resampling and normalization. Then, we trained and optimized three Random Forest (RF) classifiers to predict the player's Memorization. We chose RF because it can generalize well given the high-dimensional dataset. We used RF as the classifier, subject to optimization using its hyperparameter: n_estimators. We implemented a Grid Search Cross Validation (GSCV) method to identify the best value of  n_estimators. We utilized the statistics of GSCV results to reduce the weight of  n_estimators by observing the region of interest shown by the graphs of performances of the classifiers. Overall, the classifiers fitted using the BEST n_estimators (i.e., 89, 31, 89, and 196 trees) from GSCV performed well with around 80% accuracy. Moreover, we successfully identified the smaller number of n_estimators (OPTIMAL), at least halved the BEST  n_estimators. All classifiers were retrained using the OPTIMAL  n_estimators (37, 12, 37, and 41 trees). We found out that the performances of the classifiers were relatively steady at ~80%. This means that we successfully optimized the Random Forest in predicting a player's Memorization when playing the Chem Fight game. An automated technique presented in this paper can monitor student interactions and evaluate their abilities based on in-game data. As such, it can offer objective data about the skills used.
Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts Gunawan Gunawan; Fitria Fitria; Esther Irawati Setiawan; Kimiya Fujisawa
Knowledge Engineering and Data Science Vol 6, No 1 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i12023p57-68

Abstract

Automatic summarization is reducing a text document with a computer program to create a summary that retains the essential parts of the original document. Automatic summarization is necessary to deal with information overload, and the amount of data is increasing. A summary is needed to get the contents of the article briefly. A summary is an effective way to present extended information in a concise form of the main contents of an article, and the aim is to tell the reader the essence of a central idea. The simple concept of a summary is to take an essential part of the entire contents of the article. Which then presents it back in summary form. The steps in this research will start with the user selecting or searching for text documents that will be summarized with keywords in the abstract as a query. The proposed approach performs text preprocessing for documents: sentence breaking, case folding, word tokenizing, filtering, and stemming. The results of the preprocessed text are weighted by term frequency-inverse document frequency (tf-idf), then weighted for query relevance using the vector space model and sentence similarity using cosine similarity. The next stage is maximum marginal relevance for sentence extraction. The proposed approach provides comprehensive summarization compared with another approach. The test results are compared with manual summaries, which produce an average precision of 88%, recall of 61%, and f-measure of 70%.
Multivariate Analysis Approach to Factor-Affected Tuberculosis Disease Gultom, Zuli Agustina; Siregar, Farid Akbar; Tanjung, Mahardika Abdi Prawira; Hazidar, Al-Hamidy
Knowledge Engineering and Data Science Vol 6, No 2 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i22023p114-128

Abstract

Tuberculosis is a disease caused by infection with the mycobacterium tuberculosis complex. Tuberculosis attack organ besides the lung, such as the pleura, lining of the brain, lining of the heart, lymph gland, bones, joint, skin, intestines, kidney, urinary tract, and genital. This disease is found in densely populated settlements with poor sanitation, lack of ventilation and sunlight and lack of rest. Moreover, the factors that will be analyzed in this research are Population Density (X1), Number of HIV/AIDS (X2), number of toddlers who experience nutrition (X3), Number of toddlers who experience BCG immunization (X4), number of toddlers who get exclusive breastfeeding (X5), Total families with PHBS (X6), number of residents with healthy homes (X7), number of families with clean water facilities (X8), number of families with ownership of latrine sanitation (X9), number of families with have landfills (X10), number of families have management waste place (X11), number of elementary education facilities (X12), Number of junior school education facilities (X13), Number of senior school education facilities (X14), Number of institutions fostered by neighborhood health (X15), Number of Posyandu (X16), Number Life Expectancy (X17), Literacy Rate (X18), Human Development Index (X19), Number of Tuberculosis sufferers (X20). This research aims to analyze what variables influence each other on the prevalence rate of tuberculosis in the city of Surabaya. The method used in this research is a multivariate analysis using factor analysis, cluster analysis, biplot analysis and discriminant analysis. This discriminant analysis determines accuracy by calculating the value (1-APER). The resulting research the Number of HIV/AIDS, number of residents with healthy homes, and Number of families with ownership of Sanitation (latrine, landfills, waste management) have a high correlation with the spread of tuberculosis in Surabaya. Meanwhile, areas with a high rate of tuberculosis are Tambaksari, Wonokromo, Sawahan, and Semampir.  The classification analysis accuracy level was 90.32% and the accuracy of the resulting model or discriminant function was very high. So that discriminant analysis can be used for predicting the accuracy of tuberculosis prevalence rates.
K-Means Clustering and Multilayer Perceptron for Categorizing Student Business Groups Miftahul Walid; Norfiah Lailatin Nispi Sahbaniya; Hozairi Hozairi; Fajar Baskoro; Arya Yudhi Wijaya
Knowledge Engineering and Data Science Vol 6, No 1 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i12023p69-78

Abstract

The research conducted in this study was driven by the East Java provincial government's requirement to assess the transaction levels of the Student Business Group (KUS) in the SMA Double Track program. These transaction levels are a basis for allocating supplementary financial aid to each business group. The system's primary objective is to assist the provincial government of East Java in making well-informed choices pertaining to the distribution of supplementary capital to the KUS. The classification technique employed in this study is the multilayer perceptron. However, the K-Means Clustering method is utilised to generate target data due to the limited availability during the classification process, which involves dividing the transaction level attributes into three distinct groups: (0) low transactions, (1) medium transactions, and (2) high transactions. The clustering process encompasses three distinct features: (1) income, (2) spending, and (3) profit. These three traits will be utilized as input data throughout the categorization procedure. The classification procedure employing the Multilayer Perceptron technique involved processing a dataset including 1383 data points. The training data constituted 80% of the dataset, while the remaining 20% was allocated for testing. In order to evaluate the efficacy of the constructed model, the training error was assessed using K-Fold cross-validation, yielding an average accuracy score of 0.92. In the present study, the categorization technique yielded an accuracy of 0.96. This model aims to classify scenarios when the dataset lacks prior target data.
Systematic Literature Review on Ontology-based Indonesian Question Answering System Admojo, Fadhila Tangguh; Lajis, Adidah; Nasir, Haidawati
Knowledge Engineering and Data Science Vol 6, No 2 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i22023p129-144

Abstract

Question-Answering (QA) systems at the intersection of natural language processing, information retrieval, and knowledge representation aim to provide efficient responses to natural language queries. These systems have seen extensive development in English and languages like Indonesian present unique challenges and opportunities. This literature review paper delves into the state of ontology-based Indonesian QA systems, highlighting critical challenges. The first challenge lies in sentence understanding, variations, and complexity. Most systems rely on syntactic analysis and struggle to grasp sentence semantics. Complex sentences, especially in Indonesian, pose difficulties in parsing, semantic interpretation, and knowledge extraction. Addressing these linguistic intricacies is pivotal for accurate responses. Secondly, template-based SPARQL query construction, commonly used in Indonesian QA systems, suffers from semantic gaps and inflexibility. Advanced techniques like semantic matching algorithms and dynamic template generation can bridge these gaps and adapt to evolving ontologies. Thirdly, lexical gaps and ambiguity hinder QA systems. Bridging vocabulary mismatches between user queries and ontology labels remains a challenge. Strategies like synonym expansion, word embedding, and ontology enrichment must be explored further to overcome these challenges. Lastly, the review discusses the potential of developing multi-domain ontologies to broaden the knowledge coverage of QA systems. While this presents complex linguistic and ontological challenges, it offers the advantage of responding to various user queries across various domains. This literature review identifies crucial challenges in developing ontology-based Indonesian QA systems and suggests innovative approaches to address these challenges.
Round-Robin Algorithm in Load Balancing for National Data Centers I Kadek Wahyu Sudiatmika; Gede Indrawan; Sariyasa Sariyasa
Knowledge Engineering and Data Science Vol 6, No 1 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i12023p79-91

Abstract

The Provincial Government of Bali assumes a crucial role in administering various public service applications to meet the requirements of its community, traditional villages, and regional apparatus. Nevertheless, the escalating magnitude of traffic and uneven distribution of requests have resulted in substantial server burdens, which may jeopardize the operation of applications and heighten the likelihood of downtime. Ensuring efficient load distribution is of utmost importance in tackling these difficulties, and the Round Robin algorithm is often utilized for this purpose. However, the current body of research has not extensively examined the distinct circumstances surrounding on-premise servers in the Bali Provincial Government. The primary objective of this study is to address the significant gap in knowledge by conducting a comprehensive evaluation of the Round Robin algorithm's effectiveness in load-balancing on-premise servers inside the Bali Provincial Government. The primary objective of our study is to assess the appropriateness of the algorithm within the given context, with the ultimate goal of providing practical and implementable suggestions. The observations above can optimize system efficiency and minimize periods of inactivity, thereby enhancing the provision of vital public services across Bali. This study provides essential insights for enhancing server infrastructure and load-balancing strategies through empirical evaluation and comprehensive analysis. Its findings are valuable for the Bali Provincial Government and serve as a reference for other organizations facing challenges managing server loads. This study signifies a notable advancement in establishing reliable and practical public service applications within Bali.
Recurrent Session Approach to Generative Association Rule based Recommendation Armanda, Tubagus Arief; Wardhani, Ire Puspa; Akhriza, Tubagus M.; Admira, Tubagus M. Adrie
Knowledge Engineering and Data Science Vol 6, No 2 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i22023p199-214

Abstract

This article introduces a generative association rule (AR)-based recommendation system (RS) using a recurrent neural network approach implemented when a user searches for an item in a browsing session. It is proposed to overcome the limitations of the traditional AR-based RS which implements query-based sessions that are not adaptive to input series, thus failing to generate recommendations.  The dataset used is accurate retail transaction data from online stores in Europe. The contribution of the proposed method is a next-item prediction model using LSTM, but what is trained to develop the model is an associative rule string, not a string of items in a purchase transaction. The proposed model predicts the next item generatively, while the traditional method discriminatively. As a result, for an array of items that the user has viewed in a browsing session, the model can always recommend the following items when traditional methods cannot.  In addition, the results of user-centered validation of several metrics show that although the level of accuracy (similarity) of recommended products and products seen by users is only 20%, other metrics reach above 70%, such as novelty, diversity, attractiveness and enjoyability.
Deep Learning Approaches with Optimum Alpha for Energy Usage Forecasting Wibawa, Aji Prasetya; Utama, Agung Bella Putra; Akbari, Ade Kurnia Ganesh; Fadhilla, Akhmad Fanny; Triono, Alfiansyah Putra Pertama; Paramarta, Andien Khansa’a Iffat; Setyaputri, Faradini Usha; Hernandez, Leonel
Knowledge Engineering and Data Science Vol 6, No 2 (2023)
Publisher : Universitas Negeri Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17977/um018v6i22023p170-187

Abstract

Energy use is an essential aspect of many human activities, from individual to industrial scale. However, increasing global energy demand and the challenges posed by environmental change make understanding energy use patterns crucial. Accurate predictions of future energy consumption can greatly influence decision-making, supply-demand stability and energy efficiency. Energy use data often exhibits time-series patterns, which creates complexity in forecasting. To address this complexity, this research utilizes Deep Learning (DL), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), and Gated Recurrent Unit (GRU) models. The main objective is to improve the accuracy of energy usage forecasting by optimizing the alpha value in exponential smoothing, thereby improving forecasting accuracy. The results showed that all DL methods experienced improved accuracy when using optimum alpha. LSTM has the most optimal MAPE, RMSE, and R2 values compared to other methods. This research promotes energy management, decision-making, and efficiency by providing an innovative framework for accurate forecasting of energy use, thus contributing to a sustainable and efficient energy system.

Page 8 of 10 | Total Record : 98