Claim Missing Document
Check
Articles

Handling Imbalanced Data in K-Nearest Neighbor Algorithm using Synthetic Minority Oversampling Technique-Nominal Continuous Anjani Anjani; Hayati, Memi Nor; Surya Prangga
International Journal of Engineering and Computer Science Applications (IJECSA) Vol. 4 No. 2 (2025): September 2025
Publisher : Universitas Bumigora Mataram-Lombok

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/ijecsa.v4i2.5142

Abstract

Classification is a part of data mining that aims to predict the class of data using a trained machine learning model. K-Nearest Neighbor (K-NN) is one of the classification methods that uses the concept of distance to the nearest neighbor in creating classification models. However, K-NN has limitations in handling imbalanced class distributions. This core problem can be addressed by applying a class balancing technique. One such technique is the Synthetic Minority Oversampling Technique for Nominal and Continuous (SMOTE-NC), which is suitable for datasets containing both nominal and continuous variables. The aim of this research is to classify Honda motorcycle loan customer data at Company Z using the K-NN method combined with SMOTE-NC to address data imbalance. This research method is experimental, using a 10-fold cross-validation approach to partition training and testing data. The input variables include gender, occupation, length of installment, income, installment amount, motorcycle price, and down payment, while the output variable is payment status (current or non-current). The results of this research are: the optimal K value for classification using K-NN with SMOTE-NC is K = 1, with an average APER (Average Probability of Error Rate) of 0.143. The best result is found in subset 8 with an APER value of 0.033. In this subset, out of 61 data points, 34 current-status customers are correctly classified as current, and 25 non-current-status customers are correctly classified as non-current, with only one misclassification in each class. The conclusion of this study is that the combination of SMOTE-NC and K-NN (K=1) provides high classification accuracy for imbalanced data, and can be effectively used to support credit risk assessment in motorcycle financing.  
Pelatihan Penulisan Karya Tulis Ilmiah Untuk Mendorong Peningkatan Kualitas Siswa Tingkat SMA Purnamasari, Ika; Hayati, Memi Nor; Yuniarti, Desi
Aksiologiya: Jurnal Pengabdian Kepada Masyarakat Vol 4 No 2 (2020): Agustus
Publisher : Universitas Muhammadiyah Surabaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30651/aks.v4i2.3565

Abstract

Karya tulis ilmiah (KTI) merupakan karya ilmiah yang ditulis dengan mengikuti kaidah ilmiah. Kaidah ilmiah sebagai syarat utama dalam penulisan sebuah karya dimaksudkan agar karya yang dihasilkan dapat dipertanggung jawabkan secara ilmiah. Tujuan kegiatan pelatihan penulisan karya tulis ilmiah yaitu menumbuhkan minat, semangat, serta ide kreatif dan inovatif dari siswa-siswi kelas X dan XI SMAN 5 Samarinda untuk menghasilkan sebuah karya ilmiah yang sesuai dengan kaidah penulisan. Berdasarkan hasil pelaksanaan kegiatan pelatihan dapat disimpulkan bahwa kegiatan berjalan dengan baik dan mendapat dukungan penuh dari pihak sekolah. Seluruh peserta pelatihan mengikuti kegiatan hingga akhir dengan tingkat kehadiran sebesar 100%. Peserta kegiatan antusias untuk bertanya, mengeksplorasi ide, serta mengemukakan pendapat. Dengan demikian, kedepannya diharapkan adanya kegiatan lanjutan dengan melibatkan guru pendamping untuk mengoptimalkan perannya dalam penyusunan karya tulis ilmiah bagi peserta didik.Kata Kunci: kaidah ilmiah; KTI; peserta didik. Training on Writing Scientific Papers to Encourage Quality Improvement of High School Level Students ABSTRACT The scientific paper is an essay written by following scientific rules that are the main requirement so that the resulting essay can be justified scientifically. The purpose of the training is to increase the interest, enthusiasm, creative, and innovative ideas from students of class X and XI of SMAN 5 Samarinda to create a scientific paper that is following the rules. Based on the implementation of the training, it can be concluded that it is run well and received support from the school. All participants follow this activity until the end with an attendance rate of 100%. They are enthusiastic to ask, explore, and express their ideas and opinions. Then, in the future, it is expected that there will be further activities involving the teachers to optimization the role of assistants to create their student’s scientific papers.Keywords: scientific paper; scientific rules; students.
Evaluating Different K Values in K-Fold Cross Validation for Binary Logistic Regression to Classify Poverty Sinaga, Julia Oriana; Fathurahman, M.; Wahyuningsih, Sri; Hayati, Memi Nor
Jurnal Varian Vol. 8 No. 2 (2025)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/varian.v8i2.4403

Abstract

Data mining is essential for decision-makers to analyze and extract insights from data efficiently. Classification is one of the data mining techniques used to organize data based on its features, helping to identify patterns and make predictions. This study evaluates Binary Logistic Regression (BLR), a type of generalized linear model that suitable for binary outcomes, for classifying poverty depth across Indonesian regencies/cities in 2022, with a focus on the impact of different K values in K-Fold Cross Validation. The dataset includes 514 regencies/cities, with the Poverty Depth Index as the target variable, categorized into high (1) and low (0) levels, using 11 predictor variables. K-Fold Cross Validation was performed with K values of 3, 5, and 10, using accuracy and Area Under Curve (AUC) as evaluation metrics. The mean accuracy values for BLR are 75.7% for K=3, 74.3% for K=5, and 75.1% for K=10. Results show that K=3 offers the highest accuracy in classifying poverty depth in Indonesia, with the lowest standard deviation of 0.03. However, K=10 demonstrates superior discriminative ability in BLR, reflected by a higher AUC value. This study highlights the significant influence of K values in K-Fold Cross Validation on BLR performance.
ANALISIS CREDIT SCORING TERHADAP STATUS PEMBAYARAN BARANG ELEKTRONIK DAN FURNITURE MENGGUNAKAN BOOTSTRAP AGGREGATING K-NEAREST NEIGHBOR Astuti, Putri Sri; Hayati, Memi Nor; Goejantoro, Rito
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 15 No 4 (2021): BAREKENG: Jurnal Ilmu Matematika dan Terapan
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (556.631 KB) | DOI: 10.30598/barekengvol15iss4pp735-744

Abstract

Classification is the process of grouping objects that have the same characteristics into several categories. This study applies a combination of classification algorithms, namely Bootstrap Aggregating K-Nearest Neighbor in credit scoring analysis. The aim is to classify the credit payment status of electronic goods and furniture at PT KB Finansia Multi Finance in 2020 and determine the level of accuracy produced. Credit payment status is grouped into 2 categories, namely smoothly and not smoothly. There are 7 independent variables that are used to describe the characteristics of the debtor, namely age, number of dependents, length of stay, years of service, income, amount of payment, and payment period. The application of the classification algorithm at the credit scoring analysis is expected to assist creditors in making decisions to accept or reject credit applications from prospective debtors. The results showed that the accuracy obtained from the Bootstrap Aggregating K-Nearest Neighbor algorithm with a proportion of 90:10, m=80%, C=73, and K=5 was the best, which was 92.308%.
IMPLEMENTATION OF THE FUZZY GUSTAFSON-KESSEL METHOD ON GROUPING DISTRICTS/CITIES IN KALIMANTAN ISLAND BASED ON POVERTY ISSUES FACTORS Paradilla, Yunda Sasha; Hayati, Memi Nor; Sifriyani, Sifriyani
BAREKENG: Jurnal Ilmu Matematika dan Terapan Vol 17 No 1 (2023): BAREKENG: Journal of Mathematics and Its Applications
Publisher : PATTIMURA UNIVERSITY

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (364.419 KB) | DOI: 10.30598/barekengvol17iss1pp0125-0134

Abstract

Cluster analysis is an analysis that is useful in summarizing data by grouping objects based on certain similarity characteristics. One of the group analysis is Fuzzy Gustafson-Kessel (FGK) which is the development of the Fuzzy C-Means (FCM) method. The FGK method has a good way in adjusting the form of cluster membership function correctly for a data. This study aims to determine the results of the optimal number of groups based on the Partition Coefficient (PC) and Classification Entropy (CE) validity indexes and to find out the results of grouping 56 districts/cities on the island of Kalimantan based on poverty issue factors in 2021. The optimal number of groups using the FGK method based on the validity indexes of PC and CE are two groups. The first group and the second group each consist of 28 districts/cities in Kalimantan Island.
PENERAPAN SPATIAL DURBIN MODEL PADA DATA PENYAKIT MALARIA DI INDONESIA Nabilla, Maghrisa Ayu; Hayati, Memi Nor; Sifriyani, Sifriyani
Jurnal Teknologi Informasi: Jurnal Keilmuan dan Aplikasi Bidang Teknik Informatika Vol. 19 No. 2 (2025): Jurnal Teknologi Informasi : Jurnal Keilmuan dan Aplikasi Bidang Teknik Inform
Publisher : Universitas Palangka Raya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47111/jti.v19i2.20334

Abstract

The Spatial Durbin Model (SDM) is a special case of the Spatial Autoregressive (SAR) model, involving the addition of spatial lag effects of both the dependent and independent variables. The parameter estimation used in this study is the maximum likelihood estimator. Parameter estimation for the SDM is performed at each observation location using spatial weighting. The spatial weights are calculated based on queen contiguity and customized contiguity weighting methods. This study aims to obtain the SDM and identify the factors influencing the number of malaria cases in Indonesia in 2023. The Lagrange Multiplier (LM) test indicates that there is a spatial lag in the dependent variable, with the parameter ρ being significant at a significance level of α = 0.1. Based on the results of the SDM analysis, it was found that the factors directly influencing the number of malaria cases in Indonesia in 2023 are the percentage of poor population, number of medical personnel and the percentage of households with access to adequate drinking water services. Meanwhile, the factors that have an indirect or spatial lag effect are the open unemployment rate and the percentage of poor population.
Pendampingan Desain Infografis dengan Statistika dan Sains Data Bagi Siswa/Siswi MAN 1 Kota Samarinda Muhammad Fathurahman; Dani, Andrea Tri Rian; Fauziyah, Meirinda; Darnah; Goenjatoro, Rito; Hayati, Memi Nor; Prangga, Surya; Siringoringo, Meiliyani; Oroh, Chiko Zet
Journal of Research Applications in Community Service Vol. 4 No. 3 (2025): Journal of Research Applications in Community Service
Publisher : Universitas Nahdlatul Ulama Sunan Giri Bojonegoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32665/jarcoms.v4i3.5158

Abstract

Kegiatan pengabdian masyarakat ini bertujuan untuk memberikan pendampingan desain infografis yang mengintegrasikan ilmu statistika dan sains data serta meingkatkan literasi data bagi siswa dan siswi MAN 1 Kota Samarinda. Dalam era digital yang ditandai dengan kemudahan akses informasi, masih terdapat kekurangan pemahaman di kalangan siswa mengenai pemanfaatan teknologi, khususnya dalam desain infografis berbasis statistika dan sains data. Infografis merupakan alat yang efektif untuk menyajikan informasi secara visual yang membantu mempercepat pemahaman data kompleks menjadi lebih mudah dipahami. Aplikasi Canva dipilih sebagai platform dalam pendampingan ini karena kemudahan penggunaannya, yang memungkinkan siswa untuk berkreasi secara mandiri. Berdasarkan hasil tes awal, siswa belum memanfaatkan dengan optimal pengembangan ilmu data sains dalam pembuatan desain infografis. Oleh karena itu, kegiatan ini dirancang untuk memberikan pemahaman dan keterampilan praktis kepada peserta agar mereka dapat menggunakan teknologi visual dalam mengelola dan menyampaikan informasi berbasis data dengan lebih efektif dan inovatif. Melalui metode pengabdian ini, diharapkan terjadi peningkatan pemahaman dan keterampilan dalam penggunaan desain infografis serta pemanfaatan sains data literasi siswa yang dapat diterapkan dalam kegiatan belajar mengajar, terutama dalam pengolahan dan penyajian data statistik.
COMPARISON OF K-NEAREST NEIGHBOR AND NAÏVE BAYES CLASSIFICATION METHODS FOR STATUS OF TODDLER NUTRITION DATA AT BAQA SAMARINDA SEBERANG COMMUNITY HEALTH CENTER Annabaa Aulia, Muzizah; Goejantoro, Rito; Hayati, Memi Nor
Jurnal Statistika Universitas Muhammadiyah Semarang Vol 13, No 1 (2025): Jurnal Statistika Universitas Muhammadiyah Semarang
Publisher : Department Statistics, Faculty Mathematics and Natural Science, UNIMUS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26714/jsunimus.13.1.2025.1-13

Abstract

Classification is a job of assessing data objects to put them into a certain class from a number of available classes. The naïve Bayes method is a statistical classification that can be used to estimate the probability of membership in a class. Meanwhile, the K-Nearest Neighbor (K-NN) method is a supervised method used for classification. The aim of this research is to obtain classification results of the nutritional status of toddlers at the Baqa Samarinda Seberang Community Health Center in 2022 using the naïve Bayes algorithm and the K-NN algorithm. Based on the calculation results for classification of the nutritional status of toddlers at the Baqa Samarinda Seberang Community Health Center using accuracy calculations and confusion matrices, the highest accuracy was obtained using the naïve Bayes method of 82.15% and a Press's Q value of 168 with a training data proportion of 90%: testing data of 10%. Meanwhile, the results of accuracy calculations and the confusion matrix obtained the highest accuracy in the K-NN method of 90.57% at values 3-NN, 5-NN, 7-NN, 9-NN and Press's Q value of 187.65 with a training data proportion of 90% and testing data 10%. From the results of this analysis, it was concluded that the K-NN method worked better than the naïve Bayes method in classifying the nutritional status of toddlers at the Baqa Samarinda Seberang Community Health Center.
PENGELOMPOKAN PROVINSI DI INDONESIA BERDASARKAN DATA JUMLAH KEJADIAN DAN DAMPAK BENCANA BANJIR MENGGUNAKAN METODE FUZZY C-MEANS Hayati, Memi Nor; Goejantoro, Rito; Siringoringo, Meiliyani; Purnamasari , Ika; Yuniarti, Desi; Nida, Khairun; Messakh, Gerald Claudio
VARIANSI: Journal of Statistics and Its application on Teaching and Research Vol. 6 No. 01 (2024)
Publisher : Program Studi Statistika Fakultas MIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/variansiunm167

Abstract

Cluster analysis is a technique used to find groups of similar data objects. The Fuzzy C-Means (FCM) method is a data grouping method where the existence of each data in a cluster is determined by the degree of membership. This study aims to determine the optimal number of clusters based on the Modified Partition Coefficient (MPC) validity index and to determine the optimal grouping results of 34 provinces in Indonesia based on data on the number of events and the impact of floods in 2017-2021. The optimal number of clusters using the FCM method is based on MPC value consists of 2 clusters, namely the first cluster consisting of 27 provinces in Indonesia and the second cluster consisting of 7 provinces in Indonesia.
Peramalan Nilai Tukar Petani Kalimantan Timur Menggunakan Metode Neural Network Rahmah, Putri Aulia; Hayati, Memi Nor; Cahyaningsih, Ariyanti
Indonesian Journal of Applied Statistics and Data Science Vol. 2 No. 1 (2025): Mei
Publisher : Universitas Mataram

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29303/ijasds.v2i1.5855

Abstract

The farmer exchange rate (NTP) is a significant indicator for measuring the purchasing power of Indonesian farmers, who are the main actors in the agricultural sector. This is because the agricultural sector is one of the main sectors in Indonesia, one of which is in East Kalimantan Province. This study aims to predict and forecast the NTP of East Kalimantan Province using the Neural Network (NN) method with the backpropagation algorithm. The data used is the NTP data of East Kalimantan Province for the period January 2020 to September 2024 obtained from the BPS of East Kalimantan Province. This study tested 5 NN architecture models with different numbers of layers in the hidden layer, namely 1, 2, 3, 4, and 5 layers in the hidden layer. The study was conducted using 1 input variable, a learning rate of 0.01, a maximum of 10,000 iterations, and a threshold of 0.5. Based on the training process that has been carried out, it was concluded that the best NN architecture that can be used to forecast the NTP of East Kalimantan Province is NN with 5 layers in the hidden layer with a MAPE of 2.087%.
Co-Authors - Purhadi Alifta Ainurrochmah Anak Agung Gede Sugianthara Andi M. Ade Satriya Anjani Anjani Annabaa Aulia, Muzizah Asnita, Asnita Astuti, Putri Sri Cahyaningsih, Ariyanti Candra Dewi, Ni Luh Ayu Casuarina, Indah Putri Damayanti, Elok Dani, Andrea Tri Rian Darnah Deviyana Nurmin Dewi, Isma Fatma wati Fauzia, Rina Fauziyah, Meirinda Fidia Deny Tisna Amijaya Goenjatoro, Rito Hadisti, Zahrah Dhafina Hadistii, Zahrah Dhafiinia Hidayatullah, Aji Syarif Ibrahim, Rizky Nur Ika Purnamasari Ika Purnamasari Ika Puspita, Ika Julnita Bidangan Karima, Nabila Al Khasanah, Lisa Dwi Nurul Krisna Rendi Awalludin Lestari, Nur Aini Ayu Lupinda, Indah Cahyani M. Fathurahman Mahmuda, Siti Marsandy, Aldwin Falah Hasan Meiliyani Siringoringo Messakh, Gerald Claudio Mochammad Imron Awalludin Nabilla, Maghrisa Ayu Nana Nirwana Nanda Arista Rizki Nida, Khairun Ningsih, Eva Lestari Nohe, Darnah Andi Nur Annisa Fitri Nur Azizah Nurmin, Deviyana Oroh, Chiko Zet Paradilla, Yunda Sasha Pratiwi, Reni Purhadi - Putri Ayu Dwi Lestari, Putri Ayu Dwi Putri, Nurlia Sucianti Rahmah, Putri Aulia Rahmaulidyah, Fatihah Noor Ramadani, Kartika Rito Goejantoro, Rito Safitri, Ranita Nur Sari, Devi Nur Endah Sa’diyah, Lita Vindiyatus Sembiring, Rinawati Sifriyani, Sifriyani Sinaga, Julia Oriana Siringoringo, Meiliyani Soraya, Raihana Sri Wahyuningsih Suerni, Widya - Surya Prangga Suyitno Suyitno Suyitno Suyitno Suyitno Suyono, Ari Krisna Syamsiar, Syamsiar Syaripuddin Syaripuddin Utami, Riska Putri Wahyuni, Nanda Anggun Yuki Novia Nasution, Yuki Novia Yuniarti, Desi