Claim Missing Document
Check
Articles

OPTIMALISASI VALIDITAS KLASTERISASI IPM MELALUI PENERAPAN VARIASI DISTANCE MEASURE PADA ALGORITMA K-MEANS++ Sipayung, Sardo Pardingotan; Efendi, Syahril
JOURNAL OF SCIENCE AND SOCIAL RESEARCH Vol 8, No 4 (2025): November 2025
Publisher : Smart Education

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.54314/jssr.v8i4.5420

Abstract

Abstract: The Human Development Index (HDI) is an important indicator for measuring the quality of regional development through the dimensions of health, education, and decent living standards. In North Sumatra Province, HDI achievements between districts/cities still show significant disparities, requiring a data-based analytical approach to map development patterns objectively. This study aims to optimize the validity of regional HDI clustering through the application of the K-Means++ algorithm with distance measure variations. This study uses a quantitative approach with an unsupervised learning method. The data analyzed includes HDI, Average Length of Schooling (ALS), and Adjusted Per Capita Expenditure sourced from the Central Statistics Agency. The research stages include data preprocessing and standardization, determining the optimal number of clusters using the Elbow method, applying the K-Means++ algorithm, and evaluating cluster quality using the Davies–Bouldin Index (DBI) and Purity Index. In addition, a comparison of clustering performance based on Euclidean, Manhattan, and Cosine distances was conducted. The results of the study show that the optimal number of clusters is three clusters representing high, medium, and low levels of human development. A DBI value of 0.60 and a Purity Index of 0.61 indicate good clustering quality. Euclidean and Manhattan distances produced the best performance compared to Cosine distance. Keyword: Human Development Index; K-Means++; Clustering; Distance Measure; Davies–Bouldin Index; Purity Index. Abstrak: Indeks Pembangunan Manusia (IPM) merupakan indikator penting untuk mengukur kualitas pembangunan wilayah melalui dimensi kesehatan, pendidikan, dan standar hidup layak. Di Provinsi Sumatera Utara, capaian IPM antar kabupaten/kota masih menunjukkan ketimpangan yang cukup signifikan, sehingga diperlukan pendekatan analitis berbasis data untuk memetakan pola pembangunan secara objektif. Penelitian ini bertujuan untuk mengoptimalkan validitas klasterisasi IPM wilayah melalui penerapan algoritma K-Means++ dengan variasi distance measure. Penelitian ini menggunakan pendekatan kuantitatif dengan metode unsupervised learning. Data yang dianalisis meliputi IPM, Rata Lama Sekolah (RLS), dan Pengeluaran per Kapita Disesuaikan yang bersumber dari Badan Pusat Statistik. Tahapan penelitian mencakup praproses dan standarisasi data, penentuan jumlah klaster optimal menggunakan metode Elbow, penerapan algoritma K-Means++, serta evaluasi kualitas klaster menggunakan Davies–Bouldin Index (DBI) dan Purity Index. Selain itu, dilakukan perbandingan kinerja klasterisasi berdasarkan Euclidean, Manhattan, dan Cosine distance. Hasil penelitian menunjukkan bahwa jumlah klaster optimal adalah tiga klaster yang merepresentasikan tingkat pembangunan manusia tinggi, menengah, dan rendah. Nilai DBI sebesar 0,60 dan Purity Index sebesar 0,61 menunjukkan kualitas klasterisasi yang baik. Euclidean dan Manhattan distance menghasilkan performa terbaik dibandingkan Cosine distance. Kata kunci: Indeks Pembangunan Manusia; K-Means++; Klasterisasi; Distance Measure; Davies–Bouldin Index; Purity Index.
ANALISIS PENGELOMPOKAN KARAKTERISTIK SISWA MENGGUNAKAN METODE K-MEANS DALAM PERSPEKTIF FILSAFAT SAINS KOMPUTER Sipayung, Sardo Pardingotan; Nasution, Mahyuddin K. M.
JOURNAL OF SCIENCE AND SOCIAL RESEARCH Vol 8, No 4 (2025): November 2025
Publisher : Smart Education

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.54314/jssr.v8i4.5153

Abstract

Abstract: The development of information technology in education is changing the way students construct and access knowledge, but differences in academic ability, motivation, discipline, and digital literacy often lead to learning disparities. This study grouped student characteristics using K Means Clustering and reviewed them from the perspective of computer science philosophy: ontology, epistemology, axiology, logic, and induction. The data from 120 students included academic scores, learning motivation, discipline, and digital literacy. After normalization, the number of clusters was determined using the Elbow and Silhouette methods, and the quality of the clusters was evaluated using the Davies–Bouldin Index. The findings indicate an optimal number of three clusters, with a Silhouette value of 0.466 and a DBI of 0.733, indicating fairly good and stable clustering. The three clusters describe: 1) highly motivated students with high digital literacy; 2) disciplined students with good academic performance but moderate digital skills; 3) low-motivated students with low digital literacy who require a personalized and empathetic learning approach. Ontologically, data is not just numbers, but the manifestation of students' digital existence in the modern learning space. Epistemologically, knowledge is formed inductively from students' interactions with technology and data. Axiologically, the clustering results support fairness in digital learning with an approach tailored to student characteristics. The dimensions of logic and induction show the clustering process as a scientific pattern of thinking from observation to meaningful rational generalization. The findings support a balance between algorithmic rationality and human values in digital education. Keyword: K-Means Clustering; Philosophy of Computer Science; Ontology, Epistemology; Axiology; Student Characteristics; Digital Learning. Abstrak: Perkembangan teknologi informasi di pendidikan mengubah cara siswa membangun dan mengakses pengetahuan, tetapi perbedaan kemampuan akademik, motivasi, kedisiplinan, dan literasi digital sering menimbulkan ketimpangan pembelajaran. Penelitian ini mengelompokkan karakteristik siswa dengan K Means Clustering dan meninjaunya melalui perspektif filsafat sains komputer: ontologi, epistemologi, aksiologi, logika, dan induksi. Data 120 siswa meliputi nilai akademik, motivasi belajar, kedisiplinan, dan literasi digital. Setelah normalisasi, jumlah klaster ditentukan lewat metode Elbow dan Silhouette, lalu kualitas klaster dievaluasi dengan Davies–Bouldin Index. Temuan menunjukkan jumlah klaster optimal tiga, dengan nilai Silhouette 0,466 dan DBI 0,733, mengindikasikan pengelompokan yang cukup baik dan stabil. Tiga klaster menggambarkan: 1) siswa bermotivasi dan berliterasi digital tinggi; 2) siswa disiplin dan berprestasi akademik baik, namun kemampuan digital sedang; 3) siswa bermotivasi dan literasi digital rendah yang memerlukan pendekatan pembelajaran personal dan empatik. Secara ontologis, data tidak sekadar angka, melainkan wujud eksistensi digital siswa dalam ruang belajar modern. Epistemologis, pengetahuan terbentuk secara induktif dari interaksi siswa dengan teknologi dan data. Aksiologis, hasil klasterisasi mendukung keadilan pembelajaran digital dengan pendekatan sesuai karakteristik siswa. Dimensi logika dan induksi menunjukkan proses klasterisasi sebagai pola berpikir ilmiah dari observasi menuju generalisasi rasional bermakna. Temuan mendukung keseimbangan antara rasionalitas algoritmik dan nilai kemanusiaan dalam pendidikan digital. Kata kunci: K-Means Clustering; Filsafat Sains Komputer; Ontologi, Epistemologi; Aksiologi; Karakteristik Siswa; Pembelajaran Digital.
Analisis Sentimen Masyarakat terhadap Budaya Batak di Twitter Menggunakan Metode Naive Bayes Situmorang, Cristina; Cristina Situmorang; Hutauruk, Amelia Sanna Maria; Sipayung, Sardo Pardingotan; Zebua, Wilfred Raimond
Jurnal Ilmu Komputer dan Teknik Informatika Vol. 2 No. 1 (2026): Januari 2026
Publisher : CV. Raskha Media Group

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.64803/juikti.v2i1.117

Abstract

Media sosial seperti Twitter telah menjadi ruang terbuka bagi masyarakat untuk menyampaikan opini dan pandangan mereka terhadap berbagai isu, termasuk kebudayaan lokal. Penelitian ini bertujuan untuk menganalisis sentimen masyarakat di Twitter terhadap budaya Batak dengan menggunakan algoritma Naive Bayes. Data dikumpulkan melalui teknik web crawling untuk mengambil cuitan yang mengandung kata kunci terkait budaya Batak. Tahapan penelitian meliputi praproses data teks, seperti pembersihan data, tokenisasi, penghapusan stopwords, dan normalisasi, sebelum dilakukan klasifikasi sentimen menggunakan algoritma Naive Bayes. Sentimen diklasifikasikan ke dalam tiga kategori utama, yaitu positif dan negatif. Hasil penelitian menunjukkan bahwa mayoritas opini masyarakat di Twitter bersifat positif, terutama terkait kekayaan adat istiadat, musik, dan kuliner khas Batak. Namun, terdapat pula sentimen negatif yang berkaitan dengan stereotip budaya dan kurangnya pelestarian. Penelitian ini memberikan gambaran umum terhadap persepsi masyarakat di media sosial dan dapat menjadi masukan bagi pelaku budaya dan pemerintah dalam melestarikan serta mempromosikan budaya Batak secara lebih efektif.
Flood And Landslide Severity Mapping In North Sumatra Using Random Forest Manurung, Evaldo; Sipayung, Sardo Pardingotan
Jurnal Sosial Teknologi Vol. 6 No. 2 (2026): Jurnal Sosial dan Teknologi
Publisher : CV. Green Publisher Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59188/jurnalsostech.v6i2.32697

Abstract

Floods and landslides are recurrent hydrometeorological hazards that cause significant environmental damage and socioeconomic losses in many regions of Indonesia, including North Sumatra. Complex topography, high rainfall intensity, land-use changes, and rapid urban development have increased the exposure and vulnerability of several districts to these disasters. This study aims to classify the severity of flood- and landslide-affected areas in North Sumatra using an integrated Geographic Information System (GIS) and Random Forest (RF) approach. The research was conducted using the CRISP-DM framework, which includes data collection, data preprocessing, feature weighting using the Analytical Hierarchy Process (AHP), model development with the RF algorithm, and spatial validation using historical disaster records. Five main conditioning factors were used as model inputs: rainfall, slope, land cover, soil type, and elevation. Hazard severity was classified into three categories: low, moderate, and severe. The results indicate that the RF model achieved strong predictive performance, with high precision, recall, F1-score, and an excellent ROC-AUC value, demonstrating the reliability of the proposed approach. Spatial analysis shows that Mandailing Natal, South Tapanuli, and Humbang Hasundutan are the most severely affected districts, mainly due to high rainfall, steep slopes, and land degradation. This study concludes that the GIS–RF framework provides an effective decision-support tool for regional disaster risk management and can support evidence-based planning for flood and landslide mitigation in North Sumatra.
Prediksi Risiko Penyakit Diabetes Menggunakan Algoritma K-Nearest Neighbor Sagala, Lauren Patricia; Sitanggang, Romualda; Sipayung, Sardo Pardingotan
Jurnal Pendidikan Tambusai Vol. 10 No. 1 (2026)
Publisher : LPPM Universitas Pahlawan Tuanku Tambusai, Riau, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31004/jptam.v10i1.36610

Abstract

Diabetes melitus merupakan salah satu penyakit tidak menular yang sangat umum, sehingga dapat menyebabkan banyak komplikasi serius jika tidak ditangani segera. Algoritma K-Nearest Neighbor (KNN) yang digunakan dalam penelitian ini digunakan untuk memprediksi risiko penyakit diabetes. Dataset yang digunakan terdiri dari sejumlah metrik medis, termasuk usia, kadar glukosa, tekanan darah, indeks massa tubuh (BMI), insulin, dan riwayat diabetes. Data ini diperoleh melalui Kaggle. Pengumpulan data, praproses, pembagian data latih dan uji, penerapan algoritma KNN, dan evaluasi kinerja model dengan akurasi dan confusion matrix adalah bagian dari penelitian. Hasil pengujian menunjukkan bahwa algoritma KNN dapat dengan akurat memprediksi risiko penyakit diabetes. Ini terutama berlaku untuk nilai k tertentu. Oleh karena itu, algoritma KNN dapat digunakan sebagai salah satu cara untuk membantu proses pengambilan keputusan dalam memprediksi kemungkinan terkena diabetes pada usia dini.
Prediksi Kelulusan Mahasiswa Menggunakan Logistic Regression dan Random Forest Berdasarkan Data Akademik Sitanggang, Armando Agasi; Lase, Marsindra Yanti; Sipayung, Sardo Pardingotan
Jurnal Pendidikan Tambusai Vol. 10 No. 1 (2026)
Publisher : LPPM Universitas Pahlawan Tuanku Tambusai, Riau, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31004/jptam.v10i1.36623

Abstract

Ketepatan waktu kelulusan mahasiswa merupakan indikator krusial bagi kualitas institusi pendidikan dan akreditasi program studi. Namun, pemantauan manual terhadap risiko keterlambatan lulus seringkali sulit dilakukan secara dini. Penelitian ini bertujuan untuk membandingkan performa algoritma Logistic regression dan Random Forest dalam memprediksi kelulusan mahasiswa berdasarkan data akademik semester awal. Dataset yang digunakan berjumlah 120 data mahasiswa, yang terdiri dari 100 data pelatihan dan 20 data pengujian, dengan parameter input berupa IPK semester 1-4, total SKS, serta nilai mata kuliah fundamental (Algoritma dan Basis Data). Eksperimen dilakukan menggunakan perangkat lunak Orange Data Mining. Hasil penelitian menunjukkan bahwa algoritma Logistic regression memiliki performa yang lebih unggul dengan nilai Classification Accuracy (CA) sebesar 0,750 (75%), dibandingkan algoritma Random Forest yang mencapai nilai CA sebesar 0,683 (68,3%). Temuan ini mengindikasikan bahwa pola kelulusan pada dataset akademik yang digunakan cenderung memiliki hubungan linear yang kuat. Kesimpulan dari penelitian ini adalah Logistic regression lebih efektif diimplementasikan sebagai instrumen sistem peringatan dini (early warning system) untuk mendeteksi mahasiswa yang berisiko lulus terlambat, sehingga pihak manajemen perguruan tinggi dapat memberikan intervensi akademik yang tepat sasaran.
Analisis Algoritma K-Means untuk Pengelompokan Tingkat Kemisikinan di Kota Medan Tambunan, Dwito Julian; Simbolon, Agustina; Marmata, Sri Ulina Br; Sipayung, Sardo Pardingotan
Jurnal Pendidikan Tambusai Vol. 10 No. 1 (2026)
Publisher : LPPM Universitas Pahlawan Tuanku Tambusai, Riau, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31004/jptam.v10i1.36715

Abstract

Kemiskinan merupakan permasalahan sosial yang bersifat multidimensi dan masih menjadi tantangan utama di wilayah perkotaan, termasuk Kota Medan. Penanganan kemiskinan memerlukan pendekatan berbasis data agar kebijakan yang diambil dapat tepat sasaran. Penelitian ini bertujuan untuk mengelompokkan tingkat kemiskinan di Kota Medan menggunakan algoritma K-Means berdasarkan indikator sosial ekonomi. Data yang digunakan merupakan data sekunder yang bersumber dari Badan Pusat Statistik (BPS) Kota Medan Tahun 2023 dengan unit analisis berupa kecamatan. Variabel yang digunakan meliputi rata-rata pendapatan, tingkat pengangguran, tingkat pendidikan, dan kondisi perumahan. Tahapan penelitian meliputi preprocessing data, penentuan jumlah cluster optimal menggunakan metode Elbow, serta proses clustering dengan algoritma K-Means. Hasil penelitian menunjukkan bahwa tingkat kemiskinan di Kota Medan dapat dikelompokkan ke dalam tiga cluster, yaitu kemiskinan tinggi, kemiskinan sedang, dan kemiskinan rendah. Pengelompokan ini diharapkan dapat menjadi dasar bagi pemerintah daerah dalam menentukan prioritas wilayah dan merumuskan kebijakan pengentasan kemiskinan yang lebih efektif dan tepat sasaran.
Public Sentiment Analysis of Free Nutritious Meal Program Discourse on Social Media X Using Support Vector Machine N-Gram Features Based Silaban, Daniel; Gracia Simatupang; Sardo Pardingotan Sipayung
Tech-E Vol. 9 No. 2 (2026): TECH-E (Technology Electronic)
Publisher : Fakultas Sains dan Teknologi-Universitas Buddhi Dharma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31253/te.v9i2.4355

Abstract

The Free Nutritious Meal Program is a government policy aimed at improving the nutritional quality of society and has generated diverse public responses on social media. This study aims to analyze public sentiment toward the Free Nutritious Meal Program on social media X using the Support Vector Machine (SVM) algorithm with N-Gram features and Term Frequency–Inverse Document Frequency (TF-IDF) weighting. The data were collected through a crawling process from social media X, resulting in 1,014 tweets. After data cleaning, 931 tweets were obtained and labeled into two sentiment classes, namely positive and negative. The research stages include text preprocessing, N-Gram feature extraction (unigram and bigram), classification using the SVM algorithm, and model evaluation using the 10-Fold Cross-Validation method with the assistance of the RapidMiner tool. The experimental results show that the SVM model achieved an accuracy of 79.59%. Although the precision value for the negative class is relatively high, the recall and F1-score remain relatively low due to the imbalance in data distribution. Overall, the results indicate that public sentiment toward the Free Nutritious Meal Program on social media X is dominated by positive sentiment. The findings of this study are expected to serve as an initial evaluation for the government in understanding public perceptions of the implementation of the program.
Penerapan Algoritma K-Means dalam Pengelompokan Indeks Harga Perdagangan Besar (IHPB)Produk Logam,Mesin, dan Perlengkapannya Tahun 2025 LumbanBatu, Vio Br; Napitupulu, Virzinia; Sipayung, Sardo Pardingotan
Jurnal Ilmu Komputer dan Informatika | E-ISSN : 3063-9026 Vol. 2 No. 3 (2026): Januari - Maret
Publisher : GLOBAL SCIENTS PUBLISHER

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

The Wholesale Price Index (WPI) is an important economic indicator used to measure price changes at the wholesale level. The metal, machinery, and equipment product group plays a strategic role in supporting the industrial and national development sectors. Price fluctuations in this product group need to be analyzed systematically to identify their movement patterns. This study aims to classify the Wholesale Price Index (WPI) of metal, machinery, and equipment products in 2025 using the K-Means clustering algorithm. The data used in this study consist of annual WPI values obtained from the official publications of Statistics Indonesia (BPS). The research stages include data collection, data preprocessing, data normalization using the Min-Max method, determination of the optimal number of clusters, application of the K-Means algorithm, and analysis of clustering results. The number of clusters used is K = 3, representing low, medium, and high price index groups. The results show that the K-Means algorithm is effective in grouping WPI data based on the similarity of price index values. The clustering results provide a clearer overview of price movement patterns and can be used to support economic analysis, price monitoring, and policy decision-making in the industrial sector.
Analisis Pola Pembelian Pada Platform E-Commerce Menggunakan Algoritma Apriori Untuk Mengidentifikasi Tren Produk 2025 Ritonga, Margan Rizkiano; Siringoringo, Maysya Faiftin; Sipayung, Sardo Pardingotan
Jurnal Ilmu Komputer dan Informatika | E-ISSN : 3063-9026 Vol. 2 No. 3 (2026): Januari - Maret
Publisher : GLOBAL SCIENTS PUBLISHER

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

The growth of E-commerce platforms had generated large volumes of transaction data that could be used to analyze consumer behavior and product trends. This study aimed to analyze purchasing patterns using the Apriori algorithm to identify product trends for 2025. The method involved data collection, data cleaning, transformation into itemsets, and extraction of association rules based on support, confidence, and lift values. The results indicated that several product combinations were frequently purchased together and showed strong relationships, reflecting consumer preferences. Products in the electronics, fashion, and household categories demonstrated increasing demand and were likely to become trends in 2025. These findings could support e-commerce managers in developing effective marketing strategies, optimizing inventory management, and improving data-driven product recommendation systems to enhance competitiveness and customer satisfaction.
Co-Authors Ade Linhar P Alex Rikki Andreas, Kevin Antonius Siagian, Novriadi Baehaqi Barus, Paskalia Br Batubara, Muhammad Iqbal Br Ginting, Anirma Kandida Cristina Situmorang Efendi, Syahril Fernando, Juniko Frans Steven Pakpahan Frans, Paulina Gorat Gaol, Sasmita Lumban Garingging, Cesia Trisani Saragih Ginting, Anirma Ginting, Anirma Kandinda Giovani, Aritonang Girsang, Jahanra Gracia Simatupang Gulo, Jelita Astrid Harianja, Andy Paul Hasugian , Paska Marto Hia, Hikmat Pengertian Hulu, Setiani Hutauruk, Amelia Sanna Maria Lahagu, Marlinus Lahagu, Nicolas Elsada Lase, Marsindra Yanti Limbeng, Yuni br Lubis, Maria Angelina Lumban Gaol, Fortina Lumbanbatu, Noperla Anjelisari LumbanBatu, Vio Br Maha, Yadi Limanta Mahyuddin K. M Nasution Manalu, Ester Manurung, Evaldo Manurung, Saut Maria Angelina Lubis Marmata, Sri Ulina Br Maruwahal Sijabat, Ramson Rikson Matondang, Zekson Aizona Meri Nova Marito Br Sipahutar Naibaho, Marcel Naibaho, Wirma Nainggolan, Kevin Marcho Napitupulu, Virzinia Nunes, Minaldinu Deyesus Panggabean, Jusnan Pasaribu, Adri Purba, Ade Purba, Jhonatan Purba, Marta Rahmawati Rajagukguk, Jonatan Carlos riang, rya Ricardo, Erich Ritonga, Margan Rizkiano Sagala, Lauren Patricia Sagala, Masdiana Saragih, Dea Ananda Sembiiring, Dia Alemisa br Sembiring, Boy Mountavani Sembiring, Brema Aprilta Sembiring, Dessianna Natalia Siagian, Novriadi Antonius Sianturi, Firman Torino SIBURIAN, MANANDA TURE Sihombing, Carlo Poda Boromeo Sihotang, Yuli Pitriani Br Silaban, Daniel Silalahi, Rasit Junaedi Simanjuntak, Richard Parlindungan Simanjuntak, Theresya Simbolon, Agustina Simbolon, Cantriya Simbolon, Daniel S. Simbolon, Yoel Sinaga, Elvis Lavenius sinaga, lotar mateus Sinaga, Rafael Grealdi Sirait, Juan Sebastian Siringoringo, Maysya Faiftin Sitanggang, Armando Agasi Sitanggang, Romualda Sitanggang, Roni Gabe Situmorang, Cristina Situmorang, Yudi Yohannes Sorang Pakpahan Surbakti, Efrans Tambunan, Dwito Julian Tambunan, Yosua Tampubolon, Albert Julio Tampubolon, Amsal Tarigan, Jenheri Rejeki TONNI LIMBONG Tulus Pramita Sihaloho Zakarias Situmorang Zebua, Wilfred Raimond Zekson Matondang