Claim Missing Document
Check
Articles

Found 37 Documents
Search

PERFORMANCE OF ROBUST SUPPORT VECTOR MACHINE CLASSIFICATION MODEL ON BALANCED, IMBALANCED AND OUTLIERS DATASETS Muhammad Ardiansyah Sembiring; Herman Saputra; Riki Andri Yusda; Sutarman Sutarman; Erna Budhiarti Nababan
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) Vol. 10 No. 1 (2024): JITK Issue August 2024
Publisher : LPPM Nusa Mandiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33480/jitk.v10i1.5272

Abstract

In the realm of machine learning, classification models are important for identifying patterns and grouping data. Support Vector Machine (SVM) and Robust SVM are two types of models that are often used. SVM works by finding an optimal hyperplane to separate data classes, while Robust SVM is designed to deal with uncertainty and noise in the data, making it more resistant to outliers. However, SVM has limitations in dealing with class imbalance and outliers in the dataset. Class imbalance makes the model tend to predict the majority class, and outliers can interfere with model formation. This research compares the performance of SVM and Robust SVM on normal, unbalanced and outlier datasets. The software uses Python and Scikit-learn for implementation and comparison of the two models. Key features include automatic data preprocessing, model training, and evaluation with metrics such as accuracy, precision, recall, and F1 score. The results show that Robust SVM is superior in accuracy on normal datasets and is very effective in dealing with class imbalance, achieving a maximum accuracy of 100%. On datasets with outliers, Robust SVM maintains stable accuracy, demonstrating its robustness to outliers. This research contributes to correspondence management by providing more reliable classification models, improving data processing accuracy, and supporting more informed decision making in software development
Performance Analysis Of The Combination Of Blum Blum Shub And Rc5 Algorithm In Message Security Rambe, Basyit Mubarroq; Nababan, Erna Budhiarti; Nasution, Mahyuddin KM
JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING Vol. 7 No. 2 (2024): Vol. 7 No. 2 (2024): Issues January 2024
Publisher : Universitas Medan Area

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31289/jite.v7i2.10937

Abstract

This research aims to enhance message security in the RC5 algorithm by integrating it with the Blum Blum Shub (BBS) algorithm. The rapid growth in data and information exchange, driven by advancements in information and communication technology, demands robust security against attacks such as eavesdropping, interruption, and data modification. Cryptography, particularly with symmetric and asymmetric keys, becomes a solution to maintain message confidentiality. The RC (Rivest Cipher) algorithm, specifically RC5, has become a popular choice in network applications due to its speed and variable key length complexity. This study attempts to improve the quality of encryption keys by integrating the Blum Blum Shub (BBS) method, a mathematical random number generator algorithm. RC5 and BBS are used together to secure messages, producing ciphertext that is difficult to predict and smaller in file size compared to the standard RC5 method. The test results show that the processing speed is independent of the number of characters in the plaintext, while the encrypted file size resulting from the RC5-BBS combination is more efficient than using the default RC5. In conclusion, integrating BBS into RC5 can enhance the security and efficiency of the encryption algorithm, with the potential for widespread application in cryptography-based data security
Analysis of Dimensional Reduction Effect on K-Nearest Neighbor Classification Method Taufiqurrahman, Taufiqurrahman; Nababan, Erna Budhiarti; Efendi, Syahril
Sinkron : jurnal dan penelitian teknik informatika Vol. 5 No. 2B (2021): Article Research October 2021
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v6i1.11234

Abstract

Classification algorithms mostly become problematic on data with high dimensions, resulting in a decrease in classification accuracy. One method that allows classification algorithms to work faster and more effectively and improve the accuracy and performance of a classification algorithm is by dimensional reduction. In the process of classifying data with the K-Nearest Neighbor algorithm, it is possible to have features that do not have a matching value in classifying, so dimension reduction is required. In this study, the dimension reduction method used is Linear Discriminant Analysis and Principal Component Analysis and classification process using KNN, then analyzed its performance using Matrix Confusion. The datasets used in this study are Arrhythmia, ISOLET, and CNAE-9 obtained from UCI Machine Learning Repository. Based on the results, the performance of classifiers with LDA is better than with PCA on datasets with more than 100 attributes. Arrhythmia datasets can improve performance on K-NN K=3 and K=5. The best performance is obtained by LDA+K-NN K=3 which produces an accuracy value of 98.53%, the lowest performance found in K-NN without reduction with K=3. ISOLET datasets, the best performance results are also obtained by data that has been reduced with LDA, but the best performance is obtained when the classification of K-NN with K=5 and the lowest performance is found in PCA+ K-NN with a value of K=3. As for the best performance, dataset CNAE-9 is also achieved by LDA+K-NN, while the lowest performance is PCA+K-NN with the value of K=3.
Convolutional Neural Network Activation Function Performance on Image Recognition of The Batak Script Muis, Abdul; Zamzami, Elviawaty Muisa; Erna Budhiarti Nababan
Sinkron : jurnal dan penelitian teknik informatika Vol. 8 No. 1 (2024): Articles Research Volume 8 Issue 1, January 2024
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v9i1.13192

Abstract

Deep Learning is a sub-set of Machine learning, Deep Learning is widely used to solve problems in various fields. One of the popular deep learning architectures is The Convolutional Neural Network (CNN), CNN has a layer that transforms feature extraction automatically so it is widely used in image recognition. However, CNN's performance using the tanh function is still relatively low, therefore it is necessary to select the right activation function to improve accuracy performance. This study analyzes the use of the activation function in image recognition of the Batak script. The result of this study is that the CNN model using the ReLU and eLU functions produces the highest accuracy compared to the CNN model using the tanh function. The CNN model using eLU produces the best accuracy performance in the training process, which is 99.71% with an error value of 0.0108. Meanwhile, in the testing process, the highest accuracy value is generated by the CNN Model using the ReLU function with an accuracy of 94.11%, an error value of 0.3282, a precision value of 0.9411, a recall of 0.9411, and an f1-score of 0.9416.
The role of Louvain-coloring clustering in the detection of fraud transactions Mardiansyah, Heru; Suwilo, Saib; Nababan, Erna Budhiarti; Efendi, Syahril
International Journal of Electrical and Computer Engineering (IJECE) Vol 14, No 1: February 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v14i1.pp608-616

Abstract

Clustering is a technique in data mining capable of grouping very large amounts of data to gain new knowledge based on unsupervised learning. Clustering is capable of grouping various types of data and fields. The process that requires this technique is in the business sector, especially banking. In the transaction business process in banking, fraud is often encountered in transactions. This raises interest in clustering data fraud in transactions. An algorithm is needed in the cluster, namely Louvain’s algorithm. Louvain’s algorithm is capable of clustering in large numbers, which represent them in a graph. So, the Louvain algorithm is optimized with colored graphs to facilitate research continuity in labeling. In this study, 33,491 non-fraud data were grouped, and 241 fraud transaction data were carried out. However, Louvain’s algorithm shows that clustering increases the amount of data fraud of 90% by accurate.
Development of the fuzzy grid partition methods in generating fuzzy rules for the classification of data set Marbun, Murni; Sitompul, Opim Salim; Nababan, Erna Budhiarti; Sihombing, Poltak
Bulletin of Electrical Engineering and Informatics Vol 13, No 3: June 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/eei.v13i3.5378

Abstract

The main weakness of complex and sizeable fuzzy rule systems is the complexity of data interpretation in terms of classification. Classification interpretation can be affected by reducing rules and removing important rules for several reasons. Based on the results of experiments using the fuzzy grid partition (FGP) approach for high-dimensional data, the difficulty in generating many fuzzy rules still increases exponentially as the number of characteristics increases. The solution to this problem is a hybrid method that combines the advantages of the rough set method and the FGP method, which is called the fuzzy grid partition rough set (FGPRS) method. In the Irish data, the rough set approach reduces the number of characteristics and objects so that data with excessive values can be minimized, and the fuzzy rules produced using the FGP method are more concise. The number of fuzzy rules produced using the FGPRS method at K=2 is 50%; at K=K+1, it is reduced by 66.7% and at K=2 K, it is reduced by 75%. Based on the findings of the data collection classification test, the FGPRS method has a classification accuracy rate of 83.33%, and all data can be classified.
Model Klasifikasi Dengan Logistic Regression Dan Recursive Feature Elimination Pada Data Tidak Seimbang Sutarman; Arisandi, Dedy; Kurniawan, Edi; Nababan, Erna Budhiarti; Siringoringo, Rimbun
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 11 No 4: Agustus 2024
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25126/jtiik.1148198

Abstract

Logistic Regression merupakan metode pengklasifikasi yang sangat populer dan digunakan secara luas pada berbagai penelitian. Logistic Regression dapat memberikan hasil yang baik pada masalah klasifikasi maupun prediksi. Fitur dataset yang besar mengakibatkan beban komputasi,  dan  menurunkan kinerja klasifikasi. Terdapat tiga dataset yang digunakan pada penelitian ini yaitu Bank marketing, Glass, dan Musk II. Dataset tersebut bersumber dari  UCI Repository dan memiliki karakteristik yang berbeda. Ada dua tantangan penggunaan dataset tersebut, yaitu ketidakseimbangan kelas, dan jumlah fitur yang besar. Ada dua tahapan utama pada penelitian ini, yaitu pemrosesan awal dan klasifikasi.  Tahapan pemrosesan awal menerapkan seleksi  fitur melalui recursive feature elimination, dan penyeimbangan data menggunakan teknik  SMOTE. Tahapan klasifikasi menerapkan Logistic Regression. Teknik ridge regression (L2-regularization) diterapkan untuk menghindari overfitting pada tahap validasi model LR.  Evaluasi kinerja model didasarkan pada matrik konfusi dan grafik ROC. Hasil penelitian menunjukkan bahwa seleksi fitur dan peyeimbangan kelas memiliki dampak yang baik. Melalui ROC, model LR+RFE+SMOTE memiliki luas sebesar 93%. Hasil ini lebih baik dibanding dengan empat model klasifikasi lainnya, yaitu  Naïve Bayes, Decision Tree, K-NN, dan Random Forest.   Abstract   Logistic regression is a widely popular classification method extensively used in various studies. Logistic regression can yield good results in classification and prediction problems. The extensive features of the dataset can lead to computational burdens and reduced classification performance. Three datasets were utilized in this research: Bank Marketing, Glass, and Musk II. The dataset is sourced from the UCI Repository and contains various characteristics. There are two challenges associated with using this dataset: class imbalance and a large number of features. There are two main stages in this research: initial processing and classification. At the initial processing stage, feature selection is conducted through recursive feature elimination, and data balancing is achieved using the SMOTE technique. The classification stage applies logistic regression. The ridge regression technique (L2-regularization) is applied to prevent overfitting during the validation stage of the linear regression model. The model performance evaluation is based on confusion matrices and ROC graphs. The research results show that feature selection and class balancing have a positive impact. Through the Receiver Operating Characteristics (ROC) analysis, the LR+RFE+SMOTE model achieved an area under the curve of 93%. These results are better than those of four other classification models, namely Naïve Bayes, Decision Tree, K-NN, and Random Forest.
Performance Level Analysis On Learning Vector Quantization And Cohonen Algorithms Pasaribu, Roni Fredy Halomoan; Zarlis, Muhammad; Nababan, Erna Budhiarti
Sinkron : jurnal dan penelitian teknik informatika Vol. 9 No. 1 (2025): Research Article, January 2025
Publisher : Politeknik Ganesha Medan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33395/sinkron.v9i1.14313

Abstract

Biometric identification is an alternative for a security system that consists of physiological characteristics and behavioral characteristics. Physiological characteristics are relatively stable physical characteristics such as fingerprints, hand lines, facial features, tooth patterns, and the retina of the eye. Behavioral characteristics such as signature, speech patterns, or typing rhythm. The function of a signature is proof in a document which states that the party signing, knows and agrees to all the contents of a document. There are several stages in the signature pattern image recognition system, namely the signature pattern image is produced through a scanning process, then the resulting digital signature image is cut (scaling) manually, the next process is thresholding, edge detection, image division, and representation. input value. The method used in recognizing signature patterns is the learning vector quantization (LVQ) artificial neural network method and kohonen self-organizing map (SOM). In Learning vector quantization, the initial weights are updated using existing patterns. Meanwhile, in the self-organizing map method, Kohonen takes initial weights randomly, then these weights are updated until they can classify themselves into the desired number of classes. The processes that occur in the artificial neural network method require a relatively long time. This is influenced by the large number of data samples used as a means of updating the trained weights. From the results of the research conducted, it shows that the learning rate value that was built around 0.2 < α ≤ (10) ^ (-2) can produce better signature pattern recognition accuracy.
Detection of the Use of Mask to Prevent the Spread of COVID-19 Using SVM, Haar Cascade Classifier, and Robot Arm Pratiwi, Andini; Nababan, Erna Budhiarti; Amalia
Data Science: Journal of Computing and Applied Informatics Vol. 6 No. 2 (2022): Data Science: Journal of Computing and Applied Informatics (JoCAI)
Publisher : Talenta Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32734/jocai.v6.i2-9289

Abstract

In the effort to hold up the case spread of COVID-19’s growth rate by implementing health protocols such as the use of masks, supervision is needed especially for the people who have not or still have problems to wearing masks. In this research, the system utilizes the robotic power to identify visitors whether they are wearing masks or not, and automatically distribute masks if the user is detected as not wearing a mask. The user face detection process uses the Haar Cascade Classifier algorithm and SVM (Support Vector Machine) to classify users who wear masks or not. For the user who is detected as not wearing masks, myCobot-Pi with the support of suction pump will distribute masks to users. The use of myCobot-Pi as a raspberry pi based robotic arm allows the application of the system on devices that are minimal in terms of specifications and size. Through trials by taking 41 examples of detection cases, 29 cases were found that managed to detect the correct use of masks. In addition, in this study we use PP sheet plastic protector to replace the packaging of the mask because it can be carried by the suction pump properly.
Reduksi Dimensi pada Klasifikasi Data Microarray Menggunakan Minimum Redundancy Maximum Relevance dan Random Forest : The Dimensional Reduction in Microarray Data Classification Using Minimum Redundancy Maximum Relevance and Random Forest Harahap, Lailan; Nababan, Erna Budhiarti; Efendi, Syahril
The Indonesian Journal of Computer Science Vol. 12 No. 1 (2023): The Indonesian Journal of Computer Science
Publisher : AI Society & STMIK Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33022/ijcs.v12i1.3133

Abstract

Di Indonesia prevalensi kanker pada data Riskesdes tahun 2018 terdapat 1,79 per 1.000 penduduk mengidap penyakit kanker. Akibat tingginya prevalensi kanker maka diperlukan pendeteksian kanker sejak dini. Salah satu cara mendeteksi kanker yaitu dengan teknologi microarray dimana teknologi ini dapat memantau ribuan ekpresi gen secara bersamaan dalam satu percobaan. Namun, data microarray memiliki dimensi yang besar sehingga diperlukan proses reduksi dimensi data microarray pada penyakit prostate cancer da gastric cancer agar dapat menghilangkan atribut yang redundansi dan meningkatkan akurasi pada klasifikasi. Reduksi dilakukan menggunakan MRMR (FCQ dan FCD) dengan k 10,20,30,40,50,60,70,80,90 dan 100. Klasifikasi dilakukan menggunakan RF dengan membentuk 100 tree. Hasil akurasi terbaik pada klasifikasi data prostate cancer yaitu dengan FCQ 100% pada k=10, tanpa reduksi 95% dan akurasi terendah dengan FCD 52% pada k=90. Sedangkan hasil akurasi terbaik klasifikasi data gastric cancer yaitu dengan FCQ dan FCD 100% pada semua k dan akurasi terendah yaitu tanpa reduksi 83%.