Claim Missing Document
Check
Articles

Found 39 Documents
Search

Building Synonym Sets for English WordNet with Robust Clustering using Links Method Suryaningsih, Sarah; Bijaksana, Moch Arif; Astuti, Widi
Jurnal Pendidikan Informatika (EDUMATIC) Vol 4, No 1 (2020): Edumatic : Jurnal Pendidikan Informatika
Publisher : Universitas Hamzanwadi

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

English WordNet is an important synonym set to present the similarity of meanings between words. Synonym Set is built using Oxford Thesaurus which is accessed through lexico.com, which is a part of the lexical database that will be used. After using the extraction process through Oxford Thesaurus it will produce a synonym set with the same meaning between words. The difference between WordNet and ordinary dictionaries is that the word is interconnected with other words. One method employed for this approach is Robust Clustering Using Links method, which is similarity values and synonym sets that have been created to be used to build a lexical database. Therefore the main purpose of the development of the English WordNet is to produce an accurate synonym set using clustering techniques. The evaluation calculation will use the F-measure method and will use the gold standard for the calculation method. With the ROCK method, there is an increase in accuracy output from dataset input. Building the English wordnet is to improve words that can be used to help research and development of other language wordnets with role models using more accurate English wordnets. And the use of ROCK method there is an increase in the accuracy upon results of the development of English wordnet compared to the previous method, which is using hierarchical clustering. The outcome of this study resulted in improved accuracy so that the ROCK method is one of the good methods used in the development of the English wordnet.
Comparative analysis of ReliefF-SVM and CFS-SVM for microarray data classification Mochamad Agusta Naofal Hakim; Adiwijaya Adiwijaya; Widi Astuti
International Journal of Electrical and Computer Engineering (IJECE) Vol 11, No 4: August 2021
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v11i4.pp3393-3402

Abstract

Cancer is one of the main causes of death in the world where the World Health Organization (WHO) recognized cancer as among the top causes of death in 2018. Thus, detecting cancer symptoms is paramount in order to cure and subsequently reduce the casualties due to cancer disease. Many studies have been developed data mining approaches to detect symptoms of cancer through a classifying human gene data expression. One popular approach is using microarray data based on DNA. However, DNA microarray data has many dimensions that can have a detrimental effect on the accuracy of classification. Therefore, before performing classification, a feature selection technique must be used to eliminate features that do not have important information to support the classification process. The feature selection techniques used were ReliefF and correlation-based feature selection (CFS) and a classification technique used in this study is support vector machine (SVM). Several testing schemes were applied in this analysis to compare the performance of ReliefF and CFS with SVM. It showed that the ReliefF outperformed compared with CFS as microarray data classification approach.
Building Synonym Sets for English WordNet with Robust Clustering using Links Method Sarah Suryaningsih; Moch Arif Bijaksana; Widi Astuti
Jurnal Pendidikan Informatika (EDUMATIC) Vol 4, No 1 (2020): Edumatic: Jurnal Pendidikan Informatika
Publisher : Universitas Hamzanwadi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29408/edumatic.v4i1.2063

Abstract

English WordNet is an important synonym set to present the similarity of meanings between words. Synonym Set is built using Oxford Thesaurus which is accessed through lexico.com, which is a part of the lexical database that will be used. After using the extraction process through Oxford Thesaurus it will produce a synonym set with the same meaning between words. The difference between WordNet and ordinary dictionaries is that the word is interconnected with other words. One method employed for this approach is Robust Clustering Using Links method, which is similarity values and synonym sets that have been created to be used to build a lexical database. Therefore the main purpose of the development of the English WordNet is to produce an accurate synonym set using clustering techniques. The evaluation calculation will use the F-measure method and will use the gold standard for the calculation method. With the ROCK method, there is an increase in accuracy output from dataset input. Building the English wordnet is to improve words that can be used to help research and development of other language wordnets with role models using more accurate English wordnets. And the use of ROCK method there is an increase in the accuracy upon results of the development of English wordnet compared to the previous method, which is using hierarchical clustering. The outcome of this study resulted in improved accuracy so that the ROCK method is one of the good methods used in the development of the English wordnet.
Comparative Analysis of Support Vector Machine-Recursive Feature Elimination and Chi-Square on Microarray Classification for Cancer Detection with Naïve Bayes Talitha Kayla Amory; Adiwijaya Adiwijaya; Widi Astuti
Journal of Data Science and Its Applications Vol 3 No 2 (2020): Journal of Data Science and Its Applications
Publisher : Telkom University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34818/jdsa.2020.3.62

Abstract

Cancer is a world-famous deadly disease. According to the World Health Organization (WHO), cancer is the second leading cause of death globally and is responsible for an estimated 9.6 million deaths in 2018. One well-known technique for cancer detection is the DNA microarray technique. DNA microarray technology provides an opportunity for researchers to analyze thousands of gene expression profiles at the same time to determine whether a person has cancer or not. However, one of the problems in DNA microarray data is the large number of features that require feature selection. In overcoming these problems, this study will use the feature selection Support Vector Machine-Recursive Feature Elimination (SVM-RFE) and Chi-Square and use the Naïve Bayes classification method. The accuracy results from using feature selection with those that are not will be compared. The accuracy between using the two feature selection methods will also be compared to find which feature selection method is better when combined with the Naïve Bayes classification method. To get an overall picture of the performance comparison, this study also considers precision, recall, and F1-score. The best accuracy results obtained were 100% lung cancer data with SVM-RFE and Chi-Square, 99.6% ovarian cancer with SVM-RFE, 93.7% breast cancer with SVM-RFE, and 90% colon cancer with SVM- RFE.
Cancer Detection based on Microarray Data Classification Using FLNN and Hybrid Feature Selection Ghozy Ghulamul Afif; Adiwijaya; Widi Astuti
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 5 No 4 (2021): Agustus 2021
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (484.221 KB) | DOI: 10.29207/resti.v5i4.3352

Abstract

Cancer is one of the second deadliest diseases in the world after heart disease. Citing from the WHO's report on cancer, in 2018 there were around 18.1 million cases of cancer in the world with a total of 9.6 million deaths. Now that bioinformatics technology is growing and based on WHO’s report on cancer, an early detection is needed where bioinformatics technology can be used to diagnose cancer and to help to reduce the number of deaths from cancer by immediately treating the person. Microarray DNA data as one of the bioinformatics technology is becoming popular for use in the analysis and diagnosis of cancer in the medical world. Microarray DNA data has a very large number of genes, so a dimensional reduction method is needed to reduce the use of features for the classification process by selecting the most influential features. After the most influential features are selected, these features are going to be used for the classification and predict whether a person has cancer or not. In this research, hybridization is carried out by combining Information Gain as a filtering method and Genetic Algorithm as a wrapping method to reduce dimensions, and lastly FLNN as a classification method. The test results get colon cancer data to get the highest accuracy value of 90.26%, breast cancer by 85.63%, lung cancer and ovarian cancer by 100%, and prostate cancer by 94.10%.
Development Synonym Set for the English Wordnet Using the Method of Comutative and Agglomerative Clustering Munirsyah Munirsyah; Moch. Arif Bijaksana; Widi Astuti
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol 9, No 2 (2020): JULI
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v9i2.855

Abstract

Wordnet is a collection of words that interpret or present a meaning, in its development Wordnet has an important part, the Synonym Set or Synset. In making Synonym sets, synonyms are needed and the commutative nature of words is needed. To get word synonyms, the English language thesaurus becomes the reference data for taking synonym data. Broadly speaking, the difference between Wordnet and the dictionary is that the meaning of the word is related to other words, to determine the equation requires a commutative process. The process is made easy by using commutative methods that will produce a candidate synonym set. Candidates for the synonym set cannot be used for word syntax, the grouping process of words which produces the Synonym set as the final result must be carried out. The process of grouping words can one of them use clustering techniques, in this study will use Agglomerative Clustering techniques. In the process of agglomerative clustering techniques there is a threshold value to determine the number of repetitions or as a condition to stop the iteration process. The clustering process in this study will use a threshold value of 0.1 to 1 to test the best threshold value to produce the best Synonym set and calculate its accuracy value. Accuracy calculation and evaluation will use the F-measure method to find the best results.
Implementation of Modified Backpropagation with Conjugate Gradient as Microarray Data Classifier with Binary Particle Swarm Optimization as Feature Selection for Cancer Detection Muhammad Naufal Mukhbit Amrullah; Adiwijaya Adiwijaya; Widi Astuti
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol 9, No 3 (2020): NOVEMBER
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v9i3.978

Abstract

Cancer is one of the deadliest diseases in the world that needs to be handled as early as possible. One of the methods to detect the presence of cancer cells early on is by using microarray data. Microarray data can store human gene expression and use it to classify cancer cells. But one of the challenges of using microarray is its vast number of features, not proportional to its small number of samples. To resolve that problem, dimensionality reduction is needed to reduce the number of features stored in microarray data. Binary Particle Swarm Optimization (BPSO) is one of the methods to reduce dimensionality of microarray data that can increase classification performance. Although when combined with Backpropagation, BPSO still shows a relatively low performance. In this research, Modified Backpropagation with Conjugate Gradient is used to classify data that has been reduced with BPSO. The average accuracy result of BPSO+CGBP is 86.1%, giving it an improvement compared to BPSO+BP which averaged to 80.8%.
Klasifikasi Multi Label pada Hadis Bukhari Terjemahan Bahasa Indonesia Menggunakan Mutual Information dan k-Nearest Neighbor Afrian Hanafi; Adiwijaya Adiwijaya; Widi Astuti
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol 9, No 3 (2020): NOVEMBER
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v9i3.980

Abstract

Hadith is the second source of law for Muslims after the Qur'an which comes from various forms of the words, actions and stipulations of the Prophet Muhammad or referred to as his sunnah. In order to make it easier for Muslims to apply the teachings of the hadiths, a classification system is needed that can categorize a hadith into a class or a combination of two of the three classes which called a multi-label classification. In building a text classification system, there are various classification techniques, one of which is k-Nearest Neighbor (KNN). KNN is a simple and effective classification method for text classification, but has a weakness in processing data with high vector dimensions so that the computation time is higher and the efficiency of text classification is very low. Mutual Information (MI) is used as a feature selection method to reduce vector dimensions because it has the ability to show how strong a feature is in making a correct prediction of a class. In this study Problem Transformation Method with the Binary Relevance (BR) approach is used so that the multi label classification process can be accomplished. The optimum results obtained in this study shows the value of hamming loss is 0.0886 or about 91.14% of data were correctly classified and computational time for 595 seconds by using MI as a feature selection, but without stemming.
Building Synsets for Indonesian WordNet using ROCK (Robust Clustering Using Links) Algorithm Mubaroq Iqbal; Moch. Arif Bijaksana; Widi Astuti
Jurnal Sisfokom (Sistem Informasi dan Komputer) Vol 9, No 2 (2020): JULI
Publisher : ISB Atma Luhur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32736/sisfokom.v9i2.853

Abstract

On the development of Indonesian WordNet, the synonym set is an important part that represents the similarity of meaning between words. Synonym sets are built using the Indonesian Thesaurus as the lexical database. After going through the extraction process from the Indonesian Thesaurus, we will get a synonym set that has a similarity or word sense between words. In general, the difference between WordNet and the dictionary is their main focus, in which the dictionary usually focuses on just one word, while in WordNet the focus is on the meaning of words and connectedness with other words. Explained in previous research, the constructions of synonym sets were done using several approaches, which is clustering to produce synonym sets and WSD (Word Sense Disambiguation). In this article, the approach used to produce synonym sets is the ROCK (Robust Clustering Using Links) algorithm, which uses similarity and link values. The resulting synonym sets will then be used for lexical database development. Therefore, the main focus of this article is to produce synonym sets through the clustering process and calculate their accuracy, using the F-Measure method involving the gold standard for performance calculation and evaluation.
Deteksi Konten Gereflekter pada Cerita Anak Menggunakan Naïve Bayes Classifier Mayya Tania Wewengkang; Dana Sulistiyo Kusumo; Widi Astuti
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 4, No 2 (2020): April 2020
Publisher : STMIK Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v4i2.2015

Abstract

Textbooks and storybooks are the ones used as a source of knowledge. When children read a book, they will try to interpret each word and sentence in it. However, it will be a problem if the book contains vulgar words and indecent sentences. For children at the elementary school level, it is not allowed. For this research, we called that content as gereflekter content. Based on these problems, this research was conducted by building a system to detect gereflekter content in the text of the child's stories that were used as a data set. A system is built by using Naïve Bayes Classifier (NBC) and then evaluated in two scenarios using accuracy, precision, and recall metrics because the characteristics of the data set are imbalanced with the amount of data in the negative class are greater than the data in the positive class. From evaluation results, test scenario produced a high average precision of 99.01%, whereas the recall value has an average of above 50%. From these two values, it can be concluded that the model built by the system has not detected the class properly, but highly trusted when it does.