Articles
Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen
Lukman Hakim;
Fadli Husein Wattiheluw;
Agus Zainal Arifin;
Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (575.381 KB)
|
DOI: 10.26418/jlk.v1i2.7
Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.
Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia
Maryamah Maryamah;
Made Agus Putra Subali;
Lailly Qolby;
Agus Zainal Arifin;
Ali Fauzi
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (840.487 KB)
|
DOI: 10.26418/jlk.v1i1.4
Clustering of news documents manually depends on the ability and accuracy of the human so that it can lead to errors in the grouping process of documents. Therefore, it is necessary to group the news document automatically. In this clustering, we need a weighting method that includes TF.IDF.ICF. In this paper we propose a new weighting algorithm is TF.IDF.ICF.ITF to automatically clustering documents automatically through statistical data patterns so that errors in manual grouping of documents can be reduced and more efficient. K-Means ++ is an algorithm for classification and is the development of the K-Means algorithm in the initial cluster initialization stage which is easy to implement and has more stable results. K-Means ++ classifies documents at the weighting stages of Inverse Class Frequency (ICF). ICF is developed from the use of class-based weighting for the term weighting term in the document. The terms that often appear in many classes will have a small but informative value. The proposed weighting is calculated. Testing is done by using a certain query on some number of best features, the results obtained by TF.IDF.ICF.ITF method gives less optimal results.
PREFERENCE BASED TERM WEIGHTING FOR ARABIC FIQH DOCUMENT RANKING
Khadijah Fahmi Hayati Holle;
Agus Zainal Arifin;
Diana Purwitasari
Jurnal Ilmu Komputer dan Informasi Vol 8, No 1 (2015): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (381.084 KB)
|
DOI: 10.21609/jiki.v8i1.283
In document retrieval, besides the suitability of query with search results, there is also a subjective user assessment that is expected to be a deciding factor in document ranking. This preference aspect is referred at the fiqh document searching. People tend to prefer on certain fiqh methodology without rejecting other fiqh methodologies. It is necessary to investigate preference factor in addition to the relevance factor in the document ranking. Therefore, this research proposed a method of term weighting based on preference to rank documents according to user preference. The proposed method is also combined with term weighting based on documents index and books index so it sees relevance and preference aspect. The proposed method is Inverse Preference Frequency with α value (IPFα). In this method, we calculate preference value by IPF term weighting. Then, the preference values of terms that is equal with the query are multiplied by α. IPFα combined with the existing weighting methods become TF.IDF.IBF.IPFα. Experiment of the proposed method uses dataset of several Arabic fiqh documents. Evaluation uses recall, precision, and f-measure calculations. Proposed term weighting method is obtained to rank the document in the right order according to user preference. It is shown from the result with recall value reach 75%, precision 100%, and f-measure 85.7% respectively.
INFORMATION RETRIEVAL OF TEXT DOCUMENT WITH WEIGHTING TF-IDF AND LCS
Munjiah Nur Saadah;
Rigga Widar Atmagi;
Dyah S. Rahayu;
Agus Zainal Arifin
Jurnal Ilmu Komputer dan Informasi Vol 6, No 1 (2013): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (545.523 KB)
|
DOI: 10.21609/jiki.v6i1.216
Information retrieval of text document requires a method that is able to restore a number of documents that have high relevance according to the user's request. One important step in the process is a text representation of the weighting process. The use of LCS in Tf-Idf weighting adjustments considers the appearance of the same order of words between the query and the text in the document. There is a very long document but irrelevant cause weight produced is not able to represent the value relevance of documents. This research proposes the use of LCS which gives weight to the word order by considering long documents related to the average length of documents in the corpus. This method is able to return a text document effectively. Additional features of word order by normalizing the ratio of the overall length of the document to the documents in the corpus generate values of precision and recall as well as the method of Tasi et al.
CORTICAL BONE SEGMENTATION USING WATERSHED AND REGION MERGING BASED ON STATISTICAL FEATURES
Mamluatul Hani`ah;
Christian Sri Kusuma Aditya;
Aryo Harto;
Agus Zainal Arifin
Jurnal Ilmu Komputer dan Informasi Vol 8, No 2 (2015): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (597.002 KB)
|
DOI: 10.21609/jiki.v8i2.305
Research on biomedical image is a subject that attracted many researchers’ interest. This is because the biomedical image could contain important information to help analyze a disease. One of the existing researches in his field uses dental panoramic radiographs image to detect osteoporosis. The analyzed area is the width of cortical bone. To analyze that area, however, we need to determine the width of the cortical bone. This requires proper segmentation on the dental panoramic radiographs image. This study proposed the integration of watershed and region merging method based on statistical features for cortical bone segmentation on dental panoramic radiographs. Watershed segmentation process was performed using gradient magnitude value from the input image. The watershed image that still has excess segmentation could be solved by region merging based on statistical features. Statistical features used in this study are mean, standard deviation, and variance. The similarity of adjacent regions is measured using weighted Euclidean distance from the statistical feature of the regions. Merging process was executed by incorporating the background regions as many as possible, while keeping the object regions from being merged. The segmentation result has succeeded in forming the contours of the cortical bone. The average value of accuracy is 93.211%, while the average value of sensitivity and specificity is 93.858% and respectively.
USER EMOTION IDENTIFICATION IN TWITTER USING SPECIFIC FEATURES: HASHTAG, EMOJI, EMOTICON, AND ADJECTIVE TERM
Yuita Arum Sari;
Evy Kamilah Ratnasari;
Siti Mutrofin;
Agus Zainal Arifin
Jurnal Ilmu Komputer dan Informasi Vol 7, No 1 (2014): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (271.587 KB)
|
DOI: 10.21609/jiki.v7i1.252
Abstract Twitter is a social media application, which can give a sign for identifying user emotion. Identification of user emotion can be utilized in commercial domain, health, politic, and security problems. The problem of emotion identification in twit is the unstructured short text messages which lead the difficulty to figure out main features. In this paper, we propose a new framework for identifying the tendency of user emotions using specific features, i.e. hashtag, emoji, emoticon, and adjective term. Preprocessing is applied in the first phase, and then user emotions are identified by means of classification method using kNN. The proposed method can achieve good results, near ground truth, with accuracy of 92%.
COVERAGE, DIVERSITY, AND COHERENCE OPTIMIZATION FOR MULTI-DOCUMENT SUMMARIZATION
Khoirul Umam;
Fidi Wincoko Putro;
Gulpi Qorik Oktagalu Pratamasunu;
Agus Zainal Arifin;
Diana Purwitasari
Jurnal Ilmu Komputer dan Informasi Vol 8, No 1 (2015): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (608.144 KB)
|
DOI: 10.21609/jiki.v8i1.278
A great summarization on multi-document with similar topics can help users to get useful information. A good summary must have an extensive coverage, minimum redundancy (high diversity), and smooth connection among sentences (high coherence). Therefore, multi-document summarization that considers the coverage, diversity, and coherence of summary is needed. In this paper we propose a novel method on multi-document summarization that optimizes the coverage, diversity, and coherence among the summary's sentences simultaneously. It integrates self-adaptive differential evolution (SaDE) algorithm to solve the optimization problem. Sentences ordering algorithm based on topical closeness approach is performed in SaDE iterations to improve coherences among the summary's sentences. Experiments have been performed on Text Analysis Conference (TAC) 2008 data sets. The experimental results showed that the proposed method generates summaries with average coherence and ROUGE scores 29-41.2 times and 46.97-64.71% better than any other method that only consider coverage and diversity, re-spectively.
GRAMMATICAL EVOLUTION FOR FEATURE EXTRACTION IN LOCAL THRESHOLDING PROBLEM
Go Frendi Gunawan;
Sonny Christiano Gosaria;
Agus Zainal Arifin
Jurnal Ilmu Komputer dan Informasi Vol 5, No 2 (2012): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (1399.499 KB)
|
DOI: 10.21609/jiki.v5i2.197
The various lighting intensity in a document image causes diffculty to threshold the image. The conventional statistic approach is not robust to solve such a problem. There should be different threshold value for each part of the image. The threshold value of each image part can be looked as classifcation problem. In such a classifcation problem, it is needed to find the best features. This paper propose a new approach of how to use grammatical evolution to extract those features. In the proposed method, the goodness of each feature is calculated independently. The best features then used for classification task instead of original features. In our experiment, the usage of the new features produce a very good result, since there are only 5 miss-classification of 45 cases. Variasi intensitas pencahayaan pada citra dokumen akan menyebabkan kesulitan dalam menentukan nilai threshold dari citra tersebut. Pendekatan statistik konvensional tidak cukup baik dalam memecahkan masalah ini. Dalam hal ini, diperlukan nilai threshold yang berbeda-beda untuk setiap bagian citra. Nilai threshold dari setiap bagian citra dapat dipandang sebagai masalah klasifikasi. Dalam permasalahan klasifikasi semacam ini, dibutuhkan pencarian fitur-fitur terbaik. Di sini diusulkan sebuah pendekatan baru untuk mengekstrak fitur-fitur tersebut dengan menggunakan grammatical evolution. Nilai kebaikan dari masing-masing fitur akan dihitung secara saling lepas. Dalam percobaan yang dilakukan, tampak bahwa penggunaan fitur-fitur baru tersebut menghasilkan hasil yang sangat baik. Hanya ditemukan 5 kesalahan pengklasifikasian dalam 45 kasus.
Iterated Region for Interactive Image Segmentation on Dental Panoramic Radiograph
Biandina Meidyani;
Lailly S. Qolby;
Ahmad Miftah Fajrin;
Agus Zainal Arifin;
Dini Adni Navastara
Jurnal Ilmu Komputer dan Informasi Vol 12, No 1 (2019): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information
Publisher : Faculty of Computer Science - Universitas Indonesia
Show Abstract
|
Download Original
|
Original Source
|
Check in Google Scholar
|
Full PDF (1070.017 KB)
|
DOI: 10.21609/jiki.v12i1.613
Image Segmentation is a process to separate between foreground and background. Segmentation process in low contrast image such as dental panoramic radiograph image is not easily determined. Image segmentation accuracy determines the success or failure of the final analysis process. The process of segmentation can occur ambiguity. This ambiguity is due to an ambiguous area if it is not selected as a region so it may have occurred cluster errors. To solve this ambiguity, we proposed a new region merging by iterated region merging process on dental panoramic radiograph image. The proposed method starts from the user marking and works iteratively to label the surrounding regions. In each iteration, the minimal gray-levels value is merged so the unknown regions significantly reduced. This experiment shows that the proposed method is effective with an average of ME and RAE of 0.04% and 0.06%.