Jurnal Teknologi Informasi dan Ilmu Komputer
Vol 1, No 2 (2014)

Semantic Clustering Dan Pemilihan Kalimat Representatif Untuk Peringkasan Multi Dokumen

., Pasnur (Unknown)
Santika, Putu Praba (Unknown)
Syaifuddin, Gus Nanang (Unknown)



Article Info

Publish Date
01 Nov 2014

Abstract

Abstrak Coverage dan saliency merupakan masalah utama dalam peringkasan multi dokumen. Hasil ringkasan yang baik harus mampu mampu mencakup (coverage) sebanyak mungkin konsep penting (salient) yang ada pada dokumen sumber. Penelitian ini bertujuan untuk mengembangkan metode baru peringkasan multi dokumen dengan teknik semantic clustering dan pemilihan kalimat representatif cluster. Metode yang diusulkan berdasarkan prinsip kerja Latent Semantic Indexing (LSI) dan Similarity Based Histogram Clustering (SHC) untuk pembentukan cluster kalimat secara semantik, serta mengkombinasikan fitur Sentence Information Density (SID) dan Sentence Cluster Keyword (SCK) untuk pemilihan kalimat representatif cluster. Pengujian dilakukan pada dataset Document Understanding Conference (DUC) 2004 Task 2 dan hasilnya diukur menggunakan Recall-Oriented Understudy for Gisting Evaluation (ROUGE). Hasil pengujian menunjukkan bahwa metode yang diusulkan mampu mencapai nilai ROUGE-1 rata-rata sebesar 0,395 dan nilai ROUGE-2 rata-rata sebesar 0,106. Kata kunci: peringkasan multi dokumen, latent semantic indexing, similarity based histogram clustering, sentence information density, sentence cluster keyword Abstract Coverage and saliency is a major problem in multi-document summarization. The good summary should be able to cover (coverage) as much as possible the important concepts (salient) that exist in the source document. This research aims to develop a new method for multiple document summarization with semantic clustering techniques and the selection of representative clusters sentence. The proposed method is based on the principles of Latent Semantic Indexing (LSI) and Similarity Based Histogram Clustering (SHC) for clustering sentences semantically, and combine features of Sentence Information Density (SID) and Sentence Cluster Keyword (SCK) for selecting a representative sentence cluster. Tests are performed on Document Understanding Conference (DUC) 2004 Task 2 dataset and the results are measured using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE). The results show that the proposed method is able to achieve ROUGE-1 value by an average of 0.395 and the ROUGE-2 value by an average of 0.106. Keywords: multiple document summarization, latent semantic indexing, similarity based histogram clustering, sentence information density, sentence cluster keyword

Copyrights © 2014






Journal Info

Abbrev

JTIIK

Publisher

Subject

Computer Science & IT Engineering

Description

Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK) merupakan jurnal nasional yang diterbitkan oleh Fakultas Ilmu Komputer (FILKOM), Universitas Brawijaya (UB), Malang sejak tahun 2014. JTIIK memuat artikel hasil-hasil penelitian di bidang Teknologi Informasi dan Ilmu Komputer. JTIIK berkomitmen ...