Jurnal Nasional Teknik Elektro dan Teknologi Informasi
Vol 7 No 3: Agustus 2018

Ekstraksi Frasa Kunci pada Penggabungan Klaster berdasarkan Maximum-Common-Subgraph

Adhi Nurilham (Institut Teknologi Sepuluh Nopember)
Diana Purwitasari (Institut Teknologi Sepuluh Nopember)
Chastine Fatichah (Institut Teknologi Sepuluh Nopember)



Article Info

Publish Date
10 Sep 2018

Abstract

Document clustering based on topic similarities helps users in searching from a collection of scientific articles. Topic labels are necessesary for describing subjects of the document clusters. Clusters with related subjects or contextual similarities can be merged to produce more descriptive labels. Relations between those words in one context can be modelled as a graph. Instead of single word, this paper proposed cluster labeling of phrases from scientific articles withcluster merging based on graph. The proposed method begins with K-Means++ for clustering the scientific articles. Then, the candidates of word phrases from document clusters are extracted using Frequent Phrase Mining which inspired by Apriori algorithm. Each cluster result has a representation graph from those extracted word phrases. An indicator value from each graph shows any similarities of graph structures which is calculated with Maximum Common Subgraph (MCS). Those clusters are merged if there are any structure similarities between them. Topic labels of clusters are keyword phrases extracted from a representation graph of previous merged clusters using TopicRank algorithm. The merging process which becomes the contribution of this paper is considering topic distribution within clusters for phrase extraction. The proposed method evaluationis performed based on topic coherence of the merged clusterslabel. The results show that proposed method can improve topic coherence on the merged clusters with MCS graph size percentage as the key factor.Further observation shows that merged cluster labels consistent to MCS graph.

Copyrights © 2018






Journal Info

Abbrev

JNTETI

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Energy Engineering

Description

Topics cover the fields of (but not limited to): 1. Information Technology: Software Engineering, Knowledge and Data Mining, Multimedia Technologies, Mobile Computing, Parallel/Distributed Computing, Artificial Intelligence, Computer Graphics, Virtual Reality 2. Power Systems: Power Generation, ...