Claim Missing Document
Check
Articles

Found 15 Documents
Search

Rhetorical Sentence Classification for Automatic Title Generation in Scientific Article Jan Wira Gotama Putra; Masayu Leylia Khodra
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 15, No 2: June 2017
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v15i2.4061

Abstract

In this paper, we proposed a work onrhetorical corpus construction andsentence classification model experiment that specifically could be incorporated in automatic paper title generation task for scientific article. Rhetorical classification is treated as sequence labeling. Rhetorical sentence classification model is useful in task which considers document’s discourse structure. We performed experiments using two domains of datasets: computer science (CS dataset), and chemistry (GaN dataset). We evaluated the models using 10-fold-cross validation (0.70-0.79 weighted average F-measure) as well as on-the-run (0.30-0.36 error rate at best). We argued that our models performed best when handled using SMOTE filter for imbalanced data
Review of Local Descriptor in RGB-D Object Recognition Ema Rachmawati; Iping Supriana Suwardi; Masayu Leylia Khodra
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 12, No 4: December 2014
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v12i4.388

Abstract

The emergence of an RGB-D (Red-Green-Blue-Depth) sensor which is capable of providing depth and RGB images gives hope to the computer vision community. Moreover, the use of local features began to increase over the last few years and has shown impressive results, especially in the field of object recognition. This article attempts to provide a survey of the recent technical achievements in this area of research. We review the use of local descriptors as the feature representation which is extracted from RGB-D images, in instances and category-level object recognition. We also highlight the involvement of depth images and how they can be combined with RGB images in constructing a local descriptor. Three different approaches are used in involving depth images into compact feature representation, that is classical approach using distribution based, kernel-trick, and feature learning. In this article, we show that the involvement of depth data successfully improves the accuracy of object recognition.
Hierarchical multi-label news article classification with distributed semantic model based features Ivana Clairine Irsan; Masayu Leylia Khodra
International Journal of Advances in Intelligent Informatics Vol 5, No 1 (2019): March 2019
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/ijain.v5i1.168

Abstract

Automatic news categorization is essential to automatically handle the classification of multi-label news articles in online portal. This research employs some potential methods to improve performance of hierarchical multi-label classifier for Indonesian news article. First potential method is using Convolutional Neural Network (CNN) to build the top level classifier. The second method could improve the classification performance by calculating the average of the word vectors obtained from distributed semantic model. The third method combines lexical and semantic method to extract documents features, which multiplied word term frequency (lexical) with word vector average (semantic). Model build using Calibrated Label Ranking as multi-label classification method, and trained using Naïve Bayes algorithm has the best F1-measure of 0.7531. Multiplication of word term frequency and the average of word vectors were also used to build this classifiers. This configuration improved multi-label classification performance by 4.25%, compared to the baseline. The distributed semantic model that gave best performance in this experiment obtained from 300-dimension word2vec of Wikipedia’s articles. The multi-label classification model performance is also influenced by news’ released date. The difference period between training and testing data would also decrease models’ performance.
TRANSFORMING RHETORICAL DOCUMENT PROFILE INTO TAILORED SUMMARY OF SCIENTIFIC PAPER Masayu Leylia Khodra; Mohammad Dimas; Dwi Hendratmo Widyantoro; E. Aminudin Aziz; Bambang Riyanto Trilaksono
Jurnal Ilmiah Kursor Vol 6 No 3 (2012)
Publisher : Universitas Trunojoyo Madura

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Since abstract of scientific paper is author biased, readers’ required information may not be included in the abstract. Tailored summary may help them to get a summary based on their information needs. This research is the first one that implements tailored summary system for scientific paper. Tailored summary applies information extraction that transforms a scientific paper into Rhetorical Document Profile, a structured representation of paper content based on rhetorical scheme of fifteen slots. This research adapted building plan that used rhetorical scheme of seven slots. We also implement tailored summary system. After generating initial summary, surface repair is conducted to improve summary readability. Each sentence in initial summary is combined with template phrase based on syntax-tree combination method. There are five groups of template phrases provided in surface repair. We construct evaluation standards by asking five human raters. The best method for sentence selection subsystem that uses Maximal Marginal Importance-Multi Sentence is employing TF.IDF weighting system with precision/recall of 0.61. The surface repair subsystem has acceptance of 0.91.
RESTRICTED CONTENT CLASSIFICATION BASED ON VIDEA METADATA AND COMMENTS (CASE STUDY : YOUTUBE.COM) Stefanus Thobi Sinaga; Masayu Leylia Khodra
Jurnal Ilmiah Kursor Vol 7 No 4 (2014)
Publisher : Universitas Trunojoyo Madura

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

RESTRICTED CONTENT CLASSIFICATION BASED ON VIDEA METADATA AND COMMENTS (CASE STUDY : YOUTUBE.COM) aStefanus Thobi Sinaga, aMasayu Leylia Khodra a,bSekolah Teknik Elektro dan Informatika, Institut Teknologi Bandung, Jl. Ganesha 10 Bandung E-Mail: s.thobi.sinaga@gmail.com Abstrak Klasifikasi konten terbatas merupakan kegiatan memisahkan konten video yang layak untuk seluruh pengguna dari konten yang tidak layak untuk pengguna di bawah umur (<18 tahun). Pada situs Youtube, proses klasifikasi konten terbatas dilakukan secara manual oleh karyawan berdasarkan laporan yang dikirimkan oleh komunitas pengguna. Pada penelitian ini dirancang sebuah sistem klasifikasi konten terbatas secara otomatis yang dapat melakukan klasifikasi terhadap video Youtube berdasarkan teks metadata (judul, deskripsi) dan komentar dari video tersebut. Sistem tersebut memanfaatkan model klasifikasi hasil eksperimen terhadap dataset video Youtube yang telah dikumpulkan. Judul dan deskripsi video dipilih sebagai atribut klasifikasi karena mengandung informasi utama yang ditulis oleh penggunggah terkait video yang diunggah. Sedangkan komentar dipilih sebagai atribut klasifikasi karena dapat dijadikan sumber informasi ketika informasi yang disediakan oleh pengunggah tidak dapat mereprentasikan video yang digunakan. Melalui eksperimen, didapatkan model klasifikasi dengan F-Measure sebesar 83,45%. Model dibangun dengan menggunakan pendekatan leksikal terhadap dataset judul dan deskripsi video (tanpa komentar), Support Vector Machines sebagai algoritma klasifikasi, serta metode binary sebagai metode pembobotan fitur. Dengan menggunakan model tersebut, telah dikembangkan sistem klasifikasi konten terbatas berdasarkan teks metadata dan komentar video. Kata kunci: Klasifikasi, Konten Terbatas, Support Vector Machines. Abstract Restricted content classification is an activity of labeling video content into two category, which are restricted content that is appropriate for all audiences and non-restricted content that are not appropriate for minor audiences (age < 18). On Youtube, restricted content classification is being processed manually by the expert staffs based on user reports. This research aims to build automatic restricted content classification system which is able to classify Youtube video based on its metadata (title, description) and video comments. This system would use the best model achieved from the experiment on Youtube video dataset. Video title and description are chosen as the classification attribute since they contain the main information about the video provided by the uploader. Meanwhile, video comments are chosen as the other classification attribute under the assumption that they would provide the information necessary when video title and description are not able to give any information related to the video. Our experiment shows that the best classification model with F-Measure of 83.45% is achieved by using lexical feature on dataset built from video title and description (without comments). We employed Support Vector Machines as the classification algorithm and binary as the feature weighting method. In this paper, a restricted content classification system based on metadata and video comments has been built. Keywords:Classification,Restricted Content, Support Vector Machines.
Toward a New Approach in Fruit Recognition using Hybrid RGBD Features and Fruit Hierarchy Property Ema Rachmawati; Iping Supriana; Masayu Leylia Khodra
Proceeding of the Electrical Engineering Computer Science and Informatics Vol 4: EECSI 2017
Publisher : IAES Indonesia Section

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1220.177 KB) | DOI: 10.11591/eecsi.v4.1029

Abstract

We present hierarchical multi-feature classification (HMC) system for multiclass fruit recognition problem. Our approach to HMC exploits the advantages of combining multimodal features  and  the  fruit  hierarchy  property.  In  the construction of hybrid features, we take the advantage of using color feature in the fruit recognition problem and combine it with 3D shape feature of depth channel of RGBD (Red, Green, Blue, Depth) images. Meanwhile, given a set of fruit species and variety, with a preexisting hierarchy among them, we consider the problem of assigning images to one of these fruit variety from the point of view of a hierarchy. We report on computational experiment using this approach. We show that the use of hierarchy structure along with hybrid RGBD features can improve the classification performance.
Automatic Tailored Multi-Paper Summarization based on Rhetorical Document Profile and Summary Specification Masayu Leylia Khodra; Dwi Hendratmo Widyantoro; E. Aminudin Aziz; Bambang Riyanto Trilaksono
Journal of ICT Research and Applications Vol. 6 No. 3 (2012)
Publisher : LPPM ITB

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.5614/itbj.ict.2012.6.3.4

Abstract

In  order  to  assist  researchers  in  addressing  time  constraint  and  low relevance  in  using  scientific  articles,  an  automatic  tailored  multi-paper summarization  (TMPS)  is  proposed.  In  this  paper,  we  extend  Teufel's  tailored summary  to  deal  with  multi-papers  and  more  flexible  representation  of  user information needs. Our TMPS extracts Rhetorical Document Profile (RDP) from each paper and  presents a summary based on user information needs.  Building Plan  Language  (BPLAN)  is  introduced  as  a  formalization  of  Teufel's  building plan  and  used  to  represent summary  specification,  which  is  more  flexible representation user information needs. Surface repair is embedded within the BPLAN  for  improving  the  readability  of  extractive summary.  Our  experiment shows that the average performance of RDP extraction module is 94.46%, which promises  high  quality  of  extracts  for  summary  composition.  Generality evaluation  shows  that  our  BPLAN  is  flexible  enough  in  composing  various forms  of summary.  Subjective  evaluation  provides evidence that  surface repair operators can improve the resulting summary readability.
Automatic Title Generation in Scientific Articles for Authorship Assistance: A Summarization Approach Jan Wira Gotama Putra; Masayu Leylia Khodra
Journal of ICT Research and Applications Vol. 11 No. 3 (2017)
Publisher : LPPM ITB

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.5614/itbj.ict.res.appl.2017.11.3.3

Abstract

This paper presents a studyon automatic title generation for scientific articles considering sentence information types known as rhetorical categories. A title can be seenas a high-compression summary of a document. A rhetorical category is an information type conveyed by the author of a text for each textual unit, for example: background, method, or result of the research. The experiment in this studyfocused on extracting the research purpose and research method information for inclusion in a computer-generated title. Sentences are classifiedinto rhetorical categories, after which these sentences are filtered using three methods. Three title candidates whose contents reflect the filtered sentencesare then generated using a template-based or an adaptive K-nearest neighbor approach. The experiment was conducted using two different dataset domains: computational linguistics and chemistry. Our study obtained a 0.109-0.255 F1-measure score on average for computer-generated titles compared to original titles. In a human evaluation the automatically generated titles were deemed 'relatively acceptable' in the computational linguistics domain and 'not acceptable' in the chemistry domain. It can be concluded that rhetorical categories have unexplored potential to improve the performance of summarization tasks in general.
Word Embedding for Rhetorical Sentence Categorization on Scientific Articles Ghoziyah Haitan Rachman; Masayu Leylia Khodra; Dwi Hendratmo Widyantoro
Journal of ICT Research and Applications Vol. 12 No. 2 (2018)
Publisher : LPPM ITB

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.5614/itbj.ict.res.appl.2018.12.2.5

Abstract

A common task in summarizing scientific articles is employing the rhetorical structure of sentences. Determining rhetorical sentences itself passes through the process of text categorization. In order to get good performance, some works in text categorization have been done by employing word embedding. This paper presents rhetorical sentence categorization of scientific articles by using word embedding to capture semantically similar words. A comparison of employing Word2Vec and GloVe is shown. First, two experiments are evaluated using five classifiers, namely Naïve Bayes, Linear SVM, IBK, J48, and Maximum Entropy. Then, the best classifier from the first two experiments was employed. This research showed that Word2Vec CBOW performed better than Skip-Gram and GloVe. The best experimental result was from Word2Vec CBOW for 20,155 resource papers from ACL-ARC, features from Teufel and the previous label feature. In this experiment, Linear SVM produced the highest F-measure performance at 43.44%.
Using Graph Pattern Association Rules on Yago Knowledge Base Wahyudi Wahyudi; Masayu Leylia Khodra; Ary Setijadi Prihatmanto; Carmadi Machbub
Journal of ICT Research and Applications Vol. 13 No. 2 (2019)
Publisher : LPPM ITB

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.5614/itbj.ict.res.appl.2019.13.2.6

Abstract

The use of graph pattern association rules (GPARs) on the Yago knowledge base is proposed. Extending association rules for itemsets, GPARS can help to discover regularities between entities in a knowledge base. A rule-generated graph pattern (RGGP) algorithm was used for extracting rules from the Yago knowledge base and a GPAR algorithm for creating the association rules. Our research resulted in 1114 association rules, with the value of standard confidence at 50.18% better than partial completeness assumption (PCA) confidence at 49.82%. Besides that the computation time for standard confidence was also better than for PCA confidence.