Garuda - Garba Rujukan Digital

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control

Vol 3, No 4, November 2018

Rahutomo, Faisal (Unknown)
Hafidh Ayatullah, Ahmad (Unknown)

Publish Date
15 Oct 2018

This paper describes the academic base of an openly Indonesian dataset in Mendeley Data with DOI: 10.17632/d7vx5cc92y.1 [1]. The dataset is an Indonesian language expansion of Microsoft research video description corpus, an open dataset contains about 120 thousand sentences. The dataset is a useful resource because the sentences are a set of roughly parallel descriptions of more than 2,000 video snippets of 35 languages. Both paraphrase and bilingual relation are available but Indonesian description is not available in the dataset. Therefore, this paper describes the research effort to expand the dataset for the Indonesian language. The research collected 43,753 description texts of 1,959 short videos, parallel with Microsoftâs dataset. Adding more value to the dataset, similarity metrics calculations of the texts were done. The metrics were Cosine, Jaccard, euclidian, and Manhattan with average results were 0.22, 0.33, 2.38, and 6.08 respectively.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

719.162 KB

Check in Google Scholar

Journal Info

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control

Website

Abbrev

kinetik

Publisher

Universitas Muhammadiyah Malang

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Energy Engineering

Description

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control was published by Universitas Muhammadiyah Malang. journal is open access journal in the field of Informatics and Electrical Engineering. This journal is available for researchers who want to improve ...

Article Info

Abstract

Indonesian Dataset Expansion of Microsoft Research Video Description Corpus and Its Similarity Analysis

Article Info

Abstract