Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control

Indonesian Dataset Expansion of Microsoft Research Video Description Corpus and Its Similarity Analysis Rahutomo, Faisal; Hafidh Ayatullah, Ahmad
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Vol 3, No 4, November 2018
Publisher : Universitas Muhammadiyah Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (719.162 KB) | DOI: 10.22219/kinetik.v3i4.680

Abstract

This paper describes the academic base of an openly Indonesian dataset in Mendeley Data with DOI: 10.17632/d7vx5cc92y.1 [1]. The dataset is an Indonesian language expansion of Microsoft research video description corpus, an open dataset contains about 120 thousand sentences. The dataset is a useful resource because the sentences are a set of roughly parallel descriptions of more than 2,000 video snippets of 35 languages. Both paraphrase and bilingual relation are available but Indonesian description is not available in the dataset. Therefore, this paper describes the research effort to expand the dataset for the Indonesian language. The research collected 43,753 description texts of 1,959 short videos, parallel with Microsoft’s dataset. Adding more value to the dataset, similarity metrics calculations of the texts were done. The metrics were Cosine, Jaccard, euclidian, and Manhattan with average results were 0.22, 0.33, 2.38, and 6.08 respectively.
Indonesian Dataset Expansion of Microsoft Research Video Description Corpus and Its Similarity Analysis Faisal Rahutomo; Ahmad Hafidh Ayatullah
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Vol 3, No 4, November 2018
Publisher : Universitas Muhammadiyah Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (719.162 KB) | DOI: 10.22219/kinetik.v3i4.680

Abstract

This paper describes the academic base of an openly Indonesian dataset in Mendeley Data with DOI: 10.17632/d7vx5cc92y.1 [1]. The dataset is an Indonesian language expansion of Microsoft research video description corpus, an open dataset contains about 120 thousand sentences. The dataset is a useful resource because the sentences are a set of roughly parallel descriptions of more than 2,000 video snippets of 35 languages. Both paraphrase and bilingual relation are available but Indonesian description is not available in the dataset. Therefore, this paper describes the research effort to expand the dataset for the Indonesian language. The research collected 43,753 description texts of 1,959 short videos, parallel with Microsoft’s dataset. Adding more value to the dataset, similarity metrics calculations of the texts were done. The metrics were Cosine, Jaccard, euclidian, and Manhattan with average results were 0.22, 0.33, 2.38, and 6.08 respectively.
Co-Authors Abdul Latif Priyadi Agustaf Fanisnaini Narolis Ahmad Hafidh Ayatullah Aisy Muhammad R Ali, Muhammad Haidar Aljalal, Majid Annisa Taufika Firdausi Annisa Taufika Firdausi Ariyo, Bashiru Olalekan Astiningrum, Mungki Aulia, Indinabilah Bambang Harjito, Bambang Carfin Febriawan Pratama Putra Christine Dewi Christine Kartika Dewi Daffa , Aminuddin Dhebys Suryani Hormansyah Dhebys Suryani Hormansyah, Dhebys Suryani Dhiana Novita Sari Diana Mayangsari Ramadhani Diana Mayangsari Ramadhani Dimas Rossiawan Hendra Putra Dwi Puspitasari Dyah Ayu Irawati Dyah Ayu Irawati, Dyah Ayu Ekojono Febri Liantoni Fidyawan, Miftahul Agtamas Gunawan Budi Prasetyo Hafidh Ayatullah, Ahmad Haris Setiyono Henda, Reihan Ibrahim, Sutrisno Ikawati, Deasy Sandhya Elya Imam Fahrur Rozi Imam Nawawi Imam Nawawi, Imam Indinabilah Aulia Inggrid Yanuar Risca Pratiwi Inggrid Yanuar Risca Pratiwi Irvan Wahyu Nurdian Joko Haryono Josaphat Tetuko Sri Sumantyo Kharismadita, Paratisa Kurniawan, Muhammad Fachrul Latif Priyadi, Abdul M Bisri Musthofa Meiyanto Eko Sulistyo Meiyanto Eko Sulistyo Meiyanto Eko Sulistyo Mekonnen, Atinkut Molla Miftahul Agtamas Fidyawan Moechammad Sarosa Muhammad Arief Rahman Muhammad Arief Rahman Muhammad Bisri Musthafa Muhammad Elfa Rodhian Putra Muhammad Fachrul Kurniawan Muhammad Hamka Ibrahim Muhammad Hamka Ibrahim Muhammad R, Aisy Muhammad Rifky Prayanta Musthafa, Muhammad Bisri Ngat mari Ngatmari Ngatmari Ngatmari, Ngatmari Nugraha, Bagus Putra Nur Rochmanshah Nurdian, Irvan Wahyu Pangestu Nur Mirzha Paratisa Kharismadita Pramana Yoga Saputra Pramudita, Muhammad Aisamuddin Eka Putra Prima Arhandi, Putra Prima Putra, Carfin Febriawan Pratama Rahmad, Cahya Rahman, Muhammad Arief Ridwan Rismanto Ririd, Ariadi Retno Tri Hayati Riyanarto Sarno Rochmanshah, Nur Rohman, Obby Auliyaur Rosa Andrie Asmara Rosiani, Ulla Delfana Sari, Dhiana Novita Septarina, Amalia Agung Subuh Pramono Sulistyoningrum, Trie Endah Sutrisno Sutrisno Sutrisno Sutrisno Sutrisno Sutrisno Sutrisno, Sutrisno Yoppy Yunhasnawa Yushintia Pramitarini Yushintia Pramitarini Zanuar Hanif Rachmat Adi