Karwan Jacksi
University of Zakho

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

State of the art document clustering algorithms based on semantic similarity Karwan Jacksi; Niyaz Salih
Jurnal Informatika Vol 14, No 2 (2020): May 2020
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jifo.v14i2.a17513

Abstract

The constant success of the Internet made the number of text documents in electronic forms increases hugely. The techniques to group these documents into meaningful clusters are becoming critical missions. The traditional clustering method was based on statistical features, and the clustering was done using a syntactic notion rather than semantically. However, these techniques resulted in un-similar data gathered in the same group due to polysemy and synonymy problems. The important solution to this issue is to document clustering based on semantic similarity, in which the documents are grouped according to the meaning and not keywords. In this research, eighty papers that use semantic similarity in different fields have been reviewed; forty of them that are using semantic similarity based on document clustering in seven recent years have been selected for a deep study, published between the years 2014 to 2020. A comprehensive literature review for all the selected papers is stated. Detailed research and comparison regarding their clustering algorithms, utilized tools, and methods of evaluation are given. This helps in the implementation and evaluation of the clustering of documents. The exposed research is used in the same direction when preparing the proposed research. Finally, an intensive discussion comparing the works is presented, and the result of our research is shown in figures.
A state-of-the-art survey on semantic similarity for document clustering using GloVe and density-based algorithms Shapol M. Mohammed; Karwan Jacksi; Subhi R. M. Zeebaree
Indonesian Journal of Electrical Engineering and Computer Science Vol 22, No 1: April 2021
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v22.i1.pp552-562

Abstract

Semantic similarity is the process of identifying relevant data semantically. The traditional way of identifying document similarity is by using synonymous keywords and syntactician. In comparison, semantic similarity is to find similar data using meaning of words and semantics. Clustering is a concept of grouping objects that have the same features and properties as a cluster and separate from those objects that have different features and properties. In semantic document clustering, documents are clustered using semantic similarity techniques with similarity measurements. One of the common techniques to cluster documents is the density-based clustering algorithms using the density of data points as a main strategic to measure the similarity between them. In this paper, a state-of-the-art survey is presented to analyze the density-based algorithms for clustering documents. Furthermore, the similarity and evaluation measures are investigated with the selected algorithms to grasp the common ones. The delivered review revealed that the most used density-based algorithms in document clustering are DBSCAN and DPC. The most effective similarity measurement has been used with density-based algorithms, specifically DBSCAN and DPC, is Cosine similarity with F-measure for performance and accuracy evaluation.
Towards a Complete Kurdish NLP Pipeline: Challenges and Opportunities Karwan Jacksi; Dastan Maulud; Ismael Ali
Jurnal Informatika Vol 17, No 1 (2023): January 2023
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jifo.v17i1.a26053

Abstract

With the rapid growth of Kurdish language content on the web, there is a high demand for making this information readable and processable by machines. In order to accomplish this, the Kurdish Natural Language Processing (KNLP) pipeline is required. Computers that can process human language use the field of Natural Language Processing (NLP). In its efforts to bridge the communication gap between humans and computers, NLP draws from a wide range of fields, including computer science and computational linguistics. There have been some notable efforts made toward creating the KNLP pipeline. However, it does not support the complete NLP tasks needed to enable semantic web and text mining applications. This paper surveys the work done in the field of NLP for the Kurdish language, its applications, and linguistic challenges.