Journal of Applied Data Sciences
Vol 6, No 3: September 2025

Formalization of Morphological Rules for Kazakh Nouns in the New Latin Alphabet

Zhetkenbay, Lena (Unknown)
Sharipbay, Altynbek (Unknown)
Razakhova, Bibigul (Unknown)
Bekmanova, Gulmira (Unknown)
Barlybayev, Alibek (Unknown)
Nazyrova, Aizhan (Unknown)
Yergesh, Banu (Unknown)



Article Info

Publish Date
12 Jul 2025

Abstract

This study presents a hybrid computational model for formalizing and predicting morphological inflections of Kazakh nouns written in the new Latin alphabet. The motivation stems from limitations in previous systems based on Cyrillic orthography, which often misrepresented key phonological features such as vowel harmony and consonant assimilation. The main objective is to develop a linguistically informed and computationally efficient system to support Natural Language Processing (NLP) for Kazakh during its transition to Latin script. The methodology combines rule-based grammar formalization with a machine learning approach, specifically a Bayesian Regulation Backpropagation Neural Network (BR-BPNN). A manually curated dataset of 1,000 Latin-script Kazakh nouns was annotated for various morphological forms. Each word was encoded at the character level using a custom dictionary (kazlat_dict), capturing the final four letters as feature vectors. Formal logic and regular expressions were used to model morphological rules such as pluralization and case endings, incorporating vowel harmony, consonant softness, and sonority. These rules provided the training labels for the BR-BPNN model. The trained model achieved 91.5% accuracy, 89.4% precision, and a correlation coefficient (R) above 0.98, confirming the effectiveness of the hybrid system. A user interface prototype was developed to demonstrate practical utility, enabling users to input root nouns and receive suffix predictions with confidence scores and linguistic explanations. The novelty of this work lies in integrating linguistic theory with machine learning for a low-resource Turkic language. It offers a foundation for intelligent Kazakh language tools including spell checkers, grammar correctors, and educational platforms. Future work will extend the system to other parts of speech and explore contextual modeling to improve handling of ambiguous or irregular forms.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...