Journal of Applied Data Sciences
Vol 5, No 3: SEPTEMBER 2024

Data Processing and Optimization in the Development of Machine Learning Systems: Detailed Requirements Analysis, Model Architecture, and Anti-Data Drift Strategies

Boyko, Nataliya (Unknown)



Article Info

Publish Date
31 Jul 2024

Abstract

The research relevance is determined by the growing need to use machine learning systems in various industries, which requires reliable data processing and optimization. The study aims to develop a machine learning system for data processing and optimization, that predicts employee departure based on internal company data, analyze the subject area and existing approaches, define model architecture and describe the developed system, validate the application’s performance on test data, and develop strategies to counteract data drift. To achieve this goal, the applied methods are machine learning algorithms, including, decision tree algorithm, logistic regression, neural networks, and architectural approaches used in machine learning systems with low input data information. This study employs multi-generation model architectures, ensemble methods with LightGBM for robust prediction, and dynamic adaptation strategies to handle feature and data drift. The main results of the study are a machine learning and data pre-processing system for recognizing the risk of employee dismissal, which can serve as a basis for implementing similar services in IT companies. The object of the study is the system of predicting the probability of a particular employee’s dismissal within a certain period. It also demonstrates how to cope with all the difficulties of developing a solution based on data of low information content and poor quality. The implemented application, despite the quality of the data and the high imbalance of classes, produces valuable results for the business. The practical significance of this study lies in the possibility of using the developed system to predict and prevent employee losses, which contributes to increasing team stability and improving the efficiency of personnel management, as well as increasing the competitiveness of enterprises.

Copyrights © 2024






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...