Manual assessment of descriptive answers is often time-consuming, error-prone, and subject to bias. While artificial intelligence (AI) has made significant strides, current automated evaluation methods typically rely on simplistic metrics like word counts or predefined terms, which lack a deeper understanding of the content and are highly dependent on curated datasets. As demand for automated grading systems increases, there is a growing need to evaluate not only descriptive answers but also code-based responses. This study addresses these challenges by applying natural language processing (NLP) and deep learning (DL) techniques, testing three baseline models: multinomial Naïve bayes (MNB), bidirectional long short-term memory (Bi-LSTM), and bidirectional encoder representations from transformers (BERT). We propose EvalBERT, a BERT-based model fine-tuned with domain-specific academic corpora using computer processing unit (CPU) acceleration. EvalBERT automates grading for both descriptive and C programming exams, offering features like readability statistics and error detection. Experimental results show that EvalBERT achieves 94.86% accuracy, outperforming other models by 1.22 percentage points, with training time reduced by half. Additionally, EvalBERT is the first model pre-trained with academic corpora for this purpose. An interactive user interface, E-Pariksha, was also developed for administering and taking exams online. EvalBERT provides precise assessments, enabling educators to better evaluate student performance and offer more detailed feedback.
Copyrights © 2025