This study aims to analyze the effectiveness of deep learning models, particularly ChatGPT, in detecting errors in students’ mathematical problem-solving processes and to compare automated assessment results with manual assessment conducted by lecturers. The research employed a quantitative descriptive-comparative approach involving 37 first-semester university students in a mathematics education program. Data were collected through written essay tests on plane geometry topics, questionnaires, and documentation. The written responses were assessed manually by lecturers and automatically by ChatGPT, focusing on conceptual, procedural, and computational errors. Data analysis used descriptive statistics and comparative analysis to examine score differences and consistency between the two assessment methods. The results show that the average score differences between manual assessment and ChatGPT assessment were relatively small, ranging from 0.4 to 4.5 points, indicating a high level of accuracy and consistency of the automated system. ChatGPT demonstrated advantages in efficiency, objectivity, and speed of assessment, while manual assessment remained superior in interpreting implicit reasoning and contextual understanding. These findings suggest that ChatGPT has strong potential as an automated assessment tool to support mathematics educators, particularly in identifying student error patterns systematically, although human judgment is still necessary for comprehensive pedagogical interpretation.