The utilization of workflow automation and multimodal artificial intelligence introduces a new approach to developing an intelligent digital receipt recording system. This study aims to design an automatic transaction processing system by integrating n8n as a workflow engine, Google Gemini AI as a multimodal inference model, and Telegram Bot as a conversational interface. The system is implemented in a self-hosted Docker-based environment to ensure local execution without cloud dependence, enhancing data security and reducing operational costs. An experimental software engineering method was applied using 33 test scenarios consisting of 20 image inputs and 13 text inputs. The system successfully extracted key transaction information such as store name, total amount, and transaction date under various real-world conditions, including blurred images, faded ink, missing text segments, tilted receipts, and imperfect handwriting. Evaluation using a Confusion Matrix produced perfect classification results with 100% accuracy, precision, recall, and F1-score, confirming that all system outputs aligned with actual conditions. The system also demonstrated stable performance with average processing times of 15.8 seconds for text and 17–18.5 seconds for low-resolution images. These results indicate that combining workflow automation and multimodal AI provides an effective and adaptive solution for automatic transaction recording.
Copyrights © 2025