A significant number of machine learning techniques use convolutional neural networks (CNN). The hardware acceleration is essential due to the tremendous computation demands of CNNs, as well as the need for improved energy performance and lower usage response time. This manuscript presents the resource-optimized CNN-based hardware accelerator on a field-programmable gate array (FPGA) platform. The design uses LeNet-5 architecture for handwritten digits classification using the MNIST dataset. The CNN accelerator uses three different optimization approaches in this work, including fixed-point (FP) data optimization with shortened bits, approximate multiply-accumulate (MAC) operations, and loop unrolling features with a pipelining mechanism to optimize the hardware resources. These approaches improve the latency and resources, and overall performance of the CNN architecture. The CNN-based accelerator is designed and implemented on the Xilinx Zynq platform with the high-level synthesis (HLS) tool. The proposed MAC unit obtains a latency of 3.58 ms, with a frequency of 120.69 MHz on chip. The design uses only 96 block random access memory (BRAMs), digital signal processing (DSP) units of 106 and obtains the accuracy of 99.01%. The proposed accelerator improves the performance over state-of-the-art accelerators in concerning the area, power and accuracy.