Claim Missing Document
Check
Articles

Found 3 Documents
Search

Attention correlated appearance and motion feature followed temporal learning for activity recognition Manh-Hung Ha; The-Anh Pham; Dao Thi Thanh; Van Luan Tran
International Journal of Electrical and Computer Engineering (IJECE) Vol 13, No 2: April 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v13i2.pp1510-1521

Abstract

Recent advances in deep neural networks have been successfully demonstrated with fairly good accuracy for multi-class activity identification. However, existing methods have limitations in achieving complex spatial-temporal dependencies. In this work, we design two stream fusion attention (2SFA) connected to a temporal bidirectional gated recurrent unit (GRU) one-layer model and classified by prediction voting classifier (PVC) to recognize the action in a video. Particularly in the proposed deep neural network (DNN), we present 2SFA for capturing appearance information from red green blue (RGB) and motion from optical flow, where both streams are correlated by proposed fusion attention (FA) as the input of a temporal network. On the other hand, the temporal network with a bi-directional temporal layer using a GRU single layer is preferred for temporal understanding because it yields practical merits against six topologies of temporal networks in the UCF101 dataset. Meanwhile, the new proposed classifier scheme called PVC employs multiple nearest class mean (NCM) and the SoftMax function to yield multiple features outputted from temporal networks, and then votes their properties for high-performance classifications. The experiments achieve the best average accuracy of 70.8% in HMDB51 and 91.9%, the second best in UCF101 in terms of 2DConvNet for action recognition.
Plant pathology identification using local-global feature level based on transformer Manh-Hung Ha; Duc-Chinh Nguyen; Manh-Tuan Do; Dinh-Thai Kim; Xuan-Hai Le; Ngoc-Thanh Pham
Indonesian Journal of Electrical Engineering and Computer Science Vol 34, No 3: June 2024
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v34.i3.pp1582-1592

Abstract

Deep learning plays a crucial role in addressing the challenge of plant disease identification in the field of agriculture. Detecting diseases in plants requires extensive effort, along with a comprehensive understanding of various plant diseases and increased processing time. Balancing both speed and accuracy in predicting leaf diseases in plants can significantly improve crop production and reduce environmental damage. In this paper, we examined deseases on popular plants in agriculture. We proposed a novel model to predict crop pathology on a feature space of global-local based on transformer aggregation. Paticular, we use refined feature of different layer to correlate semantics from high-level feature and low-level feature. Besides, to capture the extended temporal scale across the entire image, we employ a transformer to discern long-range dependencies among frames. Subsequently, the enhanced features incorporating these dependencies are inputted into a classifier for preliminary crop pathology prediction. The plant village dataset and VietNam strawberry disease (VNStr) dataset were utilized for training and disease classification in the experiments. Extensive experiments show that the proposed method outperforms by 99.18% and 94.05% accuracy in plant village and VNStr, respectivly. The model after being judged was applied on Android devices and therefore is easy to use.
Top-Heavy CapsNets Based on Spatiotemporal Non-Local for Action Recognition Manh-Hung Ha
Journal of Computing Theories and Applications Vol. 2 No. 1 (2024): JCTA 2(1) 2024
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.10551

Abstract

To effectively comprehend human actions, we have developed a Deep Neural Network (DNN) that utilizes inner spatiotemporal non-locality to capture meaningful semantic context for efficient action identification. This work introduces the Top-Heavy CapsNet as a novel approach for video analysis, incorporating a 3D Convolutional Neural Network (3DCNN) to apply the thematic actions of local classifiers for effective classification based on motion from the spatiotemporal context in videos. This DNN comprises multiple layers, including 3D Convolutional Neural Network (3DCNN), Spatial Depth-Based Non-Local (SBN) layer, and Deep Capsule (DCapsNet). Firstly, the 3DCNN extracts structured and semantic information from RGB and optical flow streams. Secondly, the SBN layer processes feature blocks with spatial depth to emphasize visually advantageous cues, potentially aiding in action differentiation. Finally, DCapsNet is more effective in exploiting vectorized prominent features to represent objects and various action features for the ultimate label determination. Experimental results demonstrate that the proposed DNN achieves an average accuracy of 97.6%, surpassing conventional DNNs on the traffic police dataset. Furthermore, the proposed DNN attains average accuracies of 98.3% and 80.7% on the UCF101 and HMDB51 datasets, respectively. This underscores the applicability of the proposed DNN for effectively recognizing diverse actions performed by subjects in videos.