Multi-modal or multi-view dataset that was captured from various resources (e.g. RGB and Depth) of a subject at the same time. Combination between different cues has still faced to many challenges as unique data and complementary in-formation. In adition, the proposed method for multiple modalities recognition consists of discrete blocks, such as: extract features for separative data flows, combine of features, and classify gestures. To address the challenges, we pro-posed two novel end-to-end hand posture recognition frameworks, which are integrated all steps into a convolution neuronal network (CNN) system from capturing various types of cues (RGB and Depth images) to classify hand ges-ture labels. Both frameworks use the Resnet50 backbone that was pretrained by ImageNet dataset. We proposed a novel end-to-end multi-modal frameworks, which are named attention convolution module (ACM) and gated concatenation module (GCM). Both of them are deployed, evaluated and compared on vari-ous multi-modalities hand posture datasets. Experimental results show that our proposed method outperforms with others state-of-the-art techniques (SOTA) methods.
Copyrights © 2022