Visual simultaneous localization and mapping (Visual SLAM) based on RGB-D images includes two main tasks: building an environment map and simultaneously tracking the location/motion trajectory of the image sensor, or called visual odometry (VO). Visual SLAM and VO are used in many applications as robot systems, autonomous mobile robots, supporting systems for the blind, human-machine interaction, industry, etc. With the strong development of deep learning (DL), it has been applied and brought impressive results when building Visual SLAM and VO from image sensor data (RGB-D images). To get the overall picture of the development of DL applied to building Visual SLAM and VO systems. At the same time, the results, challenges, and advantages of DL models to solve Visual SLAM and VO problems. In this paper, we proposed the taxonomy to conduct a complete survey based on three methods from RGB-D images: (1) using DL for the modules (depth estimation, optical flow estimation, visual odometry, mapping, and loop closure detection) of the Visual SLAM and VO framework; (2) using DL modules to supplement (feature extraction, semantic segmentation, pose estimation, map construction, loop closure detection, others module) to Visual SLAM and VO framework; (3) using end-toend DL to build Visual SLAM and VO systems. The studies were surveyed based on the order of methods, datasets, and evaluation measures, the detailed results according to datasets are also presented. In particular, the challenges of studies using DL to build Visual SLAM and VO systems are also analyzed and some of our further studies are also introduced.
Copyrights © 2024