Introduction to ROS2 Navigation and SLAM


This application note gives a quick introduction to the navigation concepts and the high-level architecture of the ROS2 navigation stack which is utilized in the I.MX 8MP NavQ+ software distribution. For in-depth description, please read the Nav2 documentation at .

The application note assumes that the reader has basic familiarity with the key concepts of the Robotic Operation System 2 (ROS2) such as Nodes, Services, Actions, or Topics. For more in-depth description, please read the ROS2 documentation at .

Basic Navigation Concepts

The Nav2 navigation stack consists of these basic parts:

  1. Behavior Trees.

  2. Navigation Servers.

  3. State Estimation Components.

  4. Environmental Representation Components.

Behavior Trees

Behavior Trees provide a formal structure for navigation logic. They define what actions a robot shall undertake being in one or another situation while navigating to a target. Being a high-level component in the navigation stack, Behavior Trees match navigation actions with corresponding navigation servers.

For example, Behavior Trees define logic sequencing actions to compute a path to the goal position, then enact action to navigate along the computed path, enact recovery actions if the robot is stuck, and recompute the path if recovery doesn’t help or a new navigation goal has been activated.

Navigation servers are the heart of navigation stack. The navigation pipeline differentiates navigation servers by their respective purposes:

  • Planners

  • Controllers

  • Recovery behaviors

The navigation stack implements more navigation servers but we intentionally omit them in order to simplify the introduction.

The planner component’s main goal is to compute paths. Depending on a use case these can be different types of paths. An example is a path that would allow a robot to move from the current position to the navigation goal. The navigation stack provides several ready-to-use planners implementing various planning algorithms.

The controller’s main goal is to follow the path previously computed by the planner. The navigation stack provides several ready-to-use controllers suitable for various robot types.

Due to a changing environment, following the path exactly can not always be achievable. When the controller can’t follow the path it signals about it with an error, which enacts various recovery behaviors such as re-compute the path from the existing location, spin robot, or move it back a bit, etc.

State Estimation Components

The navigation stack operates with different positioning data which coordinate frames. The frame could be thought of a coordinate system with a unique name which has a defined coordinate transformation relative to another frame called parent frame. Thus, frames can form hierarchies allowing to transform coordinates from one frame to another by applying transformations for each linked frame on the path between the two.

There are three main coordinate frames which have special meaning in the navigation stack:

  • base_link - a coordinate frame rigidly attached to the robot base. The robot’s base is always at the center of this coordinate system.

  • odom - a fixed-world coordinate system . The position of a mobile platform in the odom frame can drift over time, without any bounds. This drift makes the odom frame useless as a long-term global reference.

  • map - a fixed-world coordinate system. The position of a mobile platform, relative to the map frame, should not significantly drift over time.

The main coordinate frames effectively split state estimate components into two main parts:

  • Localization or global positioning system which provides map -> odom coordinate transformation

  • Odometry or local positioning system which provides the odom -> base_link coordinate transformation.

The global positioning can employ GPS, SLAM (Simultaneous Localization and Mapping), Motion Capture and other methods. The navigation stack provides amcl which is an Adaptive Monte-Carlo Localization technique based on a particle filter for localization in a static map. It also provides SLAM Toolbox as the default SLAM algorithm for use to position and generate a static map.

The local positioning (odometry) can come from many sources including LIDAR, RADAR, wheel encoders, VIO (Visual Inertial Odometry), IMUs (Inertial Measurement Unit), depth sensors or RGB cameras. The goal of the odometry is to provide a smooth and continuous local frame based on robot motion. The navigation stack provides components for working with many popular hardware options. VoxelBotics provides a software component for working with the PMD Flexx2 depth cameras.

Environmental Representation

The environmental representation is the way the robot perceives its environment. It also acts as the central localization for various algorithms and data sources to combine their information into a single space. This space is then used by the controllers, planners, and recoveries to compute their tasks safely and efficiently.

The current environmental representation is a costmap. A costmap is a regular 2D grid of cells containing a cost from unknown, free, occupied, or inflated cost. This costmap is then searched to compute a global plan (by planner) or sampled to compute local control efforts (by controller).

Various costmap layers are implemented as plugins to buffer information into the costmap. This includes information from LIDAR, RADAR, sonar, depth images, etc.

Costmap layers can be created to detect and track obstacles in the scene for collision avoidance using camera or depth sensors. Additionally, layers can be created to algorithmically change the underlying costmap based on some rule or heuristic. Finally, they may be used to buffer live data into the 2D or 3D world for binary obstacle marking.

Simultaneous Localization and Mapping (SLAM)

Simultaneous localization and mapping (SLAM) solves the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of a mobile platform (i.e. the robot) location within it.

SLAM is required where a mobile platform operates in an environment for which map is either partial or fully absent.

SLAM is also applicable for use in cases where a mobile platform operates in an environment for which map is known a priori, however these use cases can be eligible for other localization methods such as AMCL.

SLAM is applicable for both 2D and 3D motion. In the latter case it is commonly known as 3D SLAM or vSLAM.

The navigation stack provides Slam Toolbox as the default 2D SLAM implementation.

Slam Toolbox operates with odometry acquired from LIDARs which commonly are 3600 laser scans with Time of Flight (ToF) measurements across the orbit. Slam toolbox uses these scans to build the map, and to localize the mobile platform on the map by matching the newly read laser scans with the stored map, thus correcting the odom position drift which may become substantial over time.

Due to the implementation specifics, Slam Toolbox poorly operates with laser scans with narrow angles giving many false scan matches. In these cases, a PMD Flexx2 camera with only 560 angle wide would render navigation impossible.

The navigation stack doesn’t provide default 3D SLAM implementation at the moment. Though, there are several 3D SLAM ROS2 compatible implementations available, none of them is mature enough to be included in the navigation stack.

Using SLAM on NAVQ+

In the context of NAVQ+, the software release provides ready-to-use SLAM-based navigation utilizing the navigation stack, Slam Toobox, iRobot Create3 mobile platform and SlamTeck RPLIDAR A1.

In addition, the software release provides a ready-to-use ROS2 node for PMD Flexx2 camera which,upon integration, enhances obstacle avoidance of the mobile platform where the LIDAR can’t “see” obstacles below the scanned plane.

The following application notes describe how to prepare the NAVQ+, iRobot Create3, and other components for SLAM-base navigation with and without PMD Flexx2 camera.

Related Topics