What is SLAM?

SLAM introduction animation
SLAM demo.
SLAM stands for Simultaneous Localization and Mapping. It is a computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it.

Applications

SLAM Application: Object Detection
Object Detection
SLAM Application: Parking
Parking Lot Annotation
SLAM Application: Lane Annotation
Lane Annotation
SLAM Application: Lane Reprojection
Lane Reprojection
SLAM Application: HD Map
HD Map [source]

SLAM has various applications, including:

Object Detection

Undistorted LiDAR point clouds are fed into 3D object detection models to identify objects such as vehicles and pedestrians within the environment.

Parking Lot Annotation

First, SLAM is used to acquire a map of the parking lot. A rendering method then generates a ground-level map to allow for more accurate annotation.

Lane Annotation

Similar to parking lot annotation, SLAM creates a high-resolution ground map, which is then used to accurately annotate traffic lanes and road markings.

High-Definition (HD) Map Creation

SLAM is a foundational technology for building HD maps. The dense 3D point cloud of the environment is processed to extract detailed information about road geometry, signs, and other critical features.

Visual SLAM vs. LiDAR SLAM

Visual SLAM Example
LiDAR SLAM Example
Visual SLAM (left) vs. LiDAR SLAM (right)
FeatureVisual SLAMLiDAR SLAM
Hardware CostCheapExpensive
AlgorithmComplexSimple
Geometry AccuracyNo accurate geometryAccurate geometry

Basics of LiDAR

Mechanical LiDAR
Mechanical LiDAR
Solid-State LiDAR
Solid-State LiDAR
Hybrid LiDAR
Hybrid LiDAR
Different types of LiDAR [source]

There are different types of LiDAR technologies:

  • Mechanic LiDAR

    • Pros: 360-degree, high performance, and lower cost
    • Cons: Wear and failure over time
    • Use cases: Data collection, testing, Robotaxi
  • Solid-State LiDAR

    • Pros: Simple, cheap, reliable
    • Cons: Low power density, short detection distance, near-range blind spot LiDAR
  • Hybrid LiDAR

    • Pros: Low cost, highly reliable, long distance detection
    • Cons: Limited field of view
    • Use cases: Front-facing LiDAR

Problem Formulation

The core problems in SLAM can be formulated as optimization problems:

  • Mapping

    $$\underset{\mathbf{m}}{\text{min}} \sum_{t=1}^{T}||\mathbf{z}_{t}-h(\mathbf{x}_{t},\mathbf{m})||_{2}^{2}$$
  • Localization

    $$\underset{\mathbf{x}_{1:T}}{\text{min}} \sum_{t=1}^{T}||\mathbf{z}_{t}-h(\mathbf{x}_{t},\mathbf{m})||_{2}^{2}$$
  • SLAM

    $$\underset{\mathbf{x}_{1:T},\mathbf{m}}{\text{min}} \sum_{t=1}^{T}||\mathbf{z}_{t}-h(\mathbf{x}_{t},\mathbf{m})||_{2}^{2}$$

Where:

  • $\mathbf{x}$ is the pose of the robot.
  • $\mathbf{z}$ is the observation.
  • $h(.)$ is the observation function.
  • $\mathbf{m}$ is the map.

Undistorting LiDAR Points

Distorted LiDAR Points
Distorted LiDAR Points (vehicle in motion)
Undistorted LiDAR Points
Undistorted LiDAR Points (motion compensated)

To accurately process LiDAR data, it’s necessary to undistort the points. This is particularly important when the ego vehicle moves. Undistortion can be achieved using an Inertial Measurement Unit (IMU), or a motion model that can provide short-term accurate movement information.

Iterative Closest Point (ICP)

ICP Animation
ICP (Iterative Closest Point) aligning two point clouds.

Recall that when doing SLAM, the pose of the robot and the map of the environment are unknown and tightly coupled, creating a “chicken-and-egg” problem. However, we observe that the first scan of the LiDAR already provides a rough estimate of the environment. Without loss of generality, we can set the first scan as the rough map and its corresponding pose as the identity matrix. Then, we can match the following scans to the map and update the pose of the robot. The commonly used algorithm for matching is the Iterative Closest Point (ICP).

ICP is a key algorithm used in SLAM. The process involves:

  1. Association: $$j^{*}=\underset{j\in\{1,...,M\}}{\text{argmin}}||\mathbf{R}\mathbf{x}_{i}+\mathbf{t}-\mathbf{y}_{j}||_{2}$$
  2. Minimization: $$\underset{\mathbf{R},\mathbf{t}}{\text{argmin}}\quad E(\mathbf{R},\mathbf{t})=\sum_{i=1}^{N}e_{i}(\mathbf{R},\mathbf{t})^{2}=\sum_{i=1}^{N}||\mathbf{R}\mathbf{x}_{i}+\mathbf{t}-\mathbf{y}_{j}||_{2}^{2}$$

We continue these two steps iteratively until the robot’s pose converges. For a more detailed explanation of ICP, please refer to Iterative Closest Point Uncovered: Mathematical Foundations and Applications.

Motion Model

Because SLAM relies on estimating the relative pose between the current scan and either previous scans or a map, incorporating motion priors can significantly improve performance. A vehicle motion model—leveraging control data such as IMU readings, steering angle, and wheel speed—provides valuable prior information that enhances SLAM accuracy. With a motion model, the SLAM problem can be formulated as follows:

$$ \begin{aligned} \underset{\mathbf{m},\mathbf{x}_{1:T}}{\text{min}}& \sum_{t=1}^{T}||\mathbf{z}_{t}-h(\mathbf{x}_{t},\mathbf{m})||_{2}^{2} \\ \text{s.t. } &\mathbf{x}_{t}=f(\mathbf{x}_{t-1},\mathbf{u}_{t})+ \mathbf{n}_{t} \text{ for } t=1,...,T \end{aligned} \tag{1} $$

where $\mathbf{u}_{t}$ is the control input at time $t$, and $\mathbf{n}_{t}$ is the noise.

Kalman Filter

Problem (1) is a sequential optimization problem. The Kalman Filter provides an efficient way to solve it. It involves two steps:

  • Predict: Predict the next pose based on the current pose and control inputs.
$$ \mathbf{x}_{t} = f(\mathbf{x}_{t-1}, \mathbf{u}_{t}) + \mathbf{n}_{t} $$
  • Update: Update the current pose using the current observation.
$$ \min_{\mathbf{m},\mathbf{x}_{t}} \| \mathbf{z}_{t} - h(\mathbf{x}_{t}, \mathbf{m}) \|_{2}^{2} $$

Please refer to Demystifying Kalman Filters: From Classical Estimation to Bayesian Inference for more details.

Post-Processing

Ground surface before post-processing
The ground surface before post-processing.
SLAM post-processing
SLAM post-processing.
Ground surface after post-processing
Ground surface after post-processing.

As we can see, the map is built by appending scans one after another, which leads to an accumulation of errors. Therefore, post-processing is necessary to address the long-term drift. The process of post-processing is to optimize the map and the robot’s pose jointly. First, we must find the points that belong to the same feature (e.g., a surface, a line, etc.). Then, we identify the poses that correspond to the points. Finally, we make the feature conform to its natural shape by optimizing the poses—for example, making a surface feature more like a plane or a line feature more like a straight line.

SLAM Challenges in Autonomous Driving

Key challenges of LiDAR SLAM in autonomous driving include:

High-Speed Scenarios

When a vehicle travels at high speeds, the resulting point clouds become sparse, leading to less overlap between consecutive scans. This sparsity makes it difficult to accurately match scans and can degrade the quality of the localization and mapping.

Large-Scale Mapping

Mapping extensive areas, such as entire cities, consumes significant memory resources. Furthermore, performing post-processing and global optimization on such large maps is computationally expensive and challenging to manage.

Highly Repetitive Environments

Environments like highways and tunnels are often highly repetitive, with few unique features. This geometric similarity makes it difficult for matching algorithms to determine the relative poses, which makes positioning challenging.

Dynamic Environments

Autonomous driving environments are inherently dynamic, filled with moving objects like other vehicles and pedestrians. Most SLAM algorithms operate on the fundamental assumption that the environment is static. Therefore, additional techniques are required to identify, track, and remove these dynamic elements from the point cloud to prevent them from corrupting the map and vehicle’s localization.