What is SLAM?

Applications





SLAM has various applications, including:
Object Detection
Undistorted LiDAR point clouds are fed into 3D object detection models to identify objects such as vehicles and pedestrians within the environment.
Parking Lot Annotation
First, SLAM is used to acquire a map of the parking lot. A rendering method then generates a ground-level map to allow for more accurate annotation.
Lane Annotation
Similar to parking lot annotation, SLAM creates a high-resolution ground map, which is then used to accurately annotate traffic lanes and road markings.
High-Definition (HD) Map Creation
SLAM is a foundational technology for building HD maps. The dense 3D point cloud of the environment is processed to extract detailed information about road geometry, signs, and other critical features.
Visual SLAM vs. LiDAR SLAM


Feature | Visual SLAM | LiDAR SLAM |
---|---|---|
Hardware Cost | Cheap | Expensive |
Algorithm | Complex | Simple |
Geometry Accuracy | No accurate geometry | Accurate geometry |
Basics of LiDAR



There are different types of LiDAR technologies:
Mechanic LiDAR
- Pros: 360-degree, high performance, and lower cost
- Cons: Wear and failure over time
- Use cases: Data collection, testing, Robotaxi
Solid-State LiDAR
- Pros: Simple, cheap, reliable
- Cons: Low power density, short detection distance, near-range blind spot LiDAR
Hybrid LiDAR
- Pros: Low cost, highly reliable, long distance detection
- Cons: Limited field of view
- Use cases: Front-facing LiDAR
Problem Formulation
The core problems in SLAM can be formulated as optimization problems:
Mapping
$$\underset{\mathbf{m}}{\text{min}} \sum_{t=1}^{T}||\mathbf{z}_{t}-h(\mathbf{x}_{t},\mathbf{m})||_{2}^{2}$$Localization
$$\underset{\mathbf{x}_{1:T}}{\text{min}} \sum_{t=1}^{T}||\mathbf{z}_{t}-h(\mathbf{x}_{t},\mathbf{m})||_{2}^{2}$$SLAM
$$\underset{\mathbf{x}_{1:T},\mathbf{m}}{\text{min}} \sum_{t=1}^{T}||\mathbf{z}_{t}-h(\mathbf{x}_{t},\mathbf{m})||_{2}^{2}$$
Where:
- $\mathbf{x}$ is the pose of the robot.
- $\mathbf{z}$ is the observation.
- $h(.)$ is the observation function.
- $\mathbf{m}$ is the map.
Undistorting LiDAR Points


To accurately process LiDAR data, it’s necessary to undistort the points. This is particularly important when the ego vehicle moves. Undistortion can be achieved using an Inertial Measurement Unit (IMU), or a motion model that can provide short-term accurate movement information.
Iterative Closest Point (ICP)

Recall that when doing SLAM, the pose of the robot and the map of the environment are unknown and tightly coupled, creating a “chicken-and-egg” problem. However, we observe that the first scan of the LiDAR already provides a rough estimate of the environment. Without loss of generality, we can set the first scan as the rough map and its corresponding pose as the identity matrix. Then, we can match the following scans to the map and update the pose of the robot. The commonly used algorithm for matching is the Iterative Closest Point (ICP).
ICP is a key algorithm used in SLAM. The process involves:
- Association: $$j^{*}=\underset{j\in\{1,...,M\}}{\text{argmin}}||\mathbf{R}\mathbf{x}_{i}+\mathbf{t}-\mathbf{y}_{j}||_{2}$$
- Minimization: $$\underset{\mathbf{R},\mathbf{t}}{\text{argmin}}\quad E(\mathbf{R},\mathbf{t})=\sum_{i=1}^{N}e_{i}(\mathbf{R},\mathbf{t})^{2}=\sum_{i=1}^{N}||\mathbf{R}\mathbf{x}_{i}+\mathbf{t}-\mathbf{y}_{j}||_{2}^{2}$$
We continue these two steps iteratively until the robot’s pose converges. For a more detailed explanation of ICP, please refer to Iterative Closest Point Uncovered: Mathematical Foundations and Applications.
Motion Model
Because SLAM relies on estimating the relative pose between the current scan and either previous scans or a map, incorporating motion priors can significantly improve performance. A vehicle motion model—leveraging control data such as IMU readings, steering angle, and wheel speed—provides valuable prior information that enhances SLAM accuracy. With a motion model, the SLAM problem can be formulated as follows:
$$ \begin{aligned} \underset{\mathbf{m},\mathbf{x}_{1:T}}{\text{min}}& \sum_{t=1}^{T}||\mathbf{z}_{t}-h(\mathbf{x}_{t},\mathbf{m})||_{2}^{2} \\ \text{s.t. } &\mathbf{x}_{t}=f(\mathbf{x}_{t-1},\mathbf{u}_{t})+ \mathbf{n}_{t} \text{ for } t=1,...,T \end{aligned} \tag{1} $$where $\mathbf{u}_{t}$ is the control input at time $t$, and $\mathbf{n}_{t}$ is the noise.
Kalman Filter
Problem (1) is a sequential optimization problem. The Kalman Filter provides an efficient way to solve it. It involves two steps:
- Predict: Predict the next pose based on the current pose and control inputs.
- Update: Update the current pose using the current observation.
Please refer to Demystifying Kalman Filters: From Classical Estimation to Bayesian Inference for more details.
Post-Processing



As we can see, the map is built by appending scans one after another, which leads to an accumulation of errors. Therefore, post-processing is necessary to address the long-term drift. The process of post-processing is to optimize the map and the robot’s pose jointly. First, we must find the points that belong to the same feature (e.g., a surface, a line, etc.). Then, we identify the poses that correspond to the points. Finally, we make the feature conform to its natural shape by optimizing the poses—for example, making a surface feature more like a plane or a line feature more like a straight line.
SLAM Challenges in Autonomous Driving
Key challenges of LiDAR SLAM in autonomous driving include:
High-Speed Scenarios
When a vehicle travels at high speeds, the resulting point clouds become sparse, leading to less overlap between consecutive scans. This sparsity makes it difficult to accurately match scans and can degrade the quality of the localization and mapping.
Large-Scale Mapping
Mapping extensive areas, such as entire cities, consumes significant memory resources. Furthermore, performing post-processing and global optimization on such large maps is computationally expensive and challenging to manage.
Highly Repetitive Environments
Environments like highways and tunnels are often highly repetitive, with few unique features. This geometric similarity makes it difficult for matching algorithms to determine the relative poses, which makes positioning challenging.
Dynamic Environments
Autonomous driving environments are inherently dynamic, filled with moving objects like other vehicles and pedestrians. Most SLAM algorithms operate on the fundamental assumption that the environment is static. Therefore, additional techniques are required to identify, track, and remove these dynamic elements from the point cloud to prevent them from corrupting the map and vehicle’s localization.