Posts

Rethinking Data Augmentation in End-to-End Learning

Suppose the ego position relative to the center lane is denoted as $X$. We do random position augmentation along the lateral direction. Let $Y$ denote the position after augmentation as $Y = X+ Z$, where $Z$ is the random position augmentation. Assume $X\sim p_x$, $Z\sim p_z$ with its probability density function $f_X(x)$ and $f_Z(z)$, respectively. So, the distribution of $Y$ can be computed as below. First, we will compute the accumulated distribution of $Y$, then can compute the density distribution. Let $F_Y(y)$ denote the accumulated distribution of $Y$, then ...

A Gentle Introduction to Diffusion Models and Flow Matching

Diffusion models have emerged as a powerful and flexible class of generative models, underpinning recent breakthroughs in image, audio, and scientific data generation. This article offers a gentle, beginner-friendly overview of diffusion models and the related flow matching framework, demystifying their mathematical foundations and practical training procedures. We will delve into the core concepts, guiding equations, and step-by-step training algorithms, aiming to provide readers with both an intuition for how these models work and a roadmap for implementing them in practice. This article serves as notes on papers [1] and [2]. ...

LiDAR Extrinsic Parameter Adjustment for SLAM Recalibration

Occasionally, we discover that the LiDAR extrinsic parameters are inaccurate. In such cases, we aim to recalibrate the SLAM poses and maps based on the updated parameters, without the need to rerun the entire SLAM process. By doing so, we can keep the annotations and the original SLAM map, which saves human effort and computational resources. Raw Data Given LiDAR points in the LiDAR coordinate system $$\mathbf{p}^{orig} = \bigcup_{t=0:T}\{\mathbf{p}_{t,i}\}_{i=1}^{N_t}$$Where $t$ is the time index, $i$ is the point index at time $t$, and $N_t$ is the number of points at time $t$. ...

Bundle Adjustment for LiDAR SLAM: Mathematical Formulation and Optimization

In this post, we will discuss the post-processing of LiDAR SLAM. We mainly focus on its problem formulation. The content of this post follows papers [1] and [2]. Problem Formulation Factor graph representation of bundle adjustment formulation. (Fig. 1 of [2]) With LiDAR poses, each denoted by $\mathbf{T}_j = (\mathbf{R}_j,\mathbf{t}_j)$ $(j=1,\ldots,M_p)$, the bundle adjustment refers to simultaneously determining all the LiDAR poses (denoted by $\mathbf{T} = (\mathbf{T}_1,\cdots,\mathbf{T}_{M_p})$) and feature parameters (denoted by $\boldsymbol{\pi} = (\pi_1,\cdots,\pi_{M_f})$), such that the reconstructed map agrees with the LiDAR measurements to the best extent. Denote $c(\pi_i,\mathbf{T})$ the map consistency due to the $i$-th feature; a straightforward BA formulation is ...

Probabilistic Collision Loss: Bounds and Soft Distance Maps for Autonomous Driving

This post explores collision loss design for end-to-end autonomous driving training, focusing on extending PLUTO’s binary occupancy map approach to handle probabilistic maps. The method enables safer autonomous driving by providing smooth, uncertainty-aware collision avoidance while maintaining computational efficiency. Using Signed Distance Map with Binary Occupancy Map How PLUTO Builds the Loss Map [2] Vehicle Model In PLUTO and other planning algorithms, the vehicle is modelded as a series of overlapping discs. ...

A State Machine for Object Tracking

This post contains a state machine diagram that illustrates the typical lifecycle of an object in a tracking system. Object Tracking Management State Machine.

Wayformer Paper Reading

This post provides a technical deep dive into the Wayformer paper [1], a key publication in the field of motion forecasting. Training Overview An overview of the deep learning training pipeline, illustrating the data flow and key components involved during model training. Model Overview of the One-Stage E2E model One staged E2E model. Overview of the Two-Stage E2E model Two staged E2E model. Details of the Two-Stage E2E Model Overview of the Wayformer model. Model Structure Overview (a) (b) The left figure shows the encoder and decoder of the Wayformer model. The right figure shows the details of the encoder [1]. Feature Embedding/Feature Projection $$\mathbf{f}\in \mathbb{R}^{T \times N\times D} \to \mathbf{x}_{input} \in \mathbb{R}^{(T \cdot N) \times d}$$Where $T$ is the number of time history, $N$ is the number of entities, $D$ is the number of features, and $d=256$. ...

LiDAR-SLAM Decoded: From Point Clouds to Precision Maps

What is SLAM? SLAM demo. SLAM stands for Simultaneous Localization and Mapping. It is a computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. Applications Object Detection Parking Lot Annotation Lane Annotation Lane Reprojection HD Map [source] SLAM has various applications, including: ...

Perspective-n-Point (PnP) Problem

In this post, we will discuss the perspective-n-point (PnP) problem. We will start with the problem definition. Then, gradient-based optimization methods will be introduced. Finally, we will discuss two global optimization methods. Problem Formulation The core task of the Perspective-n-Point (PnP) problem is to determine the pose—specifically, the rotation and translation—of a calibrated camera in 3D space. This is achieved by using a set of known 3D points in the world and their corresponding 2D projections observed on the camera’s image sensor. ...

Connecting Points with Grace: A Study on Natural Path Generation Methods

Given a starting point, starting direction, ending point, and ending direction, the goal is to generate a feasible and “natural” path that connects the two points. This path should adhere to vehicle dynamics and avoids obstacles. However, it is hard to define what is “natural”. Path Planning In this section, we review some classical path planning methods. We ignore the algorithmic details and focus on the resulting path shapes to provide an overview. The demonstration images are generated using the code repository [1]. ...