This post explores collision loss design for end-to-end autonomous driving training, focusing on extending PLUTO’s binary occupancy map approach to handle probabilistic maps. The method enables safer autonomous driving by providing smooth, uncertainty-aware collision avoidance while maintaining computational efficiency.

Using Signed Distance Map with Binary Occupancy Map

How PLUTO Builds the Loss Map [2]

Vehicle Model

In PLUTO and other planning algorithms, the vehicle is modelded as a series of overlapping discs.

Distance Map

The PLUTO planner for imitation learning-based planning uses auxiliary losses for safety, including a collision loss.
PLUTO starts with a binary occupancy map (voxel grid):
- 1 = obstacle or off-road cell
- 0 = free/drivable cell
From this map, PLUTO computes an Euclidean Signed Distance Field (ESDF) [1]:
- Each cell stores the shortest Euclidean distance to the nearest obstacle.
- Positive distances indicate free space, zero indicates contact with an obstacle.
The distance map $D(f(p))$ is defined as:
$$ D f (p) = \min_{q\in G}\left (d(p, q) + f (q)\right) $$
Many algorithms for computing the distance transform of point sets use the equivalent definition
$$ D(p) = \min_{q\in G}\left( d(p, q) + 1(q)\right) $$
where $1(q)$ is an indicator function for membership in $P$:
$$ 1(q) = \begin{cases} 0, \text{ if } q\in P \\ \infin, \text{ otherwise } \end{cases} $$

The distance map can be computed with linear computational complexity [1]. Below shows an example of the computed signed Euclidean distance map.

Distance map example — Example of computed signed Euclidean distance map

Loss Formulation

For each predicted trajectory point, PLUTO covers the vehicle with several covering circles (each circle has a center $c$ and radius $R$).

The circle’s center is projected into the signed distance map.
The signed distance $d(c)$ is obtained via bilinear interpolation.
A hinge loss penalizes intrusions into obstacles:

$$ \ell_\text{collide}(c) = \max\bigl(0, R + \varepsilon - d(c)\bigr) $$

where:

$R$ = circle radius
$\varepsilon$ = safety margin

This loss is summed over all circles and trajectory points to discourage collisions.

Note: The loss value itself does not provide guidance for avoiding collisions. It is the distance gradients that provide the guidance information.

Why Binary Masks Are Limiting

Binary masks assume perfect knowledge of which cells are occupied or free. Real mapping systems (e.g., from LiDAR or cameras) usually output a probability of occupancy $p(y) \in [0,1]$. If you threshold $p(y)$ to make it binary, you lose uncertainty:

High threshold → unsafe (vehicle might hit uncertain obstacles).
Low threshold → too conservative (avoiding many safe areas).

We need a way to create a soft distance field directly from the probability map.

Extending to Probability Maps

Probabilistic / Stochastic Distance Transform

We use the log-sum-exp soft-min identity:

$$ \min_i x_i = \lim_{\alpha \to \infty} \left( -\frac{1}{\alpha} \ln \sum_i e^{-\alpha x_i} \right) $$

If we set $x_i = \frac{ \|c - y\|^2 }{ 2 }$ and $\alpha = 1 / \sigma^2$:

$$ d_\text{soft}(c) = - 2 \sigma^2 \ln \sum_{y} p(y) \exp \left( - \frac{ \|c-y\|^2 }{ 2 \sigma^2 } \right)$$

$p(y)$ = occupancy probability at voxel $y$
$\sigma$ = smoothing parameter

Key points:

When $p(y)$ is binary and $\sigma \to 0$, the Gaussian kernel becomes a Dirac delta (see Appendix A), and

$$ d_\text{soft}(c) \to \min_{y:p(y)=1} \|c-y\|^2 $$

which is exactly the squared distance used in the signed distance map.

When $p(y)$ is probabilistic, $d_\text{soft}(c)$ smoothly decreases when high-probability obstacles are near.

Loss Function

This probabilistic signed distance map can be plugged into the same PLUTO hinge loss:

$$\ell_\text{collide}(c) = \max(0, (R + \varepsilon)^2 - d_\text{soft}(c))$$

and remains fully differentiable. Here, we use $(R+\varepsilon)^2$ due to the soft distance converging to the squared minimal distance.

Practical Considerations

$\sigma$ choice: Small $\sigma$ ≈ sharp distance field (like ESDF) but noisy. Large $\sigma$ ≈ smooth gradients but blurrier obstacle boundaries.
Summation range: The sums are over all voxels $y$ in the grid, but for efficiency you only consider voxels within a radius (e.g., $3\sigma$ for Gaussian).
Differentiability: Both the Gaussian-based and convolution-based methods are differentiable
Thresholding fallback: You can still threshold the probability map to get a binary map and run the normal SDF. But this discards uncertainty and isn’t recommended if you want smooth, safe behavior.

Summary

Binary occupancy map → Signed Distance Map → PLUTO hinge loss

$$ \ell_\text{collide}(c) = \max(0, (R + \varepsilon)^2 - d(c)) $$

Probability map needs a soft distance field:
- Probabilistic Distance Transform: Uses log-sum-exp with Gaussian smoothing, becomes ESDF when $\sigma\to0$.
- Kernel Convolution: Convolves occupancy probabilities with a vehicle-shaped kernel to measure expected collision.

Reduce Computational Complexity

2D CNN

We start by observing the probabilistic formula

$$ d_\text{soft}(c) = - 2 \sigma^2 \ln \sum_{y} p(y) \exp \left( - \frac{ \|c-y\|^2 }{ 2 \sigma^2 } \right)$$

Since the exponential term decays very fast as the distance increases. So, we can only compute the points in the vicinity of the current pixel. So, we have the approximation

$$ d_\text{soft}(c) = - 2 \sigma^2 \ln \sum_{y\in V_c} p(y) \exp \left( - \frac{ \|c-y\|^2 }{ 2 \sigma^2 } \right), $$

where we define $V_c = \{x| \|x-c\| \le r\}$.

1D CNN

With this new approximation, it is clear that it is equivalent to Gaussian smooth. If we do 2D convolution on the 2D probabilistic map, its complexity is $WHK^2$, where $K$ is the width of the kernel.

However by observing that the Gaussian smoothing is symmetrical:

$$\begin{aligned} d_\text{soft}(c) &= -2 \sigma^2 \ln \sum_{z\in V_c} p(z) \exp \left( - \frac{ \|c-z\|^2 }{ 2 \sigma^2 } \right)\\ &= -2 \sigma^2 \ln \sum_{z\in V_c} p(z) \exp \left( - \frac{ (c_x-z_x)^2 + (c_y - z_y)^2 }{ 2 \sigma^2 } \right)\\ &=-2 \sigma^2 \ln \sum_{z\in V_c} p(z) \exp \left( - \frac{ (c_x-z_x)^2}{ 2 \sigma^2 } \right)\exp\left(-\frac{(c_y-z_y)^2}{2\sigma^2} \right)\\ &=-2\sigma^2\ln \sum_{z_y \in V_{c_y}}\left( \sum_{z_x \in V_{c_x}} p(z_x, z_y) \exp\left(-\frac{(c_x-z_x)^2}{2\sigma^2} \right)\right) \exp\left(-\frac{(c_y-z_y)^2}{2\sigma^2} \right) \end{aligned}. $$

Then the 2D CNN can be done along the x and y axis separately. Thus, its computational complexity can be reduced to $WHK$.

Patch Method

When the number of the predicted points is much smaller than the original number of pixels. One simple method is cropping the patches around the predicted pose, computing the distance map on the patch, and sampling the value.

Binarize Probability Map

Sometimes, the curb probability map contains too much noise. In such cases, it is better to use a threshold to convert the probability map to binary format. Then we can use the signed Euclidean distance map or follow the CNN method to compute the distance map. The left figure shows the distance map using the original probabilistic map. The right figure shows the distance from the binarized map. Due to noisy curbs, the probabilistic distance map is not reliable, while the binarized map provides a clear and accurate distance map.

Using the original probability map to construct the distance map. — Probabilistic distance map

Using binary map to construct the distance map. — Probabilistic distance map

Sampling

Both our method and PLUTO rely on sampling the cost map. The cost itself does not provide the guidance on how to avoid the curb, the gradients of the cost map does. Here, we employ bilinear sampling method [5].

Bilinear Interpolation

Suppose that we want to find the value of the unknown function $f$ at the point $(x,y)$. It is assumed that we know the value of $f$ at the four points $Q(11) = (x(1), y(1))$, $Q(12) = (x(1), y(2))$, $Q(21) = (x(2), y(1))$, and $Q(22) = (x(2), y(2))$.

Repeated Linear Interpolation

$$\begin{aligned} f(x,y) &= \frac{y_2 - y}{y_2 - y_1} f(x, y_1) + \frac{y - y_1}{y_2 - y_1} f(x, y_2)\\ &= \frac{y_2 - y}{y_2 - y_1} \left( \frac{x_2 - x}{x_2 - x_1} f(Q_{11}) + \frac{x - x_1}{x_2 - x_1} f(Q_{21}) \right) + \frac{y - y_1}{y_2 - y_1} \left( \frac{x_2 - x}{x_2 - x_1} f(Q_{12}) + \frac{x - x_1}{x_2 - x_1} f(Q_{22}) \right)\\ &= \frac{1}{(x_2 - x_1)(y_2 - y_1)} \Bigl( f(Q_{11})(x_2 - x)(y_2 - y) + f(Q_{12})(x_2 - x)(y - y_1) + f(Q_{21})(x - x_1)(y_2 - y) + f(Q_{22})(x - x_1)(y - y_1) \Bigr)\\ &= \frac{1}{(x_2 - x_1)(y_2 - y_1)} \begin{bmatrix} x_2 - x & x - x_1 \end{bmatrix} \begin{bmatrix} f(Q_{11}) & f(Q_{12}) \\ f(Q_{21}) & f(Q_{22}) \end{bmatrix} \begin{bmatrix} y_2 - y \\ y - y_1 \end{bmatrix}. \end{aligned}$$

Polynomial Fit

An alternative way is to write the solution to the interpolation problem as a multilinear polynomial

$$ f(x, y) \approx a_{00} + a_{10}x + a_{01}y + a_{11}xy,$$

where the coefficients are found by solving the linear system

$$ \begin{bmatrix} 1 & x_1 & y_1 & x_1 y_1 \\ 1 & x_1 & y_2 & x_1 y_2 \\ 1 & x_2 & y_1 & x_2 y_1 \\ 1 & x_2 & y_2 & x_2 y_2 \end{bmatrix} \begin{bmatrix} a_{00} \\ a_{10} \\ a_{01} \\ a_{11} \end{bmatrix} = \begin{bmatrix} f(Q_{11}) \\ f(Q_{12}) \\ f(Q_{21}) \\ f(Q_{22}) \end{bmatrix}, $$

yielding the result

$$ \begin{bmatrix} a_{00} \\ a_{10} \\ a_{01} \\ a_{11} \end{bmatrix} = \frac{1}{(x_2 - x_1)(y_2 - y_1)} \begin{bmatrix} x_2 y_2 & -x_2 y_1 & -x_1 y_2 & x_1 y_1 \\ y_2 & y_1 & y_2 & -y_1 \\ x_2 & x_2 & x_1 & -x_1 \\ 1 & -1 & -1 & 1 \end{bmatrix} \begin{bmatrix} f(Q_{11}) \\ f(Q_{12}) \\ f(Q_{21}) \\ f(Q_{22}) \end{bmatrix}. $$

For typical interpolation of images/grids, use repeated linear interpolation—it’s simpler and usually faster overall. Switch to the polynomial form only if you’ll reuse the same cell’s coefficients a lot or you specifically want closed-form derivatives.

Sample the Distance Map

Next step, we are going to construct our loss function. The PLUTO paper first computes the distance transform map, then does sampling on the centers of the discs. This method is good and simple if we can compute the distance map offline. The mathematical formulation is

$$ d(x) = \mathcal{S}(d(y(x))) $$

where $\mathcal{S}$ is the sampling operator. In PLUTO, it uses a bilinear sampling method to get losses at the disc center.

Sample the Probability Map Directly

Instead of directly sampling the distance map, we sample the probability of the map.

$$ d(x) = - 2 \sigma^2 \ln \sum_{y\in V_y} \mathcal{S}(p(y(x))) \exp \left( - \frac{ \|c-y\|^2 }{ 2 \sigma^2 } \right). $$

Further, we can predefine some grids around the current pose and pre-compute their corresponding weight,

$$ d(x) = - 2 \sigma^2 \ln \sum_{i,j} \mathcal{S}(p(x_u + \delta x_i, x_v + \delta y_j))w_{i,j}, $$

where $w_{i,j} = \exp \left( - \frac{ \|[\delta x_i, \delta y_j]\|^2 }{ 2 \sigma^2 } \right)$.

The drawback of this method is that the original $p(y)$ may not provide enough gradient guidance. This happens particularly when $p(y)$ is almost binary.

Another Direct Sampling Method

Ego vehicle is represented by 20×10 grid points on a 5m×2.5m rectangle (enough to cover the segmentation resolution)
Rotate and translate ego vehicle according to the positions and headings that calculated from the predicted trajectories
Map the transformed ego vehicle onto the curbside segmentation map and sample the values of the curbside segmentation map
Aggregate the sampled values and compute the loss

From BEV coordinate to image sampling coordinate

Convert BEV range $x\in [-32, 32], y\in [64, 128]$ to the sampling range that $w\in[-1, 1], h\in[-1, 1]$

$$ \begin{aligned} w &= (-y+32)/(32 + 32)*2-1,\\ h &= (-x + 128)/(128+64)*2-1 \end{aligned} $$

The two sampling methods are almost the same.

Choosing Parameters for a Probabilistic Signed Distance Map

In this section, we are using the 1D CNN method to create the distance map. Recall the distance map is defined as

$$ d_\text{soft}(c) = - 2 \sigma^2 \ln \sum_{y\in V_c} p(y) \exp \left( - \frac{ \|c-y\|^2 }{ 2 \sigma^2 } \right), $$

and the loss function is

$$ \ell_\text{collide}(c) = \max\bigl(0, (R + \varepsilon)^2 - d(c)\bigr). $$

The Kernel Size

When doing CNN, we need to choose the kernel size, which corresponds to choosing $V_c$. Here we choose the points that are within $4*\sigma$ distance of $c$, i.e., $V_c = \{x|\|x-c\|\le 4\sigma\}$. So, the kernel size is defined as

$$ N_k = 2* \max \left(\lceil 4*\sigma/\Delta\rceil, \lceil \frac{R+\varepsilon}{\Delta}\rceil\right) + 1, $$

where $\Delta$ is the resolution of the probabilistic map. The term $\lceil 4*\sigma/\Delta\rceil$ is used to ensure that the kernel size is large enough to cover the points that are within $4*\sigma$ distance of $c$. The term $\lceil \frac{R+\varepsilon}{\Delta}\rceil$ is used to ensure that the kernel size is large enough to cover the points that are within $R+\varepsilon$ distance of $c$.

The Scale and Soft Margin Parameters

We have another two parameters $\sigma$ and $\varepsilon$. Our rule of thumb is that when there is only one point mass curb, we should detect loss when the ego vehicle is within $R+\varepsilon$.

According to inequality (see Appendix A)

$$ m^2-2*\sigma^2*\log n \le d_{\text{soft}}(c)\le m^2, $$

where $m$ is the minimal distance to the curb. Assume the nearest curb locates at the origin so that we have

$$\|\mathbf{x}\|^2 - 2*\sigma^2* \log n \le d_{\text{soft}}(\|\mathbf{x}\|^2) \le \|\mathbf{x}\|^2, $$

where $d_{\text{soft}}(\mathbf{x}) = d_{\text{soft}}(\|\mathbf{x}\|^2)$.

So that the margin violation term

$$(R+\varepsilon)^2-\|\mathbf{x}\|^2 \le (R+\varepsilon)^2 - d_{\text{soft}}(\|\mathbf{x}\|^2)\le (R+\varepsilon)^2+\tau\log n -\|\mathbf{x}\|^2, $$

Remember $d_{\text{soft}}$ can be negative values. Let us draw the hinge loss function of $\|\mathbf{x}\|^2$ as below:

Python Code: Hinge Loss Visualization (Click to expand)

# Fix the plot to ensure there's only ONE y-axis (no duplicate vertical lines)
# and make sure the y-axis line itself serves as the main vertical axis.

import numpy as np
import matplotlib.pyplot as plt

# Define parameters again
R = 2.0
eps = 0.5
tau = 1.0
n = 10

# Define x values (squared norm)
x_squared = np.linspace(-1, (R+eps)**2 + tau*np.log(n) + 1, 400)

# Define left and right functions
left_func = (R+eps)**2 - x_squared
right_func = (R+eps)**2 + tau*np.log(n) - x_squared

# Apply hinge loss
left_hinge = np.maximum(0, left_func)
right_hinge = np.maximum(0, right_func)

# Start plotting
fig, ax = plt.subplots(figsize=(8,5))

# Plot hinge lines
ax.plot(x_squared, left_hinge, label=r'Hinge[$(R+\varepsilon)^2 - \|x\|^2$]', linewidth=2)
ax.plot(x_squared, right_hinge, label=r'Hinge[$(R+\varepsilon)^2 + \tau \log n - \|x\|^2$]', linewidth=2, linestyle='--')

# Shade gap and guaranteed loss areas
ax.fill_between(x_squared, left_hinge, right_hinge, color='orange', alpha=0.3, label='Curb-dependent loss')
ax.fill_between(x_squared, 0, left_hinge, color='skyblue', alpha=0.4, label='Guaranteed loss')

# Remove ticks
ax.set_xticks([])
ax.set_yticks([])

# Ensure equal axis scaling
ax.set_aspect('equal')

# Move left spine (y-axis) to x=0 (the origin)
ax.spines['left'].set_position('zero')

# Hide top and right spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Make bottom spine visible and set its position
ax.spines['bottom'].set_position(('data',0))

# Label the origin "0"
ax.text(0, 0, "0", fontsize=12, ha='right', va='top', color='black', fontweight='bold')

# Intersection points
left_intersection = (R+eps)**2
right_intersection = (R+eps)**2 + tau*np.log(n)

# Labels at intersection points
ax.text(left_intersection, 0, r'$(R+\varepsilon)^2$', fontsize=12, ha='center', va='top', color='blue', fontweight='bold')
ax.text(right_intersection, 0, r'$(R+\varepsilon)^2 + \tau \log n$', fontsize=12, ha='center', va='top', color='orange', fontweight='bold')

# Labels
ax.set_xlabel(r'$\|x\|^2$')
ax.set_ylabel('Hinge loss value')
ax.legend()
ax.grid(False)

plt.tight_layout()
plt.show()

The blue area is the loss that is absolutely guaranteed. The yellow gap will depend on the parameters and the distribution of the probabilistic map.

So, the real soft margin is at $(R+\varepsilon)^2+2*\sigma^2*\log n$. If we set $\varepsilon=0$, the soft margin gap is $2*\sigma^2\log n$

For example, if we want the soft margin gap to be $1m$. In our case, valid $n$ is approximately $3^2$. So, that $2*\sigma^2*\log 3^2 =4*\sigma^2*\log 3 \approx 4.4*\sigma^2 =1$. Then, we can set $\sigma=0.47$.

A more precise estimation of the loss upper bound can be found in Appendix B.

Appendix

Appendix A

Log–Sum–Exp “soft‑min” identity

For real numbers $x_1,\dots,x_n$ and a positive “temperature” (or smoothing) parameter $\tau > 0$:

$$ \text{softmin}_{\tau}(x_1,\dots,x_n) = −τ \log⁡ ⁣\sum_{i=1}^n e^{-\frac{x_i}{\tau}} $$

As $\tau \to 0^+$,

$$ \text{softmin}_{\tau}(x_1, x_2, \dots, x_n) = \min_i x_i $$

Why does it converge to min⁡?

Let $m = \min_i x_i$. Rewrite the sum by factoring out the smallest term:

$$ \sum_{i=1}^n e^{-\frac{x_i}{\tau}} = e^{-\frac{m}{\tau}}\sum_{i=1}^n e^{-\frac{x_i - m}{\tau}} $$

Plug into the soft‑min:

$$ -\tau\log\sum_{i=1}^n e^{-\frac{x_i}{\tau}} = m -\tau \log\sum_{i=1}^n e^{-\frac{x_i -m}{\tau}} $$

All terms inside the log are $\ge 1$ (because one of them is $e^0=1$). Thus,

$$ m-\tau \log n \le -\tau\log\sum_{i=1}^ne^{-\frac{x_i}{\tau}} \le m $$

As $\tau \to 0$, the left term equals $m$, so that the log sum equals to the minimum.

Intuition

Log–sum–exp is a smooth version of “min” (or “max”) The exponential makes smaller $x_i$ dominate the sum; the log and $-\tau$ rescale back to the original domain.
Temperature $\tau$ controls smoothness.
- Small $\tau$ → very sharp, almost the true minimum.
- Large $\tau$ → averages more of the terms (smoother).

The gradient (useful in optimization) of the softmin function is

Define $f = -\tau \log \sum_{i=1}^n e^{-\frac{x_i}{\tau}}$,

then

$$\frac{\partial f}{\partial x_i} = \frac{e^{-\frac{x_i}{\tau}}}{\sum_{i=1}^n e^{-\frac{x_i}{\tau}}} = \text{softmax}(-\frac{x_i}{\tau}), $$

Interpretation:

The gradient weights are a soft assignment to the argmin.
As $\tau\to 0$, the gradient becomes a one‑hot vector at the actual minimum index.

Numerical stability (log‑sum‑exp trick)

To avoid overflow/underflow, subtract the smallest value (or any reference):

m = x.min()
softmin = -tau * torch.log(torch.exp(-(x - m)/tau).sum()) + m

This is equivalent to what we did algebraically.

There is also continuous version of the softmin function.

For a continuous function $f(y)$:

$$ \text{softmin}_{\tau} f = -\tau \log e^{-f(y)/\tau} dy, $$

As $\tau \to 0$ it converges to $\inf_y f(y)$

(assuming the integral is well-defined). This is the Laplace approximation principle.

Max version of soft function is

$$ \text{softmax}_{\tau}(x) = \tau \log\sum_i^n e^{x_i/\tau} $$

As $\tau\to 0$, $\lim_{\tau \to 0} \text{ softmax}_{\tau}(x)=\max_i\{x_i\}$.

Appendix B

Refined Lower Bound of the Soft Distance

Recall our soft distance map is defined as

$$d_{\text{soft}}(c) = -2\sigma^2\log\sum_y f(y)e^{-\frac{\|c-y\|^2}{2\sigma^2}}$$

Assume $y_{\min}$ is with the minimal distance to $c$ and $f(y_{\min})=1$. Then we have

$$\begin{aligned} d_{\text{soft}}(c) &= -2\sigma^2\log\sum_y f(y)e^{-\frac{\|c-y\|^2}{2\sigma^2}}\\ &=\|c-y_{\min}\|^2 - 2\sigma^2\log\sum_{y}e^{-\frac{\|c-y\|^2 - \|c-y_{\min}\|^2}{2\sigma^2}})\\ &\ge \|c-y_{\min}\|^2 - 2\sigma^2\log\sum_{i=-K,\dots, K, j=-K, \dots, K}e^{-\frac{(i\Delta)^2+(j\Delta)^2}{2\sigma^2}} \end{aligned} $$

where $K$ is the half of the kernel size and $\Delta$ is the resolution of the map.

Then we need to estimate the summation term. Let us define

$$S(K,\Delta, \sigma) = \sum_{i=-K}^K\sum_{j=-K}^K e^{-\frac{(i\Delta)^2+(j\Delta)^2}{2\sigma^2}}$$

We can approximate this with a Gaussian integration on condition that $K$ is large and $\frac{\Delta}{\sigma}$ is small. So we have

$$ S(K, \Delta, \sigma) \approx \frac{2\pi\sigma^2}{\Delta^2} $$

Thus, the distance is lower bounded by

$$ d_{\text{soft}}(c) \ge \|c-y_{\min}\|^2 - 2\sigma^2\log\frac{2\pi\sigma^2}{\Delta^2} $$

If we want a soft distance to be $0.3$, then by solving $2\sigma^2\log\frac{2\pi\sigma^2}{\Delta^2} = 0.3$ and $\Delta=0.4$, we get $\sigma\approx0.32$.

Appendix C

Algorithm for Distance Transform

Please refer to [1]. The computational complexity is $\mathcal{O}(WH)$.

References

[1] P. F. Felzenszwalb and D. P. Huttenlocher, “Distance transforms of sampled functions,” Theory of Computing, vol. 8, no. 1, pp. 415–428, Sep. 2012, doi:10.4086/toc.2012.v008a019

[2] J. Cheng, Y. Chen, and Q. Chen, “PLUTO: Pushing the Limit of Imitation Learning‑based Planning for Autonomous Driving,” arXiv preprint arXiv:2404.14327, Apr. 22, 2024.

[3] A. Meijster, J. B. T. M. Roerdink, and W. H. Hesselink, “A general algorithm for computing distance transforms in linear time,” in Mathematical Morphology and its Applications to Image and Signal Processing, J. Goutsias, L. Vincent, and D. S. Bloomberg, Eds. Dordrecht, The Netherlands: Kluwer Academic, 2000, pp. 331–340. https://pure.rug.nl/ws/files/14632734/c2.pdf

[4] T. Strutz, “The Distance Transform and its Computation — An Introduction,” arXiv, vol. arXiv:2106.03503 v2, Feb. 24, 2023, arXiv Working Paper TECHP/2021/06. https://arxiv.org/pdf/2106.03503

[5] “Bilinear interpolation,” Wikipedia, The Free Encyclopedia, last modified August 6, 2025. [Online]. Available: https://en.wikipedia.org/wiki/Bilinear_interpolation

Using Signed Distance Map with Binary Occupancy Map#

How PLUTO Builds the Loss Map [2]#

Vehicle Model#

Distance Map#

Loss Formulation#

Why Binary Masks Are Limiting#

Extending to Probability Maps#

Probabilistic / Stochastic Distance Transform#

Loss Function#

Practical Considerations#

Summary#

Reduce Computational Complexity#

2D CNN#

1D CNN#

Patch Method#

Binarize Probability Map#

Sampling#

Bilinear Interpolation#

Repeated Linear Interpolation#

Polynomial Fit#

Sample the Distance Map#

Sample the Probability Map Directly#

Another Direct Sampling Method#

Choosing Parameters for a Probabilistic Signed Distance Map#

The Kernel Size#

The Scale and Soft Margin Parameters#

Appendix#

Appendix A#

Log–Sum–Exp “soft‑min” identity#

Appendix B#

Refined Lower Bound of the Soft Distance#

Appendix C#

Algorithm for Distance Transform#

References#