When projecting a 3D object onto the camera plane, we usually use the pinhole model. However, it only applies to a single point. When we consider a solid object, we need to consider the interaction between the object and the camera, especially when the object is close to the camera. In the following, we will use the view frustum to cull the object and project it onto the camera plane.

Object Projection

Use View Frustum to Cull the Object Box

View frustum culling

Algorithm

  1. Compute the frustum of the camera given camera parameters (intrinsic, region of perception, near and far field depth): the frustum is constructed from the intersection of six surfaces. Each surface is depicted by a point and a normal vector. Each normal vector points to the interested area of the camera;

  2. Construct the object: an object is ideally modeled as a cuboid which comprises six convex polygons. The order of the vertices of each polygon follows the right-hand rule and points to the inner of the object;

  3. Use the frustum to cut the object

    1. Surface culls a convex polygon

    2. Six surfaces cull a convex polygon

    3. Six surfaces cull six convex polygons

A Surface Clips a Polygon

Please refer to [1] for more details.

Distorted Image

green line indicates the boundary of the perception area, blue line the boundary of camera, and black line the frustum boundary
Green line indicates the boundary of the perception area, blue line the boundary of camera, and black line the frustum boundary

Since we use four corner vertices to compute the frustum, it is different from the actual frustum, which is curved. Objects culled by this boundary are usually outside the perception area. We should clip them into the perception area further.

Advantages

Can handle the case that an object is close to a camera. (the detection box is on the boundary of the camera). Also, it can get the correct bounding box when parts of the object has negative depths.

Usage and Limitations

  1. The frustum is represented as the intersection of six surfaces, which is a pyramid in 3D space;

  2. The frustum should be convex;

  3. The image is undistorted;

  4. Since using the surface to clip a convex polygon, we must assume each surface of the object is a convex polygon. (that is why objects are represented by small triangles?) Note that the object does not need to be convex. If a surface is non-convex, we can decompose the polygon into convext sub-polygons or use more advanced clipping algorithms.

Extensions and Improvements

We can further refine the shape of vehicles. For example:

image

Examples

In this example, we demonstrate results of the 3D object projected onto the camera by camera frustum culling.

camera position
A top-down view of the camera positions, their frustums, ego car, and the object.
camera position 1 camera position 2 camera position 3 camera position 4
2D bounding box of the object on different cameras. The blue line is the camera frustum boundary, and the red line is the projected 2D bounding box. First row is the front view and rear view, the second row is the right-front view and right-rear view.

References

[1] Joy, Kenneth I. “Clipping.” On-Line Computer Graphics Notes, Visualization and Graphics Research Group, Department of Computer Science, University of California, Davis.