Original article was published by Nabil MADALI on Deep Learning on Medium

# Deep Hierarchical Feature Learning on Point Sets in a Metric Space

In the past, there was very little research on deep learning of point sets, and PointNet opened the door in this regard. However, PointNet cannot capture the local structure generated by the points in the metric space, which limits its ability to recognize fine-grained patterns and generalize complex scenes. Therefore, the original author proposed PointNet++, which improved the problem.

In PointNet++, the author uses the distance measurement of the space to divide the point set into overlapping local areas (can be understood as patches). On this basis, first extract local features (shallow features) from the geometric structure in a small range, and then expand the range, extract higher-level features on the basis of these local features, and know the global features extracted to the entire point set . It can be found that this process is similar to the feature extraction process of the CNN network. The low-level features are first extracted. As the receptive field increases, the extracted feature level becomes higher and higher.

We are interested in analyzing geometric point sets which are collections of points in a Euclidean space. A particularly important type of geometric point set is point cloud captured by 3D scanners, e.g., from appropriately equipped autonomous vehicles. As a set, such data has to be invariant to permutations of its members. In addition, the distance metric defines local neighborhoods that may exhibit different properties. For example, the density and other attributes of points may not be uniform across different locations — in 3D scanning the density variability can come from perspective effects, radial density variations, motion, etc.

The overall network structure of PointNet++ is as follows:

The hierarchy of PointNet++ consists of many abstract levels. At each level, a set of points is processed and abstracted to produce a new set with fewer elements.

The collection abstraction layer is composed of three key layers:

- Sampling Layer
- Grouping Layer
- PointNet Layer

**The sampling layer** : Given input points {x_1, x_2, …, x_n}, we use iterative farthest point sampling (FPS) to choose a subset of N’ points from the input points, which define the centroid of the local area.

**The grouping layer** constructs a local area set by finding the adjacent points around the centroid.

The input to this layer is a point set of size N × (d + C) and the coordinates of a set of centroids of size N’ × d.

The output are groups of point sets of size N’ × K × (d + C), where each group corresponds to a local region and K is the number of points in the neighborhood of centroid points.

They use Ball Query to draw a circle with R as the radius, and treat the point cloud in each circle as a cluster.

**The PointNet **layer uses a small PointNet network to encode local area patterns into feature vectors.

The input are N’ local regions of points with data size N’ ×K×(d+C). Each local region in the output is abstracted by its centroid and local feature that encodes the centroid’s neighborhood. Output data size is N’ × (d + C’).

The author mainly draws on the idea of CNN’s multi-layer receptive field in the second generation of PointNet. CNN continuously uses the convolution kernel to scan the pixels on the image by layering and do inner product, making the feature map receptive field larger later, and each pixel contains more information. And PointNet++ is imitating this structure, it first takes local samples of the entire point cloud and draws a range, takes the points inside as local features, and uses PointNet to perform a feature extraction. Therefore, after many such operations, the number of original points becomes less and less , and each point is a local feature extracted by PointNet with more points in the previous layer, that is, each the point contains more information .

The classification network only needs to extract local features layer by layer and finally summarize the global features to output the classification results.

The segmentation network will first extract a global feature from the point cloud, and then gradually up-sampling through this global feature. The general process is as follows:

The author discussed some methods of up-sampling in the paper, of course the most simple method is simply to always sample all points as centroids in all abstraction layers, but this leads to higher computational costs.

They adopt a hierarchical propagation strategy with distance based interpolation and across level skip links.

They achieve feature propagation by interpolating feature values f of N_l points at coordinates of the N_l−1 points. Among the many choices for interpolation, they use inverse distance weighted average based on k nearest neighbors.

The interpolated features on N-l−1 points are then concatenated with skip linked point features from the set abstraction level. Then the concatenated features are passed through a “unit pointnet”, which is similar to one-by-one convolution in CNNs. A few shared fully connected and ReLU layers are applied to update each point’s feature vector. The process is repeated until we have propagated features to the original set of points.

# Conclusion

PointNet++ is a powerful neural network architecture, is used to process the point set sampled in the metric space. PointNet++ recursively divides the input point set into nests, and is very effective in learning the hierarchical features related to the distance measurement. Aiming at the problem of non-uniform point sampling, two new set abstraction layers are proposed to intelligently gather multi-scale information according to the local point density.

These contributions enable PointNet++ to achieve the most advanced performance on challenging 3D point cloud benchmarks.

# References

- Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas.PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space.
- Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas.PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation