Cyclist detection and tracking based on multi-layer laser scanner

The technology of Artificial Intelligence (AI) brings tremendous possibilities for autonomous vehicle applications. One of the essential tasks of autonomous vehicle is environment perception using machine learning algorithms. Since the cyclists are the vulnerable road users, cyclist detection and tracking are important perception sub-tasks for autonomous vehicles to avoid vehicle-cyclist collision. In this paper, a robust method for cyclist detection and tracking is presented based on multi-layer laser scanner, i.e., IBEO LUX 4L, which obtains four-layer point cloud from local environment. First, the laser points are partitioned into individual clusters using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method based on subarea. Then, 37-dimensional feature set is optimized by Relief algorithm and Principal Component Analysis (PCA) to produce two new feature sets. Support Vector Machine (SVM) and Decision Tree (DT) classifiers are further combined with three feature sets, respectively. Moreover, Multiple Hypothesis Tracking (MHT) algorithm and Kalman filter based on Current Statistical (CS) model are applied to track moving cyclists and estimate the motion state. The performance of the proposed cyclist detection and tracking method is validated in real road environment.

. Motivated by the analysis of the object detection studies, a novel method for cyclist detection and tracking using multi-layer laser scanner is proposed in this paper. IBEO LUX 4L is adopted as the main sensor and mounted on the front bumper of the autonomous vehicle, as shown in Fig. 1. First, subarea-based DBSCAN method is developed to segment the point cloud clusters. Second, three categories of the feature sets are employed with SVM and DT classifiers respectively, and six classifiers are obtained totally. Then, MHT algorithm based on Kalman filter is used to track multiple detected cyclists. It is validated that the proposed cyclist detection and tracking method has good performance in real road scene. The contributions of this paper are twofold: (1) It is the first attempt to separate the raw point cloud into several subareas based on the density distribution; (2) CS model is selected as the motion model of the cyclist, and MHT algorithm is used to track multiple cyclists.
The remainder of this paper is organized as follows: Sect. "Related works" describes the point cloud clustering process. Section "Feature extraction" presents the feature extraction. Section "Classification" introduces the cyclist classifiers. Section "Tracking" presents the cyclist tracking algorithm. Experimental results are given in Sect "Experiments". Section "Conclusion" concludes this study.

Related works
In the field of cyclist detection and tracking, most state-of-the-art methods rely upon machine vision sensors [10][11][12], due to the advantages of high-resolution image with the color and texture information. Zangenehpour [10] presented the cyclist detection method based on histogram of oriented gradient feature for video datasets from crowded traffic scenes. Li [11] described a unified framework for cyclist detection, which included a novel detection proposal and a discriminative deep model based on Fast R-CNN with over 50000 images. Tian [12] explored geometric constraint with various camera setups and built cascaded detectors with multiple classifiers to detect cyclists from multiple viewpoints. Bieshaar [13] used 3D convolutional neural network to detect the motion of cyclists on image sequences with spatio-temporal features. Liu [14] utilized a region proposal method based on aggregated channel feature to solve cyclist detection problem for high-resolution images. However, cyclist detection and tracking based on vision sensors still remain challenging, since the camera is susceptible to the illumination and the large variability of cyclist exists in appearance, pose and occlusion.  with vision sensors, laser scanner can collect the spatial and motion information of the detected objects [15,16]. As for the number of the laser layers, laser scanners are divided into three types, namely, 2D, 2.5D and 3D. 2D laser scanner generates the sparse single-layer point cloud which is inadequate for object classification in real driving environment. 3D laser scanners are usually installed on the top of autonomous vehicles and they can produce 3D dense point cloud to cover full-field of view for global environment construction. However, the high cost of 3D laser scanners limits the wide application. Compared with other kinds of laser scanners, 2.5D laser scanner, e.g., IBEO LUX 4L, is a more practical option for object classification.
Extensive algorithms have been presented for object classification using 2.5D multilayer laser scanner [17][18][19][20][21][22][23][24]. Wang et al. [19] built SVM classifiers based on 120-dimensional point cloud feature set for object recognition. Huang et al. [20] captured the discontinuities of the point cloud with the distance threshold to segment the clusters, and trained three categories of classifiers to recognize the dynamic obstacles in driving environment. Magnier et al. [21] employed belief theory to detect and track moving objects using multi-layer laser scanner. Kim et al. [22] extracted the geometrical features from four-layer point cloud data and detected the pedestrians with RBFAK classifier. Carballo et al. [23] used several intensity-based features for pedestrian detection. Arras et al. [24] conducted pedestrian classification with 14 static features of human legs at indoor environment. Compared with much research on the detection of other road users, e.g., vehicle and pedestrian, the study on cyclist detection using laser scanner is limited. Subirats et al. [8] employed single-layer laser scanner to monitor the road surface and count the cyclists in real traffic flow, and the length of the detected cyclists is subject to the speed of the cyclists. Prabhakar et al. [9] presented the last line check method to detect and count the Powered Two Wheelers (PTWs) with the laser scanner at a fixed angle. In general, the existing study on cyclist detection using laser scanner is mainly conducted for counting the number of the cyclists while ignoring the variation of the cyclists' motion state. In this paper, the proposed method for cyclist detection and tracking using multi-layer laser scanner aims at capturing the moving cyclist with high accuracy.

Point cloud clustering
The range of the scanned obstacles affects the density of the returned point cloud directly. In real driving environment, the point cloud varies frame by frame, and the number of the surrounding clusters is uncertain. DBSCAN clustering method is often used to deal with the uneven point cloud, since there is no need for DBSCAN to set the number of the clusters in advance, and the input parameters are the minimum number of the laser points in the cluster and the neighbor radius [25].
It may happen that traditional DBSCAN method segments multiple obstacles at close range as one cluster, while the single obstacle at long range is clustered as several clusters. Therefore, subarea-based DBSCAN method is proposed in this paper. First, the density distribution curve is generated for the uneven point cloud, and the subareas are divided based on the characteristics of the density distribution. Then, the optimal neighbor radius is calculated for each subarea, and DBSCAN method is applied to the point cloud in each subarea. The point cloud data at one frame in campus environment is taken as an example. First, the point cloud in the region of interest is divided into several subareas along the motion direction of ego-vehicle installed with the laser scanner. Since the excessive subareas may cause the increase of the computation cost and the fewer number of the subareas may lead to some empty subareas, the number of the subareas need to be selected properly. According to the statistical analysis, the relation between the number of the subareas and the number of the points is given as follows: The ratio between the number of the points in each subarea and the number of total points is defined as the density of each subarea. The density histogram is used to describe the distribution of the point cloud. For the example frame, there are 424 points in the interest region, and the number of the subareas equals 10.55 according to formula (1). Thus, the number of the subareas is set as a round number 10. The density distribution curve is obtained based on the density histogram, as shown in Fig. 2. We can see that the wave crest of the density distribution curve means the dense distribution, while the wave valley corresponds to the sparse distribution. The wave valley of this curve is the inflection point of the point cloud density variation, and it is also considered as the partition point for subareas. Note that the valleys between any two wave crests is not always the appropriate partition point, in case of the redundant operation. If the density difference between two adjacent subareas is not too large, there is no need to divide these two subareas, and these subareas can be regarded as one subarea with uniform density. Therefore, the density difference threshold of the subarea is set as a constant value T, and the subareas are divided only when the difference between two adjacent wave peaks is greater than T. According to the attributes of the point cloud from LUX 4L laser scanner, the density difference threshold T is set as 50. As shown in Fig. 2, the density differences among three adjacent peaks X P1 , X P2 and X P3 of the density distribution curve are greater than the threshold T. Thus, the valleys X 1 and X 2 among these three peaks are regarded as two partition points, and the partition locations are X 1 = 12 m and X 2 = 20 m. Finally, three subareas with various densities are obtained, as shown in Fig. 3.
The original DBSCAN algorithm [25] and the proposed subarea-based DBSCAN algorithm are used to cluster the point cloud at this example frame, respectively.  shows the clustering results from these two DBSCAN algorithms, and each color of the point cloud denotes an individual cluster. We can see that the subarea-based DBSCAN algorithm separates two neighboring obstacles successfully while the original DBSCAN algorithm clusters two neighboring obstacles into one obstacle using the global neighbor radius. Thus, the subarea-based DBSCAN algorithm achieves better clustering performance than the original DBSCAN algorithm.

Feature extraction
37-dimensional feature set is proposed for cyclist detection and tracking based on the point cloud characteristics of the cyclist. This feature set includes 11 number-ofpoints-based features, 16 geometric features and 10 statistical features, as listed in Tables 1, 2 and 3. Feature 23 denotes the length of the polyline which connects the horizontal projection points in ascending order of horizontal coordinate value. Features 28-31 denote the convexity of the scan points at each layer, and the convexity is equal to the distance between the center of the fitting circle and the origin point minus the distance between the geometric center of all the scan points and the origin point. Feature 37 denotes the residual ratio of the bounding areas for two edges of the laser points vs. the middle points, and this feature indicates that the middle of the point cloud cluster is denser than the edges. The effectiveness of the feature set is significant to improve the performance of classifier. However, the combination of multiple independent features cannot always make better classification ability than single feature. Therefore, the optimization of the feature set is necessary to reduce the redundancy among multiple features. Relief algorithm [26] and PCA method [27] are employed in the optimization process.

Relief algorithm
Relief algorithm is a feature selection method based on the multi-variable filtering, and it uses the sample learning to determine the weight of the features. Each feature is evaluated by the classification performance difference between the same category The total areas of bounding rectangles at four layers The average area of the bounding rectangles at four layers

SVM classifier
With the strong generalization capability, SVM is utilized as an independent classifier. First, the normalization is conducted as follows: where f is the normalized eigenvalue, f max and f min are the maximum and minimum eigenvalues, respectively. Since Radial Basis Function (RBF) can achieve nonlinear mapping with a few parameters, RBF is selected as the kernel function: where the parameter γ and penalty factor C are traversed by the grid optimization and cross validation using LIBSVM toolbox [28]. In our test, the optimal parameters are: c = 2068, γ = 0.006821.

DT classifier
Since the features are discrete variables, ID3 algorithm is selected as DT classifier. The main scheme of DT classifier is as follows. The feature with the largest information gain is selected as the classification benchmark. The classification process is repeated until a decision tree with the complete classification ability is constructed. The entropy of random variable x is: The entropy of each attribute of the dataset D is: where p i represents the ratio of the number of samples for the ith-dimensional feature and the number of total features, and n represents the number of total features. For the given feature F i , the conditional entropy of the dataset D is: where |D i | represents the number of features for the ith subset, and |D| represents the number of all samples in the dataset D. On the basis of the entropy H(D i ) and the conditional entropy H (D|F i ) of the feature F i , the information gain of the feature F i is calculated by:

Tracking
Object tracking can predict the future motion state and avoid the missing detection caused by temporary occlusion. Moving object tracking consists of data association and motion estimation. The data association procedure associates the same objects in successive frames, and the motion estimation procedure uses the filter method to estimate the position and speed of the associated objects. In this section, MHT algorithm is combined with Kalman filter algorithm to track the cyclist based on the CS model.

Data association
MHT algorithm selects the optimal association hypothesis for the same object at two consecutive frames. And the calculation of hypothesis probability is critical. From the tracking start time to the kth time step, all the measurements are recorded as Z k = {Z(1), Z(2),…, Z(k)} and all the hypothesis sets obtained by MHT algorithm at kth time step are recorded as Ω k = Ω k i , i = 1, 2, . . . , I k . The hypothesis probability P k i is calculated at the kth time step by the hypothesis Ω k i as follows: Assumed that Ω k i is obtained by the correlation hypothesis φ k between the hypothesis Ω k−1 g at previous frame and the measurements Z(k) at the current frame. Bayes theorem is utilized to compute the hypothesis probability: where c is the normalized constant. In terms of the association hypothesis φ k , the number of the measurements associated with the new object at the current frame is marked as N NT (h), the number of the measurements associated with the false object is set as N FT (h), the number of the measurements associated with the previous object is labeled as N DT (h), and the number of all the object is M K . Assumed that the number of the existing detected object obeys binomial distribution, the number of new objects is subject to Poisson distribution, and the number of false objects also obeys Poisson distribution, we can obtain: where P D is the detection probability, β FT is the probability density of the false alarm, β NT is the probability density of the new objects, F n (λ) is the Poisson distribution with the rate parameter . Then we get: The hypothesis probability is provided by: After the probability of each possible association hypothesis is obtained, all the association probabilities are represented by one correlation matrix. The hypothesis H with the maximum association probability is selected as the optimal hypothesis.

Motion model
The motion state of the object is usually described by several common models, including constant velocity model, constant acceleration model and CS model. The motion state of the cyclist varies frame by frame, and sudden acceleration or deceleration may happen at any time. To avoid the large cumulative error, CS model is used to estimate the motion state of the cyclist. CS model is time-correlated, and the mean value is the estimation of acceleration at the current time step, and the probability distribution of the acceleration is represented by the revised Rayleigh distribution [29]. In CS model, the acceleration model of the cyclist is as follows: where ā is the mean value of acceleration, α is the reciprocal of the time constant, w(t) is the white noise with the mean value 0, the variance σ 2 w = 2ασ 2 a , σ 2 a is the acceleration variance. Thus, CS model for the cyclist is established by:

Filtering algorithm
The state variable of the cyclist is: and the state equation is as follows: x(t) = [x(t),ẋ(t),ẍ(t), y(t),ẏ(t),ÿ(t)] T The above formula can also be provided by: where x(t) , ẋ(t) , ẍ(t) , y(t) , ẏ(t) , ÿ(t) represent the position, speed and acceleration of the cyclist along X and Y directions, respectively.ā x , ā y represent the average acceleration along X and Y directions. w x (t), w y (t) denote the zero-mean white noise in X and Y directions, the variance σ 2 wx = 2ασ 2 ax and σ 2 wy = 2ασ 2 ay . σ 2 ax ,σ 2 ay represents the acceleration variance along X and Y direction.
The observation equation is: , v(k) represents Gaussian noise with zero mean and the variance R(k). The cyclist motion system is modeled based on Kalman filter and CS model as follows:

Datasets
The independent clusters of the point cloud samples are segmented by subarea-based DBSCAN algorithm at each frame. In the experiments, the camera synchronously collected the scene image to manually mark the positive and negative point cloud samples of the cyclists. Figure 7 shows the real scene at one frame and the classification result in the point cloud scene. In this figure, each color of the point cloud denotes each scan layer of the laser scanner. The red and black rectangles denote the cyclist and noncyclist, respectively. The positive samples include the cyclists with various poses, and the negative samples include pedestrians, vehicles, lamp posts, trees and other non-cyclist objects. Some positive and negative samples are shown in Fig. 8. We totally extracted 3500 samples, including 1500 positive samples and 2000 negative samples. fivefold cross-validation is utilized to make the classifier generalized.

Detection algorithm validation
To demonstrate the performance of multiple classifiers for cyclist detection, six classifiers are constructed: (1) SVM + F, (2) SVM + F P , (3) SVM + F R , (4) DT + F, (5) DT + F P and (6) DT + F R . ROC curve is used to evaluate the performance of these classifiers, as shown in Fig. 9. AUC (area under the ROC) is the critical parameter to measure the cyclist detection performance. The classifier with larger AUC is superior to other classifiers. Table 5 lists the results of each classifier. It indicates that SVM classifier outperforms DT classifier with the same feature set for cyclist detection. For SVM classifier, the features extracted by PCA method show the best recognition result than other feature sets, while the features from Relief algorithm performs the best for DT classifier. In terms of the same classifier, the classification performance of the original 37-dimensional feature set is the worst. It means that the redundancy exists among the original feature set, and the proposed feature selection methods are essential. In general, AUC values of six classifiers exceed 0.93, and it demonstrates that each classifier show good performance for cyclist detection.

Tracking algorithm validation
To evaluate the proposed cyclist tracking algorithm, the point cloud of moving cyclist in campus is collected using LUX 4L on the stationary vehicle. As shown in Fig. 10, three cyclists moves away frame by frame. We set detection probability P d = 0.9, and the probability that the correct measurement falls into the tracking gate P g = 0.99. The tracking results for Frame 80, Frame 160, Frame 240 are shown in Fig. 10. The black rectangle denotes the detected cyclist, and the speed is explicitly labelled in kilometers per hour. The tails dragged by three cyclists are the historical tracks. It indicates that the combination of MHT algorithm with Kalman filter based on CS model can accurately track multiple cyclists.
To further verify the motion estimation performance of the proposed tracking method, the estimated motion parameters of Cyclist 2 are compared with on-board GPS data. The lateral and longitudinal positions in the tracking process are shown in Fig. 11, and the velocities along the lateral and longitudinal directions are shown in Fig. 12. The point cloud of the cyclist varies with the range frame by frame, and the variation causes the instability of the cyclist center. The measured position and speed of the cyclist even fluctuate obviously, and it does not match the real motion of the cyclist. The stable position and speed of moving cyclists are obtained from the proposed Kalman filter method based on CS model. Thus, the proposed tracking method has a good applicability. (2020) 10:20

Conclusion
In this paper, cyclist detection and tracking method is presented based on multi-layer laser scanner. Firstly, subarea-based DBSCAN algorithm is developed to segment the uneven point cloud based on the density distribution. Secondly, considering the point cloud characteristics of the cyclists, we construct 37-dimensional feature set including number-of-points, geometric and statistical features. Relief algorithm and PCA are further used to optimize the feature set, respectively. Then, three feature sets are combined with SVM and DT classifiers to generate six categories of cyclist classifiers, and the detected cyclists are tracked using MHT algorithm and Kalman filter based on CS motion model. Experimental results show that the subarea-based clustering algorithm can effectively segment the laser points into independent clusters. For SVM classifier, the feature set extracted by PCA method brings the better classification result than other feature sets, while the feature set from relief algorithm performs the best for DT classifier. The proposed cyclist tracking method can obtain the position and speed of moving cyclists robustly. Because of the sparsity of the laser points, future work will be the utilization of various sensors to achieve accurate detection and tracking for the longrange cyclist.