This section describes an indoor 3D localisation system for VR applications. A Hough transform algorithm is applied to detect the indoor boundary walls. A multiple Kinects selection method is proposed to localise a user’s position with an omnidirectional orientation.
Indoor boundary detection from 3D point clouds
To estimate the localisation of indoor walls, we describe a framework of plane detection in 3D point clouds, as shown in Fig. 1. The framework mainly includes the registration of 3D point clouds, a height histogram of 3D points, non-ground points segmentation and planar surface detection.
An indoor environment always contains six large planes, including four surrounding walls, the floor and the roof. This project aims to segment the non-ground walls from the detected planes to estimate the environmental size. A height histogram, as shown in Fig. 2, is first utilised to estimate the voxel distribution of the height [14]. Since the points located on the floor or roof surfaces always have the same height value, the two peaks of the height histogram are considered to be the floor and roof surfaces. After the peaks are filtered out, the non-ground points are then segmented.
In indoor environments, the planes of boundary walls always form a cuboid shape. Since most LiDAR points are projected onto the walls, the mapped 2D points on the x–z plane from the wall points are combined into four straight lines. The pairwise opposite lines are parallel to each other and the neighbour lines are orthogonal to each other. For indoor boundary detection, a Hough transform algorithm is applied to estimate the parameters of the mapped lines on x–z plane from the segmented non-ground voxels. A flowchart of the applied Hough Transform is shown in Fig. 3.
We assume that the walls are always orthogonal to the x–z plane. Hence, the wall plane is formulated using the following linear Eq. (1):
$$r = x\cos \alpha + z\sin \alpha$$
(1)
As shown in Fig. 4a, r is the distance from the origin to the straight line and α is the angle between the vertical direction of the line with the x axis. The Hough space is defined as the (r–α) plane calculated from a set of LiDAR points in x and z coordinates. The approximate sinusoidal curve in Fig. 4b represents the Hough space of a 2D point. As shown in Fig. 4c, all sinusoidal curves computed using the Hough transform from the points in a straight line cross at several points. The r and α coordinates of the maxima in the Hough space are the line parameters.
The wall planes contain most of the points that form several straight lines on the x–z plane. Therefore, the four peaks of the Hough space are recognised as the parameters of the boundary wall planes after the (r, α) coordinates are generated from all the sensed indoor points using the Hough Transform. Each (r, α) cell in the Hough space records the count of the mapped LiDAR points; these indicate the occurrence frequency. The four peaks always exist in the intensive areas as shown in Fig. 4d.
Figure 5a presents an instance of the occurrence frequency in the Hough space. To segment the intensive areas, the low frequency cells are filtered out using a threshold based on the occurrence frequency distribution of the cells. The valid cells are segmented as shown in Fig. 5b, and are classified into several distinguishable blocks using the CCL algorithm. In the CCL algorithm, the label of each cell is initialised corresponding to its index, as shown in Fig. 5c. To mark each distinguishable block with a unique label, the minimum label in Fig. 5d is searched for among a clique of each cell that contains the local, right and bottom cells. The clique updates the labels with the minimum label in it. Several seeking iterations of the minimum labels are implemented until all labels remain unchanged. The minimum label in a distinguishable block in Fig. 5e is the indicator of the connected valid cells. Finally, the corresponding (r, α) coordinate of the largest value in each distinguishable block of Fig. 5f is the required straight-line parameter.
Adaptive Kinect selection
We propose a wireless sensor network to localise the VR user using the integration of multiple Kinects. As shown in Fig. 6, the user’s motion and position datasets are detected from multiple views using the Kinects. The distributed Kinects report the sensed datasets to a VR server via a WiFi network. An adaptive Kinect is selected using a bivariate Gaussian PDF.
A Kinect is installed at each client to detect the user’s gesture information from different viewpoints. From several gathered datasets, the effectiveness of each sensor is generated based on the user’s distance d
i
and orientation θ
i
to the Kinect k
i
. If the distance is close and the orientation of the user is facing towards a sensor, the effectiveness of this sensor is then high. To select the best sensor, we apply a bivariate Gaussian PDF for the effectiveness estimation, formulated as follows:
$$f_{k_{i}} (d_{i} ,\theta_{i} ) = \frac{{\exp \left[ { - \frac{{\left(\frac{{d_{i} - d_{0} }}{{\sigma_{1} }}\right)^{2} - \frac{{2\rho \left(d_{i} - d_{0} \right)\left(\theta_{i} - \theta_{0} \right)}}{{\sigma_{1} \sigma_{2} }} + \left(\frac{{\theta_{i} - \theta_{0} }}{{\sigma_{2} }}\right)^{2} }}{{2\left(1 - \rho^{2} \right)}}} \right]}}{{2\pi \sigma_{1} \sigma_{2} \sqrt {1 - \rho^{2} } }} .$$
(2)
Here, the variables d∈[0 ~ ∞), θ∈[− π~π), σ
1 = 1, σ
2 = 1, ρ∈[− 1, 0]. The adaptive Kinect is selected using a maximum likelihood function expressed as follows:
$$k = \mathop {\arg \hbox{max} }\limits_{{k_{i} }} f_{{k_{i} }} (d_{i} ,\theta_{i} ) .$$
(3)