A 3D localisation method in indoor environments for virtual reality applications
- Wei Song†1Email authorView ORCID ID profile,
- Liying Liu†1,
- Yifei Tian†1,
- Guodong Sun†2,
- Simon Fong†3 and
- Kyungeun Cho†4
© The Author(s) 2017
Received: 26 April 2017
Accepted: 6 October 2017
Published: 13 October 2017
Virtual Reality (VR) has recently experienced rapid development for human–computer interactions. Users wearing VR headsets gain an immersive experience when interacting with a 3-dimensional (3D) world. We utilise a light detection and ranging (LiDAR) sensor to detect a 3D point cloud from the real world. To match the scale between a virtual environment and a user’s real world, this paper develops a boundary wall detection method using the Hough transform algorithm. A connected-component-labelling (CCL) algorithm is applied to classify the Hough space into several distinguishable blocks that are segmented using a threshold. The four largest peaks among the segmented blocks are extracted as the parameters of the wall plane. The virtual environment is scaled to the size of the real environment. In order to synchronise the position of the user and his/her avatar in the virtual world, a wireless Kinect network is proposed for user localisation. Multiple Kinects are mounted in an indoor environment to sense the user’s information from different viewpoints. The proposed method supports the omnidirectional detection of the user’s position and gestures. To verify the performance of our proposed system, we developed a VR game using several Kinects and a Samsung Gear VR device.
KeywordsKinect LiDAR Hough transform Connected-component-labelling Virtual reality
In recent years, head-mounted displays have been widely developed for Virtual Reality (VR) simulations and video games. However, due to the need to wear stereoscopic displays, users cannot view their real environment. Traditionally, the virtual environment’s boundary does not match that of a user’s real environment. Thus, collisions between the user and the real world always occur in VR applications and cause poor user experiences.
To create an adaptive virtual environment, boundary measurement of the real environment is necessary for warnings. Currently, a light detection and ranging (LiDAR) sensor is utilised to detect the 3D point cloud of the surrounding environment. From the point cloud, large planar regions are recognised as the boundary walls . In order to detect the boundary of an indoor environment, this paper develops a boundary wall detection method based on the Hough transform algorithm . After the Hough transform is implemented on the LiDAR datasets, a connected-component-labelling (CCL) algorithm is applied to classify the segmented intensive regions of the Hough space into several distinguishable blocks. The corresponding Hough coordinates of the largest four peaks of the blocks are recognised as the wall plane parameters. By scaling the virtual environment to the real environmental range, the user is able to act in the virtual environment without collisions, thus enhancing the user experience.
The tracking of the skeleton of a human body using RGB images and the depth sensors of the Microsoft Kinect has been widely applied for interactions between users and virtual objects in VR applications . When we utilise the Kinect to acquire a user’s gesture, the user needs to stand in front of the Kinect within a limited distance and face the Kinect . Otherwise, weak and inaccurate signals are sensed. For omnidirectional detection, this paper proposes a multiple Kinect network using a bivariate Gaussian probability density function (PDF). In the system, multiple Kinect sensors installed in an indoor environment detect a user’s gesture information from different viewpoints. The sensed datasets of the distributed clients are sent to a VR management server that selects an adaptive Kinect based on the user’s distance and orientation. In our method, only small datasets of the user’s position and body joints are delivered from the Kinect clients to the server; this satisfies the real-time transmission requirements .
The remainder of this paper is organised as follows. “Related works” section provides an overview of related works. “A 3D localisation system” section describes the 3D localisation system, including the environmental boundary walls detection method and wireless Kinect sensor network selection. “Experiments” section illustrates the experiment results. Finally, “Conclusions” section concludes this paper.
To realise a virtual–physical collaboration approach, environmental recognition methods such as plane and feature detection have been researched . Zucchelli et al.  detected planes from stereo images using a motion-based segmentation algorithm. The planar parameters were extracted automatically with projective distortions. The traditional Hough transform was usually used to detect straight lines and geometric shapes from the images. Trucco et al.  detected the planes from the disparity space using a Hough-like algorithm. Using these methods, matching errors were caused when the outliers overlapped with the plane regions.
To detect continuous planes, Hulik et al.  optimised a 3D Hough transform to extract large planes from LiDAR and Kinect RGB-D datasets. Using a Gaussian smoothing function, the noise in the Hough space was removed to preserve the accuracy of the plane detection process. In order to speed up the Hough space updating process, a caching technique was applied for point registration. Compared with the traditional plane detection algorithm Compared Random sample consensus (RANSAC) , the 3D Hough transform performed faster and was more stable. During the maxima extraction process from the Hough space, this method applied a sliding window technique with a pre-computed Gaussian kernel. When dense noise exists surrounding a line, more than one peak is extracted in a connected segmented region using this method. In order to maintain stable line estimation, this paper applied a CCL algorithm to preserve only one peak extracted in one distinguishable region .
To localise and recognise a user’s motion, Kinect is a popular display device in VR development. It is able to report on the user’s localisation and gesture information. However, a single Kinect can only capture the front-side of users facing the sensor. To sense the back-side, Chen et al.  utilised multiple Kinects to reconstruct an entire 3D mesh of the segmented foreground human voxels with colour information. To track people in unconstrained environments, Sun et al.  proposed a pairwise skeleton matching scheme using the sensing results from multiple Kinects. Using a Kalman filter, their skeleton joints were calibrated and tracked across consecutive frames. Using this method, we found that different Kinects provided different localisation of joints because the sensed surfaces were not the same from different viewpoints.
To acquire accurate datasets from multiple sensors, Chua et al.  addressed a sensor selection problem in a smart-house using a naïve Bayes classifier, a decision tree and k-Nearest-Neighbour algorithms. Sevrin et al.  proposed a people localisation system with a multiple Kinects trajectory fusion algorithm. The system adaptively selected the best possible choice among the Kinects in order to detect people with a highly accurate rate . Following these sensor selection methods, we developed a wireless and reliable sensor network for VR applications to enable users to walk and interact freely with virtual objects.
A 3D localisation system
This section describes an indoor 3D localisation system for VR applications. A Hough transform algorithm is applied to detect the indoor boundary walls. A multiple Kinects selection method is proposed to localise a user’s position with an omnidirectional orientation.
Indoor boundary detection from 3D point clouds
The wall planes contain most of the points that form several straight lines on the x–z plane. Therefore, the four peaks of the Hough space are recognised as the parameters of the boundary wall planes after the (r, α) coordinates are generated from all the sensed indoor points using the Hough Transform. Each (r, α) cell in the Hough space records the count of the mapped LiDAR points; these indicate the occurrence frequency. The four peaks always exist in the intensive areas as shown in Fig. 4d.
Adaptive Kinect selection
In this section, we analyse the performance of the proposed indoor boundary walls detection method from LiDAR points and illustrate a VR application developed using the proposed 3D localisation method. The experiments were implemented using one HDL-32E Velodyne LiDAR and two Microsoft Kinect2 sensors. The wall detection method was executed on a 3.20 GHz Intel® Core™ Quad CPU computer with a GeForce GT 770 graphics card and 4 GB of RAM. The Kinects were utilised to detect a user’s gesture on two clients; these were 3.1 GHz Intel® Core™ i7-5557U CPU NUC mini PCs with 16 GB of RAM. The VR client was implemented on a Samsung Gear VR with a Samsung Galaxy Note 4 in it. The Note 4 had a 2.7 GHz Qualcomm Snapdragon Quad CPU, 3 GB of RAM, a 2560 × 1440 pixels resolution and the Android 4.4 operating system.
The range of the indoor environment was estimated to be 9.94 m in length and 7.54 m in width. The virtual environment was resized to correspond to the real environment so as to achieve virtual–physical synchronisation. The wall detection method was implemented during an initialisation step before the VR application was started. Using the proposed system, we developed a VR boxing game as shown in Fig. 9.
In the system, the user’s location and orientation were detected by two Kinects. When the player was facing a Kinect with a distance between 2 and 6 m, the motion information was sensed precisely. Through the experiments, we found that d 0 = 5.0 and θ 0 = 0.0 is the perfect position for Kinect detection. Through the selection of an effective Kinect, the user was able to make free movements and interact with the virtual boxer from an omnidirectional orientation. Meanwhile, the monitor of the server rendered the game visualisation result synchronously with the VR display. The processing speed of our application including data sensing, transmission and visualisation was greater than 35 fps; this successfully achieved the real-time requirements.
To provide a free movement environment for VR applications, this paper demonstrated a 3D localisation method for virtual–physical synchronisation. For environmental detection, we utilised a HDL-32E Velodyne LiDAR sensor to detect the surrounding 3D point clouds. Using the Hough transform, a plane detection algorithm was proposed to extract indoor walls from point clouds so as to estimate the distance range of the surrounding environment. The virtual environment was then correspondingly resized. To match the user’s position between real and virtual worlds, a wireless Kinects network was proposed for omnidirectional detection of the user’s localisation. In the sensor selection process, we applied a Bivariate Gaussian PDF and the Maximum Likelihood Estimation method to select an adaptive Kinect. In the future, we will integrate touch sensors to the system for virtual–physical collaboration.
WS and LL described the proposed algorithms and wrote the whole manuscript. YT and GS implemented the experiments. SF and KC revised the manuscript. All authors read and approved the final manuscript.
This research was supported by the National Natural Science Foundation of China (61503005), and by NCUT XN024-95. This paper is a revised version of a paper entitled ‘A Wireless Kinect Sensor Network System for Virtual Reality Applications’ presented in 2016 at Advances in Computer Science and Ubiquitous Computing-CSA-CUTE2016, Bangkok, Thailand .
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Dick A, Torr P, Cipolla R (2004) Automatic 3d modeling of architecture, In: Proc. 11th British Machine Vision Conf. pp 372–381Google Scholar
- Mukhopadhyay P, Chaudhuri B (2015) A survey of hough transform. Pattern Recognit 48(3):993–1010View ArticleGoogle Scholar
- Ales P, Oldrich V, Martin V et al (2015) Use of the image and depth sensors of the Microsoft Kinect for the detection of gait disorders. Neural Comput Appl 26(7):1621–1629View ArticleGoogle Scholar
- Mohammed A, Ahmed S (2015) Kinect-based humanoid robotic manipulator for human upper limbs movements tracking. Intell Control Autom 6(1):29–37View ArticleGoogle Scholar
- Song W, Sun G, Fong S et al (2016) A real-time infrared LED detection method for input signal positioning of interactive media. J Converg 7:1–6Google Scholar
- Junho A, Richard H (2015) An indoor augmented-reality evacuation system for the Smartphone using personalized Pedometry. Hum Centric Comput Inf Sci 2:18Google Scholar
- Zucchelli M, Santos-Victor J, Christensen HI (2002) Multiple plane segmentation using optical flow. In: Proc. 13th British Machine Vision Conf. pp 313–322Google Scholar
- Trucco E, Isgro F, Bracchi F (2003) Plane detection in disparity space. In: Proc. IEE Int. Conf. Visual Information Engineering. pp 73–76Google Scholar
- Hulik R, Spanel M, Smrz P, Materna Z (2014) Continuous plane detection in point-cloud data based on 3D hough transform. J Vis Commun Image R 25(1):86–97View ArticleGoogle Scholar
- Schnabel R, Wahl R, Klein R (2007) Efficient RANSAC for point-cloud shape detection. Comput Graph Forum 26(2):214–226View ArticleGoogle Scholar
- Song W, Tian Y, Fong S, Cho K, Wang W, Zhang W (2016) GPU-accelerated foreground segmentation and labeling for real-time video surveillance. Sustainability 8(10):916–936View ArticleGoogle Scholar
- Chen Y, Dang G, Chen Z et al (2014) Fast capture of personalized avatar using two Kinects. J Manuf Syst 33(1):233–240View ArticleGoogle Scholar
- Sun S, Kuo C, Chang P (2016) People tracking in an environment with multiple depth cameras: a skeleton-based pairwise trajectory matching scheme. J Vis Commun Image R 35:36–54View ArticleGoogle Scholar
- Chua SL, Foo LK (2015) Sensor selection in smart homes. Procedia Comput Sci 69:116–124View ArticleGoogle Scholar
- Sevrin L, Noury N, Abouchi N et al (2015) Preliminary results on algorithms for multi-kinect trajectory fusion in a living lab. IRBM 36:361–366View ArticleGoogle Scholar
- Erkan B, Nadia K, Adrian FC (2015) Augmented reality applications for cultural heritage using Kinect. Hum Centric Comput Inf Sci 5(20):1–8Google Scholar
- Li M, Song W, Song L, Huang K, Xi Y, Cho K (2016) A wireless kinect sensor network system for virtual reality applications. Lect Notes Electr Eng 421:61–65Google Scholar