In this paper, color and IR images of the VR HMD and LM are acquired using a color-IR camera to estimate the position of the VR HMD and LM attached to the device. Figure 1 depicts the methodological flow of the proposed system.

Color and IR images should be acquired at the same time and taken at least three different positions. This paper utilizes Microsoft Kinect to simultaneously acquire color and IR images. Figure 2 shows an example of Kinect capturing the LM-attached VR HMD.

The Bundle Adjuster estimates the positional relationship between each pixel of the obtained color image. Images of VR HMD and LM with IR camera are obtained and it is proceeded to the proposed method, it is possible to estimate the 3D positional relationship of all the pixels of the image as one coordinate system.

On the other hand, the BLOB detector estimates the coordinates of the IR LEDs shown in IR images. If the IR pixel coordinates are classified into the HMD group and the LM group, and the IR LED coordinates are grouped by the number of IR LEDs, the clusters of all the IR LEDs can be estimated from the IR image. Finally, it is determined which equipment each IR LED cluster corresponds to. The 3D positions of HMD and LM can be expressed on one coordinate system.

Using the proposed method, three or more color and IR images can be used to represent the position of the VR HMD and LM on a single 3D coordinate system. The relative position of the LM attached to the HMD can be measured by substituting the physical position difference of the IR LEDs in a device into the 3D coordinate system.

### Estimation of image-pixel relative 3D location using the bundle adjuster

The bundle adjuster uses BAA to derive 3D locations of all pixels from the gathered images. The BAA results are a point cloud and transform matrices of the same number as those of the image frames. Consequently, the space coordinate of the 2D image pixel can be known.

The bundle adjuster is operated as follows. Infrared images of HMD where LM is attached are inputted. The infrared images are converted into a few series of sparse matrices. A 3D point cloud and transform matrices (the same number as infrared images) are estimated from the sparse matrices by the sparse bundle adjuster. When the projection matrix and 3D point cloud are multiplied, the 2D point coordinate that is projected onto the corresponding 2D image is generated.

First, a pixel value of the obtained image is received to produce a sparse matrix, *M*_{t}, to reduce the computation amount of the bundle adjustment. Here, *i*_{t,x,y} refers to a pixel value of the image acquired at time *t.* Pixel *i*_{t,x,y} refers to a grayscale value in the range from 0 to 255. The calculated sparse matrix is defined as *M*_{t}. Assuming that α refers to the image width and β refers to the infrared image height, *M*_{t} has the size of α × β, and the element of the sparse matrix is defined as *m*_{t,x,y}.

In addition, threshold θ is a constant specified for comparison with each pixel. As specified in Eq. (1), if *i*_{t,x,y} is less than threshold θ, which is determined by user testing, zero is assigned to *m*_{t,x,y}; otherwise, one is assigned to it.

$$f(i|\forall i,i \in I),f(i) = \left\{ {\begin{array}{*{20}c} {0, \quad i_{t,x,y} < \theta } \\ {1, \quad i_{t,x,y} \ge \theta } \\ \end{array} } \right.$$

(1)

A single point cloud *P *= {(*x*_{0},*y*_{0},*z*_{0}), (*x*_{1},*y*_{1},*z*_{1}),…} and *n* transform matrices are calculated to acquire the 3D location of the image pixel after receiving *n* sparse matrices.

Once *n* infrared images and sparse matrices are inputted, 3D location coordinate is estimated based on the sparse matrices through the sparse bundle adjuster to configure point cloud *P*. Figure 3 shows the IR LED detection method based on BAA.

This paper is based on HMD images obtained at different points of sight. In Fig. 3, the circles in Figs. 1, 2 and 3 refer to image pixels. Here, common feature points are searched using a method that compares feature points between pixels with regard to all pixels. Once the feature points are found, 3D location *p(X*, *Y*, *Z*), which minimizes the change value of all pixels inside the image, is calculated using the Levenberg–Marquardt optimization technique [24].

The bundle adjuster estimates the 3D space coordinate of image pixels after receiving a number of 2D color images. As a result, the point cloud and transform matrix are derived via the BAA.

### Pixel matching using infrared BLOB detector

Although the bundle adjuster can estimate a 3D space coordinate from the 2D image pixel, it should be determined whether a pixel is on HMD or LM. In the BLOB detector, infrared images of HMD and LM are inputted, and pixels emitted from the IR LEDs of the device become point clouds. Accordingly, it can be determined whether each of the pixels in the 2D image is located in the infrared light of HMD and LM.

The BLOB detector generates a cluster of circle shapes using images of HMD and LM inputted via the infrared sensor. It then calculates the center point of the cluster and the distance to the farthest point from the center point. Three clusters whose distances are the farthest from the center point become the clusters of the infrared image where the IR LEDs of LM are located.

The BLOB detector receives *n* infrared images *i*_{t} and outputs cluster *G*_{t,p}, center point *C*_{t,p}, and *d*_{t,p}, which is the distance to the farthest point from the center point inside the cluster. The *p*-th cluster with regard to infrared image *i*_{t} obtained at time *t* is defined as *G*_{t,p}. *g*_{t,p,q} refers to the *q*-th element in cluster *G*_{t,p}. The center point of *g*_{t,p,q} is defined as *c*_{t,p}= (*c*_{t,p,x}*,c*_{t,p,y}). The distance to the point farthest from the center point is defined as *d*_{t,p}.

The BLOB detection algorithm is applied to derive the 2D locations of bulbs in the infrared image. In the images, the surrounding objects are removed based on infrared scattering. A shape of BLOB is displayed as a circle referencing the part where the IR LED, the light source, is obtained. The Hough circle transform algorithm is used as an algorithm that extracts a circle shape from the image. Center point *c*_{t,p}= (*c*_{t,p,x}*,c*_{t,p,y}) of the circle, obtained thereafter, can be calculated via Eq. (2). Euclidean distance *E*_{d} is derived via Eq. (3), and distance *d*_{t,p} to the point farthest from the center point is derived via Eq. (4).

$$c_{t,p} = \frac{1}{n} \left (\sum\limits_{i = 0}^{n} {g_{t,p,i} } ,\sum\limits_{j = 0}^{n} {g_{t,p,j} } \right)$$

(2)

$$E_{d} (a,b) = \sqrt {(x_{a} - x_{b} )^{2} + (y_{a} - y_{b} )^{2} + (z_{a} - z_{b} )^{2} }$$

(3)

$$d_{t,p} = \text{max} \left( {E_{d} \left( {c_{t,p} ,p_{1} } \right),E_{d} \left( {c_{t,p} ,p_{2} } \right), \ldots } \right)\quad \left( {p \in P} \right)$$

(4)

Afterward, cluster *G*_{t,p}, center point *c*_{t,p}, and distance *d*_{t,p}, which were produced from the BLOB detector, as well as 2D point cloud *P*′_{t}, are inputted, and point cloud *P*′_{t} is separately configured as cluster *G*′_{t,p} when each point of point cloud *P’*_{t} is located inside element *g*_{t,p}. Accordingly, it is determined whether the estimated 2D point cloud *P*′_{t} corresponds to a 2D IR LED location. The cluster that collects 2D point cloud *P*′_{t} belonging within cluster *G*_{t,p} is defined as *G*′_{t,p}. If the Euclidean distance of center point *c*_{t,p} obtained from the BLOB detector and the acquired 2D location coordinate *p*′_{t}*(x*′*t,y*′*t)*—calculated via Eq. (3)—is less than *d*_{t,p}, the 2D location coordinate is included in cluster *G*′_{t,p}. Cluster *G*′_{t,p}, which includes 2D point cloud *P*′, is outputted.

The infrared clustering can be achieved with infrared images using the BLOB detector. In the 2D color images, because locations of HMD and LM cannot be determined, clustering based on infrared images is conducted to determine the location of the device inside the image.

### Detection of relative 3D location between HMD and LM using location estimator

The point cloud in the 3D space coordinate system of the 2D color image pixels is estimated by the bundle adjuster. The 2D locations of the IR LEDs in a 2D infrared image is determined by the BLOB detector. Then, 3D locations of HMD and LM are estimated by finding the 3D locations of the IR LEDs over the 3D space coordinate through the location estimator. Using the size of the infrared cluster, it can be determined whether the infrared cluster is corresponding to an IR LED cluster of HMD or that of LM. The device shapes can be estimated by determining the 3D locations of the IR LEDs of devices.

Once the coordinate of the 2D points with regard to the coordinate of all 3D points is calculated, the BLOB detector calculates whether the 2D point coordinate is located inside the cluster of LM or HMD. The center points of the 3D point coordinates, which correspond to the 2D point coordinate inside the LM cluster, are calculated. This becomes the 3D location coordinate system of LM. Similarly, the center points of the 3D point coordinates, which correspond to 2D point coordinate inside the HMD cluster, are calculated. This becomes the 3D location coordinate system of HMD.

The location detector receives 3D point cloud *P* acquired from the 2D point detector and *G*′_{t,p} produced from the BLOB detector. It then outputs the origin coordinate of LM, *P*_{L} = (*x*_{l}*,y*_{l}*,z*_{l}), and the origin coordinate of HMD, *P*_{O} = (*x*_{o}*,y*_{o}*,z*_{o}). The original 3D location of LM is defined as *P*_{L} = (*x*_{l}*,y*_{l}*,z*_{l}), and the original 3D location of HMD is *P*_{O} = (*x*_{o}*,y*_{o}*,z*_{o}). The 3D locations of the IR LEDs of HMD and LM are shown in Fig. 4.

The location detector receives 3D point cloud *P*, which is approximated in the sparse bundle adjuster, and *n* transform matrices. It then derives 2D point cloud *P’* after identifying the location relationship between 3D point cloud *P* and the infrared image. The 2D point cloud projected onto the infrared image is *P*′_{t} = {(*x*′_{t},_{0}*,y*′_{t,0}*),* (*x*′_{t},_{1}*,y*′_{t,1}),…}. It is then multiplied with all transform matrices in 3D point cloud *P.* Consequently, 2D point cloud *P*′, which is the 2D location of images formed on the 2D camera coordinate system, can be derived. When 3D point cloud *P* is multiplied with the transform matrices, *P* is first transformed into a column vector and then multiplied. The 3D point cloud *P* and 2D point cloud *P*′ projected at *P* are thereby produced.

Furthermore, the 3D location of cluster *G*′_{t,p} can be calculated using 3D point cloud *P,* 2D point cloud *P*′, and cluster *G*′_{t,p}, which is acquired in the BLOB detector. The mean of the point *p* coordinate value inside 3D point cloud *P* is calculated prior to the projection with regard to all points *p*′ inside 2D point cloud *P*′ projected at the 2D point detector. It is included in *G*′_{t,p} and then substituted with center point *c*′_{t,p} of *G*′_{t,p}. Once the 3D center point of the cluster is derived, original 3D location *P*_{L} of LM and original 3D location *P*_{O} of HMD are calculated.

The mean of the center points of the three clusters—whose radius *d*′_{t,p} of cluster *G*′_{t,p} is the largest—is set to the 3D location coordinate of an IR LED *P*_{L} of LM. The mean of the remaining center points in the cluster is set to a 3D location coordinate of an IR LED *P*_{O} of HMD. If the original 3D location of LM is moved to the original 3D location of the coordinate in HMD, the changed 3D location of LM becomes *P*′_{L}(*x*_{o}−* x*_{l}*, y*_{o}−* y*_{l}*, z*_{o}−* z*_{l}). Figure 5 shows the derived 3D location of LM and HMD.

When a user employs the LM-attached HMD, the origin in the coordinate systems of LM and HMD becomes the same if the value of *P*′_{L}(*x*_{o}−* x*_{L}*, y*_{o}−* y*_{L}*, z*_{o}−* z*_{L}) is used as the origin of the coordinate system in LM.

The location estimator can find the type of IR LED clusters and 3D location. Accordingly, the original 3D location of the device can be determined using the 3D locations of the IR LEDs of the device.