Human motion recognition based on SVM in VR art media interaction environment

In order to solve the problem of human motion recognition in multimedia interaction scenarios in virtual reality environment, a motion classification and recognition algorithm based on linear decision and support vector machine (SVM) is proposed. Firstly, the kernel function is introduced into the linear discriminant analysis for nonlinear projection to map the training samples into a high-dimensional subspace to obtain the best classification feature vector, which effectively solves the nonlinear problem and expands the sample difference. The genetic algorithm is used to realize the parameter search optimization of SVM, which makes full use of the advantages of genetic algorithm in multi-dimensional space optimization. The test results show that compared with other classification recognition algorithms, the proposed method has a good classification effect on multiple performance indicators of human motion recognition and has higher recognition accuracy and better robustness.

In the process of digital performance, body language can often express the true feelings of actors compared with natural language. Therefore, in the virtual environment, the accurate recognition of human-computer interaction is especially important. At this stage, mainstream human motion recognition methods mainly use machine vision technology, involving knowledge of advanced computer disciplines such as image processing, pattern recognition, and machine learning. Among them, the image processing method based on spatiotemporal features and the machine learning method based on representation features have higher robustness, which has become the mainstream of current research [25][26][27][28][29]. Although the computational complexity is high, the two motion recognition methods can recognize continuous motion and interaction. The research direction chosen in this paper is a machine learning based  The dance behavior after the capture is digitally recognized and presented approach. For example, using the Kinect sensor, Shi et al. [27] proposed a human motion recognition method based on the skeleton characteristics of key frames. The method uses K-means clustering algorithm to extract key frames and two features in human motion video sequences and uses SVM classifier to classify action sequences. Qin and Li [28] proposed a real-time recognition system for portable human gestures based on DSP. It uses a combination of wavelet packet principal component analysis and Linear Discriminant Analysis (LDA). All the above methods achieve a certain degree of precision and efficiency in human motion recognition. However, the human body movements in the VR multimedia art scene are more complicated and the changes are more irregular, resulting in the motion data being massive and highdimensional (non-linear feature information), so the spatial feature extraction needs to reduce the dimension as much as possible. Reflect various types of actions. In addition, SVM classifier parameter optimization has a space for improvement.
In view of the spatio-temporal continuity of human motion data, two newest CNN based approaches [30,31] are proposed. They used convolutional neural networks (CNN) to solve the problem of coherent motion recognition and used convolutional neuron spatiotemporal sequences to capture the dependence between input data. However, the size of the convolution kernel limits the range of dependency captures between data samples. Therefore, typical CNN models are not suitable for multiple complex motion recognition. Murad and Pyun [32] based on Deep Recurrent Neural Networks (DRNN) to propose an algorithm for human motion classification and recognition. Although the recognition rate is high, in the training and recognition process many GPU parallel operations are mainly used. It will lead the operations have a certain delay and real-time performance is affected, especially in large digital performances. Thus, their algorithm is not suitable for used in real-time evaluation systems.
In this paper, we proposes a human motion recognition method based on LDA and SVM (named LDA-GA-SVM), in order to improve the efficiency and accuracy of human motion recognition in VR human-computer interaction applications. This method mainly studies from two aspects: (1) Improve the recognition rate of motion features. (2) Improve the accuracy of motion classification. First, introducing a kernel function in LDA for nonlinear projection to map training samples into a high-dimensional subspace, and obtaining the best classification feature vector, effectively solving the nonlinear problem and expanding the sample difference, and reducing the dimensionality of the vector space operating efficiency. Secondly, the genetic algorithm is used to realize the parameter search optimization of SVM, which makes full use of the advantages of genetic algorithm in multi-dimensional space optimization and improves the recognition rate. The experimental results verify the validity and accuracy of the proposed method.
In addition, during the experiment, in the VR environment, the motion data acquisition of the virtual character in human-computer interaction is mainly acquired by the inertia capture device. The process mainly uses the wearable inertial sensor to capture the main bone joint posture data of the human body, and after obtaining the motion capture data, the data file can be imported into the skeleton virtual human model to drive the virtual human model bone movement.
The rest of this paper is organized as follows. The second session introduces the use of the nuclear decision LDA algorithm to extract the effective human motion features; the third session introduces the use of genetic optimization SVM algorithm for accurate motion classification; the fourth session introduces the experimental analysis in the VR environment, for the traditional K-means-SVM algorithm and the LDA-GA-SVM algorithm proposed in this paper are compared and analyzed in terms of accuracy, accuracy, specificity and sensitivity, and the advantages of the proposed method are obtained.

Feature extraction based on nuclear decision LDA
Linear discriminant analysis is a linear method commonly used for feature extraction. The LDA algorithm is insensitive to changes in illumination and attitude and is therefore widely used in image recognition tasks. However, algorithms such as traditional LDA [33] are basically linear.
Due to the complexity and diversity of human motion in VR scenes, some important high-dimensional nonlinear feature information hidden in motion data cannot be extracted. Therefore, this paper introduces a kernel function in the LDA algorithm for nonlinear projection to extract expression features. Combined with the genetically optimized SVM classifier, the complex action classification and recognition is finally realized.
In the human motion data extraction application, let A be the action matrix. In the LDA algorithm, A is a full rank matrix with class labels: is a collection of data items in the i-th class. n i is the size of class i and the total number of data items in data set A is n . Let N i denote the column index belonging to class i. The global center c of A and the local center c i of each class A i are respectively expressed as follows [34] (1) Among them, S b , S w and S t are called inter-class divergence matrix, intra-class divergence matrix and total divergence matrix, respectively.
Then, the standard LDA objective function can look like this: It can be seen that the LDA algorithm is essentially a linear method, so the effect is not very good when dealing with nonlinear problems, and there are singularities. In order to efficiently extract the nonlinear characteristics of the data, we use the kernel decision LDA to extract features.
The basic idea is to map the original training data samples to the high-dimensional feature space H by nonlinear transformation, and then perform linear decision analysis in H . Suppose the nonlinear mapping φ(X) maps X to the high-dimensional feature space H ,

and the Fisher criterion function in H is [34]:
Its summary w is the kernel space projection vector. where u i is the average of the ith samples in H , u is the total average, and S φ w is the intraclass scatter matrix. w can be expressed as: where A = X. Then formula (8) can be expressed as: Among them, K t represents the overall scatter matrix of the kernel, and K b represents the scatter matrix between kernel classes, calculated as follows [35]: where K w is a kernel class scatter matrix. Let A opt denote the feature vector of a set of optimal solutions that maximize Eq. (13). From Eq. (11) we can get the kernel space projection matrix: For any sample point x , its projection in kernel space is given by:

Motion data collection
Different from the image processing method based on spatiotemporal features, the machine learning method based on representation features used in this paper requires motion data acquisition tools with faster transmission speed and higher precision. Therefore, in the multimedia interaction scenario in the virtual reality environment, the Microsoft Corporation Kinect sensor used in the market cannot meet the accuracy requirements. Therefore, a motion data acquisition device based on an inertial sensor is employed. The specific digital performance process, in the VR interactive environment, the wearable hardware devices required for motion acquisition are shown in Fig. 4, and the hardware parameters are shown in Table 1.

Motion data classification based on genetic optimization SVM
The SVM [36] parameter optimization search based on Gaussian radial kernel function is mainly analyzed. Since different penalty factor parameters C and kernel function parameters σ are selected, different performance SVMs will be obtained. Therefore, this genetic algorithm is used to optimize the above two parameters. Cross-product coding in genetic algorithm is based on floating-point coding [37]: where a represents a random number with a range of (0, 1). Use the uniform mutation operator to perform the mutation operation, and select a random value from the specified interval of the relevant gene value to update the original gene value for all mutation points: where r is a random number with a range of (0, 1), U max is the upper limit of the gene position, and U min is the lower limit of the gene position [27]. The fitness function is: where E represents the sum of squared errors and b represents a constant.The main idea of the improved SVM is to optimize the penalty factor parameter C and the kernel function parameter σ of the SVM through a genetic algorithm.

Human motion recognition realization
The main steps proposed to realize human motion recognition are shown in Fig. 5. The main part of the pre-step process is to search for the optimal parameters required by the SVM, mainly using the global search capability of the genetic algorithm, thereby improving the SVM classification performance. The specific steps are as follows: Step 1. Collect human motion data.
Step 2. Perform kernel matrix feature extraction based on LDA algorithm.
Step 3. Search for SVM parameters according to the genetic algorithm and determine whether it is optimal.

Fig. 5 Main steps of the proposed motion recognition method
Step 4. If the parameter is the optimal parameter, the search is completed and recorded. If the non-optimal parameters continue to search.
Step 5. Classify based on the optimized SVM classifier and output the classification result.

Experimental environment
The experimental data is divided into real-time motion acquisition data based on inertial sensors, which is 20G in total. The experimental data set contains 10 types of actions, and the complexity increases in turn. The system structure of the VR multimedia art scene is shown in Fig. 6. The hardware and software parameters of the experimental environment are shown in Table 2. The relevant parameters of the test algorithm are: population size is 50, maximum iteration algebra is 30, crossover probability is 0.8, mutation probability is 0.007, b = 1000, α = 0.5, r = 0.2.

Evaluation indicators
In order to quantify the performance of the proposed method, the four most commonly used evaluation indicators in the action classification field are selected [38][39][40]: Precision, Accuracy, Specificity and Sensitivity, the calculation of the four is as follows:    positive samples of the wrong classification, and FN represents the number of negative samples of the incorrect classification (Table 3).

Experimental results
In the experiment, using the recognition test data, 10 dance motion types are obtained, as shown in Fig. 3. The recognition performance results of the 10 types of dance motion are shown in Table 4. The LDA-GA-SVM algorithm proposed in this paper is compared with the K-means-SVM algorithm [27]. It can be seen from Table 4 that the proposed algorithm increases the average of the Precision and Accuracy indicators by 4.401% and 4.903%, respectively. From the comparison chart of Figs. 7 and 8, the LDA-GA-SVM algorithm results. The Precision and Accuracy indicators of each test point are higher than the K-means-SVM algorithm and are relatively smooth and stable. That is to say, the LDA-GA-SVM algorithm proposed in this paper shows excellent performance in 10 motion type recognition. This is because the adopted genetic algorithm has certain advantages in multi-dimensional space optimization and has a good global search ability. In addition, the proposed algorithm achieves a more balanced result on both the specificity and Sensitivity. The specificity  and Sensitivity mean values of the two algorithms are 90.833%, 92.128%, 92.78%, and 94.006%, respectively. From the comparison in Fig. 10, it can be seen that the Sensitivity index curves of the two algorithms are gradually separated over time, and It can be seen from Figs. 9 and 10 that the index values of the LDA-GA-SVM algorithm are higher than the K-means-SVM algorithm, that is, the sensitivity of the LDA-GA-SVM algorithm is higher. This is due to the use of the nuclear decision LDA feature extraction to solve the nonlinear problem of the traditional LDA and expand the sample difference, so that the performance is more stable. Therefore, in summary, from the precision, accuracy, specificity and sensitivity, the LDA-GA-SVM algorithm proposed in this paper is superior to K-means-SVM algorithm can solve the problem of motion recognition in digital performance of VR environment.

Conclusion
In this paper, we combine the kernel decision LDA algorithm with the genetic optimization-based SVM algorithm to achieve human motion classification and recognition. In order to improve the accuracy of human motion recognition in VR human-computer interaction applications. Introducing a kernel function in LDA for nonlinear projection to map training samples into a high-dimensional subspace, and obtaining the best