Skip to main content

Human motion recognition based on SVM in VR art media interaction environment


In order to solve the problem of human motion recognition in multimedia interaction scenarios in virtual reality environment, a motion classification and recognition algorithm based on linear decision and support vector machine (SVM) is proposed. Firstly, the kernel function is introduced into the linear discriminant analysis for nonlinear projection to map the training samples into a high-dimensional subspace to obtain the best classification feature vector, which effectively solves the nonlinear problem and expands the sample difference. The genetic algorithm is used to realize the parameter search optimization of SVM, which makes full use of the advantages of genetic algorithm in multi-dimensional space optimization. The test results show that compared with other classification recognition algorithms, the proposed method has a good classification effect on multiple performance indicators of human motion recognition and has higher recognition accuracy and better robustness.


Today, with the rapid development of computer technology such as Internet of things (IoT), wireless communications, edge computing, and data mining [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], various advanced multimedia technologies emerge one after another. Due to the “immersive” realism, Virtual Reality (VR) can bring a new experience to users in a more natural and realistic human–computer interaction [19,20,21]. Many kinds of multimedia applications based on VR technology have gradually become the hotspots of future cultural, art and entertainment markets, such as virtual shopping community, immersive virtual reality games, virtual landscape roaming and virtual art stage performances [22,23,24]. Among them, the multimedia human–computer interaction technology in the art scene needs to capture and recognize the human body motion in real time and accurately, in order to achieve better interaction effect and artistic sensory experience. In order to enable more natural and effective communication between people and computers, the motion recognition interactive system needs to be able to accurately identify various complex and varied human actions. As shown in Fig. 1, in the digital performance, to digitally preview the dance, first capture the action of the stage dancers. Then, as shown in Fig. 2, the dance behavior after the capture is digitally recognized and presented. Figure 3 shows the interaction of the identified actions in the VR scenario.

Fig. 1

Motion capture of stage dancers

Fig. 2

The dance behavior after the capture is digitally recognized and presented

Fig. 3

The identified actions are interactively applied in the VR scene

In the process of digital performance, body language can often express the true feelings of actors compared with natural language. Therefore, in the virtual environment, the accurate recognition of human–computer interaction is especially important. At this stage, mainstream human motion recognition methods mainly use machine vision technology, involving knowledge of advanced computer disciplines such as image processing, pattern recognition, and machine learning. Among them, the image processing method based on spatiotemporal features and the machine learning method based on representation features have higher robustness, which has become the mainstream of current research [25,26,27,28,29]. Although the computational complexity is high, the two motion recognition methods can recognize continuous motion and interaction. The research direction chosen in this paper is a machine learning based approach. For example, using the Kinect sensor, Shi et al. [27] proposed a human motion recognition method based on the skeleton characteristics of key frames. The method uses K-means clustering algorithm to extract key frames and two features in human motion video sequences and uses SVM classifier to classify action sequences. Qin and Li [28] proposed a real-time recognition system for portable human gestures based on DSP. It uses a combination of wavelet packet principal component analysis and Linear Discriminant Analysis (LDA). All the above methods achieve a certain degree of precision and efficiency in human motion recognition. However, the human body movements in the VR multimedia art scene are more complicated and the changes are more irregular, resulting in the motion data being massive and high-dimensional (non-linear feature information), so the spatial feature extraction needs to reduce the dimension as much as possible. Reflect various types of actions. In addition, SVM classifier parameter optimization has a space for improvement.

In view of the spatio-temporal continuity of human motion data, two newest CNN based approaches [30, 31] are proposed. They used convolutional neural networks (CNN) to solve the problem of coherent motion recognition and used convolutional neuron spatiotemporal sequences to capture the dependence between input data. However, the size of the convolution kernel limits the range of dependency captures between data samples. Therefore, typical CNN models are not suitable for multiple complex motion recognition. Murad and Pyun [32] based on Deep Recurrent Neural Networks (DRNN) to propose an algorithm for human motion classification and recognition. Although the recognition rate is high, in the training and recognition process many GPU parallel operations are mainly used. It will lead the operations have a certain delay and real-time performance is affected, especially in large digital performances. Thus, their algorithm is not suitable for used in real-time evaluation systems.

In this paper, we proposes a human motion recognition method based on LDA and SVM (named LDA-GA-SVM), in order to improve the efficiency and accuracy of human motion recognition in VR human–computer interaction applications. This method mainly studies from two aspects: (1) Improve the recognition rate of motion features. (2) Improve the accuracy of motion classification. First, introducing a kernel function in LDA for nonlinear projection to map training samples into a high-dimensional subspace, and obtaining the best classification feature vector, effectively solving the nonlinear problem and expanding the sample difference, and reducing the dimensionality of the vector space operating efficiency. Secondly, the genetic algorithm is used to realize the parameter search optimization of SVM, which makes full use of the advantages of genetic algorithm in multi-dimensional space optimization and improves the recognition rate. The experimental results verify the validity and accuracy of the proposed method.

In addition, during the experiment, in the VR environment, the motion data acquisition of the virtual character in human–computer interaction is mainly acquired by the inertia capture device. The process mainly uses the wearable inertial sensor to capture the main bone joint posture data of the human body, and after obtaining the motion capture data, the data file can be imported into the skeleton virtual human model to drive the virtual human model bone movement.

The rest of this paper is organized as follows. The second session introduces the use of the nuclear decision LDA algorithm to extract the effective human motion features; the third session introduces the use of genetic optimization SVM algorithm for accurate motion classification; the fourth session introduces the experimental analysis in the VR environment, for the traditional K-means-SVM algorithm and the LDA-GA-SVM algorithm proposed in this paper are compared and analyzed in terms of accuracy, accuracy, specificity and sensitivity, and the advantages of the proposed method are obtained.

Feature extraction based on nuclear decision LDA

Linear discriminant analysis is a linear method commonly used for feature extraction. The LDA algorithm is insensitive to changes in illumination and attitude and is therefore widely used in image recognition tasks. However, algorithms such as traditional LDA [33] are basically linear.

Due to the complexity and diversity of human motion in VR scenes, some important high-dimensional nonlinear feature information hidden in motion data cannot be extracted. Therefore, this paper introduces a kernel function in the LDA algorithm for nonlinear projection to extract expression features. Combined with the genetically optimized SVM classifier, the complex action classification and recognition is finally realized.

In the human motion data extraction application, let A be the action matrix. In the LDA algorithm, A is a full rank matrix with class labels:

$$ {\mathbf{A}} = [a_{1} \ldots a_{n} ] = [B_{1} \ldots B_{k} ] \in {\mathbf{R}}^{m \times n} $$

Among them, each \( a_{i} (1 \le i \le n) \) is a data point in m-dimensional space. Each block matrix \( B_{i} \in {\mathbf{R}}^{m \times n} (1 \le i \le k) \) is a collection of data items in the i-th class. \( n_{i} \) is the size of class i and the total number of data items in data set \( {\mathbf{A}} \) is \( n \). Let \( N_{i} \) denote the column index belonging to class i. The global center \( c \) of \( {\mathbf{A}} \) and the local center \( c_{i} \) of each class \( {\mathbf{A}}_{i} \) are respectively expressed as follows [34]

$$ c = \frac{1}{n}Ae,\quad c_{i} = \frac{1}{{n_{i} }}B_{i} e_{i}, \quad i = 1, \ldots ,k $$


$$ S_{b} = \sum\limits_{i = 1}^{k} {n_{i} (c_{i} - c)(c_{i} - c)^{T} } $$
$$ S_{w} = \sum\limits_{i = 1}^{k} {\sum\limits_{{j \in N_{i} }}^{{}} {(a_{j} - c)(a_{j} - c)^{T} } } $$
$$ S_{t} = \sum\limits_{j = 1}^{n} {(a_{j} - c)(a_{j} - c)^{T} } $$

Among them, \( S_{b} \), \( S_{w} \) and \( S_{t} \) are called inter-class divergence matrix, intra-class divergence matrix and total divergence matrix, respectively.

$$ S_{t} = S_{b} + S_{w} $$

Then, the standard LDA objective function can look like this:

$$ G = \mathop {\arg \hbox{max} trace}\limits_{{G \in {\mathbf{R}}^{m \times 1} }} \left( {\left( {G^{T} S_{t} G} \right)^{ - 1} \left( {G^{T} S_{b} G} \right)} \right) $$

It can be seen that the LDA algorithm is essentially a linear method, so the effect is not very good when dealing with nonlinear problems, and there are singularities. In order to efficiently extract the nonlinear characteristics of the data, we use the kernel decision LDA to extract features.

The basic idea is to map the original training data samples to the high-dimensional feature space \( H \) by nonlinear transformation, and then perform linear decision analysis in \( H \). Suppose the nonlinear mapping \( \phi (X) \) maps \( X \) to the high-dimensional feature space \( H \), yielding \( \phi (X) = \{ \phi (x_{1}^{1} ), \ldots ,\phi (x_{i}^{j} ), \ldots ,\phi (x_{c}^{{N_{c} }} )\} \), where \( \phi (x_{i}^{j} ) \in H \) represents the \( x_{i}^{j} \) sample vector in \( H \). Set the kernel matrix to \( {\mathbf{K}} = \phi (X)^{T} \phi (X) = [k_{1}^{1} , \ldots ,k_{i}^{j} , \ldots ,k_{c}^{{N_{c} }} ] \), where \( k_{i}^{j} = \phi (X)^{T} \phi (x_{i}^{j} ) \). and the Fisher criterion function in \( H \) is [34]:

$$ J(w) = \frac{{w^{T} {\mathbf{S}}_{b}^{\phi } w}}{{w^{T} {\mathbf{S}}_{t}^{\phi } w}} $$

Its summary \( w \) is the kernel space projection vector.

$$ {\mathbf{S}}_{w}^{\phi } = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{N_{i} }} {(\phi (x_{i}^{j} ) - u_{i} ) - (\phi (x_{i}^{j} ) - u_{i} )^{T} } } $$
$$ {\mathbf{S}}_{w}^{\phi } = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{N_{i} }} {(\phi (x_{i}^{j} ) - u_{i} ) - (\phi (x_{i}^{j} ) - u_{i} )^{T} } } $$

where \( u_{i} \) is the average of the ith samples in \( H \), \( u \) is the total average, and \( {\mathbf{S}}_{w}^{\phi } \) is the intra-class scatter matrix. \( w \) can be expressed as:

$$ {\mathbf{w}} = \phi (X){\mathbf{a}} $$

where A = X. Then formula (8) can be expressed as:

$$ J({\mathbf{a}}) = \frac{{{\mathbf{a}}^{T} {\mathbf{K}}_{b} {\mathbf{a}}}}{{{\mathbf{a}}^{T} {\mathbf{K}}_{t} {\mathbf{a}}}} $$

Among them, \( {\mathbf{K}}_{t} \) represents the overall scatter matrix of the kernel, and \( {\mathbf{K}}_{b} \) represents the scatter matrix between kernel classes, calculated as follows [35]:

$$ {\mathbf{K}}_{w} = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{N_{i} }} {({\mathbf{k}}_{i}^{j} - {\mathbf{m}}_{i} )({\mathbf{k}}_{i}^{j} - {\mathbf{m}}_{i} )^{T} } } $$
$$ {\mathbf{K}}_{b} = \sum\limits_{i = 1}^{c} {N_{i} ({\mathbf{m}}_{i}^{{}} - {\mathbf{m}})({\mathbf{m}}_{i}^{{}} - {\mathbf{m}})^{T} } $$
$$ {\mathbf{K}}_{t} = {\mathbf{K}}_{w} + {\mathbf{K}}_{b} $$
$$ {\mathbf{m}}_{i} = \frac{1}{{N_{i} }}\sum\limits_{j = 1}^{{N_{i} }} {{\mathbf{k}}_{i}^{j} } $$
$$ {\mathbf{m}} = \frac{1}{N}\sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{{N_{i} }} {{\mathbf{k}}_{i}^{j} } } $$

where \( {\mathbf{K}}_{w} \) is a kernel class scatter matrix. Let \( {\mathbf{A}}_{\text{opt}} \) denote the feature vector of a set of optimal solutions that maximize Eq. (13). From Eq. (11) we can get the kernel space projection matrix:

$$ {\mathbf{W}}_{\text{opt}} = \phi (X){\mathbf{A}}_{\text{opt}} $$

For any sample point \( x \), its projection in kernel space is given by:

$$ {\mathbf{z}} = {\mathbf{W}}_{opt}^{T} \phi (x) = {\mathbf{A}}_{opt}^{T} \phi (X)\phi (x) $$

Proposed human motion recognition method

Motion data collection

Different from the image processing method based on spatiotemporal features, the machine learning method based on representation features used in this paper requires motion data acquisition tools with faster transmission speed and higher precision. Therefore, in the multimedia interaction scenario in the virtual reality environment, the Microsoft Corporation Kinect sensor used in the market cannot meet the accuracy requirements. Therefore, a motion data acquisition device based on an inertial sensor is employed. The specific digital performance process, in the VR interactive environment, the wearable hardware devices required for motion acquisition are shown in Fig. 4, and the hardware parameters are shown in Table 1.

Fig. 4

Device for collecting action data in a VR interactive environment. a Motion capture of the human body using inertia motion capture. b Human wears inertia motion capture device Noitom. c User wears AR glasses, shoots the real environment through a hand-held camera, and transmits the shot image to the AR glasses display screen in real time by wire. d See the surrounding environment through the cooperation of AR glasses and hand-held camera

Table 1 Sensor parameters

Motion data classification based on genetic optimization SVM

The SVM [36] parameter optimization search based on Gaussian radial kernel function is mainly analyzed. Since different penalty factor parameters \( C \) and kernel function parameters \( \sigma \) are selected, different performance SVMs will be obtained. Therefore, this genetic algorithm is used to optimize the above two parameters. Cross-product coding in genetic algorithm is based on floating-point coding [37]:

$$ X_{A}^{t + 1} = aX_{A}^{t} + (1 - a)X_{B}^{t} $$
$$ X_{B}^{t + 1} = aX_{B}^{t} + (1 - a)X_{A}^{t} $$

where \( a \) represents a random number with a range of (0, 1).

Use the uniform mutation operator to perform the mutation operation, and select a random value from the specified interval of the relevant gene value to update the original gene value for all mutation points:

$$ X^{\prime} = U_{\text{min} } + r\left( {U_{\text{max} } - U_{\text{min} } } \right) $$

where \( r \) is a random number with a range of (0, 1), Umax is the upper limit of the gene position, and Umin is the lower limit of the gene position [27]. The fitness function is:

$$ f = \frac{b}{1 + E} $$

where \( E \) represents the sum of squared errors and \( b \) represents a constant.The main idea of the improved SVM is to optimize the penalty factor parameter \( C \) and the kernel function parameter \( \sigma \) of the SVM through a genetic algorithm.

Human motion recognition realization

The main steps proposed to realize human motion recognition are shown in Fig. 5. The main part of the pre-step process is to search for the optimal parameters required by the SVM, mainly using the global search capability of the genetic algorithm, thereby improving the SVM classification performance. The specific steps are as follows:

Fig. 5

Main steps of the proposed motion recognition method

  • Step 1. Collect human motion data.

  • Step 2. Perform kernel matrix feature extraction based on LDA algorithm.

  • Step 3. Search for SVM parameters according to the genetic algorithm and determine whether it is optimal.

  • Step 4. If the parameter is the optimal parameter, the search is completed and recorded. If the non-optimal parameters continue to search.

  • Step 5. Classify based on the optimized SVM classifier and output the classification result.

Experimental analysis and comparison in VR environment

Experimental environment

The experimental data is divided into real-time motion acquisition data based on inertial sensors, which is 20G in total. The experimental data set contains 10 types of actions, and the complexity increases in turn. The system structure of the VR multimedia art scene is shown in Fig. 6. The hardware and software parameters of the experimental environment are shown in Table 2. The relevant parameters of the test algorithm are: population size is 50, maximum iteration algebra is 30, crossover probability is 0.8, mutation probability is 0.007, b = 1000, α = 0.5, r = 0.2.

Fig. 6

Virtual reality system structure

Table 2 Software and hardware parameters of the experimental environment

Evaluation indicators

In order to quantify the performance of the proposed method, the four most commonly used evaluation indicators in the action classification field are selected [38,39,40]: Precision, Accuracy, Specificity and Sensitivity, the calculation of the four is as follows:

$$ {\text{Precision}} = \frac{TP}{TP + FP} $$
$$ {\text{Accuracy}} = \frac{TP + TN}{TP + TN + FN + FP} $$
$$ {\text{Specificity}} = \frac{TN}{FP + TN} $$
$$ {\text{Sensitivity}} = \frac{TP}{TP + FN} $$

where \( TP \) represents the number of positive samples correctly classified, TN represents the number of negative samples correctly classified, \( FP \) represents the number of positive samples of the wrong classification, and \( FN \) represents the number of negative samples of the incorrect classification (Table 3).

Table 3 Motion types

Experimental results

In the experiment, using the recognition test data, 10 dance motion types are obtained, as shown in Fig. 3. The recognition performance results of the 10 types of dance motion are shown in Table 4. The LDA-GA-SVM algorithm proposed in this paper is compared with the K-means-SVM algorithm [27]. It can be seen from Table 4 that the proposed algorithm increases the average of the Precision and Accuracy indicators by 4.401% and 4.903%, respectively. From the comparison chart of Figs. 7 and 8, the LDA-GA-SVM algorithm results. The Precision and Accuracy indicators of each test point are higher than the K-means-SVM algorithm and are relatively smooth and stable. That is to say, the LDA-GA-SVM algorithm proposed in this paper shows excellent performance in 10 motion type recognition. This is because the adopted genetic algorithm has certain advantages in multi-dimensional space optimization and has a good global search ability. In addition, the proposed algorithm achieves a more balanced result on both the specificity and Sensitivity. The specificity and Sensitivity mean values of the two algorithms are 90.833%, 92.128%, 92.78%, and 94.006%, respectively. From the comparison in Fig. 10, it can be seen that the Sensitivity index curves of the two algorithms are gradually separated over time, and It can be seen from Figs. 9 and 10 that the index values of the LDA-GA-SVM algorithm are higher than the K-means-SVM algorithm, that is, the sensitivity of the LDA-GA-SVM algorithm is higher. This is due to the use of the nuclear decision LDA feature extraction to solve the nonlinear problem of the traditional LDA and expand the sample difference, so that the performance is more stable. Therefore, in summary, from the precision, accuracy, specificity and sensitivity, the LDA-GA-SVM algorithm proposed in this paper is superior to K-means-SVM algorithm can solve the problem of motion recognition in digital performance of VR environment.

Table 4 Comparison of experimental results of motion recognition (%)
Fig. 7

Comparison of the precision of the two algorithms

Fig. 8

Comparison of the accuracy of the two algorithms

Fig. 9

Comparison of the specificity of the two algorithms

Fig. 10

Comparison of the sensitivity of the two algorithms


In this paper, we combine the kernel decision LDA algorithm with the genetic optimization-based SVM algorithm to achieve human motion classification and recognition. In order to improve the accuracy of human motion recognition in VR human–computer interaction applications. Introducing a kernel function in LDA for nonlinear projection to map training samples into a high-dimensional subspace, and obtaining the best classification feature vector, effectively solving the nonlinear problem and expanding the sample difference and reducing the dimensionality of the vector space operating efficiency. In addition, the genetic algorithm is used to optimize the parameter search of SVM. The experimental results verify the effectiveness and advancement of the proposed method. However, the real-time performance of the algorithm in sample training and testing remains to be studied, and the complexity and scalability of the proposed algorithm will be further studied.

Availability of data materials

Not applicable.



virtual reality


linear discriminant analysis


support vector machine


genetic algorithm


linear discriminant analysis-genetic algorithm-support vector machine algorithm


K-means Clustering-Support Vector Machine Algorithm


  1. 1.

    Wu TY, Chen CM, Wang KH, Wu JM (2019) Security analysis and enhancement of a certificateless searchable public key encryption scheme for IIoT environments. IEEE Access 7:49232–49239

    Article  Google Scholar 

  2. 2.

    Chen CM, Wang KH, Yeh KH, Xiang B, Tsu-Yang W (2019) Attacks and solutions on a three-party password-based authenticated key exchange protocol for wireless communications. J Ambient Intell Humaniz Comput 10(8):3133–3142

    Article  Google Scholar 

  3. 3.

    Pan JS, Kong L, Sung TW, Tsai PW, Snasel V (2018) alpha-Fraction first strategy for hierarchical wireless sensor networks. J Internet Technol 19(6):1717–1726

    Google Scholar 

  4. 4.

    Xiong H, Qin Z (2015) Revocable and scalable certificateless remote authentication protocol with anonymity for wireless body area networks. IEEE Trans Inf Forensics Secur 10(7):1442–1455

    Article  Google Scholar 

  5. 5.

    Ni L, Tian F, Ni Q, Yan Y, Zhang J (2019) An anonymous entropy based location privacy protection scheme in mobile social networks. EURASIP J Wirel Commun Netw 2019:93

    Article  Google Scholar 

  6. 6.

    Xiong H, Zhao Y, Peng L, Zhang H, Yeh KH (2019) Partially policy-hidden attribute-based broadcast encryption with secure delegation in edge computing. Future Gener Comput Syst 97:453–461

    Article  Google Scholar 

  7. 7.

    Chen CM, Xiang B, Liu Y, Wang KH (2019) A secure authentication protocol for internet of vehicles. IEEE Access 7(1):12047–12057

    Article  Google Scholar 

  8. 8.

    Wu TY, Chen CM, Wang KH, Meng C, Wang EK (2019) A provably secure certificateless public key encryption with keyword search. J Chin Inst Eng 42(1):20–28

    Article  Google Scholar 

  9. 9.

    Pan JS, Lee CY, Sghaier A, Zeghid M, Xie J (2019) Novel systolization of subquadratic space complexity multipliers based on toeplitz matrix-vector product approach. IEEE Trans Very Large Scale Integr Syst 27(7):1614–1622

    Article  Google Scholar 

  10. 10.

    Gan W, Lin JC, Fournier-Viger P, Chao HC, Philip SY (2019) HUOPM: high-utility occupancy pattern mining. IEEE Trans Cybern.

    Article  Google Scholar 

  11. 11.

    Lin JC, Zhang Y, Zhang B, Fournier-Viger P, Djenouri Y (2019) Hiding sensitive itemsets with multiple objective optimization. Soft Comput.

    Article  Google Scholar 

  12. 12.

    Lin JC, Fournier-Viger P, Wu L, Gan W, Djenouri Y, Zhang J (2018 ) PPSF: an open-source privacy-preserving and security mining framework. In: IEEE international conference on data mining workshops (ICDMW), pp. 1459–1463, 17–20 Nov. 2018, Singapore

  13. 13.

    Lin JC, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238

    Article  Google Scholar 

  14. 14.

    Wu JMT, Lin JCW, Tamrakar A (2019) High-utility itemset mining with effective pruning strategies. ACM Trans Knowl Discov Data.

    Article  Google Scholar 

  15. 15.

    Zhao Z, Li C, Zhang X, Chiclana F, Viedma EH (2019) An incremental method to detect communities in dynamic evolving social networks. Knowl Based Syst 163:404–415

    Article  Google Scholar 

  16. 16.

    Wang X, Ji S, Liang Y, Leung H, Chiu DKW (2019) An unsupervised strategy for defending against multifarious reputation attacks. Appl Intell.

    Article  Google Scholar 

  17. 17.

    Wang J, Gu X, Liu W, Sangaiah AK, Kim HJ (2019) An empower hamilton loop based data collection algorithm with mobile agent for WSNs. Hum Centric Comput Inf Sci 9:18

    Article  Google Scholar 

  18. 18.

    Wang J, Gao Y, Wang K, Sangaiah AK, Lim SJ (2019) An affinity propagation based self-adaptive clustering method for wireless sensor networks. Sensors 19(11):2579

    Article  Google Scholar 

  19. 19.

    Austin PD, Siddall PJ (2019) Virtual reality for the treatment of neuropathic pain in people with spinal cord injuries: a scoping review. J Spinal Cord Med.

    Article  Google Scholar 

  20. 20.

    Huang C, Zhang Y, Zhu C, Zhang C, Meng H (2019) Chinese sports basketball teaching tactics training system combined with multimedia interactive model and virtual reality technology. Multimed Tools Appl.

    Article  Google Scholar 

  21. 21.

    Hagelsteen K, Johansson R, Ekelund M, Bergenfelz A, Anderberg M (2019) Performance and perception of haptic feedback in a laparoscopic 3D virtual reality simulator. Minim Invasive Ther Allied Technol.

    Article  Google Scholar 

  22. 22.

    Zhang Fuquan, Ding Gangyi, Lin Qing, Lin Xu, Li Zuoyong, Li Lijie (2018) Research of Simulation of Creative Stage Scene Based on the 3DGans Technology. J Inf Hiding Multimed Signal Process 9(6):1430–1443

    Google Scholar 

  23. 23.

    Merchant Zahira, Goetz Ernest T, Cifuentes Lauren, Keeney-Kennicutt Wendy, Davis Trina J (2014) Effectiveness of virtual reality-based instruction on students’ learning outcomes in K-12 and higher education: a meta-analysis. Comput Educ 70:29–40

    Article  Google Scholar 

  24. 24.

    Zhang F, Ding G, Ma L, Zhu Y, Li Z, Xu L (2018) Research on stage creative scene model generation based on series key algorithms. In: Zhao Y, Wu TY, Chang TH, Pan JS, Jain L (eds) Advances in smart vehicular technology, transportation, communication and applications, vol 128. VTCA. Smart Innovation, Systems and Technologies, Springer, pp 170–177

    Google Scholar 

  25. 25.

    Riecke BE, Veen HA, Bülthoff HH (2015) Visual homing is possible without landmarks: a path integration study in virtual reality. Presence Teleoperators Virtual Environ 11(5):443–473

    Article  Google Scholar 

  26. 26.

    Zhang F, Ding G, Lin X, Chen B, Li Z (2018) An effective method for the abnormal monitoring of stage performance based on visual sensor network. Int J Distrib Sens Netw 14(4):1–11

    Google Scholar 

  27. 27.

    Shi X, Liu Y, Zhang D (2015) Human body motion recognition method based on key frames. J Syst Simul 27(10):2401–2408

    Google Scholar 

  28. 28.

    Qin QIN, Yanwei LI (2014) Real-time recognition system of human gestures based on DSP[J]. Electron Technol Appl 40(7):75–78

    Google Scholar 

  29. 29.

    Zhang F, Wu TY, Zheng G (2019) Video salient region detection model based on wavelet transform and feature comparison. EURASIP J Image Video Process.

    Article  Google Scholar 

  30. 30.

    Zhang R, Cao S (2019) Real-time human motion behavior detection via CNN using mmWave radar. IEEE Sensors Lett 3(2):3500104

    Google Scholar 

  31. 31.

    Li Z, Zheng Z, Lin F, Leung H, Li Q (2019) Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN. Multimed Tools Appl 78(14):19587–19601

    Article  Google Scholar 

  32. 32.

    Murad A, Pyun J-Y (2017) Deep recurrent neural networks for human activity recognition. Sensors 17(11):2556

    Article  Google Scholar 

  33. 33.

    Li C, Lu Y, Wu J, Zhang Y, Xia Z, Wang T, Yu D, Chen X, Liu P, Guo J. LDA meets Word2Vec: a novel model for academic abstract clustering. In: International World Wide Web Conferences, in the 2018 web conference companion (WWW 2018). April 23–27, 2018, Lyon, France, ACM, New York, pp 1699–1706

  34. 34.

    Yu Y, Pan Z, Hu G, Mo X, Xue J (2016) Kernel dimensionality reduction method based on KLDA. J Univ Sci Technol China 9:749–756

    Google Scholar 

  35. 35.

    Zamani B, A A, Nasersharif B (2014) Evolutionary combination of kernels for nonlinear feature transformation. Inf Sci 274:95–107

    MathSciNet  Article  Google Scholar 

  36. 36.

    Jindal A, Dua A, Kaur K, Singh M, Kumar N, Mishra S (2016) Decision tree and svm-based data analytics for theft detection in smart grid. IEEE Trans Ind Inform 12(3):1005–1016

    Article  Google Scholar 

  37. 37.

    Aslahi-Shahri BM, Rahmani R, Chizari M, Maralani A, Eslami M, Golkar MJ, Ebrahimi A (2016) A hybrid method consisting of GA and SVM for intrusion detection system. Neural Comput Appl 27(6):1669–1676

    Article  Google Scholar 

  38. 38.

    Rostami A, Masoudi M, Ghaderi-Ardakani A, Arabloo M, Amani M (2016) Effective thermal conductivity modeling of sandstones: SVM framework analysis. Int J Thermophys 37(6):59

    Article  Google Scholar 

  39. 39.

    Narang S, Best A, Feng A, Kang S, Manocha D, Shapiro A (2017) Motion recognition of self and others on realistic 3D avatars. Comput Anim Virtual Worlds.

    Article  Google Scholar 

  40. 40.

    Wu JM, Tsai MH, Huang YZ, Islam SH, Hassan MM, Alelaiwi A, Fortino G (2019) Applying an ensemble convolutional neural network with Savitzky-Golay filter to construct a phonocardiogram prediction model. Appl Soft Comput 78:29–40

    Article  Google Scholar 

Download references


The authors thank the handled editor for a great support and all reviewers’ careful reviewing and constructive suggestions.

Authors’ information

Fuquan Zhang received the Ph.D. degree in School of Computer Science & Technology, Beijing Institute of Technology, China in 2019. Currently, he is a professor of Minjiang University, China. He has received silver medal of the 6.18 cross strait staff innovation exhibition, gold medal of nineteenth National Invention Exhibition in 2010. In 2012, his proposed project has won the gold award of the seventh international invention exhibition. He was awarded the “top ten inventor of Fuzhou” honorary title by Fuzhou, China. He is now a director of Fujian Artificial Intelligence Society. His research interests include artificial intelligence and computer vision.

Tsu-Yang Wu received the Ph.D. degree in Department of Mathematics, National Changhua University of Education, Taiwan in 2010. Currently, he is an associate professor in College of Computer Science and Engineering, Shandong University of Science and Technology, China. In the past, he is an assistant professor in Innovative Information Industry Research Center at Shenzhen Graduate School, Harbin Institute of Technology. He serves as executive editor in Journal of Network Intelligence and as associate editor in Data Science and Pattern Recognition. His research interests include artificial intelligence and information security.

Jeng-Shyang Pan received the Ph.D. degree in Electrical Engineering from the University of Edinburgh, U.K. in 1996. Currently, he is the Director of the Fujian Provincial Key Lab of Big Data Mining and Applications, the Dean in College of Information Science and Engineering, and an Assistant President at Fujian University of Technology, China. He is the IET Fellow, UK and was offered Thousand Talent Program in China in 2010. His research interests include artificial intelligence, pattern recognition, and computer vision.

Gangyi Ding is professor, doctoral tutor. He received the Ph.D. degree from Beijing Institute of Technology, China in 1993. In December 2008, he served as Dean of the School of Software, Beijing Institute of Technology. He was hired as a member of the General Technology Department’s Simulation Technology Expert Group, Vice Chairman of the China Computer Simulation Association, Editor of the Computer Simulation Magazine, Member of the Quality and Reliability Expert Group of the National Defense Science and Technology Commission, National 863 Information Technology Specialist, Beijing Multimedia Public Service platform experts, etc. In 2011, as the leader, the Ministry of Education approved the “Digital Performance” of the Ministry of Education to set up an interdisciplinary discipline. In 2008, he was awarded the title of Olympic Liberation Model, Beijing Mass Economic and Technological Innovation Model, and Beijing Education Innovation Model Award by the Beijing Federation of Trade Unions. In 2009, he was awarded the “Top Ten Capital Education News Figures”. In 2010, he was awarded the title of Beijing Advanced Worker. He won the “Support for Contribution Unit Award” and “Innovation Achievement Award” for the National Day of the Capital.

Zuoyong Li Ph.D., Professor, Executive Deputy Director of Information Processing and Intelligent Control Key Laboratory of Fujian Province, Director of E-health Research Center of Internet Innovation Institute of Minjiang College, and Executive Director of Fujian Artificial Intelligence Society. In July 2010, he received a Ph.D. degree in computer application from Nanjing University of Science and Technology. He is mainly engaged in image processing, pattern recognition, and machine learning. Selected as the 2013 Outstanding Youth Research Talents Cultivation Program of Fujian Province and the 2015 New Century Excellent Talents Supporting Program of Fujian Province University. In 2015, he was selected as the Young Scholar Program of Minjiang College, and won the 2013 Fuzhou Education System Advanced Worker and Fuzhou City in 2014. The title of advanced educator.


This work was supported by the Research Program Foundation of Minjiang University under Grants No. MYK17021, MYK18033, MJW201831408, and No. MJW201833313 and supported by the Major Project of Sichuan Province Key Laboratory of Digital Media Art under Grants No. 17DMAKL01 and supported by Fujian Province Guiding Project under Grants No. 2018H0028. We also acknowledge the solution from National Natural Science Foundation of China (61772254 and 61871204), Key Project of College Youth Natural Science Foundation of Fujian Province (JZ160467), Fujian Provincial Leading Project (2017H0030), Fuzhou Science and Technology Planning Project (2016-S-116), Program for New Century Excellent Talents in Fujian Province University (NCETFJ) and Program for Young Scholars in Minjiang University (Mjqn201601).

Author information




FZ and TYW design the flowchart and main algorithms. Meanwhile, they finish the revise works. JSP designs the experimental environment. GD analyzes the previous related works. ZL analyzes the experimental results. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tsu-Yang Wu.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, F., Wu, T., Pan, J. et al. Human motion recognition based on SVM in VR art media interaction environment. Hum. Cent. Comput. Inf. Sci. 9, 40 (2019).

Download citation


  • Human motion recognition
  • Virtual reality
  • Interactive technology
  • Support vector machine
  • Linear decision