3D face recognition: a survey

3D face recognition has become a trending research direction in both industry and academia. It inherits advantages from traditional 2D face recognition, such as the natural recognition process and a wide range of applications. Moreover, 3D face recognition systems could accurately recognize human faces even under dim lights and with variant facial positions and expressions, in such conditions 2D face recognition systems would have immense difficulty to operate. This paper summarizes the history and the most recent progresses in 3D face recognition research domain. The frontier research results are introduced in three categories: pose-invariant recognition, expression-invariant recognition, and occlusion-invariant recognition. To promote future research, this paper collects information about publicly available 3D face databases. This paper also lists important open problems.

stages that are identical to the stages in the training phase. In the feature matching stage, the features of the target face are compared with the faces stored in the feature database and calculate the match scores. When a match score is sufficiently high, we would claim that the target face is recognized.

3D face acquisition
The acquisition of 3D face samples involves special hardware equipments, which could be categorized as active acquisition systems and passive acquisition systems according to the technologies used. The active acquisition systems actively emit non-visible light, e.g. infrared laser beams, to illuminate the target human face. Then the systems measure the reflection to determine the shape features of the target face. According to the different types of illumination methods, the active acquisition systems could be further categorized as triangulation-based and structured light based. As shown in Fig. 2a, Minolta vivid scanner is an example of triangulation-based 3D scanning system. The scanner measures the emitting and the receiving angles of the laser beam, and then use triangulation methods to determine the exact point of reflection. As the laser beam scanning through the face, a precise map is formed by calculating and grouping many reflection points. The triangulation based systems trade the scanning speed for the precision. It Fig. 1 A general 3D face recognition system [6] Fig. 2 Three popular 3D scanners [11]. a Depth image, b point cloud, c mesh would require the target man to hold still for several minutes before a 3D face map could be acquired [7]. Therefore this technology is infeasible for the 3D video recording. Compared with the triangulation based systems, the structured light based systems are more popular in consumer level 3D face acquisition. Figure 2b shows a Microsoft Kinect, which emits a light pattern, such as a light grid, to the target face. It then measures the deformation of the light pattern to calculate the surface shape. The structured light based systems offer much faster measurements than the triangulation based systems. However, the structured light measurements often contain holes and artifacts so that the acquired 3D face data are less precise than the triangulation data [8]. Figure 2c shows a Bumblebee XB3, which is a passive acquisition system [9]. It contains several cameras that are placed apart from each other. The system matches points observed from different camera and calculates the exact 3D location of the matched point [10]. The set of the matched points forms the 3D face. Systems like Bumblebee XB3 are often called stereoimaging systems. Such systems relied on good visible light conditions and usually deliver less precise 3D face data than active 3D face acquisition systems.

Preprocessing
Acquired 3D face data cannot be directly used as the inputs of feature extraction algorithms because the data contain the human faces, but also many distracting features such as hair, ear, neck, eye glasses, and jeweleries. It is true that when us human beings identify each other, these features could be helpful. However, computers are not as intelligent as us at least for now. Features like hair, eye glasses, and jeweleries could be changed from time to time. Ear and neck features are not reliably identifiable for different head poses. These features could be misleading to the current state-of-the-art 3D face recognition algorithms and therefore should be removed before feature extraction.
The first step of preprocessing is to detect the position and orientation of human face. Geometric transformations are used to "turn" the human face to directly against the camera axis. Then the preprocessing uses the help from clearly identifiable facial parts such as nose to isolate the human face area out from areas of the distracting features. This operation is called segmentation.
The preprocessed facial data samples are often interpreted in three model formats: depth image, point cloud, and mesh, as shown in Fig. 2. Note that the three model formats are not one to one corresponding to the three popular 3D scanners. They are formats to represent 3D face data.

Feature extraction, feature database, and feature matching
The most straightforward school of feature extraction is to take the entire face as a single feature vector, which is called the global approach [12]. In this approach, the entire face is stored in the database. In the feature matching stage, the target face is compared with faces in database using statistical classification functions [9]. Opposed to the global approach, the component based approach focuses on the local facial characteristics such as nose and eyes. It uses graph operators to extract the nose and eyes part and store these local features in the database. When a target face is inputed for recognition, the component based approach first extract the corresponding parts from the target faces and then searching the matched set of parts from the feature database [13]. There are hybrid approaches that combine the features used by the global approaches and the local approaches. With more computational cost, the hybrid approach could achieve better recognition accuracy [14].

Methodology
In 3D face recognition system, the selection of feature extraction and matching methods is very important. Both global and local approach have been extensively investigated in the literature and summarized in Table 1.

Performance metrics
This paper proposes the following indications about the performance measures for 3D face recognition tasks.
The notation used for evaluation is as follows: • TP-the number of samples for the prediction of the positive class as the positive class. • FN-the number of samples of positive class is predicted to be negative class.
• FP-the number of samples whose negative class is predicted as positive class.
• TN-the number of samples of negative class is predicted to be negative class.
Among them, True and False indicate correct and wrong classification, Positive and Negative samples.
The calculation metrics are as follows: Accuracy refers to the ratio between the number of samples correctly classified by the classifier and the total number of samples for a given test data set, which reflects the judging ability of the classifier to the entire sample. In other words, it can determine the positive value and the negative value.
Error rate is the opposite of accuracy rate.
Precision refers to the proportion of true positive samples in the samples judged as positive by the classifier, that is, how many of all samples judged as positive by the classifier are true positive samples. Recall refers to the proportion of the positive samples correctly judged by the classifier in the total positive samples, that is, how many of all the positive samples are classified by the classifier as positive samples.
Fbeta-score is the harmonic mean of precision and recall.
The value of β ( β > 0) reflects the relative importance of precision and recall in performance evaluation. When β = 1, the commonly used F 1 (F measure) value indicates that precision is as important as recall.
The value of F 1 (F measure) is also known as Balanced f-score. When both accuracy and recall are high, the value of F 1 (F measure) is also high.
The rest of this paper is organized as follows: "History of face recognition research" section introduces significant research results of 3D face recognition in an chronicle order. This help establish a bird-view on this research area. "Domain research problems" section analyzes current researches and summarize them into domain research problems. "Research on 3D face databases" section collects the up-todate information about public 3D face databases, which could facilitate future researches. "Research on pose-invariant 3D face recognition" section reviews the technologies that could mitigate the pose variation problem for 3D face recognition. "Research on expression-invariant 3D face recognition" section surveys the technologies that could accurately recognize human faces in different expressions such as laughing or crying, using 3D face information. "Research on occlusion-invariant 3D face recognition" section reviews the 3D face recognition technologies that could work when the target faces are partially blocked. "Open problems and perspectives" section suggest significant problems that are still waiting to be solved in 3D face recognition area. "Conclusions and discussion" section concludes this paper.

History of face recognition research
Research in face recognition can be dated back to 1960s [15]. From 1964 to 1966 Woodrow W. Bledsoe, along with Helen Chan and Charles Bisson of Panoramic Research, researched programming computers to recognize human faces. Their program asks the administrator to locate the eyes, ears, nose and mouth in the photo. Then, the reference data can be use comparison with the distance and measures. However, because of inconvenience, this work has not received much recognition. Peter Hart at the Stanford Research Institute continued this research, and found optimistic results when using a set of images instead of a set of feature points. Since then, there have been many researches following on this subject and a substantial amount of efforts have been made to find the optimal face recognition method. In the 1970s, Goldstein, Harmon, and Lesk used 21 specific subjective markers such as hair color and lip thickness to automatically identify human faces. The attempt obtained good recognition accuracy. However, the feature measurement and locationing are manually calculated. It is impractical to apply this method to many faces. In 1991, Turk and Pentland proposed a method of using principal component analysis (PCA) to handle face data [16]. This is called the eigenface algorithm which is already become a golden standard for face recognition. Later, inspired by eigenface, a large number of such algorithms were proposed [17][18][19].
In 1997, Christoph von der Malsburg designed a system that can identify people in photos when the photos are not clear [20]. Followed this work, the research of face recognition diverged into two paths. Face recognition by 3D view is proposed and implemented in systems such as Polar and FaceIt [21].
Although 2D face recognition has achieved considerable success, but the accuracy is still significantly affected by changes in pose and illumination conditions [14,22]. Many researchers have turned to 3D face recognition because its potential capabilities to overcome the inherent limitations and drawbacks of 2D face recognition. Moreover, the geometric information provided by 3D face data may result in higher recognition accuracy than the 2D case when the pose and illumination conditions are the same [3,4].
In the late 1980s, [23] used curvature-based methods to test on a small 3D face database, and reached 100% recognition accuracy. In 1996, Gordon's face recognition experiments showed that combining frontal and side views can improve the recognition accuracy [24]. After that, more and more 3D face recognition research has been proposed, becuase of the increasing availability of 3D scanning equipments (mainly based on laser and structured light technology).
In 2012, deep learning was first used to analyze and process three-dimensional face images for face recognition [25]. Compared with the traditional method, Deep Convolutional Neural Networks (DCNN) has a great advantage in the processing of image and video, whereas Recurrent Neural Network (RNN) also shows a very good performance in processing continuous data such as voice and text [26]. By using deep learning to train large-scale face datasets, the recognition accuracy of 2d face recognition has been significantly improved [27]. The method of deep learning needs to large datasets to learn face features and be able to depict rich internal information of data. Large-scale 2D face datasets can be obtained from the Internet. Compared 2D face dataset, training discriminative deep features for 3D face recognition is very difficult due to the lack of large-scale 3D face datasets [27]. In order to solve this problem, Kim et al. [27] proposed using the existing trained 2D face model, and adjust a small amount of 3D face datasets to 3D surface matching. Also, [28] proposed a method for generating a large corpus of labeled 3D face identities and their multiple instances for training and a protocol for merging the most challenging existing 3D datasets for testing. They also proposed the first deep CNN model designed specifically for 3D face recognition and trained on 3.1 million 3D facial scans of 100,000 identities. The proposed training and test datasets are several orders of magnitude larger than previously existing 3D datasets reported in the literature. Based on the 3D datasets, FR3DNet algorithm has been proposed and achieved great accuracy in closed and open world recognition scenarios [28].
In [14], many identification techniques were surveyed. Face recognition can be divided into three categories based on feature extraction methods used in the identification process: global approach, component-based approach and hybrid approach. In the global approach, the entire face is used as a single feature vector for feature classification. The component-based approach mainly analyzes the local facial features such as nose and eyes. The hybrid approach uses both global and local features. The hybrid approach is very effective when the face is frontal and the expression does not change.

Domain research problems
Compared with other popular biometric identification technologies such as fingerprint, iris and retina based recognition, face recognition can identify a person at greater distance. Therefore, it can be applied to various application scenarios such as crowd monitoring and border control. In many of these application scenarios, the 2D face images cannot be accurately recognized due to variations in facial expressions, head pose, occlusion and other factors. Any of these adversary factors could lead to a sharp decrease in recognition efficiency [29].
In 1999, Blanz and Vetter proposed the 3D deformation model (3DMM) synthesis technique and then use this model for 3D face recognition [30]. However, due to the technical limit of the 3D scanning technology at the time, their 3D deformation model was reconstructed from 2D images. It takes a large amount of computation to reconstruct the 3D model. Many researchers agree that 3DMM play an important role in face recognition, but the computational complexity of the reconstruction process hinders its applicability [14,[31][32][33]. In 2003, Blanz and Vetter proposed to combine 3DMM with 2D image matching technology in order to recognize faces with various head orientations [34]. Unlike [30], their algorithm automatically evaluates all 3D scene parameters, including the position and orientation of the head. Through this new initialization process, the robustness and reliability of the face recognition system is significantly improved. It is noteworthy that the 2D image synthesized 3D facial model is a compromise when fast 3D scanning technology is not available. As soon as people can directly scan 3D face data, models like 3DMM is no longer in active research.
In 2003, Wu et al. [35] posposed 3D face recognition by extracting multiple horizontal profiles from the facial range data. One pitfall of this method is, the recognition accuracy would decrease significantly when the head pose changes. In [1], Zhang compared the methods and algorithms for 3D face recognition under pose variations, and tests the maximum angle that can be recognized when pose changes. For example, when the face is registered from front and the face model is extracted using the LBP algorithm in [29], an acceptable recognition accuracy could be retained at a maximal face rotation of 60°. Our paper also compares the influence of 2D images and 3D models on recognition performance under changes in head pose. Experiments have shown that 3D models are better tolerant to pose changes than 2D models. We summarized this type of research in "Research on pose-invariant 3D face recognition" section.
Chua et al. [36] use point signatures in 3D facial recognition. In order to deal with changes in facial expressions, only the rigid part of the face (below the forehead and above the nose) is used. The point signature is also used to locate the reference point in the standardized face model. The images used in the experiment were obtained from the different expressions of 6 subjects, and recognition rate was 100%. The principal component analysis (PCA) method explored by Hesher et al. [37] uses different numbers of feature vectors and image sizes. The image data set used has 37 subjects, each containing 6 different facial expressions. Using multiple images in the gallery improves the recognition accuracy [38]. Moreno et al. [39] segment the 3D face model using Gaussian curvature and then created a feature vector based on the segmented region for the recognition. This method achieved 78% recognition accuracy in a dataset of 420 faces from 60 people with different facial expressions. Our paper summarizes this type of research in "Research on expression-invariant 3D face recognition" section.
When the face is partially blocked, the recognition accuracy would suffer. In [40,41], Martinez et al. divided the face model into small areas and proposed a probabilistic approach to match each area locally. The matching results are then combined for the face recognition. Colombo and Cusano [42] propose to recover the blocked part through algorithms and then use the recovered face data in recognition. This method is also useful when people have decorative objects on their face such as scarf, hat, or eye glasses. Our paper summarizes this type of research in "Research on occlusion-invariant 3D face recognition" section.
In this paper, we will review the latest solutions and the results achieved from the three classes of face recognition research introduced in sections above. Because these researches are all based on some 3D face datasets. In the following sections, we will firstly summarize the current publicly available 3D face database, including the data type of each database, the number of people being collected, the number of scanned images collected, as well as variations in pose, expression, and occlusion.

Research on 3D face databases
There are many large-scale 2D face databases in the world. These databases provide a common platform to evaluate and compare 2D face recognition algorithms. 3D face databases are less common and smaller in scale. Before 2004, there were few publicly available 3D face databases. In recent years, many research institutes have established different kinds of 3D face databases to test and evaluate their own methods for 3D face recognition. Listed below are some of the published 3D databases (see Table 1) that compare different types of data formats, the number of faces, the number of models, and the types of scanning devices. Tables 2, 3 and 4 show the 3D databases constructed specifically for recognition algorithms that could adapt to the expression variation, the pose variation, and the occlusion variation.
The FRGC [43] database (as shown in Fig. 3c) has tremendous influence on the development of 3D face recognition algorithms. It is widely accepted as a standard reference database to evaluate the performance of 3D face recognition algorithms.   [44] (as shown in Fig. 3a). There are 2500 3D scans from 100 individuals using the stereo photography technique. This database contains 6 types of expressions: anger, happiness, sadness, surprise, disgust, and fear. Each type of expression is further tagged with four different levels.  As shown in Fig. 3b, Bosphorus [45] database contains 3D face images with variations on expressions, head poses, and different types of occlusion. This database is based on 4666 3D scan images of 105 individuals and was scanned using an Inspeck Mega Capturor II 3D scanner.
As shown in Fig. 3e, the ND-2006 dataset [46] was the largest 3D face dataset at the time of publication, and it was also a superset of FRGC v2.0. It contains 13,450 3D scan images with 6 different expression tags (neutral, happy, sad, surprised, disgusted, etc.) and was scanned using a Minolta Vivid 910 range scanner. There were a total of 888 different people had been scanned. Each person had been scanned multiple times. The most scanned person appeared 63 times in the database.
The Texas 3D Face Recognition Database (Texas 3DFRD) [47], shown in Fig. 3d, is a set of 1149 pairs of face texture descriptions and scanned images using the MU-2 stereo imaging system. The database includes 105 adult subjects.
BJUT-3D is a large Chinese Face 3D Face Dataset [44] (as shown in Fig. 3f ) which includes 500 Chinese people as the subjects. 250 women and 250 men registered their 3D face data in the database. High-resolution human 3D facial data are scanned using a CyberWare 3030 RGB/PS laser scanner.
As shown in Fig. 3g, the CASIA dataset [48] was tested in 2004 using a non-contact 3D digitizer Minolta Vivid 910 range scanner for 4624 scans of 123 people. The data set not only considers single changes in pose, expression, and lighting, but also changes in expression under the same lighting and pose changes under the same expression.
3D-TEC (3D Twins Expression Challenge (3D-TEC) Data Set) [49], this dataset contains 3D facial scans of 107 pairs of twins, that is 214 people, each with a smile and a neutral expression for a total of 428 scans. Although this data set is ten times smaller than the FRGC v2.0 data set, it is still very representative, because it includes twins with different expressions. This database will help promote the development of 3D face recognition technology.
In contrast to 2D face images, 3D models contains the geometry information and are insensitive to pose and lighting changes [50,51]. There are two kinds of acquisition techniques for acquiring 3D face models: the active acquisition technologies and the passive acquisition technologies. Examples of the active acquisition technologies include triangulation and structured light. The most typical passive acquisition system is a stereo camera [9]. In active acquisition techniques, such as the Minolta Vivid scanners (shown in Fig. 4a), triangulation technology is used. The scanner emits laser light on the face and then uses the camera to record the image of the light spot. Once the center pixel of the point is calculated, the position of the laser spot is determined by the triangle formed by the laser spot, camera, and laser emitter. The effective range of the triangulation technique could be a few meters with the accuracy of several millimeters. However, the triangulation process could be time-consuming. The scanner has to reconstruct the 3D face model point by point. Using structured light technology, such as the Microsoft Kinect (shown in Fig. 4b), the scanner projects a pattern onto the face surface, and then a camera captures the pattern deformed by the face contour. The shape of the face is calculated based on the deformation of the pattern. Structured light can acquire 3D face data in real time, but the acquired data may contain a large number of holes and artifacts. For a typical passive acquisition system, such as the Bumblebee XB3 (shown in Fig. 4c), the scanner uses two (or more) cameras to take pictures for the face from different angles. The system uses algorithms to match feature points in different pictures and then calculates the exact position of the feature points with the triangulation algorithm. Multiple feature points are calculated simultaneously and then used to reconstruct the 3D face model [10,52]. The main pitfall of the stereoscopic system is the relatively low resolution of the reconstructed 3D face scans.
3D face recognition algorithms have different performances on different 3D face databases. Many methods are implemented on a specific 3D face database, and performance on other databases may vary.

Research on pose-invariant 3D face recognition
As shown in Fig. 5, in 3D face recognition, the change of head poses can substantially affect the accuracy of 3D face recognition. Many 3D face recognition systems rely on the front face model. Once the head is not upright or the face orientation is rotated away from the front-facing pose, the system would have difficulty to match the face scan with the preset face models.
As early as 2003, Song et al. [54] proposed a 3D face recognition method which could stand with large head deflection. The method depends on the geometric information of the feature points on the face to "adjust" the head pose in the scanned image. Figure 6 briefly shows the extraction of facial feature points, the determination of the head position, and the process of recognition. First, the maximum and minimum curvature points are automatically extracted using the geometric information of the face. These points are composed of the bump points and the nasal peak point (NPP). In order to find the exact position of the head and the deflection angle of the head from the input 3D head image, They proposed the Error Compensated SVD (EC-SVD) algorithm to minimize the least   [53] square error and then compensate in the established 3D normalized space. For each axis, the pose is optimized from the angle acquired by the SVD method, thereby restoring the face model to the frontal angle.
Passalis et al. [55] proposed a method to use face symmetry to resolve the pose variation problem. This method uses wavelet biometric signatures which is also used in the landmark detection algorithms proposed in [56]. The signatures allows a matching for the face symmetry to compensate the pose variation (as shown in Fig. 7). Experiments show that this method is suitable for practical scenarios because it requires no manual intervention and the whole process is fully automatic. Moreover, this method is good at handling extreme pose changes such as a nearly 90° head rotation and leaving only one side of the face to the front.
Perakis et al. [56] proposed an algorithm to handle internal occlusion. The algorithm is based on the annotated face model (AFM). The geometry created by the AFM is also invariant in the event of data missing. Therefore, this method deals with incomplete data problems due to pose changes. Verification experiments had been conducted on FRGC v2.0 and can UND. The UND45LR contains a set of scans with each person turns its head 45° away from the frontal orientation. For each person in the scan, the left pose scan belongs to the training set and the right pose scan is considered to be in the testing set. Similarly, the UND60LR marks a collection of side scans with a 60° pose.  6 Interpose matching using the proposed method (left to right) [55] In [13], a new 3D surface representation method, namely the multi-scale local binary model (MS-LBP) depth map, is proposed. This method is used in conjunction with the shape index (SI) map to increase the significance of the smooth-range surface. Scale Invariant Feature Transform (SIFT) are introduced to extract local features to enhance their robustness to pose variations. The Rank-one recognition rate achieved on the FRGC v2.0 database is 96.1%. Since local facial features are used, this method has been shown to be capable of handle partially occluded facial probes.
Berretti et al. [57] uses Scale Invariant Feature Transform (SIFT) key point detection methods to locate feature points in the depth image and find facial curves that connect these key points. The authors use 45° and 60° side scans in the UND database to test their proposal. Since the same organization has collected UND and FRGC v2.0 databases, they have found 39 identical faces between UND's 45° lateral face and the frontal scan of FRGC v2.0. In addition, there are 33 identical faces in the 60° side scan of the UND, and the frontal face of the FRGC v 2.0 uses the curvature information of the landmark to achieve matching.
In [58], face models are represented by radial curves. In order to overcome the data missing problem caused by pose variation, they used a statistical model in the radial curve's shape space. This method works well for recognition and can reach 98.36% recognition accuracy for faces looking downwards at 35°. However, the scanning result from the right side of the face shows that the recognition rate of the right side scan has dropped to 70.49%, while the left side scan has 86.89%. In addition, the limitation of this method is that manual annotation of the nose tip of the side scan is required.
Mahmood et al. [59] proposed a matching method using nose region extraction to defend against large yaw changes (approximately 60° of yaw axis). In order to realign the face to the frontal orientation, a pre-defined and pre-trained nose model is used. Face surfaces are represented by local shape descriptors. The effectiveness of this method has been evaluated in the GAVADB 3D facial database, which includes The block diagram of the proposed method [54] both frontal and partially frontal facial scans. Using this method, the recognition accuracies for frontal face scans and partially frontal facial scans are 94% and 90% respectively.
Ding et al. [60] proposed a PBPR face representation scheme based on the unobstructed facial texture. PBPR can be applied to face images of arbitrary poses, which has greater advantages than other methods. At the same time, they proposed the MtFTL model for learning compact feature transformation between poses.

Research on expression-invariant 3D face recognition
Human faces have local non-rigid deformation when the expression changes, which reduces the similarity between the scanned face and the trained face models, and thereby reducing the accuracy of the 3D face recognition algorithms [68]. Figure 8 shows the facial shapes of the five typical expressions: the neutral expression, happiness, sadness, surprise and disgust in 2D and 3D.
3D face recognition methods that can handle the expression changes generally fall into two categories: rigid [4,69,70] and non-rigid [71][72][73]. The rigid method treats the human face as a rigid subject. Such methods are popular in the early days. The main idea is: when the facial expression changes, there are always some facial regions remain unchanged or have little change. These regions are considered as the rigid areas. The features of the rigid areas are extracted and used in face recognition [74]. The most commonly used rigid areas are the nose, eyes, and the area near the forehead. Queirolo and Silva [75] uses the round area around the nose, the ellipse area around the nose, the face area above the nose, and the entire face area to match. The comprehensive four-part score is used to calculate the similarity between two 3D face images. An modified Simulate Annealing Algorithm (SAA) is then used to find the optimal value of the score. This method was tested with the FRGC v2.0 database and achieved a recognition accuracy of 98.4%. Bornak and Rafiei [76] uses the nose area for 3D face recognition. The authors Fig. 8 Different types of expressions gathered for subject 04514 and their associated texture and 3D images [69] proposed to firstly search for the nasal area in the center of the image, and then extract the outline of the diagonal area of the nasal area as a feature. Erdogmus et al. [77] proposed another local feature based method. They divided the face into several parts, and then calculated the similarity of the corresponding parts between two 3D face images. The conditional density is used to transform the face recognition problem into a probability optimization problem. Miao and Krim [78] uses the nearest point alignment and level set method to search for a region where two face images matching with each other and then uses the size of the matching areas as the similarity of two face images. The method based on the rigid area is relatively simple and easy to implement. However, this type of method discards areas affected by expressions and does not use all the information contained in the 3D face data.
The non-rigid method applies the deformation recovery algorithms to the 3D facial scan to counteract the distortion caused by expression variations. Although a good recognition method can be found in both categories, the non-rigid method is more capable of handling 3D face recognition in facial expression variations and can extract richer facial information [73]. In non-rigid classification, the recognition algorithms are divided into two categories: local methods and holistic methods.

Local feature based expression-invariant approaches
To our best knowledge, the first review of a 3D face recognition systems based on local processing was composed by Chang et al. [4].
Samir et al. proposed a method of comparing facial shapes by the surface curvature [79]. The basic idea is to roughly represent a facial surface with a limited level curve. The curve is extracted from the depth image. In [80], there is a description of the metric for the facial curve calculation. Experimental results show that this method is robust to various expressions.
Kin-Chung et al. proposed a 3D face recognition system that combines linear discriminant analysis (LDA) and linear support vector machine (LSVM) [81]. This method can obtain the sum of invariants by capturing local characteristics from multiple regions. Ten sub-regions and subsequent feature vectors are extracted from the frontal face image. In addition, the amount of variation is summed using the moving frame technique [82]. LDA and LSVM based on linear optimal fusion rules provide better performance. The performance of the reporting method decreases with the expression increases.
Faltemier et al. used 28 best-performing facial sub-regions for 3D facial recognition [83]. To detect the image, a total of 38 sub-regions were extracted, some of which were overlapping. By using an ICP algorithm, each sub-regions in the probe image matches a gallery image region. The highest Rank-one recognition rate reached 90.2% through single-region matching, which promoted the use of fusion strategies. The improved Bordacount fusion method yields an overall 97.2% Rank-one recognition rate. Although the facial information of the image is not complete in some areas of the FRGC v2.0 database, this algorithm still performs well.
In [84], they proposed a mesh-based 3D face recognition method and evaluated it on the Bosphorus database. The surface micro-components are extracted at the salient points of the local neighborhoods, which are respectively detected by the maximum and minimum curvatures, and the final matching score is determined by the two salient points. The experimental results on the Bosphorus dataset highlight the effectiveness of the method and its robustness in facial expression variations.
In [85], the meshSIFT algorithm and its 3D face recognition are proposed. The salient points are detected as extremum points in a scale space, and the convex points are determined according to the surface normals in the local neighborhoods that depend on the scale. Position the embossments and describe them in the feature vector of the connected histogram containing the tilt angle and shape index. Because this descriptor is captured from a local area, expressions are almost always preserved. They allow the use of the number of matching features as a measure of similarity to perform 3D face recognition with invariant expression. Using the left-right symmetry of the face to expand the set of feature descriptors, matching features can be found even without overlapping.
Berretti and Werghi [86] proposes a 3D face recognition method based on mesh-DoG keypoints detector and local GH descriptors, and proposes an original solution to improve the stability of keypoints and select the most effective features from local descriptors. Experiments have been conducted to evaluate the effectiveness of optimization recommendations for stable keypoint detection and feature selection. The recognition accuracy was evaluated on the Bosphorus database and the competition results of existing 3D face recognition solutions based on 3D key points are shown.
Tang et al. proposed a local binary model (LBP) based on a 3D facial segmentation scheme [87]. The face surface is divided into 29 sparse areas and 59 dense areas. They used the nearest neighbor to perform 3d face recognition based on classifiers (Table 5).
This paper [88] is based on a new 3D facial feature recognition system, namely Angular Radial Signature(ARS), which is extracted from the semi-rigid region of the face, and then use the kernel principal component analysis (KPCA). The ARSs extract medium characteristics to improve the discriminating ability. The medium features are then connected into a single feature vector, which is input into a Support Vector Machine (SVM) to perform face recognition. This method deals with facial expression changes in different individuals by using face scans. They did a lot of experiments on FGRGC v2.0 and SHREC 2008 data sets to get excellent recognition performance. Regional Bounding SpheRe descriptors (RBSR) perform effective feature extraction on 3D facial surfaces [89].
In [90] on the application of biometric recognition in 3D face recognition in real life, a new grid SIFT-like algorithm for registration-free 3D face recognition is proposed under expression vatiations, occlusion, and pose changes. The principal curvature-based 3D keypoint detection algorithm, which can repeatedly recognize the complementary position in the local curvature on a facial scan.
Different region based approaches reported so far are summarized in Table 6.

Holistic approaches
The following table (Table 7) shows the main holistic approaches for using the deformation model functions in different databases. The method of isometric deformation model belongs to the overall method. In the isometric deformation model method, [96] used the fast-moving method to calculate the geodesic distance of the face surface, established a geodesic distance matrix (GMD), and then used the singular value decomposition (SVD) method to decompose the GMD to obtain the k largest eigenvalues as the shape of the human face descriptor. Miao [97] calculated a set of equal geodesic distance curves for a 3D face surface, and then calculated the evolution vectors between the adjacent two geodesic distance curves. Considering that the evolution vector is easily affected by the deformation of Euclidean space and requires precise face alignment, the author also uses the evolution angle function (EAF) to normalize the evolution vector into a one-dimensional equation. In this way, the comparison problem between two 3D faces is converted into a comparison of two EAF curves. Feng and Krim [98] used the 20 isometric geodesic distance curves from the tip of the nose to represent the human face, and then cut the equal geodesic distance curve into arcs of equal length and then mapped to the Euclidean integral invariant space as face Features to achieve face classification and recognition. This method is tested in the FRGC v2.0 database and the recognition accuracy is 95%.
Berretti and Del Bimbo [99] divide the face surface into a series of equidistant geodesic strips, and then establish a direction index table by measuring the spatial displacement between the equidistant geodesic strips. Finally, compare the orientation index table of the 3D face model to complete face recognition. This form of tablebased representation greatly reduces the computational complexity, speeds up the search, and is suitable for large-scale face databases. In addition, some scholars use the elastic geodesic between the facial curves to solve the problem of face expression changes and obtain high recognition accuracy [91,100].  [21]. In this way, the internal attributes of the face will not change in the case of isometric deformation, so this representation is suitable for 3D face recognition with expression-invariant. Image classification is done using a PCA classifier and information on colors and shapes is obtained. The experimental results show that the overall performance has been significantly improved by using geodesic polar coordinates.

Research on occlusion-invariant 3D face recognition
However, obtaining non-cooperative individuals' face information in an uncontrolled environment may result in certain parts of the face not being captured because hats, sunglasses, eyes or faces may be partially covered by the hair (Figs. 9, 10). The unavailability of this 3D face data is caused by occlusion of external objects. During the scanning process, due to the non-frontal face pose of the detected individual, some parts of the face may not be captured, which results in erroneous data and we call it internal occlusion. Although many researchers are now dealing with the recognition of expression variations, few researchers do the study of the variation of the occlusion. We will give a detailed introduction to the recognition of some researchers in the case of face occlusion in the following content, including the methods they used, the database they used and the recognition effect that was eventually achieved (Table 8).
Colombo et al. [42] proposed a brand-new recovery strategy that can effectively recognize 3D faces even when faces are partially occlude by unforeseen and unrelated objects (such as scarves, hats, glasses, etc.). The occlusion region is detected by considering their influence on the face projection in a suitable face space. Then, the non-occluded region is used to restore the missing information. Any recognition algorithm can be applied to  [45] this recovery strategy. This recovery strategy fixes 52 3D faces with all kinds of occlusion and has achieved very good results. Alyuz et al. [105] proposed a new 3D face registration and recognition method for partial face regions, which can achieve a good recognition effect in the expression and face occlusion. They proposed a fast and flexible alignment method using average regional models (ARMs) to infer local information by iterating the closest point (ICP) algorithm. Different scores from local regional matchers are derived from local regional matchers are fused to robustly identify probe subjects. In this work, a multi-expression 3D facial database and a Bosphorus 3D face database containing a large number of different types of expressions and realistic face occulsion are used for experimental testing. When face were blocked, a good recognition effect was obtained, and the recognition rate increased from 47.05 to 94.12%.
Mayo and Zhang [106] proposed and evaluated a 3D face recognition algorithm based on point cloud rotations, multiple projections, and voted keypoint matching. His basic idea is to rotate every 3D point cloud that represents a person on the x, y, or z-axis, iteratively project 3D points onto multiple 2.5D images in each step of the rotation. The marked keypoint is then extracted from the generated 2.5D image, and this smaller keypoint will replace the original face scan and its projection in the face database. In an extensive assessment using the GavabDB 3D facial recognition data set, their method has a recognition rate of 95% in neutral expressions, and 90% in recognition of faces such as smiles, laughing faces, and partial occlusion of faces.
Alyuz et al. [107] proposed a new type of a novel occlusion-resistant 3D face recognition system that can cope with severe occlusions of hair, hands, and glasses. A twostep registration model first detects the nose region on the curvedness-weighted convex shape index map and then uses the nose-based iterative closest point (ICP) algorithm to perform well alignment. The occlusion region is automatically determined by a generic facial model. After the occluded introduction of the non-facial part is removed, Gappy PCA is used to recover the entire face from the non-occlude facial surface. Experimental results obtained on realistically occluded facial images from the Bosphorus 3D face database show that using the score level fusion of the regional Linear Discriminant Analysis (LDA) classifier, this method improves the Rank-one recognition accuracy significantly from 76.12 to 94.23%.
Alyuz et al. [108] proposed a fully automatic and effective 3D face recognition method, which is robust to face occlusion. In order to align the occluded surfaces, they use a model based registration scheme in which the model is selected to adaptive the face's occlusion. The alignment model is formed by the automatic inspection for validity and Fig. 10 Sample faces from the UMB-DB [58] includes only the patch of the non occluded face. By registering the occlusion surfaces of the adaptive selection model, a one-to-one correspondence between the model and non-occlusion surface points is obtained. Therefore, occlusion face registration can be achieved. Compared with the registration strategy based on the overall face model (which is usually used for non-occluded surfaces), the recognition rate of the registration strategy is better than that of the overall face model and achieved about 20% improvement identification rate by the adaptive model method testing of Bosphorus and UMB-DB databases. Drira et al. [58] proposed a new geometric framework for analyzing 3D faces and give a specific targets for comparison, matching, and averaging their shapes. They use radial curves from the tip of the nose to represent facial surfaces and use the elastic shape analysis of these curves to form a Riemannian frame to analyze the shape of the entire facial surface. The representation, together with the elastic Riemannian metric, seems to be naturally used to measure facial deformation and is very robust to partial obstructions and glasses, hair, etc.

Open problems and perspectives
3D face recognition still have a lot of open problems for us to research, such as automatic facial expression recognition, age-invariant face recognition and transfer learning.
As an important part of face recognition technology, facial expression (emotion) recognition (FER) has received extensive attention in the fields of human-computer interaction, security, robot manufacturing, automation, medical care, communication and driving in recent years, and become an active research field in the academic and industrial circles [109]. 3D facial expression recognition can overcome weakness and improve recognition accuracy. Some efforts have focused on the recognition of complex and spontaneous emotions rather than the identification of a typical emotional expression that is deliberately displayed [110][111][112][113].
Most face recognition systems are sensitive to age variation. Although some of the earlier papers proposed some recognition methods under age-variations [114][115][116][117]. But there are still many problems that need us to explore.
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. The performance of 2D face recognition algorithms has significantly increased by leveraging the representational power of deep neural networks and the use of large-scale labeled training data. As opposed to 2D face recognition, training discriminative deep features for 3D face recognition is very difficult due to the lack of large-scale 3D face datasets. In [27], they show that transfer learning from a CNN trained on 2D face images can effectively work for 3D face recognition by finetuning the CNN with a relatively small number of 3D facial scans.
3D facial expression analysis, recognition under age-variations and transfer learning constitute three open problems that is still in its infancy.
Future 3D technology will be applied to 3D sensing and 3D visualization. 3D sensing is a depth-sensing technology that augments camera capabilities for facial and object recognition in augmented reality, gaming, autonomous driving, and a wide range of applications. 3D sensing technology is about to go full-on mainstream as the likes of Apple, Google and Samsung race to incorporate 3D sensors into their next generation of smartphones. 3D visualization is the latest mainstream technology that allows designing objects in three-dimensional space with the help of 3D software and producing high-quality digital products. 3D visualization will be used in games, cartoons, films and motion comics because this is the exact sphere for developing and improvement the 3D visualization production.

Conclusions and discussion
3D face recognition is an important and popular area in recent years. More and more researchers are working on this field and presenting their 3D face recognition methods. In this paper, we surveyed some of the latest methods for 3D face recognition under expressions, occlusions, and pose variations. At first we summarized some various available 3D face databases. All of the above methods are tested on these databases. Almost all researchers use the following three formats of face data: point cloud, mesh and range data. All three type face data are obtained by 3D scanner.
The recognition methods are mainly divided into two categories: local methods and holistic methods. Although many experiments are carried out based on the holistic method, we believe that the local method is more suitable for 3D face recognition. Compared to holistic methods, the local method has stronger robustness in terms of occlusion and can obtain better experimental results.
This survey divided 3D face recognition into three directions, pose-invariant 3D face recognition, expression-invariant 3D face recognition and occlusion-invariant 3D face recognition. This paper survey some methods for pose-invariant face recognition that handles a wide range of poses on publicly available databases. The recognition method is mainly local method. For instance, By using half face matching, a complete face model can be synthesized [55]. Using a statistical model in the radial curve's shape space to overcome the data missing problem.
There are two methods for Expression-Invariant 3d face recognition, one is local approaches based expression invariant approaches and the other is holistic approaches.
This survey made a detailed introduction to the recognition of some researchers in the case of face occlusion in the following content, including the methods they used, the database they used and the recognition effect that was eventually achieved.
Also, 3D face recognition technology has been applied in many fields, such as access control and automatic driving. The iPhone X uses Face ID, technology that unlocks the phone by using infrared and visible light scans to uniquely identify your face. It works in a variety of conditions and is extremely secure. In the world of autonomous driving, the autopilot needs to manage the hand-over between the automated and the manual modes. To have a smooth hand-over, it is important to make sure that the driver is alert and ready to take control of the car before the autopilot is disengaged.
To have a smooth transition between modes of operation, Omron introduce 3D facial recognition technology that detects a drowsy or distracted driver. Considering the fact that one out of every six car accidents is attributed to a drowsy or distracted driver, the technology can have a huge impact even on the safety of manual driving.