- Open Access
Discriminative histogram taxonomy features for snake species identification
Human-centric Computing and Information Sciences volume 4, Article number: 3 (2014)
Incorrect snake identification from the observable visual traits is a major reason for death resulting from snake bites in tropics. So far no automatic classification method has been proposed to distinguish snakes by deciphering the taxonomy features of snake for the two major species of snakes i.e. Elapidae and Viperidae. We identify 38 different taxonomically relevant features to develop the Snake database from 490 sample images of Naja Naja (Spectacled cobra), 193 sample images of Ophiophagus Hannah (King cobra), 88 images of Bungarus caeruleus (Common krait), 304 sample images of Daboia russelii (Russell’s viper), 116 images of Echis carinatus (Saw scaled viper) and 108 images of Hypnale hypnale (Hump Nosed Pit Viper).
Snake identification performances with 13 different types of classifiers and 12 attribute elevator demonstrate that 15 out of 38 taxonomically relevant features are enough for snake identification. Interestingly, these features were almost equally distributed from the logical grouping of top, side and body views of snake images, and the features from the bottom view of snakes had the least role in the snake identification.
We find that only few of the taxonomically relevant snake features are useful in the process of snake identification. These discriminant features are essential to improve the accuracy of snake identification and classification. The presented study indicate that automated snake identification is useful for practical applications such as in medical diagnosis, conservation studies and surveys by interdisciplinary practitioners with little expertise in snake taxonomy.
Snake is a cold blooded reptile that is in majority perceived to be deadly to humans [1–5]. Since the ancient times, Snakes have been worshipped, feared and disliked by people across the world. Snake remain a painful reality in the daily life of millions of affected people and is largely one of the most misunderstood species [6, 7]. At the same time, they are more perilous than the wild animals due to their close existence near human habitation . World Health organization reports around five million snake bites every year resulting in millions of envenomation, hundreds of thousands of amputations and deaths. In cities like Thiruvananthapuram in Kerala, that has high humidity environment, where we started our study, on daily approximately 25–30 Snakes sightings are reported. Majority of these sighted snakes were identified to equip with enough venom to kill a human in the course of few hours.
In tropical regions of the world, most of the snake bite cases are caused by four venomous snakes often referred to as “Big Four” snakes . They include Spectacled Cobra (Naja naja), Common Krait (Bungarus caeruleus), Russell’s Viper (Daboia russelii) and Saw Scaled Viper (Echis carinatus) . Another snakes which causes major snake bite cases and is very commonly found are King cobra (Ophiophagus Hannah) and Hump nosed Pit Viper (Hypnale hypnale). Due to this reason we restrict our study in this paper to these six deadly snakes [9, 10].
Although anti-venom is produced in sufficient quantities by several public and private manufacturers, most snake bite victims don’t have access to good quality care, and in populated countries like India, both morbidity and mortality due to snake bite is high. Because of serious misreporting, the true burden of snake bite is not known. Doctors mostly inject polyvalent anti-venom to the snake bite victim. This is injected without considering which snake has bitten the person, even under the situation when the patient has knowledge about some observational features of the snake under consideration. The taxonomy of the snake is not well understood by majority of the medical practitioners making the correct identification of the snake from the remarks of the victims or eye witness. The polyvalent anti-venom injected by the medical practitioner contains antibodies raised against two or more species of snake, which may neutralize the venom injected by a single snake bite. Since there is only one type of venom injected by a snake bite, the remaining non-neutralized part of the polyvalent anti-venom used for treating the patient creates further risk to the human health. So proper identification of the snake is very important for the proper medical treatment to save the life of the snake bite victims [9–11].
To our knowledge, there has been no research reported yet on computer based approach to automatically distinguish snake classes. This may be largely due the lack of database for this purpose and less awareness of snake taxonomy research. The lack of database of venomous snakes in India makes this research very challenging, as the collection of images often involve well trained snake catchers, photographers and expert biologists. Through this paper we provide an early set of snake images that are collected in a view to identify relevant features based on snake taxonomy. In addition, the images contain a wide range of features from different snakes that can help with gaining newer understanding on snake taxonomy. The Indian snake taxonomy is a topic that is not investigated with rigor and there is lack of expert taxonomists. This makes the first line snake identification difficult in life threatening situations that are essential for recommending accurate treatment to the snake bite victims.
Materials and methods
The snake images for the experiment were collected from forest across different parts of Kerala, India with the help of snake catchers from Pujappura Panchakarma Serpentarium, Trivandrum, India, through the close and 1 year long interaction with the subjects under study. The total number of images used for this experiment is 1299 that are obtained from 10–15 wild snakes of each species taken at different occasions and time.
Table 1 shows the taxonomically relevant features and their logical grouping based on the top, bottom, side or body view of the snake in the captured image, and Figure 1 shows the visual description of taxonomy features for each of snake class. The descriptions of the snakes are included as a supplementary file (Additional file 1). In total, 38 taxonomy based features are identified for creation of the feature database from 1299 snake images collected. There are a total of 490 images of spectacled cobra, 304 images of Russell’s viper, 193 images of king cobra, 88 images of common krait, 116 images of saw scaled viper and 108 images of hump nosed pit viper. For creating the feature database, the 1299 snake images are manually converted by taxonomist to form feature vectors representing 38 taxonomically relevant features. This database file is included as a supplementary material to this article (Additional file 2).
Feature ranking and selection
Out of 38 taxonomically relevant features, top features that have highest impact on classification are determined. In order to find the top features from the complete database following 12 Attribute Elevators are used: ChiSquared AttributeEval , CfsSubsetEval , ConsistencySubsetEval , FilteredAttributeEval , FilteredSubsetEval , GainRatioAttributeEval , InfoGainAttributeEval , OneRAttributeEval , PrincipalComponents , ReliefFAttributeEval , SVMAttributeEval , SymmetricalUncertAttributeEva , along with combination of certain search methods [21, 22] like Genetic Search, Greedy Stepwise, Linear Forward Selection, Rank Search, Scatter Search, Subset Size Forward Selection and Ranker. The histogram of the feature counts from these attribute elevators is then plotted to get the ranking of the taxonomically relevant features that are most useful for the classification as shown in Figure 2. The concept of ranking and histograms used in this method is useful for identifying the relevance of the features [23–25]. The rank table is made with the help of this histogram based on the total number of repetitions of each features in the experiment. The repetitions of the feature results from the repeated ranking of features using different feature ranking method. The features that share same number of repetitions are then ranked on the basis of their average classification score taken independently for that feature i.e. features with highest average classification score among the features with same repetition is ranked first. Table 2 shows the ranking of all the 38 features using the attribute elevators with search method and classification score. The rank list of features is used to prepare 38 feature subsets with different numbers of features from 1 to 38 starting from the top feature to the last feature of Table 2. The numbers of features in the feature subsets are referred to as feature size.
Classifier selection and training
In order to perform automated snake classification following 13 classifiers are used: Bayes Net , Naïve Bayes , Multilayer perception , Ada BoostM1 , Multi BoostAB , RBF network , IB1 , IBk , LWL , NB Tree , J48 , Random Sub Space , and Bagging . In the setting up the classification experiment, the database is split into training and test set. The training set is the one that will train the classifier parameter, while the test set is used to assess the performance of the classifier in terms of classification accuracy, F-score value, the area under the receiver operator characteristic curve, precision and recall rates. The selection of less number of samples per snake class in the training set makes the problem challenging and performance measures in such situations indicates classifiers applicability in practice. In our study, we use 5% of the samples from each snake class for the training set, while remaining 95% is selected as test set. The classifier that performs the best in terms of performance measures can be selected as a possible candidate for implementation.
The research and work submitted do conform to the guidelines for care and use of animals in scientific research. We’ve followed the guidelines published by Indian National Science Academy. The Ethics committee of Enview R&D Labs gave approval for the research work.
Results and discussion
The feature database of the snakes is as explained in Table 1 and Figure 1 is used for analysing the classification performance of this six class classification problem. The feature database contains 38 features of each sample. Now using Table 2, we perform our further experiments for databases with different feature size. The samples in the databases are randomly split into 5% samples in training set and 95% in test set and performance evaluated on individual classifiers. The selection of features is performed on the training set. To ensure statistical correctness, the selection and testing is repeated 100 times, and the resulted reported in Table 3. The testing is done such that test and training set are non-overlapping in samples. Table 3 shows the comparisons of average performance measures of 38 feature size databases. The performances indicated are percentage accuracy of correct classification, F-score value, the area under the receiver operator characteristic curve, precision and recall rates. Table 3 shows the variation of performance measures with the increase in feature size i.e. the number of features in the feature-subset. As shown in Table 3, the correct classification accuracy increases considerably till feature size 15 which contain top 15 features of rank list in the database and tend to drop from feature size 31. This proves that these top 15 features are alone enough for the automated snake identification instead of 38 taxonomically relevant features.
Tables 4 and 5 shows the performance of the automatic snake classification using Bayes Net , Naive Bayes , Multilayer perception , Ada BoostM1 , Multi BoostAB , RBF network , IB1 , IBk , LWL , NB Tree , J48 , Random Sub Space , and Bagging  classification methods for top 15 selected snake feature database and 38 snake feature database respectively. The performances indicated are percentage accuracy of correct classification, F-score value, the Area under the receiver operator characteristic curve, precision and recall rates. The RBF network, IBk and IB1 classifiers showed higher classification performance as opposed other classifiers. The classification accuracy of above 85% in-dicates robustness of the taxonomically relevant features in the automatic classification process. Multilayer perception , RBF Network , IB1 , IBk , and J48  shows good recognition performance among the tested classifiers at 5% training data. While increasing the training dataset size to 30% the multilayer perception  classifier results in 94.31 ± 1.00% classification accuracy. The results indicate the difficulty of automatic classification of snakes, nonetheless, is indicative of the practical use in as a first line prediction of the snake classification. These early results opens up two major directions of research: (1) as to identify the taxonomy features of unknown snakes using feature automatic feature analysis and (2) to develop accurate feature classification and recognition methods for automatic snake. To use of real-time applications such as in diagnosis an ambitious 100% accuracy is preferred, which is by far a challenging problem posed through these results. In addition, the results on 5% training data, is likely to be more useful in real-time systems as in real applications the size of the test data keeps on growing at a rate higher than the training data, mainly because of the labor intensive processes involved in the preparation and validation of the training data.
In this paper, we presented an automatic snake identification problem by developing a taxonomy based feature targeted for use by the computer scientist and herpetologist. The feature-subset analysis indicated that only 15 features are sufficient for snake identification. In a real-life situation, the snake feature database reflects a situation when the bite victim has seen the snake, and based on the observed features it is required to identify the class of the snake. In addition to the venom detection research required for treating the bite victims, the proposed automatic snake recognition method could provide valuable information to administer correct medication and treatment in life threatening situation. Survey of snakes in wild is another major activity in the process to ensure the preservation of snake population and diversity. This is however a very challenging task and require prohibitive investments in manpower. The automatic classification using snake image database can be extended to the analysis of snake images captured remotely with minimal human intervention. The progress in snake taxonomy research is in the decline for the last 60 years, and has resulted in lack of expertise for environmental surveys and help required for medical practitioners in emergency situations. With a computerized analysis on the images of snakes using the proposed database and classification approach, we hope that more studies would come out to generate interest on this topic.
Smith MA: Reptilia and Amphibia. Today & Tomorrow’s Printers & Publishers, India; 1981.
Whitaker R, Captain A, Ahmed F: Snakes of India: the field guide. Draco Books, Chengalpattu; 2004.
Mattison C: Snake. Dorling Kindersley, New York,USA; 1999.
Firth SMJWJR: Snake. Scholastic, India; 2002.
Weidensaul S: Snakes of the World. Grange Books Ltd, Chartwell House, London; 1996.
Mertens T: Deadly & Dangerous Snakes. Magic Bean. Era Publications, Flinders Park, South Australia; 1995.
Backshall S: Venomous Animals of the World. Johns Hopkins University Press, Maryland, USA; 2007.
Stevens D: The Big Four Snakes: The Indian Cobra, the Common Krait, the Russell’s Viper, and the Saw-Scaled Viper. Webster’s Digital Services, USA; 2011.
Premawardhena A, De Silva C, Fonseka M, Gunatilake S, De Silva H: Low dose subcutaneous adrenaline to prevent acute adverse reactions to antivenom serum in people bitten by snakes: randomised, placebo controlled trial. BMJ: Brit Med J 1999, 318(7190):1041. 10.1136/bmj.318.7190.1041
Warrell DA: The clinical management of snake bites in the Southeast Asian region. Southeast Asian J Trop Med Public Health 1999, 1(Suppl 1):1–89.
Calvete JJ, Ju’arez P, Sanz L: Snake venomics. Strategy and applications. J Mass Spectrom 2007, 42(11):1405–1414. 10.1002/jms.1242
Sorower MS, Yeasin M: Robust Classification of Dialog Acts from the Transcription of Utterances. In ICSC 2007. IEEE International Conference on Semantic Computing, 3–10. 2007.
Chanda P, Cho YR, Zhang A, Ramanathan M: Mining of attribute interactions using information theoretic metrics. In Data mining workshops, ICDMW’09. IEEE International Conference on Data Mining, Florida, USA; 2009:350–355.
Devi MI, Rajaram R, Selvakuberan K: Generating best features for web page classification. Webology 5. 2008.
Marquez-Vera C, Romero C: Ventura S: Predicting school failure using data mining. In Proceedings of the 4th International Conference on Educational Data Mining 271–276. 2011.
John GH, Kohavi R, Pfleger K: Irrelevant features and the subset selection problem. In Proceedings of the eleventh international conference on machine learning, Volume 129, San Francisco 121–129. 1994.
Jensen R, Shen Q: Fuzzy-rough sets assisted attribute selection. Fuzzy Systems, IEEE Transactions on 2007, 15: 73–89. 10.1109/TFUZZ.2006.889761
Meng YX: The practice on using machine learning for network anomaly intrusion detection. In IEEE International Conference on Machine Learning and Cybernetics (ICMLC), 2011, Vol. 2, 576–581. 2011.
Indra Devi M, Rajaram R, Selvakuberan K: Automatic web page classification by combining feature selection techniques and lazy learners. In conference on computational intelligence and multimedia applications, 2007. Int Conference on 2007, 2: 33–37.
Koonsanit K, Jaruskulchai C: Band selection for hyperspectral image using principal components anal-ysis and maxima-minima functional. In Knowledge, Information, and Creativity Support Systems. Thailand, Springer; 2011:103–112. 10.1007/978-3-642-24788-0_10
Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH, Trigg L: Weka. In Data Mining and Knowledge Discovery Handbook. Springer, USA; 2005:1305–1314. 10.1007/0-387-25465-X_62
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 2009, 11: 10–18. 10.1145/1656274.1656278
James AP, Dimitrijev S: Ranked selection of nearest discriminating features. Hum-centric Comput Inform Sci 2012, 2: 12. 10.1186/2192-1962-2-12
Milacic M, James AP, Dimitrijev S: Biologically inspired features used for robust phoneme recognition. International Journal of Machine Intelligence and Sensory Signal Processing 2013, 1(1):46–54. 10.1504/IJMISSP.2013.052867
James AP, Maan AK: Improving feature selection algorithms using normalised feature histograms. Electron Lett 2011, 47(8):490–491. 10.1049/el.2010.3672
Longstaff ID, Cross JF: A pattern recognition approach to understanding the multi-layer perception. Pattern Recogn Lett 1987, 5(5):315–319. 10.1016/0167-8655(87)90072-9
Kim SB, Han KS, Rim HC, Myaeng SH: Some effective techniques for naive bayes text classification. Knowledge and Data Engineering, IEEE Transactions on 2006, 18(11):1457–1466. 10.1109/TKDE.2006.180
Freund Y, Schapire RE: A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, Springer 23–37. 1995.
Benbouzid D, Busa-Fekete R, Casagrande N, Collin FD, Kégl B: MultiBoost: a multi-purpose boosting package. J Mach Learn Res 2012, 13: 549–553.
Buhmann MD: Radial basis functions: theory and implementations, Volume 12. Cambridge university press. 2003.
Aha DW, Kibler D, Albert MK: Instance-based learning algorithms. Machine learning, Boston,USA; 1991.
Atkeson CG, Moore AW, Schaal S: Locally weighted learning for control. Artif Intell Rev 1997, 11(1–5):75–113. 10.1023/A:1006511328852
Kohavi R: Bayes rule based and decision tree hybrid classifier. [US Patent 6,182,058]. 2001.
Kotsiantis SB, Zaharakis ID, Pintelas PE: Machine learning: a review of classification and combining techniques. Artif Intell Rev 2006, 26(3):159–190. 10.1007/s10462-007-9052-3
Ho TK: The random subspace method for constructing decision forests. Pattern Anal Mach Intel, IEEE Transactions on 1998, 20(8):832–844.
Breiman L: Bagging predictors. Mach Learn 1996, 24(2):123–140.
Singhal A, Brown C: Dynamic Bayes net approach to multimodal sensor fusion. In Proceedings of the SPIE-The International Society for Optical Engineering, Volume 3209, 2–10. 1997.
The authors thank the snake catchers in Trivandrum for the assistance with the creation of the database. The assistance of Balaji Balasubramaniam (TRDDC) and Anaswara Krishnan (Department of Zoology, Kerala University) is also acknowledged. Dileep Kumar R would like to acknowledge the support of Prof Ommen V Ommen for the encouragement and support for this research.
We declare that there are no competing interests.
AJ carried out the problem formulation, algorithm development and drafted the manuscript. BM organized the dataset and performed the feature analysis. SS helped in the implementation of algorithm. DR collected the original snake images. All authors read and approved the final manuscript.