From: A survey of cloud-based network intrusion detection analysis
Work | Goal | Dataset(s) | Major approaches | Cloud environment | ML algorithm(s) | Advantages | Challenges |
---|---|---|---|---|---|---|---|
Lee et al. [21] | Monitor internet traffic flow | Simulated NetFlow packets | (1) Packet sampling, (2) Flow aggregation, and (3) MapReduce programming model | Apache Hadoop | None | Flow computation time improved by 72Â % over legacy tools | Batch-processing jobs and text input file formats difficult to handle; flow analysis tools are not adequately developed for the MapReduce interface |
Singh et al. [65] | P2P botnet detection | Simulated and CAIDA sample datasets | (1) Information gain measurement and (2) Clustering (random forest) in mahout | Apache Hadoop | Random forest | Process high bandwidth in quasi-real-time, effectively classifies malicious traffic on a cluster | High packet drop rates, detection times still a little too high, cannot respond to newer, more sophisticated threats |
Bhat et al. [67] | Anomaly intrusion detection | NSL-KDD 99 | (1) Naïve Bayes (NB) tree and (2) A hybrid approach of NB tree and random forest | Amazon EC2 | NB tree and random forest | Good performance, high accuracy, low false positive rate for NB tree/random forest hybrid implementation | High false positive rate for non-hybrid implementations |
Chen et al. [20] | Phishing attack detection | Simulated dataset | Apache Hadoop | Eucalyptus, Apache Hadoop, and Amazon EC2 | Collaborative algorithm based on distributed hash tables (DHT) | Practical scheme, can be generalized to other attacks | Not tested with various datesets |
Chen et al. [57] | Intrusion detection | KDD 99, CMDC 2012 | (1) Feature reduction, (2) Vertical compression, and (3) Intrusion detection | Apache Hadoop | OneR algorithm, affinity propagation, KNN, and SVM | Faster than traditional models | No incremental clustering ability—feature reduction and training steps can provide significant overhead |
Marnerides et al. [22] | Malware detection | Simulated dataset | (1) Energy estimation, (2) Feature selection, and (3) Covariance analysis | Unknown | Choi-Williams distribution | Effective for identifying Kelihos injection | Not tested with various datesets |
Muthurajkumar et al. [51] | Intrusion detection | Simulated dataset | (1) Feature selection and (2) Fuzzy SVM | Unknown | Rough set based feature selection algorithm (RSFSA), fuzzy SVM | Reduces number of decision attributes and the size of log data, faster than traditional models | Not tested with various datesets |
Vieira et al. [64] | Intrusion detection technique | Simulated dataset | Utilization of grid and cloud computing | Unknown | Feed-forward neural network | Successfully explores communication events to mark intrusion | Large sample period of data is required and training cannot adapt new threats |
Wang et al. [68] | Network traffic passive measurement | CAIDA dataset (anonymized traffic data collected from equinix-chicago and equinix-sanjose) | IP trace analysis system (IPTAS) | Unknown | None | Useful prototype of passive traffic analysis tool | Not provide a fine-grained traffic analysis |