Skip to main content

Table 1 Network intrusion detection techniques that have been developed utilizing cloud-computing technology

From: A survey of cloud-based network intrusion detection analysis

Work

Goal

Dataset(s)

Major approaches

Cloud environment

ML algorithm(s)

Advantages

Challenges

Lee et al. [21]

Monitor internet traffic flow

Simulated NetFlow packets

(1) Packet sampling, (2) Flow aggregation, and (3) MapReduce programming model

Apache Hadoop

None

Flow computation time improved by 72 % over legacy tools

Batch-processing jobs and text input file formats difficult to handle; flow analysis tools are not adequately developed for the MapReduce interface

Singh et al. [65]

P2P botnet detection

Simulated and CAIDA sample datasets

(1) Information gain measurement and (2) Clustering (random forest) in mahout

Apache Hadoop

Random forest

Process high bandwidth in quasi-real-time, effectively classifies malicious traffic on a cluster

High packet drop rates, detection times still a little too high, cannot respond to newer, more sophisticated threats

Bhat et al. [67]

Anomaly intrusion detection

NSL-KDD 99

(1) Naïve Bayes (NB) tree and (2) A hybrid approach of NB tree and random forest

Amazon EC2

NB tree and random forest

Good performance, high accuracy, low false positive rate for NB tree/random forest hybrid implementation

High false positive rate for non-hybrid implementations

Chen et al. [20]

Phishing attack detection

Simulated dataset

Apache Hadoop

Eucalyptus, Apache Hadoop, and Amazon EC2

Collaborative algorithm based on distributed hash tables (DHT)

Practical scheme, can be generalized to other attacks

Not tested with various datesets

Chen et al. [57]

Intrusion detection

KDD 99, CMDC 2012

(1) Feature reduction, (2) Vertical compression, and (3) Intrusion detection

Apache Hadoop

OneR algorithm, affinity propagation, KNN, and SVM

Faster than traditional models

No incremental clustering ability—feature reduction and training steps can provide significant overhead

Marnerides et al. [22]

Malware detection

Simulated dataset

(1) Energy estimation, (2) Feature selection, and (3) Covariance analysis

Unknown

Choi-Williams distribution

Effective for identifying Kelihos injection

Not tested with various datesets

Muthurajkumar et al. [51]

Intrusion detection

Simulated dataset

(1) Feature selection and (2) Fuzzy SVM

Unknown

Rough set based feature selection algorithm (RSFSA), fuzzy SVM

Reduces number of decision attributes and the size of log data, faster than traditional models

Not tested with various datesets

Vieira et al. [64]

Intrusion detection technique

Simulated dataset

Utilization of grid and cloud computing

Unknown

Feed-forward neural network

Successfully explores communication events to mark intrusion

Large sample period of data is required and training cannot adapt new threats

Wang et al. [68]

Network traffic passive measurement

CAIDA dataset (anonymized traffic data collected from equinix-chicago and equinix-sanjose)

IP trace analysis system (IPTAS)

Unknown

None

Useful prototype of passive traffic analysis tool

Not provide a fine-grained traffic analysis