Skip to main content

Table 6 Fault tolerance methods in cloud computing systems [93, 94]

From: Reliability and high availability in cloud computing environments: a reference roadmap

FT policy

FT technique

Description

Proactive FT

Preemptive migration

Preemptive migration involves suspending a process, recording its state, transferring it to another node and resuming operation of the process in the new node. It makes use of a feedback-loop control system where applications are constantly monitored and analyzed

Software rejuvenation

Software rejuvenation technique can be applied proactively as inescapably software aging can lead to the software systems failures. In fact, it is a technique in which periodic reboots are scheduled for the system. After each reboot, the system resumes with a clean state

Reactive FT

Checkpointing/restart

Application checkpoint/restart technique allows saving the state of a running application to resume its execution later from the time at which it was checkpointed, on any arbitrary machine

After a failure has occurred, the application software will be restarted from the point of failure, instead of rerunning the whole application from the scratch. It is an efficient fault tolerance technique for high computation intensive applications hosted in the cloud

Replication

Replication is one of the most popular techniques which can be used according to the reactive policy. In cloud computing fault tolerance techniques, replication can be applied by keeping multiple replica of data and services. So, when an incoming request is received, it can be handled by a set of available replicas. Several different replicas are running through different computing resources to complete the requested task

Task resubmission

The failed task can be resubmitted either to the same or to a different host at system runtime without any interruption during the system workflow