Reliability and high availability in cloud computing environments: a reference roadmap

Mesbahi, Mohammad Reza; Rahmani, Amir Masoud; Hosseinzadeh, Mehdi

doi:10.1186/s13673-018-0143-8

Table 6 Fault tolerance methods in cloud computing systems [93, 94]

From: Reliability and high availability in cloud computing environments: a reference roadmap

FT policy	FT technique	Description
Proactive FT	Preemptive migration	Preemptive migration involves suspending a process, recording its state, transferring it to another node and resuming operation of the process in the new node. It makes use of a feedback-loop control system where applications are constantly monitored and analyzed
Proactive FT	Software rejuvenation	Software rejuvenation technique can be applied proactively as inescapably software aging can lead to the software systems failures. In fact, it is a technique in which periodic reboots are scheduled for the system. After each reboot, the system resumes with a clean state
Reactive FT	Checkpointing/restart	Application checkpoint/restart technique allows saving the state of a running application to resume its execution later from the time at which it was checkpointed, on any arbitrary machine After a failure has occurred, the application software will be restarted from the point of failure, instead of rerunning the whole application from the scratch. It is an efficient fault tolerance technique for high computation intensive applications hosted in the cloud
	Replication	Replication is one of the most popular techniques which can be used according to the reactive policy. In cloud computing fault tolerance techniques, replication can be applied by keeping multiple replica of data and services. So, when an incoming request is received, it can be handled by a set of available replicas. Several different replicas are running through different computing resources to complete the requested task
	Task resubmission	The failed task can be resubmitted either to the same or to a different host at system runtime without any interruption during the system workflow

Back to article page