Skip to main content

Table 4 Failure classification in cloud computing systems

From: Reliability and high availability in cloud computing environments: a reference roadmap

Failure classification

Failure modes

Description

Software failures

System/application software failure [36]

The cloud tasks and VM hypervisors are actually software programs running on different computing nodes, which may contain software faults, bugs, and errors

Database failure [36]

There is the possibility of hardware or software failure in each database system. So, database systems are prone to losing data

Hardware failures

Hardware component failure [65]

The computing resources, in general, have hardware components (such as storage devices, processing elements, and memory) which may also encounter hardware failures

Network failure [65]

When cloud tasks access remote data sources, the communication channels could be broken, which causes the network failure, especially for the long time transmissions of large datasets

Cloud management system (CMS) failures

Overflow [66, 67]

There is usually a limitation on the maximal number of incoming requests in the queue. Waiting too long in the queue can cause the Timeout failure for new requests. So, if the queue is full, new requests will be dropped simply which is called an overflow failure

Timeout [66, 67]

The cloud service commonly has its due time set by the owner or the service monitor. If the waiting time of the queued requests is over the due time, the Timeout failure occurs. Therefore, those timeout requests will be dropped from the queue

Data resource missing [66, 67]

In CMS, the data resource manager should register data resources. However, it is possible that some previously registered data are removed but the data resource is not updated. So, data resource missing will happen

Computing resource missing [66, 67]

The computing resource missing is another failure like data resource missing that can also happen in the cloud management system. This failure will happen because of the reasons like turning off the PC without notifying the CMS

Security failures

Customer faults [68]

The recent research results show that only a small portion of security failures impacting cloud services consumers have been due to the provider’s fault. According to the Gartner’s top predictions for IT users for 2016 and beyond, about 95% of cloud security failures through 2020 will be the customer’s faults

Software security breaches [69]

Software security breaches can lead to the cloud services failure. When the attackers can gain access to the customer information such as login data, credits and etc. through the cloud-based software security breaches, it can result in huge problems for the customers who rely on their daily cloud-based activities

Security policy failure [69]

Miscalculating the cloud security requirements in providing a security policy is really a hot challenge which leads to system failures. Common mistakes to define a comprehensive security policy are some of the main reasons for security failure

Human Operational Faults

Misoperation [67]

This kind of failure is related to accidental faults made by human personnel operating or configuring the system, for both updates of the system and during a repair process. The extent to which this misoperation affects the cloud system can depend on the level on which the fault has occurred

Misconfiguration [67]

There is a possibility of affecting a whole cluster or even a whole datacenter in a cloud system in case network node software is misconfigured. The worst case, however, remains the misconfiguration of the cloud management software which leads to bringing down all the cloud at once

Environmental Failures

Environmental disasters [67]

Environmental disasters can play the main role in the dependability of a cloud system. Factors such as floods, power outages, fires etc. are although outside the control of the service provider but can always interrupt service provision. This is because these environmental disasters like floods and power outages affect a whole cloud datacenter and hence their consequences can be a very large-scale service disruption

Cooling system failure [67]

The functionality of physical servers in a cloud datacenter also depends on the thermal conditions of the location where the servers are installed. So, failure in the air-conditioning system where servers are placed also causes failure in services provision. Therefore, the servers will either shut down completely or will be under-utilized for offering services and hence can be regarded as unavailable