From: Reliability and high availability in cloud computing environments: a reference roadmap
Failure classification | Failure modes | Description |
---|---|---|
Software failures | System/application software failure [36] | The cloud tasks and VM hypervisors are actually software programs running on different computing nodes, which may contain software faults, bugs, and errors |
Database failure [36] | There is the possibility of hardware or software failure in each database system. So, database systems are prone to losing data | |
Hardware failures | Hardware component failure [65] | The computing resources, in general, have hardware components (such as storage devices, processing elements, and memory) which may also encounter hardware failures |
Network failure [65] | When cloud tasks access remote data sources, the communication channels could be broken, which causes the network failure, especially for the long time transmissions of large datasets | |
Cloud management system (CMS) failures | There is usually a limitation on the maximal number of incoming requests in the queue. Waiting too long in the queue can cause the Timeout failure for new requests. So, if the queue is full, new requests will be dropped simply which is called an overflow failure | |
The cloud service commonly has its due time set by the owner or the service monitor. If the waiting time of the queued requests is over the due time, the Timeout failure occurs. Therefore, those timeout requests will be dropped from the queue | ||
In CMS, the data resource manager should register data resources. However, it is possible that some previously registered data are removed but the data resource is not updated. So, data resource missing will happen | ||
The computing resource missing is another failure like data resource missing that can also happen in the cloud management system. This failure will happen because of the reasons like turning off the PC without notifying the CMS | ||
Security failures | Customer faults [68] | The recent research results show that only a small portion of security failures impacting cloud services consumers have been due to the provider’s fault. According to the Gartner’s top predictions for IT users for 2016 and beyond, about 95% of cloud security failures through 2020 will be the customer’s faults |
Software security breaches [69] | Software security breaches can lead to the cloud services failure. When the attackers can gain access to the customer information such as login data, credits and etc. through the cloud-based software security breaches, it can result in huge problems for the customers who rely on their daily cloud-based activities | |
Security policy failure [69] | Miscalculating the cloud security requirements in providing a security policy is really a hot challenge which leads to system failures. Common mistakes to define a comprehensive security policy are some of the main reasons for security failure | |
Human Operational Faults | Misoperation [67] | This kind of failure is related to accidental faults made by human personnel operating or configuring the system, for both updates of the system and during a repair process. The extent to which this misoperation affects the cloud system can depend on the level on which the fault has occurred |
Misconfiguration [67] | There is a possibility of affecting a whole cluster or even a whole datacenter in a cloud system in case network node software is misconfigured. The worst case, however, remains the misconfiguration of the cloud management software which leads to bringing down all the cloud at once | |
Environmental Failures | Environmental disasters [67] | Environmental disasters can play the main role in the dependability of a cloud system. Factors such as floods, power outages, fires etc. are although outside the control of the service provider but can always interrupt service provision. This is because these environmental disasters like floods and power outages affect a whole cloud datacenter and hence their consequences can be a very large-scale service disruption |
Cooling system failure [67] | The functionality of physical servers in a cloud datacenter also depends on the thermal conditions of the location where the servers are installed. So, failure in the air-conditioning system where servers are placed also causes failure in services provision. Therefore, the servers will either shut down completely or will be under-utilized for offering services and hence can be regarded as unavailable |