- Open Access
On construction of a cloud storage system with heterogeneous software-defined storage technologies
© The Author(s) 2019
- Received: 13 October 2018
- Accepted: 19 March 2019
- Published: 2 April 2019
With the rapid development of networks and Information technologies, cloud computing is not only becoming popular, the types of cloud services available are also increasing. Through cloud services, users can upload their requirements via the Internet to the cloud environment and receive responses following post-processing, for example, with cloud storage services. Software-Defined Storage (SDS) is a virtualization technology for cloud storage services. SDS uses software to integrate storage resources and to improve the accessibility and usability of storage services. Currently, there are many different open source projects available for SDS development. This work aims to utilize these open source projects to improve the efficiency of integration for hardware and software resources. In other words, in this work, we propose a cloud storage system that integrates various open source SDS software to make cloud storage services more compatible and user friendly. The cloud service systems can also be managed in a more convenient and flexible manner. The experimental results demonstrate the benefits of the proposed system.
- Cloud service
- Storage service
- Software-defined storage
- Automatic distribution
In the last decade, cloud computing has attracted more and more attentions in both industry and academia [1–8]. It deeply changed people’s lives due to its inherent advantages, such as on-demand self-service, resource pooling and rapid resource elasticity, etc.. With the services provided by cloud computing, users can upload their requirements via the Internet to a cloud environment and receive responses following post-processing in the cloud environment. Among these services, cloud storage service is one of the important and indispensable services [9–12]. Cloud storage makes data storage a service in which data is outsourced to a cloud server maintained by a cloud provider. With the service, data could be stored remotely into the cloud efficiently and safely. Thus, this service attracts many people, especially enterprises, due to that it brings appealing benefits, e.g., avoidance of capital expenditure on hardware and software, relief of the burden for storage management, etc. [13–15].
We implemented a cloud storage system to integrate various SDS technologies using cubic spline interpolation and distribution mechanisms. The proposed system consisted of three main components, they were the storage service, the file distribution mechanism and the user service, respectively. In addition, since the user’s file size cannot be predicted and the received files were not the same with our measured results, we successfully solve this problem by integrating the cubic spline interpolation method.
In the system architecture, we used open source software to make the system more compatible. In addition, a file was assigned automatically to an appropriate storage location after users uploaded files.
We designed a user-friendly interface, users could easily upload their files and realized the usage percentages of storage as well as the status of their uploading jobs. Also, the parameters could be set freely to make the system more flexible by managers.
During the early development of cloud services, the exact meaning of software-defined service was inconclusive. The concept of “software-defined data center” was first proposed by VMware as software became more important. By employing the concept of virtualization in developing hardware resources as a resource pool, software could be employed to control the arrangement of hardware resources. When using programmable software to control the arrangement of hardware resources, there is no need to think about how to manipulate servers and security or allocate resources. In other words, all the resources function perfectly [16–18]. Cloud computing gave rise to more possibilities, enabling software-defined services to be different concepts in hardware and software architectures. These concepts have in turn enabled the creation of custom functions and the automation of operations. Accordingly, many research papers and commercial products related to software-defined storage have been proposed.
Yang et al.  proposed an integrated storage service. They used the open source software—OpenStack  to build and manage cloud services, and also used software to integrate storage resources, including Hadoop HDFS, Ceph and Swift on Open Stack to achieve an SDS design. Software users can integrate different storage devices to provide an integrated storage array and to build a virtual storage pool, such that the services provided for users are not limited by the storage devices. Our work primarily follows the concepts in , but we improve the system architecture and propose a mechanism to store data efficiently. In addition, we provide a new and more friendly user interface.
The EMC Virtualization Platform Reinvented (ViPR)  is a logical storage system, not a physical storage system. It can integrate EMC storage and third-party storage in a storage pool, and manage them as a single system while retaining the value of the original storage. ViPR can replicate data across different locations and data centers with different storage products, and provides a unified block store, object store, file system and other services. ViPR also provides a unified metadata service and self-service deployment, as well as measurement and monitoring services.
A file system architecture that efficiently organizes data and metadata and enables sharing in addition to exploiting the power of storage virtualization and maintaining simplicity in such a highly complex and virtualized environment was proposed by Ankur Agrrawal et al. . Tahani Hussain assessed the performance of an existing enterprise network before and after deploying distributed storage systems . Additionally, simulation of an enterprise network with 680 clients and 54 servers followed by redesigning the system led to improvements in the storage system throughput by 13.9%, a reduction in average response time by 24.4% and a reduction in packet loss rate by 38.3%.
Chengzhang et al.  proposed a solution for building a cloud storage service system based on the open-source distributed database. Dejun Wang  proposed an efficient cloud storage mode for heterogeneous cloud infrastructures, and validated the model with numerical examples through extensive testing. He also highlighted the differences in a cloud storage system using traditional storage. For example, the demand from the performance point of view, data security, reliability, efficiency and other indicators need to be taken into consideration for cloud storage services, which are services in a wide range of complex network environments designed to meet the demands of large-scale users.
In this section, we introduce the system architecture and the implementation, which adopts open-source software for better development and maintenance in the future. The integrated heterogeneous storage technologies employed in the system are useful and complete object storage systems. In addition, a graphical user interface is provided so that an administrator can change the parameters to make the system more flexible.
The cyan colored components are in charge of calculating hash in real time.
The pink colored components are in charge of indexing the hash of suffix and partition directories, receiving and sending requests to compare the hash of a partition or suffix and generating jobs replicating suffix directories to the replication queue.
The gray colored component, which is called the partition-monitor, is in charge of checking whether to move the partition at various intervals.
The green colored component, which is called the suffix-transporter, is in charge of monitoring the replication-queue and invoking rsync to sync the suffix directories.
The implementation of the proposed system consists of three main components, the storage service deployment, the file distribution mechanism and the user services. In the following subsections, each component will be introduced in detail.
The deployment of storage services
In the first part, we introduce the storage services. We create VMs that form a storage cluster. Then, we use the open source software OpenStack to build and manage the cloud system.
The mechanism of file distribution
Interpolations using cubic splines have been well studied in [29–31]. In , the basis of cubic spline interpolation was introduced. Miao et al.  employed the cubic spline method to predict the storage volume of a data center by interpolating the storage volume time series such that an entire time series with the same number as the former series can be reconstructed. In addition, Mastorakis  showed that the cubic spline method is well suited for application to the problem of anomaly detection in cloud environments. A cubic spline is a spline constructed of piecewise third-order polynomials that pass through a set of m control points. The second derivative of each polynomial is commonly set to zero at the endpoints, since this provides a boundary condition that completes the system of m-2 equations. This produces a so-called “natural” cubic spline and leads to a simple tridiagonal system that can be solved easily to give the coefficients of the polynomials.
\(f_t(S)\) represents the transfer speed obtained in the transfer speed experiment when the file size is S.
\(f_c(S)\) represents the transfer speed obtained in the storage capacity experiment when the file size is S.
\(\alpha\) and \(\beta\) are the weights, with default values of 0.5. The sum of these two weights equals one.
\(f_K(S)\) represents the resulting transfer speed of the storage service, which is used to compare the performance of the storage services.
In this section, we show the experimental results and the system implementation performance. We first perform efficacy experiments to demonstrate the benefits of our system infrastructure. Next, we measure the speed of each storage object. This measurement is the basis of the file distribution mechanism. Finally, we show the user interface for our system.
Setup of the experimental environment
Storage environment specifications
Performance evaluations of our system
According to the previous results from measuring the network bandwidth, if VMs are deployed on the same host, their bandwidths are almost the same. Thus, we select swift01, swift02, OpenStack compute01 and OpenStack compute02 for comparison of their disk writing and reading speeds. The results show that the VMs cannot take full advantage of the reading and writing resources and therefore require deployment of the storage system. These I/O tests can be used to debug and improve bottlenecks when problems are encountered. In addition, the experimental results for disk reading and writing speed help us decide on the number of VMs deployed on the physical machine and understand how best to deploy the storage cluster.
User interface design
As shown in Fig. 13, there are two panels in the system overview page. The two panels are used to show the storage usage percentages and the account list. We use three small liquid fill gauges to display the percentages for the total usage, the Swift usage and the Ceph usage. More detailed information is shown when the mouse moves over the liquid fill gauge, as shown in Fig. 13. In addition, there is a table that shows information for all the accounts when the user logs into the administrator mode.
The my storage page is the major operating part of our system. When the page is loaded, a file list is shown in the middle of the page and a drop down menu pops up when the right mouse button clicks a file name, as shown in Fig. 14. The drop down menu has four functions: download, delete, rename and detailed information. All functions related to the storage operations are displayed in this page.
Friendly user interface: a visualization of the upload progress is provided. This makes it easy for users to monitor and control their uploading jobs.
Supports the upload of multiple files: users can upload multiple files at the same time.
Background processing: users can upload their files in the background while accessing other functions simultaneously in the my storage page.
In this work, we implemented a cloud storage system by integrating the open source storage software to provide a software-defined storage service. In the system, we used the distributed cloud architecture to provide high reliable and scalable cloud services which integrate several software storage technologies. In addition, we provided an user interface with high usability to make the proposed system more user friendly. In the future, we plan to build a larger system with more VMs and integrating more heterogeneous storage technologies.
C-TY conceptualized the study and proposed the system design. S-TC implemented the system and wrote the manuscript. Y-WC wrote and revised the manuscript. Y-CS performed the experiments. All authors read and approved the final manuscript.
This work was supported in part by the Ministry of Science and Technology, Taiwan ROC, under Grant Numbers 106-2622-E-029-002-CC3, 107-2221-E-029-008, and 107-2218-E-029-003.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Zhou Z, Ota K, Dong M, Xu C (2017) Energy-efficient matching for resource allocation in d2d enabled cellular networks. IEEE Trans Vehicul Technol 66(6):5256–5268View ArticleGoogle Scholar
- Xu C, Gao C, Zhou Z, Chang Z, Jia Y (2017) Social network-based content delivery in device-to-device underlay cellular networks using matching theory. IEEE Access 5:924–937View ArticleGoogle Scholar
- Mo Y, Peng M, Xiang H, Sun Y, Ji X (2017) Resource allocation in cloud radio access networks with device-to-device communications. IEEE Access 5:1250–1262View ArticleGoogle Scholar
- Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. In: Proceedings of the 2008 grid computing environments workshop: 2008; Austin, USA, pp 1–10Google Scholar
- Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2009) The eucalyptus open-source cloud-computing system. In: Proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid: 2009; Shanghai, China, pp 124–131Google Scholar
- Satyanarayanan M, Bahl P, Caceres R, Davies N (2009) The case for vm-based cloudlets in mobile computing. IEEE Pervasive Comput 8:14–23View ArticleGoogle Scholar
- Buyya R, Yeo CS, Venugopal S (2008) Market-oriented cloud computing: Vision, hype, and reality for delivering it services as computing utilities. In: Proceedings of the 10th IEEE international conference on high performance computing and communications: 2008; Dalian, China, pp 5–13Google Scholar
- Kim H-W, Jeong Y-S (2018) Secure authentication-management human-centric scheme for trusting personal resource information on mobile cloud computing with blockchain. Human-centric Comput Inform Sci 8(1):11View ArticleGoogle Scholar
- Vernik G, Shulman-Peleg A, Dippl S, Formisano C, Jaeger MC, Kolodner EK, Villari M (2013) Data on-boarding in federated storage clouds. In: Proceedings of the 2013 IEEE sixth international conference on cloud computing: 2013; Santa Clara, USA, pp 244–251Google Scholar
- Kolodner EK, Tal S, Kyriazis D, Naor D, Allalouf M, Bonelli L, Brand P, Eckert A, Elmroth E, Gogouvitis SV, Harnik D, Hernandez F, Jaeger MC, Lakew EB, Lopez JM, Lorenz M, Messina A, Shulman-Peleg A, Talyansky R, Voulodimos A, Wolfsthal Y (2011) A cloud environment for data-intensive storage services. In: Proceedings of the 2011 IEEE third international conference on cloud computing technology and science: 29 Nov.-1 Dec. 2011; Athens, Greece, pp 357–366Google Scholar
- Rhea S, Wells C, Eaton P, Geels D, Zhao B, Weatherspoon H, Kubiatowicz J (2001) Maintenance-free global data storage. IEEE Internet Comput 5:40–49View ArticleGoogle Scholar
- Mesnier M, Ganger GR, Riedel E (2003) Object-based storage. IEEE Commun Mag 41:84–90View ArticleGoogle Scholar
- Mesbahi MR, Rahmani AM, Hosseinzadeh M (2018) Reliability and high availability in cloud computing environments: a reference roadmap. Human-centric Comput Inform Sci 8(1):20View ArticleGoogle Scholar
- Zhang Y, Xu C, Liang X, Li H, Mu Y, Zhang X (2017) Efficient public verification of data integrity for cloud storage systems from indistinguishability obfuscation. IEEE Trans Inform Forensic Sec 12(3):676–688View ArticleGoogle Scholar
- Ren Z, Wang L, Wang Q, Xu M (2018) Dynamic proofs of retrievability for coded cloud storage systems. IEEE Trans Serv Comput 11(4):685–698View ArticleGoogle Scholar
- Li Y, Feng D, Shi Z (2013) An effective cache algorithm for heterogeneous storage systems. Sci World J 2013:693845Google Scholar
- Lin W, Wu W, Wang JZ (2016) A heuristic task scheduling algorithm for heterogeneous virtual clusters. Sci Program 2016:7040276Google Scholar
- Callegati F, Cerroni W, Contoli C (2016) Virtual networking performance in openstack platform for network function virtualization. J Elec Comput Eng 2016:266–267View ArticleGoogle Scholar
- Yang C-T, Lien W-H, Shen Y-C, Leu F-Y (2015) Implementation of a software-defined storage service with heterogeneous storage technologies. In: Proceedings of the 2015 IEEE 29th international conference on advanced information networking and applications workshops (WAINA): 24-27 March 2015, pp 102–107Google Scholar
- OpenStack. https://www.openstack.org/ (2015)
- EMC ViPR. http://www.emc.com/vipr (2015)
- Agrrawa A, Shankar R, Akarsh S, Madan P (2012) File system aware storage virtualization management. In: Proceedings of the 2012 IEEE international conference on cloud computing in emerging markets (CCEM): 11-12 Oct. 2012; Bangalore, India, pp 1–11Google Scholar
- Hussain T, Marimuthu PN, Habib SJ (2013) Managing distributed storage system through network redesign. In: Proceedings of the 2013 15th Asia-Pacific network operations and management symposium (APNOMS): 25-27 Sept. 2013; Hiroshima, Japan, pp 1–6Google Scholar
- Peng C, Jiang Z (2011) Building a cloud storage service system. Procedia Environ Sci 10:691–696View ArticleGoogle Scholar
- Wang D (2011) An efficient cloud storage model for heterogeneous cloud infrastructures. Procedia Eng 23:510–515View ArticleGoogle Scholar
- OpenStack Swift. https://wiki.openstack.org/wiki/Swift (2015)
- Weil SA, Brandt SA, Miller EL, Long DD, Maltzahn C (2006) Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th symposium on operating systems design and implementation: 6-8 November 2006; Seattle, USA, pp 307–320Google Scholar
- Zheng Q, Chen H, Wang Y, Zhang J, Duan J (2013) Cosbench: Cloud object storage benchmark. In: Proceedings of the 4th ACM/SPEC international conference on performance engineering (ICPE 2013): 21-24 April 2013; Prague, Czech Republic, pp 199–210Google Scholar
- Knott GD (2012) Interpolating Cubic Splines. Springer, BerlinMATHGoogle Scholar
- Miao B, Dou C, Jin X (2016) Main trend extraction based on irregular sampling estimation and its application in storage volume of internet data center. Comput Intell Neurosci 2016:1–12View ArticleGoogle Scholar
- Mastorakis G (2015) Resource management of mobile cloud computing networks and environments. IGI Global, HersheyView ArticleGoogle Scholar