Implement Efficient Data Storage Measures
Data growth is an inescapable trend: in 2014 IDC and InformationWeek predicted a doubling of volume about every three years through 2020.1 Most strategies for efficient data storage take advantage of one or more of the following concepts, explored in greater detail below:
- Making better use of existing storage hardware
- Reducing the volume of data to be stored
- Using storage equipment that consumes less energy.
Make better use of existing storage hardware
- Storage Tiers
Tiered storage is the assignment of different categories of data to various types of storage media, with the goal of reducing total storage cost. Tiers are determined by performance needs, the cost of the storage media, and how often data is accessed. Tiered storage policies place the most frequently accessed data on the highest performing storage. Rarely accessed data goes on low-performance, cheaper storage.2
Generally speaking, lower performance storage (e.g., lower speed drives, smaller drives, MAID, tape) uses less electricity. To save energy and reduce costs, use high-speed drives only where necessary, and use slower drives for applications that do not require instantaneous response. Storage tiers can also have different levels of data protection. For example, it might be prudent to keep multiple copies of mission critical data, but this requires more hardware. Less important data can be copied just once onto on slow-spinning hard disk drives (HDDs), recordable compact disks, or tapes.
Automated storage tiering (AST) is a storage software management feature that dynamically moves data between different disk types and backup levels to meet capacity, performance and cost requirements. 32% of data center administrators use automated tiering.3
- Storage virtualization
Storage virtualization is the pooling of physical storage from multiple network storage devices into what appears to be a single storage device that is managed from a central console. Storage virtualization enhances storage performance, enables the use of storage tiers, and makes it easier to expand storage capacity. - Software-defined storage (SDS) environments extend storage virtualization to include features such as deduplication, thin provisioning, snapshots and more. These features are explained below. Data centers with higher storage utilization (meaning they store more data on the average piece of hardware) need less storage equipment overall, so they need less electricity for equipment and for cooling.
- Thin Provisioning
Thin provisioning is a method of optimizing the efficiency with which available storage space is utilized. In the past, storage space was allocated beyond current needs in anticipation of data growth and increased data complexity. But because applications can suffer performance issues when their storage limits are exceeded, the result was “over provisioning” of storage. Large amounts of storage were brought online but often never fully utilized.
By contrast, thin provisioning presents an application with a virtual volume of just about any size, but allocates physical storage space on a just-enough, just-in-time basis by centrally controlling capacity and allocating space to an application as data is actually written. Thus you can allocate space for an application with data storage needs that you expect to grow in the future, but only power up storage that is currently in use. A recent survey revealed that 40% of IT administrators used thin provisioning.4 Gmail offers a good example of thin provisioning. Every Gmail account has a large amount of allocated capacity but, because most Gmail users only use a fraction of the allocated capacity, this "free space" is "shared" among all Gmail users.
Reduce the volume of data to be stored
- Data Compression
Compression is a reduction in the number of bits needed to represent data.5 Compression is performed by software that uses a formula or algorithm to determine how to shrink the size of the data. Compression functionality is built into a wide range of technologies, including storage systems, databases, operating systems and software applications used by enterprise organizations. A 2014 survey revealed that 60 percent of data center administrators use some form of data compression.6
Compressing data can save storage capacity, speed file transfer, and decrease costs for storage hardware and network bandwidth. The main disadvantage of compression is the performance impact and energy consumption resulting from the use of CPU and memory resources to compress and decompress the data.7 Files that are rarely accessed are better candidates for compression than files that are regularly accessed.
Be aware that some types of data files are already compressed (e.g., JPEG, MPEG and MP3, etc.) and that data should be compressed before encryption on writes and decrypted before decompression on reads.
- Deduplication
Deduplication software works by retaining one unique instance of a file or data block and replacing all duplicates with a pointer to the original. For example, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. When the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB of storage space. With data deduplication, only one instance of the attachment is actually stored. Each subsequent instance is just referenced back to the one saved copy. In this example, a 100 MB storage demand could be reduced to only one MB.8
A 2014 survey revealed that 55 percent of data center administrators use some form of deduplication technology.
- Snapshot technology
A snapshot is the state of a computer system at a particular point in time. To understand how snapshot technology can save energy, consider some of the challenges that backing up data can present to system administrators. Each full backup of a large data set requires a large amount of storage media, and may take a long time to complete. If any user changes data while it is being backed up, problems can arise. For example, if a user moves a file into a directory that has already been backed up, then that file would be completely missing on the backup media, since the backup operation had already taken place before the addition of the file. One approach to safely backing up live data is to temporarily disable write access to data during the backup.
High-availability (e.g., 24/7) systems, however, cannot tolerate such service stoppages, even for something as important as backups. To avoid downtime, high-availability systems may instead perform the backup on a snapshot—a read-only copy of the data set frozen at a point in time—and allow applications to continue writing to their data.9 Snapshots create temporary virtual “copies” of data that only include data changes. (This is why snapshots are sometimes referred to as “delta snapshots.”) Snapshots improve storage efficiency because they require only a fraction of the disk space that an identical copy would require. The average disk space requirements for a snapshot copy are 10% to 20% of the base volume space.
A 2014 survey revealed that 62 percent of data center administrators use some form of storage-based snapshots.
Use storage equipment that consumes less energy
- ENERGY STAR certified storage equipment
Data storage products that qualify for the ENERGY STAR are made by leading OEMs. They usually cost and perform the same (or better) than standard products, but they are designed and/or constructed to save energy. Advantages include more efficient power supplies and features such as compression, deduplication, and snapshots. Find certified data storage products here. - Lower-speed hard drives
Higher spin speeds on high performance hard disk drives (HDDs) – for example, 15,000 revolutions per minute (RPM) -- mean faster read/write speeds. All things being equal, energy consumption is proportional to the cube of disk spin speed. To boost energy-efficiency, consider slower HDDs (e.g., 7,500 RPM) for applications where slower read/write speeds won’t adversely impact operations. - Massive Array of Idle Disks (MAID)
MAID (massive array of idle disks) is a storage technology in which only those disk drives in active use are spinning at any given time. In other words, it powers up individual hard disks only when an application needs to access the data stored on that disk. MAID reduces power consumption and prolongs the lives of the drives.
A MAID system can have hundreds or even thousands of individual disk drives. MAID is often a good solution for “tier 3” data storage (data accessed infrequently). About 20% of data center administrators reported using MAID systems in a 2014 survey by InformationWeek.10
- RAID level
RAID (redundant array of independent disks) is a data storage technology that combines multiple disk drive components into a single logical unit. Different “RAID levels” are defined based on the level of redundancy and performance that they offer. For example, RAID 1 creates a duplicate copy of data, but also doubles hardware and energy consumption. By contrast, RAID 5 guards against a single disk drive failure in a RAID array by reconstructing the failed disk information from data distributed on the remaining drives. The tradeoff in performance and data security makes RAID 5 a good choice for storing data that is not mission critical.
- Scale-out storage
Scale-out storage is a data storage architecture in which the total amount of disk space can be expanded through the addition of new hardware. In a scale-out system, new hardware can be added and configured as the need arises. When a scale-out system reaches its storage limit, another storage array can be added to expand the system capacity.
Before scale-out storage became popular, enterprises often purchased storage arrays much larger than needed in order to ensure that plenty of disk space would be available for future expansion.11 If that expansion never occurred or requirements turned out to be less than anticipated, much of the originally purchased disk space went to waste, consuming energy all the while.
- Solid-state drives (SSDs)
Like a memory stick, there are no moving parts in an SSD; data is stored in microchips. By contrast, a hard disk drive uses a mechanical arm with a read/write head to move around and read data from various locations on a spinning storage platter. This difference is what makes an SSD so much faster and better performance per watt than a hard disk drive. SSDs also generate less heat, which can reduce data center cooling costs.
However, the speed and energy-efficiency of SSDs comes with one main tradeoff: they are considerably more expensive per gigabyte than hard-disk storage. - Tape storage.
When data is not being written or accessed, tape storage consumes no energy. In addition, the rise of ransomware (a type of malicious software that encrypts your data until a sum of money is paid) has renewed interest in old-fashioned tape backup systems. This is because tapes do not allow direct data access, and thus provide protection against ransomware.12
Savings & Costs
- Storage virtualization
Storage utilization, which typically averages 30-50% in a non-virtualized environment, can reach over 80% utilization with storage virtualization.13
- Deduplication
Deduplication software can condense the amount of data stored at many organizations by as much as 95%. (See Table 1.) Storing less data requires fewer hardware resources, which in turn consume less energy.
Scenario | Content | Typical Storage Space Savings |
---|---|---|
User documents | Documents, photos, music, videos | 30-50% |
Deployment shares | Software binaries, cab files, symbols files | 70-80% |
Virtualization libraries | Virtual hard disk files | 80-95% |
General file share | All of the above | 50-60% |
Table 1: Typical Windows Server 2012 data deduplication savings for various content types.14
- RAID level
Because it requires only one extra “redundant” disk, RAID 5 saves energy compared to RAID 1. For example, moving data from a 20-disk RAID 1 array to an 11-disk RAID 5 array would reduce storage energy use by 9 / 20 = 45%.
Tips & Considerations
Energy efficiency has become important in data centers, but managers tend to focus their efficiency efforts on HVAC and airflow. Raise awareness about opportunities to save energy with data storage equipment and add it to the list of criteria by which your organization evaluates new data center purchases.
1 2014 State of Enterprise Storage, by Kurt Marko, InformationWeek Reports, February 2014.
2 Tiered Storage, by Margaret Rouse, TechTarget, August 2016. http://searchstorage.techtarget.com/definition/tiered-storage
3 2014 State of Storage by Kurt Marko, InformationWeek Reports, February 2014.
4 2014 State of Storage by Kurt Marko, InformationWeek Reports, February 2014.
5 Compression, by Margaret Rouse, TechTarget, March 2015. http://searchstorage.techtarget.com/definition/compression
6 2014 State of Storage by Kurt Marko, InformationWeek Reports, February 2014.
7 Compression, by Margaret Rouse, TechTarget, March 2015. http://searchstorage.techtarget.com/definition/compression
8 Data Deduplication (Intelligent Compression or Single-instance Storage) by Margaret Rouse, TechTarget, July 2010. http://searchstorage.techtarget.com/definition/data-deduplication
9 Snapshot (computer storage), Wikipedia. https://en.wikipedia.org/wiki/Snapshot_(computer_storage)
10 2014 State of Storage by Kurt Marko, InformationWeek Reports, February 2014.
11 Scale-out Storage, by Margaret Rouse, TechTarget, February 2016. http://whatis.techtarget.com/definition/scale-out-storage
12 2014 State of Enterprise Storage, by Kurt Marko, InformationWeek Reports, February 2014.
13 Implementing the IBM SAN Volume Controller and FlashSystem 820, by J. Tate, D. Bryant, C. Burns, J. M. Leite, and D. Senin, IBM Redbooks, September 2013, p. 3.
14 Configuration Manager Distribution Points and Windows Server 2012 Data Deduplication by Yvette O'Meally, Microsoft Enterprise Mobility and Security Blog, February 18, 2014. https://blogs.technet.microsoft.com/enterprisemobility/2014/02/18/configuration-manager-distribution-points-and-windows-server-2012-data-deduplication/