Friday, February 5, 2010

Managing Information Over Years

Today, data centers are facing an indomitable challenge of provisioning colossal capacity of storage space at an affordable price yet meeting ever-increasing performance demands. The biggest threats that data centers and the storage industry are facing today are power consumption, housing space and environmental concerns. However, many administrators and planners fail to recognize the fact that the value of data to an organization decreases over time, as it loses its relevance, freshness, and �popularity�. One question that administrators should be asking themselves is: why should data that is decreasing in value remain in expensive front line storage, subject to the same backup, replication, and recovery policies and procedures as key data? Would it not be useful to have a system or methodology in place for analyzing and tracking data freshness, so that storage space could be made free for more fresh and relevant data, and time/ bandwidth consuming data protection policies be relaxed as data loses its value?

Direct Hit!

Applies To: Database managers
USP: Learn to manage information based on relevance to the organization
Primary Link: http://bit.ly/8wdR71
Search Engine Keyword: information lifecycle management

The noteworthy leap that the storage industry is forced to take in this regard is Storage Tiering. Here, the capacity to be provisioned is divided into separate pools of storage space with various cost/ performance attributes. At the top resides the Tier 1 pool, which is the most expensive but high performing nonetheless. The bottom tier is occupied by more cost-effective storage arrays. The next challenge is to devise a sophisticated software layer that intelligently places data into the different tiers according to their value. This concept is variously known as data classification or Information Lifecycle Management (ILM).

What is ILM?
ILM is a concept that encompasses the discovery, classification, analysis, and maintenance of data, across the entire period of its useful life. It adds structure and context to data, marking the transition from data to information. ILM is a part of the larger concept of Business Continuity Planning, but has become increasingly prominent in the storage arena in recent years thanks to several factors, including advancements in data storage management techniques and the technology that underpins it, and evolution in the storage environment, including:

  • Coexistence of Fibre Channel and iSCSI (IP-Storage) in the data center

  • SAS and SATA storage coexisting in storage systems. Storage consolidation practices, for reducing the use of solitary �islands of data� in direct attached storage (DAS)

  • Regulatory requirements for data archiving and recall (SOX, etc.)

Though many vendors offer ILM services or modules as part of their products, ILM is above all a concept or a strategy, rather than a product. However, for a practical explanation of what the concept embodies, we can safely generalize that many implementations of ILM encompass such components as:

  • Database Management

  • Storage System Performance and Monitoring

  • Storage Capacity Planning and Management

  • Business Controls for Data Degradation and EOL

How is this done?
In a tiered storage system, storage is not merely seen as a container of data. Another important dimension of intelligence is appended to every block, transitioning blocks of data into blocks of information.

Data + Intelligence = Information

This intelligence associated with every block of data, forms very vital metadata, which automatically tracks the access patterns to these blocks. Therefore, data is first classified, then moved at the block-level from tier to tier, based on frequency of access. At the peak of its popularity, data is stored in the fastest, most responsive top-tier storage on hand and subject to the most stringent replication and backup controls. Since the ILM system is constantly monitoring the data's value in comparison to other data, as it loses value, it is migrated down the chain to less expensive, less powerful storage, where it may not be accessed as frequently, or protected as carefully. In the final stage, it is migrated out of the storage system completely. Data of the lowest value is either purged from the system or transferred to other media (eg, written to tape and delivered to offsite storage) depending on the organization's policy and regulatory requirements for data end-of-life.

Why ILM?
Having examined how an ILM system can be implemented, we should next look more closely at the reasons why more and more organizations are accepting the need for a comprehensive ILM strategy.

Exponential growth of data
With data growth averaging near 80% to 100% every year, managing storage effectively has become a challenging task. Storage administrators face limited budgets, and are charged with not only expanding capacity by purchasing new hardware wisely to meet projected storage needs, but also optimize the use of existing capacity, in order to maximize the investment in current storage hardware. Moreover, any changes or additions need to be considered carefully, as the downstream effects of new hardware are often unforeseen, and can quickly wipe out any short term cost gains.

Data accessibility/freshness
As mentioned at the beginning of this article, data does not have a constant value; rather that value is changing, whether it is due to time, relevance, security, or popularity. Policies and procedures must therefore be set in place to continuously shift and monitor the location (and therefore the accessibility) of data so that information that is highest in demand is in the most accessible location.

Carbon dioxide emissions of traditional storage servers versus tiered storage servers

Cost (TOC) issues
The overall cost of a storage system is measured not just in the initial price paid for the hardware and its commissioning. The total operating cost (TOC) includes maintenance, power and cooling expenses, together with the cost to staff and train administrators. As storage arrays grow, power usage (for server operation and cooling) is just one factor that has an enormous impact on the TOC of a storage solution. If less expensive solutions are available, administrators should by all means devise a careful plan to incorporate these components, with some restrictions. When possible, additional storage technology should be adopted that does not require significant investment of time and resources to learn its operation. New solutions that are more power or space efficient should be integrated into the array.

Ability to protect and recover lost data
Because key data has to be protected against loss to ensure business continuity, the term Continuous Data Protection has come into being. It describes a scheme of ensuring data survival in the face of disasters such as power/network outages and natural catastrophes, and incorporates techniques such as backups, data snapshots and remote replication to do so. To add to the challenges surrounding data protection, regulatory requirements for the preservation and archiving of several types of corporate data continue to mount.

Data of a particularly sensitive or critical nature must be available for recall within clearly established time limits if circumstances demand it, and kept secure as well. Therefore a successful ILM implementation integrates well with the backup solution and recovery solution of an organization along several touch points. ILM dictates that as items age they can be taken offline completely and migrated to tape storage, for example, yet some data still must be available for recall, even at this point. Since only a percentage of data has to be protected in the same manner, the ILM solution must be flexible enough to manage varying CDP requirements.

Green data centers
As mentioned earlier, one of the primary challenges facing data centers today is the amount of power consumption. Thus, while the initial cost of acquisition of the storage might have been low, the higher cost of power consumption and cooling means that the TCO is very high. In addition to the tangible financial burdens this adds, the other, often intangible, concern in such a data center is its environmental impact. Today, global warming and pollution are major hazards that cannot be ignored. There are both regulatory as well as financial incentives to reducing carbon dioxide emissions, which often result in a direct cost saving due to increased carbon credits.

Conclusion
Storage Tiering in enterprise-class storage is becoming a highly desirable feature today. It is only a matter of time before the cost, environmental and performance benchmarks of a tiered system become critical parameters on which decisions of storage system procurement will be based.

Tiered storage servers implementing ILM offer a greater cost advantage and performance. It is important to realize that with storage servers with Tiered Storage and ILM enable data centers to reduce footprint, electricity costs, and CO2 emissions, for the creation of a greener and more eco-friendly data center.

No comments:

Post a Comment