The Scary Truth Of How Ineffective Data Deduplication Can Wreak Havoc On Your Backup And Recovery Plans

2015-10-29 by Christophe Bertrand

Protecting and ensuring recovery for an ever-increasing volume of data while managing costs is a serious concern for companies of all sizes. It’s no surprise that backups may fail, take up too much space or cost too much, all of which can have a significantly negative impact an organization’s ability to recover and operate in the event of data loss.

Data deduplication (often referred to as “dedupe”) is one method to address the issue of exponential data growth. By identifying and eliminating portions of redundant information in a data set, deduplication technology can dramatically reduce an organization’s need for storage space and network bandwidth.

That said, many consider this technology standard within every backup and recovery solution - effectively viewing it as a checkbox on the list of product features. Here’s the truth: there’s a lot more than meets the eye, and overlooking this extremely valuable technology can leave you footing a much larger bill than you expected (and an unhappy CTO makes an unhappy IT department).

Before choosing a vendor that provides deduplication – either within a data protection platform or as a standalone product – it’s critical to understand the differences between the variations in technology.

Post-Process and Inline Deduplication:How well deduplication performs is largely based on whether it is “post-processed” or “inline.” Like its name says, post-process deduplication means that incoming data is first stored to disk and the data is processed for deduplication at a later time.  Alternatively, when data is processed for deduplication before being written to disk, this is called inline deduplication. Inline deduplication has the advantage of writing data to disk only once, and is commonly the preferred method of deduplication when compared to post-process deduplication, which requires extra storage space and writes to more disk.

Target-Side Deduplication:Target deduplication means that the full set of data is shared across the network and is deduplicated when it reaches the target deduplication appliance.  Target deduplication was the first method that achieved widespread success when combined with data protection.  On the other hand, source-side deduplication means that the process begins at the data source.

Source-Side Deduplication:The process of source-side deduplication entails new next generation backup servers that work in conjunction with agents installed on the clients (the “data source”). The client software communicates with the backup servers to compare new blocks of data and removes redundancies before the data is transferred over the network. Without having to check for duplicate data, this form of deduplication yields dramatic savings in terms of bandwidth, required storage and corresponding costs. Source-side deduplication is quickly replacing target deduplication as the preferred method because of its ability to back up only new and unique data at the source.

Global Source-Side Deduplication: This takes this the process of source-side deduplication a step further by sharing all of an organization’s deduplicated data intelligence across all source systems. Every computer, virtual machine or server that is backed up communicates with a backup server, which acts as the central data store and manages a global database index of files on all machines, everywhere. The backup server does the work of figuring out what needs to be backed up and pulls only new data as required, while eliminating duplicate copies. Don’t be fooled by data protection solutions that limit deduplication to a single storage volume or a single backup job.  Solutions that limit the scope of deduplication, such as Windows Server 2012, are doing so to limit the size of the hash database index.  True global deduplication works across the entire network for maximum results.

So how, exactly, does ineffective deduplication effect your backup and recovery plans?

As you’d imagine, solutions that provide global, source-side deduplication can yield tremendous operational efficiencies and tangible cost savings by reducing the amount of data that is backed up (by up to 92% with some solution providers). So if you’re not using effective deduplication technology, you’re potentially wasting significant money and resources to manage larger data sets than necessary. Further, it optimizes storage requirements and required bandwidth, while accelerating data protection and recovery. With a much-lower volume of data, the frequency of backups can then be increased to improve recovery point objectives.  And when that deduplication solution is seamlessly interwoven with data protection and recovery solutions, simultaneously managing and protecting impossibly huge volumes of data seems a little less daunting.

news Buffer

Christophe Bertrand

Christophe Bertrand, VP Marketing at Arcserve. Arcserve, a leading provider of data protection and recovery solutions featuring global source-side deduplication, is even sharing real customer data to demonstrate the superiority of this technology, and they’re asking their competitors to do the same. Learn more about Arcserve’s “Size Matters” challenge here: http://arcserve.com/blog/blog/news/show-us-dedupe-size-matters/

View Christophe Bertrand`s profile for more

Leave a Comment