Tactical Advice

De-Duplication Is the New Word in Backup Technology

This story appears in the December 2007 issue of BizTech Magazine.

Critical data is growing at an exponential rate, and tapes are no longer the only option for backup. Data de-duplication technology (also called data reduction or commonality factoring) allows users to store more information on fewer physical disks than has been possible in the past, making the cost of disk backup competitive with tape.

“Although the technology is fairly new, de-duplication is becoming widespread,” says Stephanie Balaouras, an analyst at Forrester Research. “Right now, disk space is three to four times as expensive as tape, but de-duplication can reduce data that needs to be backed up by a ratio of 20 to one. The big question is whether this technology is what puts the last nail in the coffin of tape backup.”

As the name suggests, the goal of de-duplication is to eliminate redundant data from backups. The technology replaces duplicate copies with much smaller pointers to a shared record. This can take place at the level of either whole records or smaller unique data segments.

For example, if someone e-mails a 10-megabyte Excel file to 10 people on a network and each of them stores it, that translates into 100MB of backup disk space without de-duplication. With whole-record de-duplication, one copy would be stored along with 10 reference pointers. If, however, one of the users changes the name of the file or alters the contents in even the slightest way, the entire copy will be backed up. Using sub-record level de-duplication, only the changes to the altered file would be saved, with pointers to the original. Both de-duplication methods are usually used in conjunction with the traditional compression algorithms — standard backup tactics that reduce the space consumed on the backup disk.

The trend is toward subrecord level de-duplication. A wide range of systems that provide de-duplication are already available, such as Quantum’s DXi hardware or Cybernetics’ iSCSI SAN and software such as Veritas NetBackup PureDisk. Along with dramatically reducing backup storage space consumption, these technologies cut restore time and eliminate the need to wade through incremental backup tapes. Most systems allow users to restore back to a specific date and time, and some make decentralized backups possible.

Proceed With Care

Balaouras warns that while data de-duplication is fast becoming a standard feature in backup systems, the technology is new enough, and there are enough variations among applications, that buyers should proceed with care. A key distinction is whether the data reduction takes place at the source (the backup server) or the target (a virtual tape library or disk appliance). Source-based processing uses much less bandwidth and provides for either local or global backup, but it often requires users to replace their current backup systems or run one system for central office backup and another for remote locations.

Whether de-duplication occurs during or after data are processed is also a serious concern. Data reduction is very CPU-intensive and can slow down the backup. Performing the de-duplication, after an initial backup has been completed, however, requires more disk space and means that the data reduction must be completed before the next scheduled backup.

Scalability and data integrity issues raised by the number of times the data is processed by de-duplication and checking algorithms in most systems are also issues users should investigate before they buy, says Balaouras. But de-duplication is here to stay, and it’s accelerating movement toward disk backup, especially among SMBs without large investments in legacy tape systems.

“Tape will be around for a while — for one thing it’s got a better power and cooling profile than disk, and that’s important in today’s data center,” says Balaouras. “But data de-duplication is a reality — it will take some time to sort out the approaches, but it definitely changes the comparison with tape.”

IT Takeaway

To narrow your options, consider the following criteria:

• Location of the de-duplication — backup source or target
• Data integrity
• Scalability
•Maturity of the vendor offering (Some systems have included de-duplication for several years, but in others it’s a new feature.)
Jeff Gross is an IT manager at Tucker Industries in Bensalem, Pa.
Sign up for our e-newsletter

Security

Review: Belkin Advanced Secu... |
This tool can prevent KVM toggling from being a source of network vulnerabilities.
Honeywords: Password Securit... |
Researchers are proposing a new method of spiking the password punch as a way to identify...
How Many Vulnerabilities Doe... |
The potential for damaging data breaches lurks in nearly every corner for SMBs.

Storage

EMC World 2013: Software-Def... |
Storage virtualization is a key element of providing on-demand, flexible cloud services.
How Steve Wozniak Explains V... |
Fusion-io's chief scientist breaks virtualization down into terms everyone can understand.
Product Review: Quantum NDX-... |
Device does double duty for storage and backup.

Infrastructure Optimization

Why More Software Is Headed... |
Many of your favorite software suites are trading in their shiny discs for cloud-based...
Cisco Live 2013: Brush Up wi... |
Get up to speed on convergence, wireless networking, collaboration and more ahead of the...
EMC World 2013: Software-Def... |
Storage virtualization is a key element of providing on-demand, flexible cloud services.

Networking

How to Secure Optimized Netw... |
WAN optimization and security aren’t always complementary. These tips can help you deal...
Cisco Live 2013: Brush Up wi... |
Get up to speed on convergence, wireless networking, collaboration and more ahead of the...
Do Virtual Meetings Boost Pr... |
New study finds that face-to-face meetings don’t always work in workers’ favor.

Mobile & Wireless

Consumr App Powers Informed... |
Reviews and ratings for products on the shelf are only a barcode scan away.
Faster In-Flight Wi-Fi: Com... |
The FCC is working on regulation to free up more Internet bandwidth for air travelers.
CTIA: Wireless Network Data... |
The invisible bytes that zip through the air continue to multiply at rapid rates.

Hardware & Software

Consumr App Powers Informed... |
Reviews and ratings for products on the shelf are only a barcode scan away.
Review: Belkin Advanced Secu... |
This tool can prevent KVM toggling from being a source of network vulnerabilities.
How Many Vulnerabilities Doe... |
The potential for damaging data breaches lurks in nearly every corner for SMBs.