Deduplication FAQ

What is data deduplication?

Data deduplication is a data compression technology that reduces data volume by identifying and eliminating redundant data. Early technologies for single-instance storage, based on file-grain deduplication, have largely disappeared in favor of block-grain deduplication, where a file is composed of multiple blocks. Each block of a file is compared to known blocks. If a block has been previously stored, the block is simply referenced rather than stored again. Each block, stored only once, is compressed using other encoding technologies.

What is single-instance storage (SIS) and how is it different from deduplication?

SIS is "file-grain" deduplication. Duplicate copies of the same file, whether or not with the same name, are detected and only one copy of the file is stored on disk. By comparison, block-grain deduplication operates within files and ensures that only unique blocks are stored. Block-grain deduplication delivers greater compression ratios than SIS. For example, if a single word is added to a document, SIS will not recognize any redundancies and will store the entire file as a "new" file. Block-grain deduplication will only store the newly created blocks.

Why is data deduplication important for backup?

Data deduplication is especially important for backup because large fractions of data in a typical backup set are duplicates. Many IT shops perform regular full-disk backups of desktop and laptop computers to ensure rapid recovery ("disaster recovery") in case of system loss. Many files (e.g. system files, email attachments) in any IT shop are shared across many computers. Data deduplication allows each file to be backed up only once. Where multiple, slightly-different versions of the same file exist, block-grain deduplication allows only unique blocks to be stored.

Why is source-side data deduplication important for backup?

Source-side data deduplication is important in backup applications because it accelerates the backup process. Faster backups permit shorter backup windows and make backups less intrusive in business operations. Source-side deduplication accelerates backups by reducing network traffic because only unique blocks are transferred over the network. An important side effect of deduplication is a reduction in the storage required for backups.

What does "replication" have to do with deduplication?

Replication of backup sets across a WAN is a useful alternative to transporting tapes by truck. Deduplicating backup data before replication can significantly reduce the time necessary to move data across a network. Backups to the cloud are another example of where dedupe can greatly shorten backup windows.

How does deduplication work in virtualized environments?

Deduplication is especially beneficial in virtualized environments due to the high levels of redundant operating system and application code and data. This redundancy exists both within a single virtual machine image and across images. Deduplication is appropriate even when technologies like VMware's changed block tracking (CBT) are employed.

How does deduplication work with encrypted data?

With source-side deduplication, data are first deduplicated on the source platform. Next, unique blocks and block references are encrypted before being sent over the network.

What impact will deduplication have on backup performance?

Arkeia is acquiring Kadena Systems to accelerate backup and restores. Source-side dedupe technology improves performance in two ways.

  • Deduplication accelerates backups and restores by reducing network traffic. If a block to be backed up is already known to the backup server, the block doesn't have to be transferred over the network.
  • Deduplication accelerates restores by allowing more backup sets to be stored on disk, for a given volume of disk storage.

While virtually all backup jobs will benefit from source-side deduplication, an administrator can specify clients for which data should not be deduplicated at the source. In this case, the data can either be deduplicated at the media server (i.e. the target) or simply backed up without deduplication. A single Arkeia backup job can mix all three types of backups.

Will the version of Arkeia Network Backup with deduplication be backwards-compatible with previous versions of Arkeia Network Backup? Will I have to install new agents?

Deduplication done at the target will be backwards compatible with existing Arkeia Network Backup Agents. Source-side deduplication will require agents to be updated to version 9 of Arkeia Network Backup.

Will deduplication be available as an appliance or software?

Deduplication will be available on backup servers deployed as an appliance, a virtual appliance, and as traditional software. It is possible to upgrade any Arkeia appliances under maintenance to version 9.0 firmware so current appliance customers can benefit from source-side deduplication. No hardware upgrades are necessary because source-side deduplication leverages the processors at the client computer to compress data before it travels over the network.