Data Deduplication is the best feature in Server 2012/2012 R2. For any shop, it provides a huge benefit for 5 minutes of work! When configured, data deduplication will analyze files for duplicate chunks, remove the duplicate portions, and reference files to a single specially stored copy. This can give you some amazing space savings!
Data deduplication is useful for three main scenarios:
- Folder Redirection/Home Folders/User Share: imagine all of the PDFs that are emailed out and saved into each user’s documents.
- Software Distribution/Application shares: a lot of software share the same components.
- VDI: If you have 100 VMs running the same OS, the disk space saved with data deduplication would be insane! Microsoft saved close to 90% in their environment!
In this guide, we will setup data deduplication and learn some best practices for integration.
How to Configure Data Deduplication in Server 2012/2012 R2
First, data deduplication can not be configured on a system or boot volume. With that out of the way, pick a machine running Server 2012 or higher. In Server Manager, launch the Add Roles and Features Wizard. Under Server Roles, expand File and Storage Services – select the Data Deduplication role and finish the wizard. This role does not require a reboot.
Select the File and Storage Services node in Server Manager and then select volumes. Right click on data volume and select Configure Data Deduplication…
Change the type from disabled to General purpose file server (or check the enable box if you are on Server 2012). Leave the deduplicate files older than setting at the default value. You may want to exclude certain folders/files from deduplication. For example, SCCM 2012 requires a few folders to be excluded.
Select Set Deduplication Schedule and check the Enable throughput optimization box. Adjust the duration so that the optimization is outside of your work hours – the server will be taxed while this optimization occurs. Use Server Manager Performance Counters to keep an eye on resources during optimization.
Depending on your server size, it may take a day or two for volumes to be completely optimized. If you are wanting to see results quickly, you have two options:
- Launch DDPEval.exe (ex: ddpeval.exe \\Server-01\Data\). This tool in is the System32 folder on any server with data dedupe installed on it. You can copy this EXE to any 2008R2+ machine to evaluate potential savings.
- Start a dedup job with PowerShell. The following command will dedup volume D and consume up to 50% of the server’s RAM: Start-DedupJob D: -Type Optimization –Memory 50
That wraps up our data deduplication guide. If you wish to learn more, the links below will help:
Fantastic guide. Just built a new File Server migrating from 2008 -> 2012R2. Migrated JUST our HomeDrives and so far its saved me 90GB (19% deduplication rate)… What will happen when I move over our central share? I’ll post the results then!
Awesome news, Chris!
Awesome tip, thanks! I’d like to use this along with BranchCache to save space at the main office and speed things up for a branch office. Any conflict with using the two simultaneously? Do they work well in tandem?
Should not be any issues with combining them.
We had enabled this on a file server and it saved a ton of space. Tested with Acronis both at the file level and hyper-v vm restore level and both worked. However, restoring individual files to a different location on a different server/workstation that did not have deduplication enabled did not work. The files were corrupted and inaccessible. So dedup can certainly cause *some* issues with restores though your two primary restore jobs will work fine (file to original location, full vm). We didn’t do a lot of testing when we found the issue, but I wanted to throw my 2c in that there can be issues with Acronis Backup for HV.
Thanks for your tips Jason! I saw a corrupt restore of a pst file yesterday. I don’t believe data dedupe had anything to do with it (as the file was constantly being written to) but I am going to check the way I restored it.
This killed a bunch of little application databases and interfered with my backup. Spent the next two hours undoing what it did.
Application databases are not good candidates for dedup. Those files change way too often. I would limit dedup to volumes containing the three types of data listed above (home folders/user shares, software distribution shares, VDI VHDs on 2012 R2).
Here are the general candidate guidelines from TechNet:
Great candidates for deduplication:
◦ Folder redirection servers
◦ Virtualization depot or provisioning library
◦ Software deployment shares
◦ SQL Server and Exchange Server backup volumes
◦ VDI VHDs (supported only on Windows Server 2012 R2)
Should be evaluated based on content:
◦ Line-of-business servers
◦ Static content providers
◦ Web servers
◦ High-performance computing (HPC)
Not good candidates for deduplication:
◦ Hyper-V hosts
◦ WSUS
◦ Servers running SQL Server or Exchange Server
◦ Files approaching or larger than 1 TB in size
◦ VDI VHDs on Windows Server 2012
I am so dumb! I skipped right over the scenario section and just rolled this out without testing… thanks for pointing me in the right direction and for putting together guides like this!
And if the application databases are on a volume that you really want to dedup, you can exclude those files/folders from the process by using exclusions.
Does this also work with virtualized servers (Hyper-V)? Or does this require additional requirements of the SAN environment?
Works on VMs and Physical machines. And no additional requirements.
Also,
Any negatives from using block level backup solutions like Shadowprotect or Acronis? Wonder how that would work with Acronis since it does dedup as well during the backup.
According to Microsoft, “Block-based backup applications should work without modification, and they maintain the optimization on the backup media.”
I would imagine that Acronis would just not have anything to dedup. You may even be able to cut your backup times by turning off dedup in Acronis during the backup.
Any negatives you experienced? I thought most dedup systems had extremely high memory requirements?
Other than the CPU/Memory hit during the optimization timeframe, I haven’t see any additional load on our servers. It is a post processing dedup job – new file writes are not altered at all. Files aren’t even touched until they are X days old.
I would advise that you check out your performance stats first and ensure that you aren’t close to maxing out memory/CPU on a daily basis.