Week 8 Posting - Data Deduplication

Something I have been curious about for a while, and it came up again this week, is data deduplication. For my on-premise data centers, we always run dedupe and compression on the systems to squeeze more out of the raw hardware. I wonder how much cloud service providers can deduplicate data across customers, though. Surely they have to be able to do it at scale to make the business model work in their favor.
Based on an article from Microsoft this week, some deduplication can get as high as 95% efficiency. This is a specific file type for consumers, but do CSPs run the same algorithms? There are also providers, like Box.com that don't even charge by how much data you store with them. It is all about user licenses with unlimited capacity to store data. I think they have to have some way of gaining massive optimizations across users to make that work. It must be about the scale at the end of the day.

Comments

Popular posts from this blog

Week 4 Posting - Subnetting in the Cloud

Week 6 Posting - Security Appliance in the Cloud

Week 5 Posting - VXLAN and Broadcast Security