Erasure Coding has become a hot topic in the Hyperconverged Infrastructure (HCI) world since Nutanix announced its implementation (EC-X) in June 2015 at its inaugural user conference and VMware have followed up recently with support for EC in its 6.2 release for All-Flash deployments.
As this is a new concept to many in the industry there have been a lot of questions about how it works, what are the benefits and of course what are the trade offs.
In short, regardless of vendor Erasure Coding will allow data to be stored with tuneable levels of resiliency such as single parity (similar to RAID 5) and double parity (similar to RAID 6) which provides more usable capacity compared to replication which is more like RAID 1 with ~50% usable capacity of RAW.
Not dissimilar to RAID 5/6, Erasure coding implementations have increased write penalties compared to replication (RF2 for Nutanix or FTT1 VSAN) similar to RAID 1.
For example, the write penalties for RAID are as follows:
- RAID 1 = 2
- RAID 5 = 4
- RAID 6 = 6
Similar write penalties are true for Erasure coding depending on each vendors specific implementation and stripe size (either dynamic/fixed).
I have written a number of posts about Nutanix specific implimentation, for those who are interested see the following deep dive post:
VMware has also released a post titled The Use Of Erasure Coding In VMware Virtual SAN 6.2 covering their implementation of Erasure Coding by Christos Karamanolis.
The article is well written and I would like to highlight two quotes from the post which are applicable to any implementation of Erasure coding, including Nutanix EC-X and VSAN.
Erasure Coding does not come for free. It has a substantial overhead in operations per second (IOPS) and networking.
In conclusion, customers must evaluate their options based on their requirements and the use cases at hand. RAID-5/6 may be applicable for some workloads on All-Flash Virtual SAN clusters, especially when capacity efficiency is the top priority. Replication may be the better option, especially when performance is the top priority (IOPS and latency). As always, there is no such thing as one size fits all.
Pros of Erasure Coding:
- Increased usable capacity of RAW storage compared to replication
- Potential to increase the amount of data stored in SSD tier
- Lower cost/GB
- Nutanix EC-X Implementation places parity on capacity tier to increase the effective SSD tier size
Cons of Erasure Coding:
- Higher write overheads
- Higher impact (read) in the event of drive/node failure
- Performance will suffer significantly for I/O patterns with high percentage of overwrites
- Increased computational overheads
Recommended Workloads to use Erasure Coding:
- Write Once Read Many (WORM) workloads are the ideal candidate for Erasure Coding
- File Servers
- Log Servers
- Email (depending on usage)
As many of the strong use cases for Erasure coding are workloads not requiring high IO, using Erasure Coding across both performance and capacity tiers can provide significant advantages.
Workloads not ideal for Erasure Coding:
- Anything Write / Overwrite Intensive
This is due to VDI typically being very write intensive which would increase the overheads on the software defined storage. VDI is also typically not capacity intensive thanks to intelligent cloning so EC advantages would be minimal.
Regardless of vendor, all Erasure Coding implementations have higher overheads than traditional replication such as Nutanix RF2/RF3 and VSANs FTT1/2.
The overheads will vary depending on:
- The configured parity level
- The stripe size (which may vary between vendors)
- The I/O profile, the more write intensive the higher the overheads
- If the striping is performed in-line on all data or post process on write cold data
- If the stripe is degraded or not from a drive/node failure
The usable capacity also varies depending on:
- The number of nodes in a cluster which can limit the stripe size (see the next point)
- The stripe size (dependant on number of nodes in the cluster)
- E.g.: A 3+1 will give usable capacity up to 75% and a 4+1 will give up to 80% usable capacity.
It is importaint to understand as the stripe size increases, the resulting usable capacity increases diminish. As the stripe size increases, so do the overheads on the storage controllers and network. The impact during a failure is also increased as is the risk of a drive or node failure impacting the stripe.
In Part 2, I am planning on publishing testing examples to show the performance delta between typical replication and erasure coding for a write intensive workload.